On Demand | Building the Foundation for AI: How Cloud Migration Transforms Data Warehousing
Are you facing challenges with outdated infrastructure, slow data processing, or limited scalability? Patrick O’Halloran, Solutions Architect at WhereScape, is here to help. In this insightful webcast, learn how cloud migration can address the most pressing pain points in data warehousing, from eliminating inefficiencies to enabling advanced AI and machine learning capabilities.
Key pain points we’ll address:
Struggling with rigid, outdated data warehousing systems.
Delays and inefficiencies in data processing.
Lack of scalability to support AI and machine learning needs.
Complexities in transitioning to cloud architecture.
Turn your challenges into opportunities—watch now!
Transcript
hey everybody my name is Patrick o'herin I'm the solutions architect with where escape and today I'll be talking about uh what the title says using Data Warehouse automation to migrate your data warehouse to the cloud I'm going to talk about war skip a little bit then I'm going to talk about the whole process of how do you build a successful project to move from on-prem data warehousing to to cloud data warehousing so again little bit about who's warscape why do you want to migrate your data round to the cloud you probably already know that answers to that because that's why you're watching this video but I'll go over some of that instead uh What stages are associated with getting to your data warehouse migration how do you plan for it what do you do before the migration uh how do you define your strategy and get execute that that strategy and then finally questions and answers at the end so wcape started as a company that uh was doing consulting building data warehouses and we were building some tools to help automate the building of those data warehouses and those tools overcame a lot of the issues but what are some of the issues that people are having with data warehousing lack of trust on the data delivery ecosystem maybe you've got some old Legacy system that's large and cumbersome people don't know how the data got where it is uh other problems that may crop up new compliance and regulation maybe because of mergers Acquisitions maybe because of regulatory changes maybe because of corporate changes uh you've got new requirements and you need to make changes to an existing system more data from more sources that's probably probably the biggest change I've seen in the last 10 years uh the 10 prior years to that probably the biggest change was the amount of data being loaded now what's really different and challenge today is the number of disparate sources you're pulling data from whether it's Excel spreadsheets orrest apis Json files XML files whatever it is difficult to find enough staff there's a lot of Technologies involved in building a data warehouse a data Foundation uh and finding the technical skills can be difficult debugging testing and rework routine take much two time there's a very very good book out uh from the got the 80s is one I got it called the mythical man month by a man named Fredick Brooks and he talks about productivity of system level programmers not just in terms of writing code or creating modules but having fully production tested documented code and that takes a long time I think he said his statistic was that the uh good programmer was developing 10 lines of code per day again fully tested documented production ready code uh data warehouse automation certainly helps with that communication as to what data is available data is available and where it's coming from documentation is a huge uh benefit of data rhouse automation we escape the idea that I can with a few clicks of the button present an entire glossery all the relationships the star schemas everything else to the end us is is a huge plus and then the cost to maintain highly reliable data fuure data structures are skyrocketing certainly on print uh the costs of those have not gone down while the cost of CC Computing has definitely got down so what is wcape what do we do we used to talk about data warehouse automation data warehouse is I don't want to call it an Antiquated term but now people talk about data Lakes data lake houses data Marts data vaults all sorts of data structures um really when I talk about data warehouse what I mean is any large data Foundation that has curated historical data probably from multiple sources uh that's available for analytics either data analysts developing charts and graphs and reports or possibly feeding into an AI engine to to train it or even to just get results out of it traditionally there's been a lot of tools involved a lot of Technologies involved and those every time you switch tools every time you bring in new people there's chance of introducing mistakes slowing things down uh one of the huge advantages of automation of building designing and developing and deploying these processes that Foundation is uh fewer mistakes means less testing less rework time things are a better product at the end uh because you can get immediate response to what it is you're developing instead of spending several months working on something and then getting
feedback so automation cost obviously development cost times goes down maintenance cost goes down testing costs go down uh that's that's a great advantage of automation risk um I think Gartner in 2022 have had a report out that said that building a data warehouse a data Foundation analytics data Foundation is one of the riskiest and costliest projects that it can take on with uh one of the biggest failure rates automation helps in that I can develop prototypes very very quickly I can get things out in front of the user uh and mitigate things and and not have to wait 6 months 9 months 12 months down the road to find out that what I'm building just doesn't work certainly complexity both in terms of the architecture of the data Foundation as well as the number of sources I'm bringing in urgency and we certainly saw with the pandemic that company priorities organizational priorities can change very very quickly be able to respond quickly uh certainly advantage of Automation and then agility the idea that I can create rapid prototypes uh in front of data analysts or whoever the end users are the business people that are needing the information uh and get their immediate feedback it's certainly part of the agile environment I get my slides right Global reach uh wecap been like I said been around about 20 years we've got more than 1,200 customers that number probably needs to be bumped up uh and really what we do is we increase productivity not just of Designing and creating a system but building a better system uh that really reduces rework reduces testing and things like that the example I use is that a a car is not just an automated horse it's not just a fast horse uh it really changes what you can do and how you do it data warehouse automation is not just a way of producing a data warehouse faster it really changes how you work and what you produce and as I said we've been around about 20 years uh we're in pretty much every single industry you could think of around the world data automation uh that's a kind of an encompassing term it could mean anything from I don't know using templates to having something that takes over completely and generates your entire data warehouse the key aspects about wcape are that it's metadata driven so as you're doing work you're describing what you want to do not how you want to do it and then where skape fits in the details how do I get data out of a rest API and get that up into snowflake that's pretty much all I need to tell wcape uh what's the URL for the for the rest API and uh what's my target where am I putting the data putting it in Snowflake and it'll create the Python scripts create the HTTP calls the file parsing uh the ddl DML whatever it needs to do to get that action done best practice is industry standards uh the nice thing about data warehouse Automation and generation is that uh as I work um I don't have my own personal idiosyncratic style I'm introducing uh whether I'm doing the work or one of my co-workers is doing the work it's going to be done the same way getting the same results documentation lineage is a huge Advantage the idea that with a couple clicks of the button I can see trackback diagrams so maybe I'm looking at a dimension or a fact and someone says where did this number come from I can trace that all the way back through all the Transformers back to the source system or if a source system is changed I can very quickly get a report that says show me everywhere where this Source system entity attribute shows up in my data warehouse and then I can decide what to do with it easier to evolve expand and Port certainly I've been working at an abstract level and you'll see that most of the work you do in wear Escape is independent of whatever the target is so whether it's snowflake or data bricks or fabric uh a as your synapse whatever it is uh the work I do is generally independent of the target meaning that if we do decide to Port later uh there's very little rework that needs to be done that's the database agnostic example all right so let me talk about actually migration you you've got an on-prem solution you've been using you've built up maybe over a couple years uh SQL Server seems to be a popular platform but it could be terod dat could be Oracle could be ACH should Excel spreadsheets I've seen that as well um so someone has decided probably higher up that we need to migrate to the cloud uh we need to do uh at least a PC a pref concept A a trial just to see is this feasible will it work uh what problems does it fix what new complexities does it introduce so the general stages are assessing what do you have right now where are you at right now coming up with a plan on okay what are we going to migrate how are we going to migrate it um who's going to be doing it over what schedule migration execution okay now that we've decided what to do let's actually do it optimization and testing okay we've done it let's make sure we did it right and then go live and maintenance and one of the basic questions is is a cut over how do you decide when you're going to decommission your old analytic system and bring in your new analytic system all right planning defining objectives and outcomes what are you doing um what do you have right now what systems do you have what domains are you monitoring is it you know HR production enrollment is it about students uh what is it about inventory control what do you have right now and what is your desired outcome are you going to move everything as it is exists as it exists are you going to try and upgrade your data Foundation your infrastructure uh maybe build a green field brand new from scratch data Foundation layer in the cloud instead of migrating lift and shift as it is those are some decisions you need to figure out scop and timeline what are you going to migrate is it going to be everything all at once is it going to be particular domains at a single time maybe it's going to be new systems as they come up you'll leave your existing systems on your old data warehouse and then as you merge with new or organizations bring in new applications maybe that's what gets migrated to the cloud uh those are kinds of decisions you have to make budgeting and resource allocation uh you have to really get a good idea of what is the priority of this how does this migration plan rank in priority versus other projects that are going on I've seen too many things where everyone says yeah yeah yeah it's our number one priority yeah yeah yeah we're absolutely going to make sure this is a commitment we're going to get it done this year production issues happen other issues happen uh and resources get pulled away budgeting doesn't quite get allocated you've got half the staff you were expecting uh not only do you have to make the budget and resource allocation you have to make the commitment that you're going to be doing that and then building the right team if you're migrating to the cloud and you have no Cloud experience whether it's snowflake or data bricks uh synapse whether you're adopting a new uh architecture new uh methodology for your data Foundation Data Vault whatever it is uh make sure you have people with experience there are plenty of opportunities to make mistakes uh in in building out especially with the new technology and building out your data Foundation it really really helps to have people who are experienced who understand the options you can take and which ones are correct uh a lot of times options that are correct just depend on your particular organization and the steps you should take so having people with experience is obviously an absolute plus and whether those are brought in as consultants or whether are brought in as full-time employees uh or some sort of service partner uh those are decisions you're going to have to make but you definitely need the right team you need someone who has commitment to this as a sole project manager or program manager who's going to be responsible for this and nothing else uh and then you need the technical people who understand the current systems as well as the technical people who understand the uh where migrating to and then you really have to involve the business as well one of the one of the big failures of of many IT projects in building data infrastructure projects is looking at these as it projects they're obviously it heavy there's a lot of it skills involved but ultimately they're business projects I'm not creating a data warehouse to make it happy uh I'm making a data warehouse creating a data warehouse migrating to the cloud to give a better product to the analysts the scientists the AI engines whoever is going to be consuming that data you need to make sure that they're part of the team as
well so what do you do before migration uh data inventory what do you have maybe it's one system maybe it's multiple systems maybe you got analysts who have what we would call a dark dark Warehouse uh theyve built their own processes in place they've got their own Excel spreadsheets their own tricky little formulas maybe they've actually got some postgress or SQL Server up and running which they're loading data into and the man ulating in some way to find those things out you have to go out and talk with people so talk with the analysts find out what it is they're using um find out where the data is coming from how accurate is it how useful is it the shortcomings of the current system uh any R any data um uh duplication that may be out there and then if you are doing something other than a lift and shift which I definitely recommend not doing just in lift and shift is what kind of data cleanup and standardization are you going to do uh what mistakes have you made in the past maybe you've designed your data Foundation your data warehouse around a source system uh that tends to paint you into a corner building designing your architecture around a specific Source system especially now that more and more Source systems are becoming available so how are you going to clean up not just the data but the architecture what kind of changes are you going to make uh if you do have Troublesome unreliable what are you going to do to get that cleaned up is that going to be part of your project and then standardization uh if you've got data coming in from multiple systems or if you are adding new systems uh what kind of naming standards data standards primary foreign key relationships uh understanding the the relationships and hierarchies of your data model uh standardizing all that independent of source systems is going to be very very important and then obviously data backup if I'm going to be migrating data from my data warehouse and that's that's a question you have to answer as well am I going to migrate existing historic data and if I do that how am I going to do that am I going to clean it up am I going to standardize it to whatever my new formats are uh or is this simply a clean cut over and only going to be ingesting new data uh that's a question as well but obviously backups I mean I've been in it for a long time and as often as people talk about backups it's not just making backups it's testing backups to make sure that they are working I've got too many horror stories where people will talk about doing backups of systems and then when crunch time came and they needed those backups they found out that they weren't quite what they thought they were so the the importance of backups is not just making backups uh but testing them so what is your data warehouse migration what are you planning on doing lift and shift again I would caution against this sometimes it's a perfectly reasonable answer I would say a minority of cases uh but taking an existing historic system that's been built over a long period of time and just deciding we're going to move this directly to the cloud in some cases say it's an on-prem SQL server and you're moving it as your synapse maybe it is the right answer because you don't have to do a lot of changes but if you're moving from some other system say on Prem Oracle or terter dat to snowflake uh lift and shift is probably not the right answer you're going to you're probably moving to the cloud because you've got some deficiencies in your current system that you can't overcome questions that can't be answered by the analysts new data that can't be incorporated into your model whatever it is lift and shift won't fix those issues um maybe you're just doing rep platforming as I said maybe you're moving from unem SQL Server to uh Azure SQL as your synapse that's a fairly easy one uh but that that's definitely going to complicate it'll answer some problems and introduce other problems uh where are you going to in the cloud are you going to something very similar or something brand new re architecting it's a lot of work it introduces a lot of complexity a lot of testing a lot of things you have to do uh but as I said many times that's the that's the right answer rather than lift and shift because as I said with lift and shift you're going to pretty much bring along any baggage any historical issues you had in the past uh re architecting may be the right answer uh this is your chance to get over that tacnical Deb get over the system that's evolved over the last 10 years uh has uh you know a bunch of Band-Aids and bailing wire holding it all together all the cliches this is your chance with re architecting to really design a new model there is a uh a book I mentioned the mythical man month where there's a whole section in there called the second system effect and how the second time someone builds a complex system their tendency is to make it overly complicated and do all the things they couldn't do when they did the first time so if you are rearching uh my caution would be don't make it overly complex keep things simple don't introduce absolutely everything you never had in the first system uh get a solid Baseline of a design that's going to work and then as time allows as budget allows um introduce specific things you think would be helpful but don't try and add absolutely everything you're missing from the from the original system into your re architected system uh that is going to really increase your chance of failure choosing the right migration path obviously prototyping is good proof of concept something like that just to make sure it's feasible uh but understanding how you're going to get data from your old system to your new system are you going to get data from your old system to your new system are you going to take historic data if you're especially if for re architecting uh in some design some methodology that's maybe on an on print system PR Prem system if you re architected are you going to move that historic data and if you're going to move it how are you going to transform it maybe it needs to be completely reorganized uh in order to get into this new architecture choosing that path is critical and then obviously using automation with r Escape um a lot of the steps of building this this uh new cloud-based architecture is going to be choosing the right tools certainly automation is going to help certainly automation with where Escape is going to help I heard one of my compatriots uh explain can can you build a a swimming pool with a shovel well technically yes you can uh there's certainly better tools and faster tools and better ways of doing it uh just like building a data Foundation if you're rearching even if you're just doing lift and shift using automation to fill in a lot of the technical details uh to to take big chunks of effort out of the project uh things that repetitive tasks manual tasks again not just producing things faster but producing things better reducing uh rework time reducing testing time things like that and and if you do get towards your goal and realize you've made mistakes you need to take a step back the advantage of automation is that I can change templates I can change rules I can change the the the the information that goes into how my system is built and then regenerate it and uh begin my testing again so thank you for your time uh I hope this was helpful as I said there's there's a lot of reasons why you would want to move to the cloud certainly cost certainly reliability uh as little as oh say 15 10 15 years ago I was still working with organizations that had machine rooms I was tasked with setting up a uh backup Oracle server they had a two server Network and they wanted a n plus1 uh they wanted the third Oracle server added as a as a backup spent about $200,000 buying Hardware spent about $200,000 on software uh I for some reason I'm a software guy not a hardware guy but I was tasked with finding space in the machine room and racking and stacking it getting power getting all the cables getting everything else done and it took several months I remember the first time I ever saw a virtual machine in a business uh my first reaction was why do you want to simulate a computer with a computer you already have a computer that doesn't make any sense but what this company was doing was they were spinning up web servers they had a web-based application that would have uh peak times and what they would do is in a matter of 60 seconds they would spin up additional web servers from a golden image and that's when it clicked on me that this whole process of having a computer a virtual computer is much much much faster much more reliable uh much more standardized than having physical machines I I can tell you too many horror stories about the weeks or months it takes to get some machines purchase orders done get them set up and installed So you you're obviously considering that you're obviously looking at moving to the cloud there's a lot of advantages there uh there's a lot of ways you can move to get to the Cloud the big thing is understanding what are your options primarily I would say your options are well what cloud platform I'm going to select am what am I going to do with my existing systems am I replicating them or am I replacing them or both what kind of priority does this project have who's going to be involved in it what kind of testing am I going to do how am I going to cut this over am I going to keep historical data how am I going to test this uh and then when it does come time to decommission your existing system and move to your uh new new cloud-based environment how's that going to be done it's not going to be flipping a switch overnight it's not going to be separate groups that are going to move one by one all those are decisions you need to make and uh hopefully this has helped you come up with some of those uh situations so that's it I appreciate uh feel free to reach out to us at we escape.com uh you can t you can email us at moreinfo wcape decom uh if you go to wor escape.com you can click and contact us I'll be glad to talk with you answer any questions you have but thank you very much
Learn more about our unique data productivity capabilities for these leading platforms
Deploy on Microsoft Azure and integrate with Microsoft applications.
“It took the architects a day and a half to solve all four use cases. They built two Data Vaults on the host application data, linked the two applications together and documented the whole process. This was impressive by any standard. After that it was an easy process to get all the documents signed.”
"At seven months into the project we can say it really worked out. We have been able to really quickly develop an initial MVP for our first country and that was really good. The automation and the changes we needed to do were rapidly applied. We had to remodel a few things and that was done within a day with the automation in WhereScape."