Join WhereScape at Big Data & AI World—the...
Operating a Data Warehouse
Having designed and built your data warehouse, I imagine that you’d like to deliver it successfully to the business and run it smoothly on a daily basis. That’s the topic of today’s article.
As digitalization continues apace across all industries, the role and value of a data warehouse—together with its attendant data marts and associated data lake—becomes ever more central to business success. With such informational systems now becoming as important as traditional operational systems, and often more so, it should be self-evident that highly reliable and efficient operating practices must be adopted.
Historically, however, approaches to operating a data warehouse environment have been somewhat lax. Manual and semi-automated methods for populating and updating the contents of the warehouse have been widespread. Advances made in the data warehouse itself have been offset by the spread of ad hoc approaches in departmentally managed data marts. The data lake has seen the re-emergence of a cottage industry of hand-crafted scripts, often operated by individual data scientists.
The challenges of data lake operations and management have recently attracted widespread comment (centered on the phrase data swamp) and increasing focus by vendors. Nonetheless, it is the data warehouse—as the repository of truth about the legally-binding history of the business and the basis for reliable exploration and analysis of business challenges and opportunities—where most can be gained by the adoption of advanced management and automation practices. The combination of data warehouse automation (DWA) and Data Vault address these needs from two perspectives: deployment of function and ongoing day-to-day operations.
Deployment seldom gets the attention it deserves; it’s not exactly the sexiest part of any IT project! However, for a data warehouse, deployment needs to be treated as a long-term, monogamous relationship. A data warehouse is substantially more complex than most IT projects, given the variety and number of systems involved. It is also significantly more important to get right as we move toward data-driven business and more agile development approaches.
As a data warehouse moves from the development phase (design and build discussed previously) to test, quality assurance, and on to production, seemingly mundane—yet highly important—issues such as packaging and installation of the code built in the previous phase must be addressed. In the case of DWA, where all cleansing, transformation, (and loading, of course) occur in the target databases of the data warehouse, mart and lake, this code consists of both definitional SQL (DDL) that builds the database structures such as tables, indexes, etc. and the processing code (DML) that creates the data to populate them.
In the context of a warehouse designed with the Data Vault model and methods, WhereScape® Data Vault Express™ allows a set of objects, such as the related Hub, Satellite, and Link objects of a customer ensemble, to be bundled together, transported and installed with ease from the development environment to quality assurance and then subsequently into production. This bundle includes the selected object and related metadata—structure (DDL), processing code (ELT procedures, etc.), and jobs. When installed into an environment, the product determines what DDL changes need to be made if an object already exists (for example, add columns, new indexes, and so on) and constructs the appropriate DDL syntax statements. New ELT code such as stored-procedures are then installed and compiled in the database.
The clear aim—subject, of course, to internal policies—is to automate the deployment activities in order to speed deployment in agile development approaches and to reduce the chance of human error across the full life cycle.
Having deployed the system to production, the next—and ongoing—task is to schedule, execute, and monitor the continuing process of loading and transforming data into the data warehouse. In this phase, jobs defined by WhereScape consist of a sequence of interdependent tasks. For example, one sequence could be to drop certain indexes on a table, load new data to the table, and rebuild the indexes. The administrator creating the job can specify the tasks and their interdependencies to ensure the objects are processed in the correct order. During execution, the tasks run in parallel (up to a specified threshold, to ensure the system is not overloaded), subject to dependency constraints. This feature is particularly useful in a Data Vault 2.0 environment, where the design supports elevated levels of parallelization of load tasks.
To ensure that data consistency is maintained, if a task fails during execution, then all downstream dependent tasks are halted. When the problem has been resolved, the job is restarted and will pick up from where it left off and continue through to completion.
As mentioned earlier, a modern analytical environment consists of a combination of data warehouse, marts, and data lake. From an operational point of view, given potential interdependencies of data across these systems, it makes sense to manage this ensemble as a single, logical environment. WhereScape supports this aim, both in terms of its job definitions that span multiple systems and in its provision of a centralized monitoring and logging repository. Here, all job execution activities such as processing times (start and finish), rows loaded, errors, exceptions, and so on are logged. This provides historical information that can be used to track load performance over time, allowing administrators to make any necessary adjustments as data volumes grow.
The smooth, ongoing daily operation of the entire data warehouse environment is a fundamental prerequisite to its acceptance by users and its overall value to the business. Nothing will destroy users’ confidence more quickly than arriving every morning and not knowing whether the warehouse is up and running with the latest data updates.
And yet, there’s more! How do you support the changes in requirements that occur almost continuously when a data warehouse has been successful? Data Vault and Data Warehouse Automation can help here too. That is the topic of the fourth and final post of this series.
You can find the other blog posts in this series here:
- Week 1: Designing a Data Warehouse
- Week 2: Building a Data Warehouse
- Week 4: Maintaining a Data Warehouse
Dr. Barry Devlin is among the foremost authorities on business insight and one of the founders of data warehousing, having published the first architectural paper on the topic in 1988. Barry is founder and principal of 9sight Consulting. A regular blogger, writer and commentator on information and its use, Barry is based in Cape Town, South Africa and operates worldwide.
Simplify Cloud Migrations: Webinar Highlights from Mike Ferguson
Migrating your data warehouse to the cloud might feel like navigating uncharted territory, but it doesn’t have to be. In a recent webinar that we recently hosted, Mike Ferguson, CEO of Intelligent Business Strategies, shared actionable insights drawn from his 40+...
2025 Data Automation Trends: Shaping the Future of Speed, Scalability, and Strategy
As we step into 2025, data automation isn’t just advancing—it’s upending conventions and resetting standards. Leading companies now treat data as a powerful collaborator, fueling key business decisions and strategic foresight. At WhereScape, we’re tuned into the next...
Building Smarter with a Metadata-Driven Approach
Think of building a data management system as constructing a smart city. In this analogy, the data is like the various buildings, roads, and infrastructure that make up the city. Each structure has a specific purpose and function, just as each data point has a...
Your Guide to Online Analytical Processing (OLAP) for Business Intelligence
Streamline your data analysis process with OLAP for better business intelligence. Explore the advantages of Online Analytical Processing (OLAP) now! Do you find it hard to analyze large amounts of data quickly? Online Analytical Processing (OLAP) is designed to answer...
Mastering Data Warehouse Design, Optimization, And Lifecycle
Building a data warehouse can be tough for many businesses. A data warehouse centralizes data from many sources. This article will teach you how to master data warehouse design, optimization, and lifecycle. Start improving your data strategy today. Key Takeaways Use...
Revisiting Gartner’s First Look at Data Warehouse Automation
At WhereScape, we are delighted to revisit Gartner’s influential technical paper, Assessing the Capabilities of Data Warehouse Automation (DWA), published on February 8, 2021, by analyst Ramke Ramakrishnan. This paper marked a significant milestone for the data...
Unveiling WhereScape 3D 9.0.5: Enhanced Flexibility and Compatibility
The latest release of WhereScape 3D is here, and version 9.0.5 brings a host of updates designed to make your data management work faster and smoother. Let’s dive into the new features... Online Documentation for Enhanced Accessibility With the user guide now hosted...
What Makes A Really Great Data Model: Essential Criteria And Best Practices
By 2025, over 75% of data models will integrate AI—transforming the way businesses operate. But here's the catch: only those with robust, well-designed data models will reap the benefits. Is your data model ready for the AI revolution?Understanding what makes a great...
Guide to Data Quality: Ensuring Accuracy and Consistency in Your Organization
Why Data Quality Matters Data is only as useful as it is accurate and complete. No matter how many analysis models and data review routines you put into place, your organization can’t truly make data-driven decisions without accurate, relevant, complete, and...
Common Data Quality Challenges and How to Overcome Them
The Importance of Maintaining Data Quality Improving data quality is a top priority for many forward-thinking organizations, and for good reason. Any company making decisions based on data should also invest time and resources into ensuring high data quality. Data...
Related Content
Simplify Cloud Migrations: Webinar Highlights from Mike Ferguson
Migrating your data warehouse to the cloud might feel like navigating uncharted territory, but it doesn’t have to be. In a recent webinar that we recently hosted, Mike Ferguson, CEO of Intelligent Business Strategies, shared actionable insights drawn from his 40+...
2025 Data Automation Trends: Shaping the Future of Speed, Scalability, and Strategy
As we step into 2025, data automation isn’t just advancing—it’s upending conventions and resetting standards. Leading companies now treat data as a powerful collaborator, fueling key business decisions and strategic foresight. At WhereScape, we’re tuned into the next...
Building Smarter with a Metadata-Driven Approach
Think of building a data management system as constructing a smart city. In this analogy, the data is like the various buildings, roads, and infrastructure that make up the city. Each structure has a specific purpose and function, just as each data point has a...
Your Guide to Online Analytical Processing (OLAP) for Business Intelligence
Streamline your data analysis process with OLAP for better business intelligence. Explore the advantages of Online Analytical Processing (OLAP) now! Do you find it hard to analyze large amounts of data quickly? Online Analytical Processing (OLAP) is designed to answer...