Build an Efficient Data Warehouse with WhereScape RED 10 – Virtual Hands-On Lab

Tune in for a free, live virtual hands-on lab...

Operating a Data Warehouse

| July 24, 2017

Having designed and built your data warehouse, I imagine that you’d like to deliver it successfully to the business and run it smoothly on a daily basis. That’s the topic of today’s article.

As digitalization continues apace across all industries, the role and value of a data warehouse—together with its attendant data marts and associated data lake—becomes ever more central to business success. With such informational systems now becoming as important as traditional operational systems, and often more so, it should be self-evident that highly reliable and efficient operating practices must be adopted.

Historically, however, approaches to operating a data warehouse environment have been somewhat lax. Manual and semi-automated methods for populating and updating the contents of the warehouse have been widespread. Advances made in the data warehouse itself have been offset by the spread of ad hoc approaches in departmentally managed data marts. The data lake has seen the re-emergence of a cottage industry of hand-crafted scripts, often operated by individual data scientists.

The challenges of data lake operations and management have recently attracted widespread comment (centered on the phrase data swamp) and increasing focus by vendors. Nonetheless, it is the data warehouse—as the repository of truth about the legally-binding history of the business and the basis for reliable exploration and analysis of business challenges and opportunities—where most can be gained by the adoption of advanced management and automation practices. The combination of data warehouse automation (DWA) and Data Vault address these needs from two perspectives: deployment of function and ongoing day-to-day operations.

Deployment seldom gets the attention it deserves; it’s not exactly the sexiest part of any IT project! However, for a data warehouse, deployment needs to be treated as a long-term, monogamous relationship. A data warehouse is substantially more complex than most IT projects, given the variety and number of systems involved. It is also significantly more important to get right as we move toward data-driven business and more agile development approaches.

As a data warehouse moves from the development phase (design and build discussed previously) to test, quality assurance, and on to production, seemingly mundane—yet highly important—issues such as packaging and installation of the code built in the previous phase must be addressed. In the case of DWA, where all cleansing, transformation, (and loading, of course) occur in the target databases of the data warehouse, mart and lake, this code consists of both definitional SQL (DDL) that builds the database structures such as tables, indexes, etc. and the processing code (DML) that creates the data to populate them.

In the context of a warehouse designed with the Data Vault model and methods, WhereScape® Data Vault Express™ allows a set of objects, such as the related Hub, Satellite, and Link objects of a customer ensemble, to be bundled together, transported and installed with ease from the development environment to quality assurance and then subsequently into production. This bundle includes the selected object and related metadata—structure (DDL), processing code (ELT procedures, etc.), and jobs. When installed into an environment, the product determines what DDL changes need to be made if an object already exists (for example, add columns, new indexes, and so on) and constructs the appropriate DDL syntax statements. New ELT code such as stored-procedures are then installed and compiled in the database.

The clear aim—subject, of course, to internal policies—is to automate the deployment activities in order to speed deployment in agile development approaches and to reduce the chance of human error across the full life cycle.

Having deployed the system to production, the next—and ongoing—task is to schedule, execute, and monitor the continuing process of loading and transforming data into the data warehouse. In this phase, jobs defined by WhereScape consist of a sequence of interdependent tasks. For example, one sequence could be to drop certain indexes on a table, load new data to the table, and rebuild the indexes. The administrator creating the job can specify the tasks and their interdependencies to ensure the objects are processed in the correct order. During execution, the tasks run in parallel (up to a specified threshold, to ensure the system is not overloaded), subject to dependency constraints. This feature is particularly useful in a Data Vault 2.0 environment, where the design supports elevated levels of parallelization of load tasks.

To ensure that data consistency is maintained, if a task fails during execution, then all downstream dependent tasks are halted. When the problem has been resolved, the job is restarted and will pick up from where it left off and continue through to completion.

As mentioned earlier, a modern analytical environment consists of a combination of data warehouse, marts, and data lake. From an operational point of view, given potential interdependencies of data across these systems, it makes sense to manage this ensemble as a single, logical environment. WhereScape supports this aim, both in terms of its job definitions that span multiple systems and in its provision of a centralized monitoring and logging repository. Here, all job execution activities such as processing times (start and finish), rows loaded, errors, exceptions, and so on are logged. This provides historical information that can be used to track load performance over time, allowing administrators to make any necessary adjustments as data volumes grow.

The smooth, ongoing daily operation of the entire data warehouse environment is a fundamental prerequisite to its acceptance by users and its overall value to the business. Nothing will destroy users’ confidence more quickly than arriving every morning and not knowing whether the warehouse is up and running with the latest data updates.

And yet, there’s more! How do you support the changes in requirements that occur almost continuously when a data warehouse has been successful? Data Vault and Data Warehouse Automation can help here too. That is the topic of the fourth and final post of this series.

You can find the other blog posts in this series here:

Dr. Barry Devlin is among the foremost authorities on business insight and one of the founders of data warehousing, having published the first architectural paper on the topic in 1988. Barry is founder and principal of 9sight Consulting. A regular blogger, writer and commentator on information and its use, Barry is based in Cape Town, South Africa and operates worldwide.

ETL vs ELT: What are the Differences?

Apr 17, 2025

In working with hundreds of data teams through WhereScape’s automation platform, we’ve seen this debate evolve as businesses modernize their infrastructure. Each method, ETL vs ELT, offers a unique pathway for transferring raw data into a warehouse, where it can be...

Dimensional Modeling for Machine Learning

Apr 16, 2025

Kimball’s dimensional modeling continues to play a critical role in machine learning and data science outcomes, as outlined in the Kimball Group’s 10 Essential Rules of Dimensional Modeling, a framework still widely applied in modern data workflows. In a recent...

Automating Data Vault in Databricks | WhereScape Recap

Apr 11, 2025

Automating Data Vault in Databricks can reduce time-to-value by up to 70%—and that’s why we hosted a recent WhereScape webinar to show exactly how. At WhereScape, modern data teams shouldn't have to choose between agility and governance. That's why we hosted a live...

WhereScape Recap: Highlights From Big Data & AI World London 2025

Mar 28, 2025

Big Data & AI World London 2025 brought together thousands of data and AI professionals at ExCeL London—and WhereScape was right in the middle of the action. With automation taking center stage across the industry, it was no surprise that our booth and sessions...

Why WhereScape is the Leading Solution for Healthcare Data Automation

Mar 20, 2025

Optimizing Healthcare Data Management with Automation Healthcare organizations manage vast amounts of medical data across EHR systems, billing platforms, clinical research, and operational analytics. However, healthcare data integration remains a challenge due to...

WhereScape Q&A: Your Top Questions Answered on Data Vault and Databricks

Mar 17, 2025

During our latest WhereScape webinar, attendees had fantastic questions about Data Vault 2.0, Databricks, and metadata automation. We’ve compiled the best questions and answers to help you understand how WhereScape streamlines data modeling, automation, and...

What is Data Fabric? A Smarter Way for Data Management

Feb 28, 2025

As of 2023, the global data fabric market was valued at $2.29 billion and is projected to grow to $12.91 billion by 2032, reflecting the critical role and rapid adoption of data fabric solutions in modern data management. The integration of data fabric solutions...

Want Better AI Data Management? Data Automation is the Answer

Feb 14, 2025

Understanding the AI Landscape Imagine losing 6% of your annual revenue—simply due to poor data quality. A recent survey found that underperforming AI models, built using low-quality or inaccurate data, cost companies an average of $406 million annually. Artificial...

RED 10: The ‘Git Friendly’ Revolution for CI/CD in Data Warehousing

Feb 14, 2025

For years, WhereScape RED has been the engine that powers rapidly built and high performance data warehouses. And while RED 10 has quietly empowered organizations since its launch in 2023, our latest 10.4 release is a game changer. We have dubbed this landmark update...

The Assembly Line for Your Data: How Automation Transforms Data Projects

Feb 10, 2025

Imagine an old-fashioned assembly line. Workers pass components down the line, each adding their own piece. It’s repetitive, prone to errors, and can grind to a halt if one person falls behind. Now, picture the modern version—robots assembling products with speed,...