How-to: Build Smarter with 3D + Automate Faster with Data Vault Express

Ready to explore Data Vault automation...

From Data Warehouse Automation to Data Architecture Automation

By Rick van der Lans, Founder of R20/Consultancy BV

| June 25, 2021

From Data Warehouse Automation to Data Architecture Automation

For a long time, the data warehouse architecture was the sole ruler of data delivery to decision-making processes, but not anymore. It now has to share the stage with other data architectures, such as the data lake, data hub, and data lakehouse. Because the data in these new data architectures is structured, organized and used differently, a new breed of generators is required: the data architecture automation tools.

The benefits of using generators are clear. They accelerate development, ease maintenance, create run-time platform independence, improve performance, and so on.

Not every task is suitable for automation and for which suitable generators can be developed. The best tasks suitable for automation are repetitive by nature and can be expressed as formal algorithms that indicate which steps to perform, what to do in special cases and how to react if something goes wrong. In other words, these tasks can be formalized.

Many of the tasks involved in designing, developing and maintaining data warehouse architectures are repetitive and can be formalized, making them highly suited for automation. For example, when an enterprise data warehouse uses a data vault design technique and the physical data marts use star schemas, both can be generated from a central data model including the ETL code to copy the data from the warehouse to the data marts.

Data architecture automation tools can be referred to as the third generation of generators used in automating the development of data architectures to support decision-making processes. The first generation is formed by tools such as ETL, BI and data modeling tools. For example, ETL tools transform high-level specifications to lower-level code to do the actual ETL work, many BI tools can be considered to be generators because they generate SQL statements that extract data from databases, and some data science tools enable data scientists to work at a high conceptual level from which code is generated.

All these generators help to accelerate development and ease maintenance, but they are all limited to generate just one component of an entire data architecture. Therefore, multiple independent generators are required to generate the complete architecture. Since these generators require similar specifications, they are defined multiple times, or in other words, they are duplicated. It is a challenge to keep all these scattered specifications consistent, to ensure that they work together optimally, and to guarantee that if one specification is changed, all the duplicate specifications are changed accordingly.

The principles that apply to generators of individual platform components can be applied to generators of entire data architectures. That is why they were succeeded by the second generation of generators, the data warehouse automation tools that generate entire data warehouse architectures. They do not generate code for one component of the architecture, but for several. Traditional data warehouse automation tools generate, for example, staging areas, enterprise data warehouses, physical data marts, the ETL solutions that copy data from one database to another, and metadata. Several of these tools have been on the market for years and have proven their worth. They all store all the metadata specifications once and reuse them when generating, for example, the data warehouse tables, the data mart tables, and the ETL logic to copy the data.

The main restriction of several data warehouse automation tools is that they only generate traditional data warehouse architectures which can only support a restricted set of data consumption forms.

Today, organizations also want to deploy data hubs, data lakes, and data lakehouses. These are used to support new forms of data consumption. For example, in these new data architectures, data is copied to a data hub and from there to a data warehouse, or the data architecture consist of a data lake that stores data from a data warehouse, transactional databases and external data sources.

Supporting other data architectures requires generators that can be adapted to generate data architectures composed of other types of data stores than those supported by more traditional data warehouse architectures. The term data warehouse automation is probably a misnomer for these tools, it is too restrictive. Data architecture automation tool is more suitable. With the increasing need by organizations to become more data-driven, or in in other words, to use data more widely, effectively and efficiently, the need for generators that can generate any kind of data architecture to support any form of data consumption, the need for adaptable data architecture automation tools has increased accordingly.

WhereScape Recap: Highlights From Big Data & AI World London 2025

Mar 28, 2025

Big Data & AI World London 2025 brought together thousands of data and AI professionals at ExCeL London—and WhereScape was right in the middle of the action. With automation taking center stage across the industry, it was no surprise that our booth and sessions...

Why WhereScape is the Leading Solution for Healthcare Data Automation

Mar 20, 2025

Optimizing Healthcare Data Management with Automation Healthcare organizations manage vast amounts of medical data across EHR systems, billing platforms, clinical research, and operational analytics. However, healthcare data integration remains a challenge due to...

WhereScape Q&A: Your Top Questions Answered on Data Vault and Databricks

Mar 17, 2025

During our latest WhereScape webinar, attendees had fantastic questions about Data Vault 2.0, Databricks, and metadata automation. We’ve compiled the best questions and answers to help you understand how WhereScape streamlines data modeling, automation, and...

What is Data Fabric? A Smarter Way for Data Management

Feb 28, 2025

As of 2023, the global data fabric market was valued at $2.29 billion and is projected to grow to $12.91 billion by 2032, reflecting the critical role and rapid adoption of data fabric solutions in modern data management. The integration of data fabric solutions...

Want Better AI Data Management? Data Automation is the Answer

Feb 14, 2025

Understanding the AI Landscape Imagine losing 6% of your annual revenue—simply due to poor data quality. A recent survey found that underperforming AI models, built using low-quality or inaccurate data, cost companies an average of $406 million annually. Artificial...

RED 10: The ‘Git Friendly’ Revolution for CI/CD in Data Warehousing

Feb 14, 2025

For years, WhereScape RED has been the engine that powers rapidly built and high performance data warehouses. And while RED 10 has quietly empowered organizations since its launch in 2023, our latest 10.4 release is a game changer. We have dubbed this landmark update...

The Assembly Line for Your Data: How Automation Transforms Data Projects

Feb 10, 2025

Imagine an old-fashioned assembly line. Workers pass components down the line, each adding their own piece. It’s repetitive, prone to errors, and can grind to a halt if one person falls behind. Now, picture the modern version—robots assembling products with speed,...

The Role of Clean Data in AI Success: Avoiding “Garbage In, Garbage Out”

Feb 5, 2025

Co-authored by infoVia and WhereScape Artificial Intelligence (AI) is transforming industries across the globe, enabling organizations to uncover insights, automate processes, and make smarter decisions. However, one universal truth remains: the effectiveness of any...

What is a Cloud Data Warehouse?

Jan 7, 2025

As organizations increasingly turn to data-driven decision-making, the demand for cloud data warehouses continues to rise. The cloud data warehouse market is projected to grow significantly, reaching $10.42 billion by 2026 with a compound annual growth rate (CAGR) of...

Simplify Cloud Migrations: Webinar Highlights from Mike Ferguson

Dec 11, 2024

Migrating your data warehouse to the cloud might feel like navigating uncharted territory, but it doesn’t have to be. In a recent webinar that we recently hosted, Mike Ferguson, CEO of Intelligent Business Strategies, shared actionable insights drawn from his 40+...