WhereScape is thrilled to invite you to...
Gartner’s Report on Data Hubs, Lakes & Warehouses Highlights Knowledge Gap for Data Managers
Gartner’s paper, Data Hubs, Data Lakes and Data Warehouses: How They Are Different and Why They Are Better Together, serves as much as a cautionary piece as an informative one. Based on inquiries made to the analyst firm over the past few years, is it apparent that a real gap in knowledge exists when it comes to what these three data structures do and how they should be employed.
“For example, while Gartner client inquiries referring to data hubs increased by 20% from 2018 through 2019, more than 25% of these inquiries were actually about data lake concepts.” (Data Hubs, Data Lakes and Data Warehouses: How They Are Different and Why They Are Better Together by Ted Friedman, Nick Heudecker.)
At the time of the report’s publication in 2020, the percentage of companies using data hubs, lakes and warehouses looked like this:
With many companies using all three of these structures already, it’s no exaggeration to say how well companies can understand and harness the potential of data lakes, warehouses and hubs can and will shape their success. This explains the urgency underlying this piece; at present, huge investments are being made by many people who either do not fully understand what each of these three entities does alone and/or how they can be combined most effectively.
How Data Warehouses, Lakes and Hubs Work
Data Warehouses should be used for the analysis of structured data, Data Lakes for analysis of unstructured or semi-structured data, and Data Hubs for communicating the resultant BI to the people who need to act on it. However, many mistakenly think that these three entities do the same thing in different ways, and so are interchangeable. It’s important that business leaders not only understand this for themselves but communicate it throughout the company to democratize the use of data.
Data Lakes and the exploratory technologies that unstructured big data enables are only as useful as your company’s ability to assimilate their findings into a structured environment. This is where the Data Warehouse takes over: a Data Lake can be added as a source to a Data Warehouse, and its data blended with other real-time and batch sources to provide rich, contextualized business insight. Read more on Data Lakehouses here.
Of the three structures, it is ironic that the one managers need to know best is the least understood. The Data Hub is where BI is not only shared but is also available for governance by those responsible for it. As its name suggests a hub also “enables data flow between diverse endpoints”
One of the main recommendations of Gartner’s report is to: “Maximize your ability to support a broader range of diverse use cases by identifying the ways that these structures can be used in combination. For example, data can be delivered to analytic structures (Data Warehouses and Data Lakes) using a Data Hub as a point of mediation and governance.” (Data Hubs, Data Lakes and Data Warehouses: How They Are Different and Why They Are Better Together by Ted Friedman, Nick Heudecker.)
Dealing with Disruption
The report also highlights the need to be agile in how your company can ingest new data from various sources in different formats. Those that can, are able to adapt to disruption and monetize it before their competitors. This supports the use of both a Data Warehouse and lake in conjunction as part of a logical Data Warehouse, and also of an end-to-end automated infrastructure to manage and change it quickly as needed.
While the exponential growth of data makes more insight available, it also means the infrastructure that stores and analyses it becomes necessarily more complex. This infrastructure needs to adapt as new demands emerge (constantly) and as data sources evolve (periodically). It’s a fallacy to think we can create the ultimate data infrastructure that won’t need to be changed.
The Dangers of Ambiguity
Perhaps by reading this piece, data leaders can iron out any ambiguity and potentially make their companies more successful. Misunderstanding also has internal implications in that expectations and reality can be quite different if those leading the data department have different definitions of certain infrastructure than those building and using it day-to-day.
The report is vital reading for data leaders who have even a hint of doubt in their minds of the purpose and role of the Data Warehouse, lake or hub. It could mean the difference between a successful data project, or a failed one in which the roles of the various technologies and staff are not clearly defined.
What Makes A Really Great Data Model: Essential Criteria And Best Practices
By 2025, over 75% of data models will integrate AI—transforming the way businesses operate. But here's the catch: only those with robust, well-designed data models will reap the benefits. Is your data model ready for the AI revolution?Understanding what makes a great...
Guide to Data Quality: Ensuring Accuracy and Consistency in Your Organization
Why Data Quality Matters Data is only as useful as it is accurate and complete. No matter how many analysis models and data review routines you put into place, your organization can’t truly make data-driven decisions without accurate, relevant, complete, and...
Common Data Quality Challenges and How to Overcome Them
The Importance of Maintaining Data Quality Improving data quality is a top priority for many forward-thinking organizations, and for good reason. Any company making decisions based on data should also invest time and resources into ensuring high data quality. Data...
What is a Cloud Data Warehouse?
As organizations increasingly turn to data-driven decision-making, the demand for cloud data warehouses continues to rise. The cloud data warehouse market is projected to grow significantly, reaching $10.42 billion by 2026 with a compound annual growth rate (CAGR) of...
Developers’ Best Friend: WhereScape Saves Countless Hours
Development teams often struggle with an imbalance between building new features and maintaining existing code. According to studies, up to 75% of a developer's time is spent debugging and fixing code, much of it due to manual processes. This results in 620 million...
Mastering Data Vault Modeling: Architecture, Best Practices, and Essential Tools
What is Data Vault Modeling? To effectively manage large-scale and complex data environments, many data teams turn to Data Vault modeling. This technique provides a highly scalable and flexible architecture that can easily adapt to the growing and changing needs of an...
Scaling Data Warehouses in Education: Strategies for Managing Growing Data Demand
Approximately 74% of educational leaders report that data-driven decision-making enhances institutional performance and helps achieve academic goals. [1] Pinpointing effective data management strategies in education can make a profound impact on learning...
Future-Proofing Manufacturing IT with WhereScape: Driving Efficiency and Innovation
Manufacturing IT strives to conserve resources and add efficiency through the strategic use of data and technology solutions. Toward that end, manufacturing IT teams can drive efficiency and innovation by selecting top tools for data-driven manufacturing and...
The Competitive Advantages of WhereScape
After nearly a quarter-century in the data automation field, WhereScape has established itself as a leader by offering unparalleled capabilities that surpass its competitors. Today we’ll dive into the advantages of WhereScape and highlight why it is the premier data...
Data Management In Healthcare: Streamlining Operations for Improved Care
Appropriate and efficient data management in healthcare plays a large role in staff bandwidth, patient experience, and health outcomes. Healthcare teams require access to patient records and treatment history in order to properly perform their jobs. Operationally,...
Related Content
What Makes A Really Great Data Model: Essential Criteria And Best Practices
By 2025, over 75% of data models will integrate AI—transforming the way businesses operate. But here's the catch: only those with robust, well-designed data models will reap the benefits. Is your data model ready for the AI revolution?Understanding what makes a great...
Guide to Data Quality: Ensuring Accuracy and Consistency in Your Organization
Why Data Quality Matters Data is only as useful as it is accurate and complete. No matter how many analysis models and data review routines you put into place, your organization can’t truly make data-driven decisions without accurate, relevant, complete, and...
Common Data Quality Challenges and How to Overcome Them
The Importance of Maintaining Data Quality Improving data quality is a top priority for many forward-thinking organizations, and for good reason. Any company making decisions based on data should also invest time and resources into ensuring high data quality. Data...
What is a Cloud Data Warehouse?
A cloud data warehouse is an advanced database service managed and hosted over the internet.