What’s up peeps

I remember when I started applying for jobs many many MANY years ago, I always came across Data Warehousing.

In my mind I imagined a big warehouse with lots of data in boxes. Yes I know how dum this sounds but that’s what I thought.

So, first things first.

What is a database?

A database is a collection of information that is organized so that it can be easily accessed, managed, and updated.

It is used for storing and retrieving data for individual applications.

Now databases come in various formats.

It could be a big oracle or sql server database.

It could be a nosql data store.

It could even be an Excel document.

What is a Data Warehouse?

A data warehouse is a large repository of data collected from different sources intended for reporting and analysis.

Think of it as a big consolidation of those different databases.

All these sources will be brought together into a model in the data warehouse for analysts to use.

So those analysts won’t have to go to the various databases to get info.

It will all be in one place.

The data warehouse is typically used to fullfil a need within a department.

Big enterprises in many cases need to share information across departments.

This would then introduce an Enterprise Data Warehouse or EDW

What is an Enterprise Data Warehouse?

An Enterprise Data Warehouse (EDW) is a type of data warehouse that services the entire organization. Unlike a regular data warehouse which is often designed for a specific department or business function, an EDW consolidates data from all departments and business functions. This allows for cross-functional analysis and reporting, and provides a unified view of the entire business.

Essentially, the EDW is a central repository for all of an organization’s data, making it a crucial tool for enterprise-level data analytics and business intelligence.

You’ll typically find EDWs in bigger organisations where there are multiple business units each creating their own sets of data.

Which tools are used?

There are several tools used for managing and working with data warehouses. Some of the most popular ones include:

  • SQL Server Integration Services (SSIS): This is a platform for data integration and workflow applications. It is used for extracting, transforming, and loading (ETL) data.
  • Amazon Redshift: This is a fully managed, petabyte-scale data warehouse service in the cloud.
  • Google BigQuery: This is Google’s fully managed, NoOps, low-cost analytics database.
  • Snowflake: This is a cloud-based data warehousing platform that allows businesses to store and analyze data using cloud-based hardware and software.
  • Tableau: This is a powerful data visualization tool used in the Business Intelligence industry for analyzing data. An alternative to this would be Microsoft PowerBI.

Since these data warehouses are used for reporting and data analysis they will typically contains years ans years and millions and billions of rows of data.

As such, it could take a while to retrieve the data you are looking for.

There are newer cloud tools which could speed up the process but these are not cheap … at … all.

So, that wraps up our overview of data warehouses.

Remember, while it may not be a physical warehouse full of data boxes, a data warehouse is a crucial tool for managing and understanding data within an organization.

Whether you’re using SQL Server Integration Services, Amazon Redshift, Google BigQuery, Snowflake, Tableau, or any other tool, the key is to effectively gather, store, and analyze your data.

Thanks for reading and don’t forget to subscribe to our free newsletter for more insights into the world of data.