I see a lot of confusion on what exactly is an Operational Data Store (ODS). While it can mean different things to different people, I’ll explain what I see as the most common definition. First let me mention that an ODS is not a data warehouse or data mart. A data warehouse is where you store data from multiple data sources to be used for historical and trend analysis reporting. It acts as a central repository for many subject areas and contains the “single version of truth”. A data mart serves the same purpose but comprises only one subject area. Think of a data warehouse as containing multiple data marts. See my other blogs that discuss this is more detail: Data Warehouse vs Data Mart,Building an Effective Data Warehouse Architecture, and The Modern Data Warehouse.
The purpose of an ODS is to integrate corporate data from different heterogeneous data sources in order to facilitate operational reporting in real-time or near real-time . Usually data in the ODS will be structured similar to the source systems, although during integration the data can be cleaned, denormalized, and business rules applied to ensure data integrity. This integration will happen at the lowest granular level and occur quite frequently throughout the day. Normally an ODS will not be optimized for historical and trend analysis as this is left to the data warehouse. And an ODS is frequently used as a data source for the data warehouse.
To summarize the differences between an ODS and a data warehouse:
- An ODS is targeted for the lowest granular queries whereas a data warehouse is usually used for complex queries against summary-level or on aggregated data
- An ODS is meant for operational reporting and supports current or near real-time reporting requirements whereas a data warehouse is meant for historical and trend analysis reporting usually on a large volume of data
- An ODS contains only a short window of data, while a data warehouse contains the entire history of data
- An ODS provides information for operational and tactical decisions on current or near real-time data while a data warehouse delivers feedback for strategic decisions leading to overall system improvements
- In an ODS the frequency of data load could be every few minutes or hourly whereas in a data warehouse the frequency of data loads could be daily, weekly, monthly or quarterly
Major reasons for implementing an ODS include:
- The limited reporting in the source systems
- The desire to use a better and more powerful reporting tool than what the source systems offer
- Only a few people have the security to access the source systems and you want to allow others to generate reports
- A company owns many retail stores each of which track orders in its own database and you want to consolidate the databases to get real-time inventory levels throughout the day
- You need to gather data from various source systems to get a true picture of a customer so you have the latest info if the customer calls customer service. Custom data such as customer info, support history, call logs, and order info. Or medical data to get a true picture of a patient so the doctor has the latest info throughout the day: outpatient department records, hospitalization records, diagnostic records, and pharmaceutical purchase records