DDIA | Chapter 3 | Storage And Retrieval | Transaction Processing or Analytics

In the previous post, we discussed different data structures that power current databases, in this post we will look into another type of databases – data warehouses

Online Transactions Processing – Access pattern of data included querying small number of keys using an index. Records are inserted or updated based on the user’s input. Because these applications are interactive, the access pattern became known as online transaction processing.

Online Analytics Processing – An analytic query needs to scan over a huge number of records, only reading a few columns per record, and calculates aggregate statistics (such as count, sum, or average) rather than returning the raw data to the user. Such type of access patterns came to be known as online analytics processing.

Pattern

Transaction processing systems (OLTP)

Analytic systems (OLAP)

Main read pattern

Small number of records per query, fetched by key

Aggregate over large number of records

Main write pattern

Random-access, low-latency writes from user input

Bulk import (ETL) or event stream

Primarily used by

End user/customer, via web application

Internal analyst, for decision support

What data represents

Latest state of data (current point in time)

History of events that happened over time

Dataset size

Gigabytes to terabytes

Terabytes to petabytes

Data Warehousing

Collection of read-only data from multiple OLTP databases in an organization, so that analysts can run as much queries as they want to without affecting the operations of transactional databases.

Data is extracted from OLTP databases (using either a periodic data dump or a continuous stream of updates), transformed into an analysis-friendly schema, cleaned up, and then loaded into the data warehouse. This process of getting data into the warehouse is known as Extract–Transform–Load (ETL).

Stars and Snowflakes: Schemas for Analytics

Star schema – At the center of the schema. Each row of the fact table represents an event that occurred at a particular time. Some of the columns in the fact table are attributes, other columns in the fact table are foreign key references to other tables, called dimension tables. As each row in the fact table represents an event, the dimensions represent the whowhatwherewhenhow, and why of the event.

The name “star schema” comes from the fact that when the table relationships are visualized, the fact table is in the middle, surrounded by its dimension tables; the connections to these tables are like the rays of a star. A variation of this template is known as the snowflake schema, where dimensions are further broken down into subdimensions.

Thanks for stopping by! Hope this gives you a brief overview in to OLAP databases. Eager to hear your thoughts and chat, please leave comments below and we can discuss.


One response to “DDIA | Chapter 3 | Storage And Retrieval | Transaction Processing or Analytics”