Amazon Redshift, an OLAP system by AWS

Ashutosh Kumar
3 min readMay 16, 2020

There are different cloud services like software as a service, platform as a service.

Amazon Redshift is an OLAP system provided by AWS. If you want to know in details, what is the OLAP system? Please refer to the link : https://medium.com/@ashutosh_68096/oltp-and-olap-what-are-the-differences-a6e21f25bfe0

Redshift is a data warehouse service.

Redshift is all about the data warehouse. A database is a place where we stores data and data warehouse is a place where we process data for analytics purpose.

Amazon Redshift

Select sum, max etc queries are possible in the database and so, this needs to be available in the data warehouse as well.

What we use in a data warehouse is that we use the database only but with underlying SQL principle like MySQL, Postgres etc.

Redshift and the file system S3

We can enable data warehouse to sit on file systems and AWS have their file system as S3. Redshift sits well with S3 and there is a COPY command and Unload command which is of great help to move data between S3 and Redshift, excellent setup by AWS.

Internals of Redshift

Redshift is a highly scalable warehouse and let’s see, what it uses internally. This also uses SQL but the SQL is somewhat different and the base version being used in SQL 92.

Redshift is fast and scalable as Amazon. It’s a distributed cluster and not one machine. It stores the data in a distributed way.
And so, the data processing also happens in a distributed way, queries when fired, they run parallel across multiple nodes and so, the read becomes faster. The performance is very high and we get the output in a faster manner whenever the read queries are fired.

Amazon Redshift is not a one node concept, instead, it’s a multi-node concept in a distributed cluster.

Data Model in Amazon Redshift

What is the data model? If we store some data in MySQL, it stores data in row and columns and that is the data model for SQL databases or the traditional RDBMS. Row oriented storage is a data model in MySQL, Oracle etc.
Even in a programming language like Java, Python, C++, there is a collection like a map. If we want to store data in a map, it has to be stored in key-value fashion and so, key-value storage is data model in map collection.

The data model in Amazon Redshift is called Columnar storage. This is a great advantage. If we store data in columnar storage, the data access becomes very fast. Before we go in details, let’s understand what do you mean by columnar storage.

Below is the diagram which shows how data is stored in a row-oriented fashion.

Below is the snapshot which shows how data is stored in a column-oriented fashion.

In row-oriented fashion, we store row-wise data in blocks while in column-oriented storage, specific column data is stored in blocks. One column is expected to store the homogenous data and so, some compression algorithm can also be used to store and thus, storage will also take less space.

All the upcoming data warehouse models are trying to store data in column-oriented fashion.

Performance depends on data size, no of columns, data model, frequency of data loads, no of nodes in the cluster.

--

--

Ashutosh Kumar

Backend Engineering | BIT Mesra | Building Microservices and Scalable Apps | Mentor