Column Compression in Amazon Redshift

Jun 28, 2019

Share this post
issues-after-qlikview-version-upgrade-on-passive-node-of-production-servers

What Is Compression?

    • Compression is a column-level operation that reduces the size of data when it is stored.
    • Compression conserves storage space and reduces the size of data that is read from storage, which reduces the amount of disk I/O and therefore improves query performance.
    • Compression is also called “encoding” in Amazon Redshift and it uses 7 encoding types.

Why Should We Use Compression?

    • When company’s amount of data has grown exponentially, it will become more critical to optimize data storage.
    • The size of your data doesn’t just impact storage size and costs, it also affects query performance. The smaller the data, the less that has to be processed during expensive disk I/O (input/output or write/read) operations.

Compression In Amazon Redshift:

    • Amazon allows two types of compression types. One is manual compression and another one is automatic compression.
    • We can apply a compression type, or encoding, to the columns in a table manually when we create the table, or we can use the COPY command to analyze and apply compression automatically.
    • Amazon strongly recommends using the COPY command to apply automatic compression.

Automatic Compression:

    • We can use the COPY command with COMPUPDATE set to ON to analyze and apply compression automatically.
    • We can use automatic compression when we create and load a brand-new table. The COPY command will perform a compression analysis.
    • We can also perform a compression analysis without loading data or changing the compression on a table by running the ANALYZE COMPRESSION command against an already populated table. For example, we can run the ANALYZE COMPRESSION command when we want to analyze compression on a table for future use, while preserving the existing DDL.
    • Automatic compression balances overall performance when choosing compression encodings. Range-restricted scans might perform poorly if sort key columns are compressed much more highly than other columns in the same query. As a result, automatic compression will choose a less efficient compression encoding to keep the sort key columns balanced with other columns. However, ANALYZE COMPRESSION does not take sort keys into account, so it might recommend a different encoding for the sort key than what automatic compression would choose. If you use ANALYZE COMPRESSION, consider changing the encoding to RAW for sort keys.
    • When the COMPUPDATE parameter is ON, the COPY command applies automatic compression whenever you run the COPY command with an empty target table and all the table columns either have RAW encoding or no encoding.
    • To apply automatic compression to an empty table, regardless of its current compression encodings, run the COPY command with the COMPUPDATE option set to ON. To disable automatic compression, run the COPY command with the COMPUPDATE option set to OFF.
    • We can apply automatic compression only to an empty table.

Manual Compression:

    • We can choose to apply compression encodings manually if the new table shares the same data characteristics as another table, or if in testing you discover that the compression encodings that are applied during automatic compression are not the best fit for your data.
    • If we choose to apply compression encodings manually, we can run the ANALYZE COMPRESSION command against an already populated table and use the results to choose compression encodings.
    • To apply compression manually, we need to specify compression encodings for individual columns as part of the CREATE TABLE statement. The syntax is as follows:

CREATE TABLE table_name (column_name

data_type ENCODE encoding-type)[, …]

    • Where encoding-type is taken from the keyword table in the following section.
    • For example, the following statement creates a two-column table, PRODUCT. When data is loaded into the table, the PRODUCT_ID column is not compressed, but the PRODUCT_NAME column is compressed, using the byte dictionary encoding (BYTEDICT).

create table product(

product_id int encode raw,

product_name char(20) encode bytedict);

    • You cannot change the compression encoding for a column after the table is created. You can specify the encoding for a column when it is added to a table using the ALTER TABLE command.

ALTER TABLE table-name ADD [ COLUMN ] column_name column_type ENCODE encoding-type;

Compression Encodings:

    • A compression encoding specifies the type of compression that is applied to a column of data values as rows are added to a table.
    • If no compression is specified in a CREATE TABLE or ALTER TABLE statement, Amazon Redshift automatically assigns compression encoding as follows:
    • Columns that are defined as sort keys are assigned RAW compression.
    • Columns that are defined as BOOLEAN, REAL, or DOUBLE PRECISION data types are assigned RAW compression.
    • All other columns are assigned LZO compression.
    • The following table identifies the supported compression encodings and the data types that support the encoding.