Redshift query basics:
- The maximum size for a single Amazon Redshift SQL statement is 16 MB
- Some Amazon Redshift queries are distributed and executed on the compute nodes, and other queries execute exclusively on the leader node
- The leader node distributes SQL to the compute nodes whenever a query references user-created tables or system tables (tables with an STL or STV prefix and system views with an SVL or SW prefix).
- A query that references only catalog tables (tables with a PG prefix, such as PG_TABLE_DEF, which reside on the leader node) or that does not reference any tables, runs exclusively on the leader node.
- To avoid client-side out-of-memory errors when retrieving large data sets using JDBC, you can enable your client to fetch data in batches by setting the JDBC fetch size parameter.
- Amazon Redshift does not recognize the JDBC max Rows parameter. Instead, specify a LIMIT clause to restrict the result set. You can also use an OFFSET clause to skip to a specific starting point in the result set.
There are two types of identifiers, standard identifiers and quoted or delimited identifiers
Identifiers must consist of only UTF-8 printable characters. ASCII letters in standard and delimited identifiers are case-insensitive and are folded to lowercase in the database.
In query results, column names are returned as lowercase by default
Standard SQL identifiers adhere to a set of rules and must:
- Begin with an ASCII single-byte alphabetic character or underscore
- character, or a UTF-8 multibyte character two to four bytes long.
- Subsequent characters can be ASCII single-byte alphanumeric characters, underscores, or dollar signs, or UTF 8 multibyte characters two to four bytes long.
- Be between 1and 127 bytes in length, not including quotes for delimited identifiers.
- Contain no quotation marks and no spaces.
- Not be a reserved SQL key word.
- Delimited identifiers (also known as quoted identifiers) begin and end with double quotation marks(””)
- If you use a delimited identifier. you must use the double quotation marks for every reference to that object. The identifier can contain any standard UTF-8 printable characters other than the double quote itself.
- Therefore, you can create column or table names that include otherwise illegal characters, such as spaces or the percent symbol.
- To use a double quota in string, you must precede it with another double quote marks.
- If a column in a row is missing, unknown, or not applicable, it is a null value or is said to contain null.
- Nulls can appear in fields of any data type that are not restricted by primary key or NOT NULL constraints.
- A null is not equivalent to the zero or to the empty string.
- Any arithmetic expression containing a null always evaluates to a null. All operators except concatenation return a null when given a null argument or operand.
- To test nulls, use comparison condition IS NULL or NOT IS NULL.
- Because null represents a lack of data, a null is not equal or unequal to any value.
The SQL language consists of commands and functions that you use to work with databases and database objects. The language also enforces rules regarding the use of data types, expressions, and literals.
- Character data types include CHAR (character) and VARCHAR (character varying)
- CHAR and VARCHAR data types are defined in terms of bytes, not characters. A CHAR column can only contain single-byte characters, so a CHAR(10) column can contain a string with a maximum length of 10 bytes. A VARCHAR can contain multibyte characters, up to a maximum of four bytes per character. For example, a VARCHAR(12) column can contain 12 single-byte characters, 6 two-byte characters, 4 three-byte characters, or 3 four-byte characters.
CHAR or CHARACTER
Use a CHAR or CHARACTER column to store fixed-length strings. These strings are padded with blanks, so a CHAR(10) column always occupies 10 bytes of storage.
VARCHAR or CHARACTER VARYING
Use a VARCHAR or CHARACTER VARYING column to store variable-length strings with a fixed limit. These strings are not padded with blanks, so a VARCHAR(120) column consists of a maximum of 120 single-byte characters, 60 two-byte characters, 40 three-byte characters, or 30 four-byte characters.
NCHAR and NVARCHAR Types
You can create columns with the NCHAR and NVARCHAR types (also known as NATIONAL CHARACTER and NATIONAL CHARACTER VARYING types). These types are converted to CHAR and VARCHAR types, respectively, and are stored in the specified number of bytes.
Use the SMALLINT, INTEGER, and BIGINT data types to store whole numbers of various ranges. You cannot store values outside of the allowed range for each type.
Use the REAL and DOUBLE PRECISION data types to store numeric values with variable precision. These types are inexact types, meaning that some values are stored as approximations, such that storing and returning a specific value may result in slight discrepancies. If you require exact storage and calculations (such as for monetary amounts), use the DECIMAL data type.
Datetime data types include DATE, TIMESTAMP, and TIMESTAMPTZ.
DATE: Use the DATE data type to store simple calendar dates without time stamps
TIMESTAMP: TIMESTAMP is an alias of TIMESTAMP WITHOUT TIME ZONE.
Use the TIMESTAMP data type to store complete time stamp values that include the date and the time of day.
TIMESTAMP columns store values with up to a maximum of 6 digits of precision for fractional seconds.
TIMESTAMPTZ: TIMESTAMPTZ is an alias of TIMESTAMP WITH TIME ZONE.
Use the TIMESTAMPTZ data type to input complete time stamp values that include the date, the time of day, and a time zone.
Use the BOOLEAN data type to store true and false values in a single-byte column. The following table describes the three possible states for a Boolean value and the literal values that result in that state. Regardless of the input string, a Boolean column stores and outputs “t” for true and “f” for false.
Mathematical Operator Symbols
The following table lists the supported mathematical operators.