Indexes in general are a way to improve the performance of the queries we fire at the database engine. A better and more in-depth understanding of indexes enables the developers to leverage indexes better. This article aims to give the most important details about indexes in Postgres in the most concise manner.
Let’s start with understanding why indexes are required. Let us take the analogy of a book. A book is a heap of information (like a database). If someone is asked to find a specific chapter in a book, then they can do it in two different ways:
By checking the content of each page until the chapter is found.
By looking up the book's index and giving the page number instantly.
Similarly, indexes help the database engine to locate the information asked through the query in a much faster and more efficient way.
Before proceeding, let us understand how data is stored in the database. Postgres creates a separate file for each of the tables in the disk. Each row of data is written into this file as a tuple. These tuples are not stored in any order. To get the location of the file on the disk, we can use the pg_class table (it is a system table that contains information about the tables). The pg_class table has the filename under the relfilenode column and the name of the table is under relname.
Also, whenever we create a table in Postgres, it creates a hidden column called ctid which is a tuple of the form (x,y) where x is the page number and y is the offset of the record in the page (in context to the concept of paging in Operating Systems).
Indexes are the entry points for the tables (just like indexes in the books can be used to point to a specific topic). The sole purpose of creating and using an index is to improve the performance of queries. Indexes are stored in a separate file (and not in the file that was created for the table) on the disk, therefore, creating indexes requires additional space on the disk. PostgreSQL locks the table when it is creating an index. To avoid this lock, we can use the keyword CONCURRENTLY which will create an index without applying a lock on the table (however, this creation would take longer than it would have taken while creating an index with a lock). We can also use Expression Indexes in Postgres (eg. “CREATE INDEX idx_exp ON table-name (lower(name))” ).
Now that we have a fair idea of what indexes are and why we need them, we can proceed with the types of indexes available in Postgres.
Types of Indexes
BTree Index
A BTree index is the default type of index used by PostgreSQL
It can be created in the following ways
CREATE INDEX btree_index_name
ON your_table(your_column);
or
CREATE INDEX btree_index_name
ON your_table USING BTREE (your_column);
This index uses a BTree Data Structure to store the ctid of the data.
BTree is a multi-talented data structure that can handle massive amounts of data with ease. When it comes to storing and searching large amounts of data, traditional binary search trees can become impractical due to their poor performance and high memory usage. B-trees, also known as B-Tree or Balanced Tree, are a type of self-balancing tree that was specifically designed to overcome these limitations. Unlike traditional binary search trees, B-Trees are characterized by the large number of keys that they can store in a single node, which is why they are also known as “large key” trees. Each node in a B-Tree can contain multiple keys, which allows the tree to have a larger branching factor and thus a shallower height. This shallow height leads to less disk I/O, which results in faster search and insertion operations. B-Trees are particularly well suited for storage systems that have slow, bulky data access such as hard drives, flash memory, and CD-ROMs. B-Trees maintain balance by ensuring that each node has a minimum number of keys, so the tree is always balanced. This balance guarantees that the time complexity for operations such as insertion, deletion, and searching is always O(log n), regardless of the initial shape of the tree.
A BTree supports Equality (=), Less than (<), Less than equal to (≤), greater than (>), Range (BETWEEN) and greater than equal to (≥) operations.
The btree index stores the ctid and column name
Hash Index
Hash indexes in PostgreSQL are more limited in terms of supported operators compared to B-tree indexes. They are mainly designed for equality checks and do not support range queries or ordering operations. The primary operator supported by Hash indexes is equality (=)
This index Stores the column name and the hash
Hash indexes are suitable for scenarios where you primarily need fast equality checks, such as exact matches in WHERE clauses.
We can use Hash indexes when we primarily need fast equality checks, and the data distribution is relatively uniform.
Hash Index can be created in the following manner
CREATE INDEX hash_index_name
ON your_table USING HASH (your_column);
Brin Index
Block Range Index is called BRIN index.
This is used when columns have some correlation with physical disk locations in the table.
This is space optimized since it has only the following information: Page number, min value, max value.
This index should be created where the data is sequential and there are no duplicates or gaps in the column on which we are creating brin index.
It takes the least amount of storage.
A Brin Index can be created in the following manner
CREATE INDEX brin_index_name
ON your_table USING BRIN (your_column);
GIN Index
These are the advanced indexes supported by PostgreSQL
GIN is a Generalised Inverted Index.
GIN is used to handle when we need composite values
Slow to create because it needs to scan the document upfront.
GIN indexes can be created as follows:
CREATE INDEX gin_index_name
ON your_table USING GIN (your_column);
GiST Index
-Generalised Search Tree
It is a tree-structured access method
It is an indexing framework used for complex data types and multidimensional data types
Used to find the point within the box
Used for full-text search
GiST indexes are adaptable to different data types. You can create custom GiST indexes tailored for specific data types by defining appropriate comparison and union methods
GiST is commonly used for spatial indexing, enabling efficient spatial queries on geometric data types like points, lines, and polygons.
GiST indexes allow the definition of custom operators to support various search and comparison operations.
A GiST index can be created in the following way
CREATE INDEX table_gist_index
ON tablename USING GIST (column_name);