A SQL Server table partition allows to physically split the data of one table on the disk. This data division allows to access directly one unique partition instead of all the data from the table. First, why all large tables should have partitions? When accessing hundreds of million lines, it becomes very tricky to get a large bunch of data in seconds. Partitions allows to improve query performance.
What are the main advantages of SQL Server table partitions?
- The main advantage is obviously to improve query performance. It avoids full scans by targeting only the need partitions.
- Table partitions allow to transfer only a subset of data based on the functional key instead of the full set of data. Let’s consider monthly data. The query transfers only the latest month stored in a single partition. It is much quicker than selecting the full scope including the full year and the previous years.
- Partitions on tables allows a much flexible maintenance like reorganize or rebuilt the indexes.
For instance, let’s consider a Microsoft Business Intelligence project with an Operational Data Store, a Datawarehouse, DataMart’s and an Analysis Tabular Model. Daily, the latest data available could be only the current month data. So, it’s more interesting to load only the current month. It also allows much quick delete and inserts to refresh a given set of data.
Please note that all versions available before and up to SQL Server 2012 supports only up to 1000 partitions per table.
What are the main components of a partition?
On the other hand, partitioning uses different mandatory SQL objects: partition functions, partition schemes and partitioning columns.
- A partition function defines how to store and map data to a partition, it’s a SQL Server function. That is to say that based on one column the complete row will flow to partition 1 or 2, etc.
- A partition scheme maps the data partitions to a set of physical files.
- A partitioning column segregates the data in the different partitions. It can be a year, a month or any other data type. The choice of the column is based on the actual usage of the data and the most used one to query the data.
For example let’s consider a partition function based on year number. The first one could be the historical years, i.e. before 2018. Then the data before 2019 and 2020 year, then only 2020 and then the data associated after 2020. Hence the partition example improve query performance.
Check out the article on Microsoft about the partition functions.
A practical SQL Server T-SQL example of a partition function and a partition scheme
CREATE PARTITION FUNCTION ufn_Partition_Sales (int) AS RANGE RIGHT FOR VALUES (2018, 2019, 2020);
CREATE PARTITION SCHEME sche_Partition_Sales AS PARTITION ufn_Partition_Sales TO (Sales_1_filegroup, Sales_2_filegroup, Sales_3_filegroup, Sales_4_filegroup);
Moreover, considering the example above, and assuming the file groups exists on the system. The partition scheme distributes the data like this:
- Sales_1_fg contains SalesYear < 2018
- Sales_2_fg contains SalesYear >= 2018 AND SalesYear < 2019
- Sales_3_fg contains SalesYear >= 2019 AND SalesYear < 2020
- Sales_4_fg contains SalesYear >= 2020
To conclude, the first article of the partition series defines what is a partitioned table. Check out this series about index management and how to create a SQL Server index.
Check this frequently asked questions about SQL Server table partitions
Partitioning technically splits a table or an index in different parts based on column values. Partitions are physically independent to improve performance and allow easier maintenance, data loads and transfers. A partition uses a partition function, a partition scheme and a partitioned column.
To partition a table, chose the main column used in the queries, loads or reports. Create a partition function, for example the past months, the previous month, the current month and the next months. Create the file groups associated to be used by the partition scheme.
An index structures the data inside one logical table. Then, the database administrators rebuilt and reorganize the indexes. A partitioned table allows the data to be physically split and accessed independently. An index itself can be partitioned.
The best practice is to maintain automatically the partition function. So, it creates the new partitions as the data evolves. For example, every new month should be taken in a partition using the database current date.