White Papers

18 BP1062 | Best Practices for Decision Support Systems with Microsoft SQL Server 2012 using Dell EqualLogic PS Series
Storage Arrays
6.2 Table partitioning studies
This section evaluates the impact of SQL Server table partitioning on DSS performance in a data
warehouse environment. Partitioning breaks up database objects by allowing subsets of data to reside in
separate file groups. This is beneficial to data warehouse databases in several ways. Some of the table
partition benefits listed by Microsoft are:
Data pruning: If aged data needs to be archived with partitioned tables, the process of removing
it takes less time since there is less data.
Improved load speed: By loading into smaller partitioned tables the incremental load process
can be significantly more efficient.
Maintenance: Activities including loading data, backing up and restoring tables can run in
parallel to achieve dramatic increases in performance once the data warehouse staging
application has been built to support partitioning.
Query speed: A table partition may or may not yield query performance improvement
depending on the database schema and query characteristics. The query performance might be
similar for partitioned and non-partitioned fact tables. When the partitioned database is properly
designed, the relational engine in a query plan will only include the partition(s) necessary to
resolve that query. The resulting query will perform well against the partitioned table, similar to a
properly indexed, combined table with a clustered index on the partitioning key.
6.2.1 Identifying candidates for table partitioning
Before performing the table partition, the best candidates for partition need to be identified. The following
data layout was implemented on the database to identify the candidates.
LineItem
and
Order
tables were the two largest tables in TPC-H database.
The first largest table (LineItem) was placed in its own filegroup (LineItem_Filegroup) and on a
separate volume.
The second largest table (Order) was placed on its own filegroup (Order_Filegroup) and on
separate volume.
The rest of the tables were placed in a separate filegroup (Others_Filegroup) and placed on a
separate volume.
TPC-H queries from benchmark factory were run against the above data layout to identify the table that
constituted the highest percentage of the total throughput (reference tests in Section 6.1
). The LineItem
table comprised about 70% of the total read throughput achieved while running queries which is why this
table was chosen as a partition candidate.
For the table partition tests in this section, the table was partitioned based on the right range partitioning
scheme using the l_shipdate date-time column. This column was chosen because most of the TPC-H
queries accessing the LineItem table used l_shipdate in their
where
clause.