White Papers

18 BP1062 | Best Practices for Decision Support Systems with Microsoft SQL Server 2012 using Dell EqualLogic PS Series

Storage Arrays

6.2 Table partitioning studies

This section evaluates the impact of SQL Server table partitioning on DSS performance in a data

warehouse environment. Partitioning breaks up database objects by allowing subsets of data to reside in

separate file groups. This is beneficial to data warehouse databases in several ways. Some of the table

partition benefits listed by Microsoft are:

• Data pruning: If aged data needs to be archived with partitioned tables, the process of removing

it takes less time since there is less data.

• Improved load speed: By loading into smaller partitioned tables the incremental load process

can be significantly more efficient.

• Maintenance: Activities including loading data, backing up and restoring tables can run in

parallel to achieve dramatic increases in performance once the data warehouse staging

application has been built to support partitioning.

• Query speed: A table partition may or may not yield query performance improvement

depending on the database schema and query characteristics. The query performance might be

similar for partitioned and non-partitioned fact tables. When the partitioned database is properly

designed, the relational engine in a query plan will only include the partition(s) necessary to

resolve that query. The resulting query will perform well against the partitioned table, similar to a

properly indexed, combined table with a clustered index on the partitioning key.

6.2.1 Identifying candidates for table partitioning

Before performing the table partition, the best candidates for partition need to be identified. The following

data layout was implemented on the database to identify the candidates.

•

LineItem

and

Order

tables were the two largest tables in TPC-H database.

• The first largest table (LineItem) was placed in its own filegroup (LineItem_Filegroup) and on a

separate volume.

• The second largest table (Order) was placed on its own filegroup (Order_Filegroup) and on

separate volume.

• The rest of the tables were placed in a separate filegroup (Others_Filegroup) and placed on a

separate volume.

TPC-H queries from benchmark factory were run against the above data layout to identify the table that

constituted the highest percentage of the total throughput (reference tests in Section 6.1

). The LineItem

table comprised about 70% of the total read throughput achieved while running queries which is why this

table was chosen as a partition candidate.

For the table partition tests in this section, the table was partitioned based on the right range partitioning

scheme using the l_shipdate date-time column. This column was chosen because most of the TPC-H

queries accessing the LineItem table used l_shipdate in their

where

clause.