White Papers

48 BP1062 | Best Practices for Decision Support Systems with Microsoft SQL Server 2012 using Dell EqualLogic PS Series
Storage Arrays
7.3.7 Columnstore Index
The columnstore index study in section 6.3 proved that by using columnstore index on the two largest
tables, the data warehouse query execution times improved significantly. Below are the resulting best
practices and few things to remember stated by Microsoft when using columnstore index. For details refer
to
http://msdn.microsoft.com/en-us/library/gg492088.aspx.
Updating data in a columnstore index: Tables that have a columnstore index cannot be
updated. There are three ways to work around this problem.
- To update a table with a columnstore index, drop the columnstore index, perform any
required INSERT, DELETE, UPDATE, or MERGE operations, and then rebuild the columnstore
index.
- Partition the table and switch partitions. For a bulk insert, insert data into a staging table,
build a columnstore index on the staging table, and then switch the staging table into an
empty partition.
- For other updates, switch a partition out of the main table into a staging table, disable or
drop the columnstore index on the staging table, perform the update operations, rebuild or
re-create the columnstore index on the staging table, and then switch the staging table back
into the main table.
- Place static data into a main table with a columnstore index. Put new data and recent data
likely to change into a separate table with the same schema that does not have a
columnstore index. Apply updates to the table with the most recent data.
Choosing columns for columnstore index: Some of the performance benefit of a columnstore
index is derived from the compression techniques that reduce the number of data pages that
must be read and manipulated to process the query. Compression works best on character or
numeric columns that have large amounts of duplicated values. For example, dimension tables
might have columns for postal codes, cities, and sales regions. If many postal codes are located
in each city, and if many cities are located in each sales region, then the sales region column
would be the most compressed, the city column would have somewhat less compression, and
the postal code would have the least compression. Although all columns are good candidates
for a columnstore index, adding the sales region code column to the columnstore index will
achieve the greatest benefit from columnstore compression, and the postal code will achieve
the least.
Columnstore index on partitioned table: Columnstore indexes are designed to support queries
in very large data warehouse scenarios where partitioning is common. When creating
columnstore indexes on a partitioned table, they must be partition-aligned with the base table.
Therefore a non-clustered columnstore index can only be created on a partitioned table if the
partitioning column is one of the columns in the columnstore index.
Memory: Column store processing is optimized for in-memory processing. SQL Server
implements mechanisms that enable data (and most data structures) to spill to disk when
insufficient memory is available. If severe memory restrictions are present, processing uses the
row store. There may be instances in which the columnstore index is chosen as an access
method but memory is insufficient to build the needed data structures. The effective memory
requirement for any query depends on the specific query. Building a columnstore index requires
approximately 8 megabytes times the number of columns in the index, times the DOP (degree