White Papers

48 BP1062 | Best Practices for Decision Support Systems with Microsoft SQL Server 2012 using Dell EqualLogic PS Series

Storage Arrays

7.3.7 Columnstore Index

The columnstore index study in section 6.3 proved that by using columnstore index on the two largest

tables, the data warehouse query execution times improved significantly. Below are the resulting best

practices and few things to remember stated by Microsoft when using columnstore index. For details refer

http://msdn.microsoft.com/en-us/library/gg492088.aspx.

• Updating data in a columnstore index: Tables that have a columnstore index cannot be

updated. There are three ways to work around this problem.

- To update a table with a columnstore index, drop the columnstore index, perform any

required INSERT, DELETE, UPDATE, or MERGE operations, and then rebuild the columnstore

index.

- Partition the table and switch partitions. For a bulk insert, insert data into a staging table,

build a columnstore index on the staging table, and then switch the staging table into an

empty partition.

- For other updates, switch a partition out of the main table into a staging table, disable or

drop the columnstore index on the staging table, perform the update operations, rebuild or

re-create the columnstore index on the staging table, and then switch the staging table back

into the main table.

- Place static data into a main table with a columnstore index. Put new data and recent data

likely to change into a separate table with the same schema that does not have a

columnstore index. Apply updates to the table with the most recent data.

• Choosing columns for columnstore index: Some of the performance benefit of a columnstore

index is derived from the compression techniques that reduce the number of data pages that

must be read and manipulated to process the query. Compression works best on character or

numeric columns that have large amounts of duplicated values. For example, dimension tables

might have columns for postal codes, cities, and sales regions. If many postal codes are located

in each city, and if many cities are located in each sales region, then the sales region column

would be the most compressed, the city column would have somewhat less compression, and

the postal code would have the least compression. Although all columns are good candidates

for a columnstore index, adding the sales region code column to the columnstore index will

achieve the greatest benefit from columnstore compression, and the postal code will achieve

the least.

• Columnstore index on partitioned table: Columnstore indexes are designed to support queries

in very large data warehouse scenarios where partitioning is common. When creating

columnstore indexes on a partitioned table, they must be partition-aligned with the base table.

Therefore a non-clustered columnstore index can only be created on a partitioned table if the

partitioning column is one of the columns in the columnstore index.

• Memory: Column store processing is optimized for in-memory processing. SQL Server

implements mechanisms that enable data (and most data structures) to spill to disk when

insufficient memory is available. If severe memory restrictions are present, processing uses the

row store. There may be instances in which the columnstore index is chosen as an access

method but memory is insufficient to build the needed data structures. The effective memory

requirement for any query depends on the specific query. Building a columnstore index requires

approximately 8 megabytes times the number of columns in the index, times the DOP (degree