White Papers

26 BP1062 | Best Practices for Decision Support Systems with Microsoft SQL Server 2012 using Dell EqualLogic PS Series
Storage Arrays
The steps involved in creating a columnstore index are detailed in Microsoft’s IEEE paper “Columnar
storage in SQL Server 2012”, that can be found at http://sites.computer.org/debull/A12mar/apollo.pdf
.
The key steps are:
1. Each column is converted to a set of rows called column segments. (The columnstore index is
divided into units of transfer called segments. A segment is stored as a Large Object (LOB), and
can consist of multiple pages).
2. The rows are divided into row groups (Each row-group has about one million rows).
3. Each row group is encoded and compressed independently to form one compressed column
segment for each column included in the index.
4. Each column segment is stored as a separate Binary Large Object (blob) and may span across
multiple pages.
5. A segment directory (Contains additional metadata about each segment such as number of rows,
size, how data is encoded, and min and max values) keeps track of segments location so all
segments containing a column can be easily located..
6. Dictionaries (A storage element that is used as a means to efficiently encode large data types)
provide mapping to segments in a columnstore index. Columnstore index dictionaries make it
possible to pull only segments that are needed when a plan is created and a query executed.
Dictionary compression is used for (large) string columns and the resulting dictionaries are stored
in separate blobs. The dictionary compression technique can yield very good compression for
repeated values, but yields bad results if the values are all distinct.
This index storage technique offers several benefits (listed in section 6.3.2
). It must be noted that the
column segments and dictionaries are not stored in the page-oriented buffer pool but in a separate
memory pool designed for handling large objects. The objects are stored adjacently and not scattered
across separate pages in the memory pool. This improves column scanning as there are no page breaks
involved.
The SQL Server query processor can take advantage of the columnstore layout to significantly improve
query execution times. SQL Server columnstore index technology is especially appropriate for typical data
warehousing data sets. Columnstore indexes can transform the data warehousing experience for users by
enabling faster performance for common data warehousing queries such as filtering, aggregating,
grouping, and star-join queries.