GPU Database Acceleration on PowerEdge R940xa Abstract This whitepaper looks at the performance and efficiency of GPU database acceleration when using the Dell EMC PowerEdge R940xa server to run Brytlyt GPU DBMS. The objective is to show how the unique CPU to GPU ratio in R940xa is well suited for this new and emerging category of database workloads that leverage the powerful capabilities of GPUs.
Revisions Revisions Date Description August 2018 Initial release Acknowledgements This paper was produced by the following persons: Author: Bhavesh Patel, Dell EMC Server Advanced Engineering. Contributors: Richard Heyns, CEO Brytlyt; Palvi Verma, Director of Marketing, Brytlyt. The information in this publication is provided “as is.” Dell Inc.
Table of contents Revisions.............................................................................................................................................................................2 Acknowledgements .............................................................................................................................................................2 Executive summary.....................................................................................................................
Executive summary Executive summary This whitepaper looks at the performance and efficiency of GPU database acceleration when using Dell EMC PowerEdge R940xa server to run Brytlyt GPU DBMS. The objective is to show how the unique CPU to GPU ratio in R940xa is well suited for this new and emerging category of database workloads that leverage the powerful capabilities of GPUs. 4 This whitepaper will discuss some of the background as to why GPUs are becoming the norm in database world.
1 Evolution of databases Database Evolution The business of processing data has been on a continuous evolution and with each advancement there have been newer methods on using different processor architectures in doing database operations. Earlier on data analysis on servers used storage area networks (SAN) and network-attached storage (NAS) but as the data volume grew, scaling became a bottleneck.
2 What is GPU acceleration and how does it apply to databases? “Think of the GPU as a coin press machine, which can punch out 100 coins with a single operation every four seconds, whereas a CPU is a coin press which can punch out 1 coin per operation every one second. While the CPU has a faster “punch time”, the GPU can punch more coins per minute. This is the key difference between the GPU and CPU. The GPU is throughput oriented, while the CPU is latency oriented.
2.2 What database operations can run on GPU? GPUs achieve their amazing performance by running things in parallel and this means the underlying code must take this parallel way of doing things into account. It also means the algorithms used must be parallelizable, and in many cases parallelizing an operation is not trivial. Relational operations like filtering, sorting, aggregating, grouping and even joining tables are all possible on GPU. 2.
Block Diagram of Brytlyt stack interfacing between PostgreSQL and GPU Cluster Data Generation involves acquiring, saving, and preparing datasets to train machine learning models. GPU databases offer advantages in all three data generation tasks: • For data acquisition, connectors for data-in-motion and at-rest with high-speed ingest make it easier to acquire millions of rows of data across disparate systems in seconds.
provides a direct link between the database and the AI models because data already on the GPU is consumed directly by the Artificial Intelligence framework. PyTorch is an open source machine learning library for Python based on Torch, a scientific computing framework that provides a wide range of algorithms for Deep Learning. It is part of a broader family of machine learning methods that learning using data representations.
2.6 Why keep CPU: GPU ratio of 1:1 in R940xa? Query Performance with respect to IO bandwidth [Source: MapD] As shown in Figure 4 above, query performance increases as the data is moved closer to the compute layer. Database acceleration is not achieved if data is fetched from disk (SSD) because of IO bottlenecks. Since GPUs improve performance only when data is available in main memory, any database architecture using GPUs for acceleration should also use CPU in-memory technology.
In GDBMS, there are two major IO bottlenecks. The first is the disk IO and second bottleneck is the PCIe bus: 2.6.1 Disk-IO bottleneck GPUs will not improve performance for disk-based database systems, since most of the time will be spent in disk IO. GPUs improve performance only when data is in main system memory, hence it’s much better to keep hot data in main memory. 2.6.
3 The Dell EMC PowerEdge R940xa server Front and rear views of the PowerEdge R940xa The rapid increase in machine learning and artificial intelligence applications is changing everything about the way enterprise does business. With a powerful 4-socket and 4U design, the Dell EMC R940xa Server is a great solution to power GPU database acceleration for massive data sets. The R940xa offers up to 112 processing cores and up to 6TB of memory for consistently fast response times.
4 The challenge with GPU Databases The biggest hurdle for GPU Databases is to achieve efficient parallel processing for SQL joins. This is crucial, as joining tables is used extensively in industries like Retail, Finance and many more. The traditional approach for running joins on CPU and is not well suited for the hundreds of thousands of cores in a GPU system.
Performance comparison of Brytlyt 4.2 Background on TPC-H Benchmarking [1] TPC-H is a decision support benchmark and for relational databases with business-oriented ad-hoc queries and data modifications. The data and queries are designed to have broad industry-wide relevance to perform comparative analysis on decision support systems. Large volumes of data are examined to give answers to business critical questions using complex queries and high levels of concurrency.
The TPC-H benchmark is a well-recognized and highly regarded attempt to model a business database and the ad-hoc Decision Support questions that it must answer. It has been used extensively by both database software and hardware vendors to demonstrate the performance of their solutions, and researchers looking to validate their approaches. TPC-H is a good test for such systems and most vendors have posted results across a vast range of hardware and data scale factors.
5 Brytlyt + Dell EMC PowerEdge R940xa Benchmarking Extensive benchmarking was done on pre-release Dell R940xa hardware and the results were impressive. 5.1 TPC-H benchmarking on PowerEdge R940xa with Brytlyt Using TPC-H data and queries, a single Dell 940xa Server with four NVIDIA P100 GPUs was able to achieve through-put of 1.9 billion rows per second at 223 GB/second of raw data for Query 1 and 16.8 billion rows per second at 1.8 TB/second of raw data for Query 6.
5.6 Use Cases for Brytlyt’s GPU Database In a world where data driven decision making is more important than ever before, being able to improve timeto-value by answering more complex questions, for more people, on ever growing amounts of data is essential.
6 References [1] http://www.tpc.org/tpch/ [2] T. Mostak. An overview of MapD (massively parallel database). White Paper, Massachusetts Institute of Technology, April 2013. http://geops.csail.mit.edu/docs/mapd_overview.pdf [3] https://spectrum.ieee.org/computing/software/data-monster [4] https://streamhpc.com/blog/2017-01-24/many-threads-can-run-gpu/ [5] GPU Join processing revisited https://hgpu.org/?p=7692 [6] P. Bakkum and S. Chakradhar. E_cient data management for GPU databases.2012. http://pbbakkum.