

# THE CPU RUN-TIME LIBRARY

H.J. Lu, Sunil K Pandey

Intel

November, 2018

### Issues with Run-time Library on IA

- Memory, string and math functions in today's glibc are optimized for today's Intel processors:
  - AVX/AVX2/AVX512
  - FMA
- It takes years for glibc from release to public to be on end-user's machines:
  - In 2018, people are still using glibc 2.17, which was released in February, 2013, on SKX, even when the current released glibc 2.28 has the new memory, string and math functions optimized for SKX.
  - The same thing will happen five years from now.

### **Proposal**

#### The CPU run-time library

- A subset of glibc:
  - On a branch in glibc git repo.
- Optimized for x86-64 processors.
  - libcpu-rt-c: memory, string functions.
  - Binary compatible with existing x86-64 OSes.
- OSVs can include it in their Linux distros.
  - Git branch is always ready for package.
- For systems without the latest glibc.
  - Use "LD\_PRELOAD=libcpu-rt-c.so" to override functions in libc.so.
  - Use -lcpu-rt-c to link with libcpu-rt-c directly.



### Machine configuration under test

Processor: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz (Haswell)

Number of socket: 2

**RAM: 132 GB** 

OS: CentOS Linux release 7.5.1804 (Core)

Kernel: CentOS Linux release 7.5.1804 (Core)

System Glibc version: glibc-2.17-222.el7.x86\_64

GCC version used to build/run benchmark: latest gcc 8 branch



### MySQL Test Data (sysbench)

Read transaction per second: (+4.02%)

Read/write transaction per second: (+5.68%)

Write transaction per second: (+1.60%)

Bulk insert transaction per second: (-2.46%)

Delete transaction per second: (+3.80%)

Insert transaction per second: (+1.45%)

Point select: (+1.71%)

Update index transaction per second: (+1.95%)

Update non index transaction per second: (+1.06%)



#### Other apps data

Nginx: Request per second (-0.01%)

Httpd: Request per second (+1.17%)

Gipfeli: Compression speed (+0.61%)

Gipfeli: Decompression speed (-0.86%)

Snappy: Compression speed (+0.51%)

Snappy: Decompression speed (+1.47%)

Protobuf: Google message1 proto2 serialize speed MB/s (-0.52%)

Protobuf: Google message1 proto3 serialize speed MB/s (+0.62%)

Protobuf: Google message2 serialize speed MB/s (+11.65%)



#### Conclusion

- This proposal is non intrusive and require very little change in run time environment.
- This proposal doesn't have release overhead, anyone can build from github and use it.
- User doesn't have to recompile their codebase to take advantage of libcpurt library.
- Data collected on Broadwell processor, SKX system will show even better improvement.
- Some old code doesn't even compile on new OS with new compiler due to compatibility.
- Overall goal is to deliver the maximum performance to end users on Intel platforms and this proposal satisfy requirement with least overhead.

## Legal Disclaimer & Optimization Notice

INFORMATION IN THIS DOCUMENT IS PROVIDED "AS IS". NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Copyright © 2016, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.

#### **Optimization Notice**

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804



