I'll be talking about the -fanalyzer
static analysis option I added to
GCC:
- overview of the analyzer and its internal implementation
- what I've changed so far for GCC 12
- my plans for further development of the analyzer
("Prepared project report": 25 minutes, including questions)
Points-to analysis is a static code analysis that calculates the pointer-pointee relationship between expressions and static memory locations. The results of the points-to analysis may be used by multiple optimizations and analyses. Of particular interest a precise points-to analysis is necessary to perform data-layout optimizations at the level of alias sets. We use the high level,...
LLVM has two main test suites:
- the regression test suite tests the compilation from source to IR; and
- the nightly test suite is a body of often large applications which are compiled and executed.
However, there is no large body of tests of detailed functionality which is compiled right down to the target object code and executed. At previous conferences, we have described the...
Bunsen is a toolkit for compact storage and analysis of DejaGNU test results. The toolkit includes a storage engine that compresses and indexes a large collection of test result logs in a Git repository, a Python library for querying and analyzing the test result collection, and a simple CGI service for accessing query results through a web browser.
In this talk I will give an in-depth look...
Abstract:
AMD has been working on adding support for GPU compute debugging to GDB. Early on, it became apparent that current DWARF would not be sufficient to support optimized SIMT/SIMD code, so we came up with extensions and generalizations that we intend to propose to DWARF 6. Although designed with GPUs in mind, the extensions are generic and can just as well be used to improve quality of...
CTF (Compact C Type Format) is a debugging format whose main (but not only) purpose is to convey type information of C program constructs. BTF is a similar format used in the Linux kernel to support the portable execution of BPF programs. Both formats share a common ancestor and show some remarkable similarities. However they are not the same format, their application goals are different, are...
BPF is a virtual machine that resides in the Linux kernel. Initially intended for user-level packet capture and filtering, BPF is nowadays generalized to serve as a general-purpose infrastructure also for non-networking purposes. BPF programs are often written manually, directly in assembly instructions. However, people often want to write their BPF programs in C. We recently added support...
Prepared presentation
In this talk we present an overview of gprofng, a next generation profiling tool for Linux.
This profiler has its roots in the Performance Analyzer from the Oracle Developer Studio product. Gprofng is a standalone tool however and specifically targets Linux. It includes several tools to collect and view the performance data. Various processors from Intel, AMD, and...
This talk will discuss the methods used in constructing the recent improvement in complex divide in libgcc where the gross error rate dropped from more than 1 per 100 tests to less than 1 per 10 million tests. The change in accuracy is platform independent while the modest performance loss varies with platform. We also discuss flaws and likely areas for addressing reducing remaining small errors.
The malloc library provided by glibc offers considerable flexibilty in deciding when to use mmap for larger allocations and when to use sbrk/trim. The default settings for the decision thresholds are reasonable for many applications. Three tunables are available to adjust these settings. The limits on these settings have not been changed since 2006. Server class systems now have much more...
On systems with copy relocation:
- A copy in executable is created for the definition in a shared library at run-time by ld.so.
- The copy is referenced by executable and shared libraries.
- Executable can access the copy directly.
Issues are:
- Overhead of a copy, time and space, may be visible at run-time.
- Read-only data in the shared library becomes read-write copy in...
Intel LAM (Linear Address Masking) Extension allows software to locate metadata in data pointers and dereference them without needing to mask the metadata bits. It supports:
- LAM_U48: Activate LAM for user data pointers and use of bits 62:48 as masked metadata.
- LAM_U57: Activate LAM for user data pointers and use of bits 62:57 as masked metadata.
I am presenting a proposal to...
The existing implementation of the OpenACC "kernels" construct in GCC
is unable to cope with many language constructs found in real HPC
codes which generally leads to very bad performance. This talk
presents upcoming changes to the "kernels" implementation that improve
the performance significantly:
- A more unified internal representation of "kernels" and "parallel"
regions as a...
BoF to discuss topics related to concurrency and offloading work onto AMD and NVIDIA accelerators using OpenMP and OpenACC.
In particular the implementation of the missing OpenMP 5.0 & 5.1 features, including memory allocators, unified shared memory, C++ attributes, etc.
Related topics and trends can also be discussed, be it base language concurrency features, offloading without using...
The annual GNU Toolchain mindfulness and meditation session. A cordial Question and Answers session with the GCC Steering Committee, GLIBC, GDB and Binutils Stewards also will be entertained.
This is a lightning talk.
One of the hurdles necessary to overcome for the M1 Darwin GCC port is
supporting the Darwin ABI specification. GCC is designed to process
argument passing the same way, regardless of whether the argument is
named or variadic. This however does not leave scope to accommodate the
Darwin modifications to the AArch64 ABI, which specifies that...
Recent x86 processors support "non_temporal" stores which bypass the cache when storing data. It is widely understood that normal stores to cache are appropriate when it is likely that the data may be needed before the cache is full. It is also understood that stores of large blocks of data which exceed the available cache allow the overall application to run faster when the block of stores...
There are multiple security features that have been requested for the Linux Kernel for a long time (https://outflux.net/slides/2020/lpc/gcc-and-clang-security-feature-parity.pdf). This wishlist includes wipe call-used registers on return, auto-initialization of stack variables, unsigned overflow detection, etc …
Some of these security features have been available in CLANG, or other...
Discuss topics related to the rs6000 / Power / PowerPC toolchain, including support for Power10.
The GNU C Library is used as the C library in the GNU systems and most systems with the Linux kernel. The library is primarily designed to be a portable and high performance C library. It follows all relevant standards including ISO C11 and POSIX.1-2008. It is also internationalized and has one of the most complete internationalization interfaces known.
This BoF aims to bring together...
The 1999 revision of ISO C removed implicit function declarations from the language. Instead, all functions must be declared (with or without a prototype) before they can be called. In previous language versions, a function f
was implicitly declared as extern int f ();
if the identifier f
was used in a call expression (such as f (1, 2, 3.0)
).
When GCC switched the default to C99...
A demonstration of debugging OpenMP/OpenACC kernels using GDB, and a quick overview of the how it was achieved and what still needs to be done.
We discuss implementation of new inter-procedural mod/ref pass. The pass is collecting information about memory locations modified or read by a given function as well as information useful for points-to analysis (such as information about whether given parameter can escape to global memory or to return value of the function).
First version of mod/ref pass was contributed to GCC 11 and is...
CORE-V is a family of RISC-V processor cores developed to commercially robust standards by the Open Hardware Group, a consortium of industrial and academic organizations.
In the first part of this talk we give an update on the work on the GNU tool chain for the CV32E40P, the first of the CORE-V family with custom extensions for branching, autoincrement load/store, hardware loops, multiply...
This is more of a placeholder than anything else: There's an email thread going around that was a bit inconclusive as to whether on not we should have one of these so I figured it'd be easier to just make one.
There are a number of optimizations done in the middle end that would benefit from understanding the amount of register pressure. Unrolling, inlining, and parallel reassociation are some that come to mind immediately. I think it would be good to have a discussion about how these optimizations might get pressure information to know how aggressive they should be.