3D graphics with new hardware designs and APIs (such as Vulkan) are evolving rapidly. At the same time, windowing systems evolve with new protocols, use-cases, formats, synchronization mechanisms and so on. To effectively support all this GPU drivers separate the implementation of Windowing System Integration (WSI) from the Core 3D rendering.
In Vulkan, WSI is implemented through windowing-specific surface extensions and a swapchain implementation. However, through the Vulkan layer mechanism, we can naturally decouple the WSI code from the core GPU driver. This can make development simpler by allowing people to focus on either supporting new windowing systems and features or new GPU hardware drivers.
By making use of the Vulkan specification with extensions as an interface between WSI code and rendering drivers, it enables more cross-vendor sharing. This also encourages more standardization in how drivers integrate with the OS and leads to more feature-rich Linux graphics stacks.
Introducing the Vulkan WSI layer: the starting point for a driver-agnostic Vulkan swapchain implementation. We've open sourced a working layer that implements VK_EXT_headless_surface and VK_KHR_swapchain, as a starting point to develop a generic implementation for the different Linux windowing systems.
We'll present the project and its current status, and open for discussion on how we can best collaborate on this important piece of the wider Linux graphics puzzle.
Radv (the radeon vulkan driver) has for a long time used LLVM as the
shader compiler backend. However, LLVM has a number of issues which
led us to develop an alternative shader compiler that strongly leans
on the shared nir intermediate language. This new compiler is showing
significant gains for compile time as well as runtime performance.
We will talk about our pain points with LLVM and how ACO solves them,
the overall design of ACO as well as the challenges we see and the
plans we have for the future.
(Google), Daniel Schürmann
Compilers are hard and there are always a lot of design decisions involved in trying to come up with the right architecture to target any given piece of hardware. In this talk, Jason will go over some of the design decisions (especially the mistakes) that have been made by the Intel team as well as other established back-ends and how they have worked out in practice. These will be used as motivating examples to discuss current back-end compiler best practices and suggestions for developers working on new back-ends.
GPUs often provide half-float 16-bit registers for floating point calculations. Using these instead of full-precision 32-bit registers can often provide a significant performance benefit, particularly on embedded GPUs. The method used to expose these registers to applications in OpenGLES is that variables can be marked as mediump, meaning that the driver is allowed to use a lower precision for any operations involving these variables. The GLES spec allows for the lower precision to be optional so it is always valid to use a higher precision. Mesa currently implements the spec effectively by just ignoring the precision markers and always using full precision.
This talk will present ongoing work at Igalia to implement a lowering pass to convert mediump operations to 16-bit float operations. The work is targetting the Freedreno driver but the resulting lowering pass may be applicable to other drivers too.
As more applications come to Linux and drivers move to NIR, the need to perform both application-specific and device-specific shader optimizations increases. Over the past year, numerous enhancements to existing optimizations and new optimization passes have been implemented. Tools and techniques developed from that experience will be presented. The emphasis will be to finding, diagnosing, and validating various kinds peephole optimizations passes and optimiztions for NIR's algebraic optimization pass.
Linux Graphics CI: Standardizing the kernel CI workflow and hardware, and improving our testsuites
There are many Linux kernel-testing projects, most of them are modeled over proven software testing workflows. These workflows however often rely on a stable host platform and stable test results and, as such, don't apply to testing development versions of Linux where the filesystem, network, boot, or suspend might be unreliable.
The Intel GFX CI debuted in 2016 with a different workflow: providing pre-merge curated testing results in a timely manner to all patch series posted on the intel-gfx mailing list. The IGT tests would get executed on Intel platforms spanning from 2004 to upcoming platforms. Known issues are automatically associated to bugs to focus the report on what the developer is trying to change, making it easier to review the change.
After years of experimenting and refining this workflow, the GFX CI team became confident that it was generic-enough and went on to standardize interfaces between the different components in order to enable other drivers to reproduce the testing workflow and collaborate on the development of IGT and related tools.
An example of related tools comes from Google's ChromeOS validation HW (Chamelium) which acts as an open hardware re-programmable screen with DP, HDMI, and VGA inputs. After initial work from Red Hat in IGT to support the Chamelium, Intel took on the project and have achieved a level of testing for Display Port and HDMI comparable to their official conformance test suites. This massively increases the level of testing achievable in an automated testing system, and not just for Intel, but for GPUs support DP and/or HDMI.
Finally, a new test suite for the KMS interface is being designed around VKMS in order to test how Xorg and Wayland compositors behave in the presence of GPU (un)hotplugging, bandwidth limitations for planes, DP link status issues, etc... This should further improve the reliability of the userspace when it comes to hard-to-reproduce events, regardless of the GPU driver being used!
In this talk, I will compare the different linux testing projects, introduce the i915 CI workflow and tools, the open sourcing and standardization effort going on in i915-infra, the recent development in IGT/Chamelium, and the plan to test Wayland compositors. Let's work together on standardizing our testing, and moving to a model where not only the i915 driver, but all the drivers would be validated before every commit!
Outreachy Internship Report: Refactoring backlight and spi helpers in drm/tinydrm
In this talk, I will briefly describe my contributions as an outreachy round 15 intern. Broadly, I worked on refactoring code in the tinydrm subsystem. Specifically, I refactored the backlight and the spi helper functions.
The Khronos Group industry consortium has created and evolved many key graphics open standards such as OpenGL and Vulkan – and strongly supports their use in open source platforms. Come and hear about the latest Khronos initiatives and roadmap updates, and participate in an AMA session with Neil Trevett, Khronos President.
Some DRM drivers have been exposing overlay planes for quite some time. Overlay planes can improve battery consumption by scanning out directly client buffers, skipping composition. While compositors usually take advantage of the cursor plane (and sometimes are able to use the primary plane to scan out directly a client's buffer), overlay planes are under-used.
The exception is Weston, which tries to use overlay planes (more work is underway, see https://gitlab.freedesktop.org/wayland/weston/issues/275). Other compositors ignore overlay planes.
The main challenge is to figure out how to assign buffers coming from clients to hardware planes. The only API exposed by KMS is atomic test commits, so user-space needs to try different combinations.
It would be nice to have a common library shared between compositors to de-duplicate the work. The library I have in mind offers an API similar to Android's hwcomposer: you give it a scenegraph, it figures out how to allocate planes.
I've started an experiment to figure out whether such a library would be viable: https://github.com/emersion/libliftoff
Getting feedback from compositor writers and DRM experts would be useful to push the project forward. Come and help making planes useful for compositors!
Lightning talks: Demos / Lightning talks
Lightning talks get schedule as time permits throughout the assigned time block. Make sure you've uploaded your slides before the slot starts - laptop switching only for demos. Please be ready!
XDC 2019 Reception @ Bier MarktBier Markt
1221 René-Lévesque Blvd West, Montréal
After a busy first day of presentations, we've reserved a space for all attendees at Bier Markt on René-Lévesque Boulevard West, and the first round of drinks, whether alcoholic or non alcoholic, will be sponsored by X.Org! With one of the best beer list in Montreal (150 from over 30 countries and over 40 local crafts), and some great food, it will be the perfect place to unwind and network with fellow attendees!
VR took off for the consumer with the release of Oculus consumer hardware. But the hardware lacked open source drivers and Linux support in general. The OpenHMD project was created to solve this issue. The consumer VR space has now grown from a kickstarter campaign into a large industry. But this growth has its down sides, multiple companies have their own APIs competing. Luckily these companies have agreed to work on a single API under the Khronos umbrella. Now that provisional spec of OpenXR has been released, the Monado project has been launched, a follow up to OpenHMD.
In this talk, Jakob will cover Monado and Khronos' OpenXR standard, give an overview about the current state of open source VR and what lies ahead.
Jakob works for Collabora with graphics and virtual reality, XR Lead at Collabora and a member of the OpenXR working group. He has worked with Linux graphics since 2006, starting with Tungsten Graphics and moving into VMware. In 2013 he along with a friend started the OpenHMD project, then in the spring of 2019 was involved in launching both Monado and OpenXR at GDC.
Christoph Haag, Joey Ferwerda
(Collabora / OpenHMD)
Improving frame timing accuracy in Mesa, DRM and X
Smooth animation of graphics requires that the presentation timing of
each frame be controlled accurately by the application so that the
contents can be correctly adjusted for the display time. Controlling
the presentation timing involves predicting when rendering of the
frame will be complete and using the display API to request that the
frame be displayed at a specific time.
Predicting the time it will take to render a frame usually draws upon
historical frame rendering times along with application
heuristics. Once drawn, the display API is given the job of presenting
the content to the user at the specified time. A failure of either of
these two mechanisms will result in content being delayed, and a
stuttering or judder artifact made visible to the user.
Historical timing information includes both the time taken to render a
frame with the GPU along with the actual time each frame was displayed
to the user. Ideally, the application will also be given some estimate
of how long it will take to ready the frame for display once the
presentation request has been delivered to the display system. With
these three pieces of information (application GPU time, actual
display time, presentation overhead), the application can estimate
when its next frame will be ready for display.
The following work is underway to provide applications this
information and to improve the accuracy of display presentation timing
in the Linux environment.
Vulkan GOOGLE_display_timing extension implementation in
Mesa. This offers applications some fairly straightforward
measurements that can help predict when a frame timing target
might be missed.
Heuristics in the X Composite and Present extension
implementations to improve accuracy of reported display times to
Additions to Composite that replace the above heuristics with
precise timing information for Compositing managers modified to
support these additions.
Semi-automatic compositing support added to the Composite
extension which allow in-server compositing of some windows to
reduce variability in the display process.
This presentation will describe the above work and demonstrate the
benefits of the resulting code.
A case study on frame presentation from user space via KMS
Traditionally, an application had very little control about when a rendered
frame is actually going to be displayed. For games, this uncertainty can cause
animation stuttering . A Vulkan prototype extension was added to address
this problem .
XR (AR/VR) applications similarly need accurate knowledge of presentation
timestamps in order to predict the head-pose for the time a frame will be
displayed. Here, inaccuracies lead to registration errors (i.e. mismatch
between virtual and real head pose), causing users to get motion sickness or to
experience swimming of virtual content.
XR compositors also optimize for latency. An already-rendered frame is
corrected for the most recent head-pose, right before its scan-out to display.
The time between the correction of a frame and its presentation determines the
resulting latency. In order to keep this value as low as possible, a compositor
needs to control how late a frame can be scheduled in order to make the desired
The Atomic KMS API is the lowest-level cross-driver API for programming
display controllers on Linux. With KMS, buffers can be submitted directly from
user space for display, circumventing traditional presentation layers of
graphics APIs (e.g. EGL surfaces or Vulkan swapchains). This way, applications
gain exclusive access to the display engine for maximum control. Collabora and
DAQRI recently published the kms-quads sample project to demonstrate this
technique . While working on this, we identified several issues of the KMS
API that make it challenging to implement tightly scheduled buffer
presentations as required by the use cases mentioned above. For instance, which
part of the scan-out signal timestamps provided by KMS refer to is not well
defined. Furthermore, it is unclear what the latest point in time is that a
buffer can be submitted to make a specific presentation deadline (see  for
related discussion). The advent of adaptive-sync support in KMS makes this
topic even more complex.
This talk should serve as an introduction and summary to user-driven
presentation timing via KMS, based on last year's experience of implementing a
KMS-based AR compositor at DAQRI. We will discuss the use-case, its
implementation and demonstrate open problems of this topic, hopefully leading
to further discussion at the venue.
I will discuss the results of an effort to implement the concepts discussed in my prior generic Unix device memory allocator talks as extensions to the existing GBM API with a nouveau driver backend. Based on prior feedback, DRM format modifiers will be used in place of what were previously called "capabilities". I will attempt to demonstrate the feasibility of this less-invasive approach, and demonstrate the performance/efficiency compared to existing GBM usage models.
While investigating a performance issue with the F1 2017 game benchmark, we identified some bottlenecks related to how ttm and amdgpu do buffer validation and LRU handling. This ultimately lead to a major redesign of how we handle buffer migration. This talk describes process that we took to identify and fix the bottleneck and what we learned along the way.
The Talos Principle(Vulkan) Clpeak(OCL) BusSpeedReadback(OCL) /unit: ms
Original 162.1 FPS 42.15 us 0.254 (1K) 0.241 (2K) 0.230(4K) 0.223(8K) 0.204(16K)
Bulk Move 162.4 FPS 44.48 us 0.260 (1K) 0.274 (2K) 0.249(4K) 0.243(8K) 0.228(16K)
Original (move PT bo on LRU) 147.7 FPS 76.86 us 0.319(1k) 0.314 (2K) 0.308(4K) 0.307(8K) 0.310(16K)
Bulk Move (move PT bo on LRU) 163.5 FPS 40.52 us 0.244(1K) 0.252(2K) 0.213(4K) 0.214(8K) 0.225(16K) <-- With the best performance and highest FPS at the same time
Lima is an open source graphics driver which supports Mali 400/450 embedded GPUs from ARM via reverse engineering.
Recently, after many years since the beginning of the reverse engineering efforts on these devices, the lima driver has been finally upstreamed in both mesa and linux kernel counterparts.
This talk will cover some information about the target GPUs and implementation details of the driver.
It will also include a history of the work done so far to make it possible and the recent efforts which lead to its inclusion in upstream.
Lima is a project under development and many features are still missing for it to become a complete graphics driver.
The aim of this talk is to discuss about its current state and how it is expected to improve going forward.
DP adaptive sync, a feature supported by AMD under the marketing name of Freesync, primarily allows for smoother gameplay but also enables other use cases, like idle desktop powersaving and 24Hz video playback. In this talk we'll describe what adaptive sync is, how it works, and will speak to different use cases and how they might be implemented. The presentation will cover design and code snippets to show how the feature is enabled.
Enabling 8K displays - A story of 33M pixels, 2 CRTCs and no Tears!
Ever seen a true 33 million pixel 8K display? The maximum display link bandwidth available with DisplayPort’s highest bit rate of 8.1 Gbps/lane limits the resolution to 5K@60 over a single DP connector. Hence the only true 8K displays allowing upto full 60 frames per second are the tiled displays enabled using 2 DP connectors running at their highest bit rate across 2 CRTCs in the display graphics pipeline. Enabling tiled displays across dual CRTC dual connector configuration has always resulted in screen tearing artifacts due to synchronization issues between the two tiles and their vertical blanking interrupts.
Transcoder port synchronization is a new feature supported on Intel’s Linux Graphics kernel driver for platforms starting Gen 11 that fixes the tearing issue on tiled displays. In this talk Manasi will explain how port synchronization is plumbed into the existing atomic KMS implementation. She will deep dive into the DRM and i915 code changes required to handle tiled atomic modesets through master and slave CRTCs lockstep mode operation to enable tearfree 8K display output across 2 CRTCs and 2 ports in the graphics pipeline. She will conclude by showing the 8K display results using Intel GPU Tools test suite.
A whirlwind tour through the input stack development efforts
The input stack comprises many pieces. libinput, libevdev, libratbag, libwacom and even a few components that don't start with "lib". The kernel or X for example, also somewhat of importance.
This talk is a tour of recently added features and features currently in development. Examples include libinput user devices, the difficulty of supporting high-resolution wheel scrolling in Wayland, how we've painted ourselves in a corner by using the hwdb in libinput and then tore out the whole room and replaced it with a nicer one, and whacky devices like the totem that will probably never work as they do in the advertising videos. This talk includes blue-sky features full of optimism and may include some features that have no such optimism left and are now merely a pile of discarded branches, soaked with tears.
Edging closer to the hardware for kernel CI on input devices
Avoiding regressions in the input stack is hard. Ideally, every commit and the ones before it are tested against every possible device. But the universe hasn't seen it fit to provide us with an army of people to test devices, infinite resources, or even a lot of time to spare. Pity, that, really. But we do have a computer, so that's a start.
In this talk we show how we moved from basically no regression tests 10 years ago in the input stack, to a state where every commit gets tested against variety of devices. We show how we can do CI on the kernel side, and how we can do CI on the user space side.
Thousands of moons ago, an obscure magical art developed by the Magic Resistance beneath the disquieting tides of the Magilectric Revolution: reverse-engineering... Some refuse to acknowledge its existence. Some mistakenly believe it a form of witchcraft or dark magic, entranced by its binary spells and shadowy hexes. The outer world rarely catches more than a glimpse of these powerful mages, for each youngling learns from an elder reverse-engineer, under a non-existent disclosure agreement, knowledged memcpy'd straight to their brains. A write-only memory spell ensured total secrecy of the art form... until today. Learn the exploits of a young mage and her memory restoration spell. Registers and secrets spilled in this magilectric adventure.
FPGAs and their less generic cousin, specialized accelerators have come onto the scene in a way that GPUs did 20 or so years ago. Indeed new interfaces have cropped up to support them in a fashion resembling early GPUs, and some vendors have even opted to try to use DRM/GEM APIs for their devices.
This talk will start with a background of what FPGAs are, how they work, and where they're currently used. Because the audience is primarily people with graphics/display background, I will make sure to cover this well. It will then discuss the existing software ecosystem around the various usage models with some comparison of the existing tools. If time permits, I will provide a demo comparing open tools vs. closed ones.
The goal of the talk is to find people who have lived through DRI1 days and are able to help contribute their experience (and coding) toward improving the stack for the future accelerators.
Currently, my focus is on helping to improve a fully open source toolchain, and so I will spend more time on that then API utilization.
Hardware testing can help catch regressions in driver code. One way to test is
to perform manual checks, however this is error-prone and doesn't scale. Another
approach is to build an automatic test suite (e.g. IGT), which calls special
driver interfaces or mocks resources to check whether they are doing the right
thing (for instance, checksums to check that the right frame is produced).
However it's not possible to test all features with driver helpers: link
training, hot-plug detection, DisplayPort Multi-Stream Transport and Display
Screen Compression are examples of hard-to-test features. Moreover, some
regressions can be missed because they happen at a lower level than the driver
For this reason, a board emulating a real screen has been developed: Chamelium.
It can be used to test display features from the KMS client to the screen and
make sure the whole software and hardware stack works properly. An overview of
the Chamelium board features (and limitations) will be presented.
Automated testing is essential to merging patches with confidence
It's not possible to test all display-related features without real
Some features can only be tested by adding knobs in the kernel (e.g. by
forcing an EDID on a disconnected connector)
The tests aren't checking that the feature works correctly with a real
Google has developed a Chamelium board that emulates a screen
Chamelium support in IGT
Example of a Chamelium test (quick demo?)
Current limitations and possible improvements
Features supported by the receiver chips but not exposed by the Chamelium
Features not supported (would require a new board)
There are similarities between Camera and Display. On how they work and how we interact with them. For example, both need an Atomic API, share the same buffer formats and need features like explicit fences. On the other hand cameras are much more complex devices where many cameras have very special and unique needs. We want to improve the camera support in Linux and their interoperability with the rest of the kernel benefiting as much as we can from the existing code base and knowledge built by the community.
(Collabora), Laurent Pinchart
(Ideas on Board)
Lightning talks get schedule as time permits throughout the assigned time block. Make sure you've uploaded your slides before the slot starts - laptop switching only for demos. Please be ready!