Sep 9 – 11, 2019
Europe/Lisbon timezone

IO: Durability, Errors and Documentation

Sep 11, 2019, 12:07 PM
Jade/room-I&II (Corinthia Hotel Lisbon)


Corinthia Hotel Lisbon



Andres Freund (EnterpriseDB / PostgreSQL)Mr Tomas Vondra (Postgresql)


Postgres (and many other databases) have, until fairly recently, assumed that IO errors would a) be reliably signalled by fsync/fdatasync/... b) repeating an fsync after a failure would either result in another failure, or the IO operations would succeed.

That turned out not to be true: See also

While a few improvements have been made, both in postgres and linux, the situation is still pretty bad.

From my point of view, a large part of the problem is that linux does not document what error and durability behaviour userspace can expect from certain operations.

Problematic areas for the kernel:
- The regular behaviour of durability fs related syscalls are not documented. One extreme example of that is sync_file_range (look at the warning section of the manpage)
- FS behaviour when encountering IO errors is poorly, if at all, documented. For example: there still is no documentation about the error behaviour of fsync, ext4's errors= operation reads as if it applied to all IO errors, but only applies to metadata errors.
- There is very little consistency for error behaviour between filesystems. To the degree that XFS will return different data after writeback failed than ext4.
- There is no usable interface to query / be notified of IO errors
- the rapid development of thin provisioned storage has increased the likelihood of IO errors drastically, as large parts of the IO stack treat out-of-space on the block level as an IO error

It seems worthwhile to work together to at least partially clean this up.

I agree to abide by the anti-harassment policy Yes

Primary authors

Andres Freund (EnterpriseDB / PostgreSQL) Mr Tomas Vondra (Postgresql)

Presentation materials