Commits · riscv-sqoscfg-rfc-v2 · BayLibre / Linux

Mar 01, 2023

io_uring/poll: don't pass in wake func to io_init_poll_iocb() · 1947ddf9

Jens Axboe authored 2 years ago


We only use one, and it's io_poll_wake(). Hardwire that in the initial
init, as well as in __io_queue_proc() if we're setting up for double
poll.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

1947ddf9

Feb 26, 2023

io_uring/poll: allow some retries for poll triggering spuriously · c16bda37

Jens Axboe authored 2 years ago

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: stable@vger.kernel.org # v5.10+
Link: https://github.com/axboe/liburing/issues/364


Signed-off-by: Jens Axboe <axboe@kernel.dk>

c16bda37

Jan 29, 2023

io_uring: Rename struct io_op_def · a7dd2782

Breno Leitao authored 2 years ago


The current io_op_def struct is becoming huge and the name is a bit
generic.

The goal of this patch is to rename this struct to `io_issue_def`. This
struct will contain the hot functions associated with the issue code
path.

For now, this patch only renames the structure, and an upcoming patch
will break up the structure in two, moving the non-issue fields to a
secondary struct.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20230112144411.2624698-1-leitao@debian.org


Signed-off-by: Jens Axboe <axboe@kernel.dk>

a7dd2782

Jan 20, 2023

io_uring/poll: don't reissue in case of poll race on multishot request · 8caa03f1

Jens Axboe authored 2 years ago

A previous commit fixed a poll race that can occur, but it's only
applicable for multishot requests. For a multishot request, we can safely
ignore a spurious wakeup, as we never leave the waitqueue to begin with.

A blunt reissue of a multishot armed request can cause us to leak a
buffer, if they are ring provided. While this seems like a bug in itself,
it's not really defined behavior to reissue a multishot request directly.
It's less efficient to do so as well, and not required to rearm anything
like it is for singleshot poll requests.

Cc: stable@vger.kernel.org
Fixes: 6e5aedb9 ("io_uring/poll: attempt request issue after racy poll wakeup")
Reported-and-tested-by: Olivier Langlois <olivier@trillion01.com>
Link: https://github.com/axboe/liburing/issues/778

Signed-off-by: Jens Axboe <axboe@kernel.dk>

8caa03f1

Jan 12, 2023

io_uring/poll: attempt request issue after racy poll wakeup · 6e5aedb9

Jens Axboe authored 2 years ago


If we have multiple requests waiting on the same target poll waitqueue,
then it's quite possible to get a request triggered and get disappointed
in not being able to make any progress with it. If we race in doing so,
we'll potentially leave the poll request on the internal tables, but
removed from the waitqueue. That means that any subsequent trigger of
the poll waitqueue will not kick that request into action, causing an
application to potentially wait for completion of a request that will
never happen.

Fix this by adding a new poll return state, IOU_POLL_REISSUE. Rather
than have complicated logic for how to re-arm a given type of request,
just punt it for a reissue.

While in there, move the 'ret' variable to the only section where it
gets used. This avoids confusion the scope of it.

Cc: stable@vger.kernel.org
Fixes: eb0089d6 ("io_uring: single shot poll removal optimisation")
Signed-off-by: Jens Axboe <axboe@kernel.dk>

6e5aedb9

Jan 09, 2023

io_uring/poll: add hash if ready poll request can't complete inline · febb985c

Jens Axboe authored 2 years ago


If we don't, then we may lose access to it completely, leading to a
request leak. This will eventually stall the ring exit process as
well.

Cc: stable@vger.kernel.org
Fixes: 49f1c68e ("io_uring: optimise submission side poll_refs")
Reported-and-tested-by:  <syzbot+6c95df01470a47fc3af4@syzkaller.appspotmail.com>
Link: https://lore.kernel.org/io-uring/0000000000009f829805f1ce87b2@google.com/


Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

febb985c

Nov 30, 2022

io_uring: combine poll tw handlers · 443e5755

Pavel Begunkov authored 2 years ago

Merge apoll and regular poll tw handlers, it will help with inlining.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/482e59edb9fc81bd275fdbf486837330fb27120a.1669821213.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

443e5755

io_uring: improve poll warning handling · c3bfb57e

Pavel Begunkov authored 2 years ago


Don't try to complete requests if their refs are broken and we've got
a warning, it's much better to drop them and potentially leaking than
double freeing.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/31edf9f96f05d03ab62c114508a231a2dce434cb.1669821213.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

c3bfb57e

io_uring: remove ctx variable in io_poll_check_events · 047b6aef

Pavel Begunkov authored 2 years ago

ctx is only used by io_poll_check_events() for multishot poll CQE
posting, don't save it on stack in advance.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/552c1771f8a0e7688afdb4f538ead245f53e80e7.1669821213.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

047b6aef

io_uring: carve io_poll_check_events fast path · 9805fa2d

Pavel Begunkov authored 2 years ago


The fast path in io_poll_check_events() is when we have only one
(i.e. master) reference. Move all verification, cancellations
checks, edge case handling and so on under a common if.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8c21c5d5e027e32dc553705e88796dec79ff6f93.1669821213.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

9805fa2d

Nov 25, 2022

io_uring/poll: fix poll_refs race with cancelation · 12ad3d2d

Lin Ma authored 2 years ago


There is an interesting race condition of poll_refs which could result
in a NULL pointer dereference. The crash trace is like:

KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 30781 Comm: syz-executor.2 Not tainted 6.0.0-g493ffd6605b2 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.13.0-1ubuntu1.1 04/01/2014
RIP: 0010:io_poll_remove_entry io_uring/poll.c:154 [inline]
RIP: 0010:io_poll_remove_entries+0x171/0x5b4 io_uring/poll.c:190
Code: ...
RSP: 0018:ffff88810dfefba0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000040000
RDX: ffffc900030c4000 RSI: 000000000003ffff RDI: 0000000000040000
RBP: 0000000000000008 R08: ffffffff9764d3dd R09: fffffbfff3836781
R10: fffffbfff3836781 R11: 0000000000000000 R12: 1ffff11003422d60
R13: ffff88801a116b04 R14: ffff88801a116ac0 R15: dffffc0000000000
FS:  00007f9c07497700(0000) GS:ffff88811a600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffb5c00ea98 CR3: 0000000105680005 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 io_apoll_task_func+0x3f/0xa0 io_uring/poll.c:299
 handle_tw_list io_uring/io_uring.c:1037 [inline]
 tctx_task_work+0x37e/0x4f0 io_uring/io_uring.c:1090
 task_work_run+0x13a/0x1b0 kernel/task_work.c:177
 get_signal+0x2402/0x25a0 kernel/signal.c:2635
 arch_do_signal_or_restart+0x3b/0x660 arch/x86/kernel/signal.c:869
 exit_to_user_mode_loop kernel/entry/common.c:166 [inline]
 exit_to_user_mode_prepare+0xc2/0x160 kernel/entry/common.c:201
 __syscall_exit_to_user_mode_work kernel/entry/common.c:283 [inline]
 syscall_exit_to_user_mode+0x58/0x160 kernel/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

The root cause for this is a tiny overlooking in
io_poll_check_events() when cocurrently run with poll cancel routine
io_poll_cancel_req().

The interleaving to trigger use-after-free:

CPU0                                       |  CPU1
                                           |
io_apoll_task_func()                       |  io_poll_cancel_req()
 io_poll_check_events()                    |
  // do while first loop                   |
  v = atomic_read(...)                     |
  // v = poll_refs = 1                     |
  ...                                      |  io_poll_mark_cancelled()
                                           |   atomic_or()
                                           |   // poll_refs =
IO_POLL_CANCEL_FLAG | 1
                                           |
  atomic_sub_return(...)                   |
  // poll_refs = IO_POLL_CANCEL_FLAG       |
  // loop continue                         |
                                           |
                                           |  io_poll_execute()
                                           |   io_poll_get_ownership()
                                           |   // poll_refs =
IO_POLL_CANCEL_FLAG | 1
                                           |   // gets the ownership
  v = atomic_read(...)                     |
  // poll_refs not change                  |
                                           |
  if (v & IO_POLL_CANCEL_FLAG)             |
   return -ECANCELED;                      |
  // io_poll_check_events return           |
  // will go into                          |
  // io_req_complete_failed() free req     |
                                           |
                                           |  io_apoll_task_func()
                                           |  // also go into
io_req_complete_failed()

And the interleaving to trigger the kernel WARNING:

CPU0                                       |  CPU1
                                           |
io_apoll_task_func()                       |  io_poll_cancel_req()
 io_poll_check_events()                    |
  // do while first loop                   |
  v = atomic_read(...)                     |
  // v = poll_refs = 1                     |
  ...                                      |  io_poll_mark_cancelled()
                                           |   atomic_or()
                                           |   // poll_refs =
IO_POLL_CANCEL_FLAG | 1
                                           |
  atomic_sub_return(...)                   |
  // poll_refs = IO_POLL_CANCEL_FLAG       |
  // loop continue                         |
                                           |
  v = atomic_read(...)                     |
  // v = IO_POLL_CANCEL_FLAG               |
                                           |  io_poll_execute()
                                           |   io_poll_get_ownership()
                                           |   // poll_refs =
IO_POLL_CANCEL_FLAG | 1
                                           |   // gets the ownership
                                           |
  WARN_ON_ONCE(!(v & IO_POLL_REF_MASK)))   |
  // v & IO_POLL_REF_MASK = 0 WARN         |
                                           |
                                           |  io_apoll_task_func()
                                           |  // also go into
io_req_complete_failed()

By looking up the source code and communicating with Pavel, the
implementation of this atomic poll refs should continue the loop of
io_poll_check_events() just to avoid somewhere else to grab the
ownership. Therefore, this patch simply adds another AND operation to
make sure the loop will stop if it finds the poll_refs is exactly equal
to IO_POLL_CANCEL_FLAG. Since io_poll_cancel_req() grabs ownership and
will finally make its way to io_req_complete_failed(), the req will
be reclaimed as expected.

Fixes: aa43477b ("io_uring: poll rework")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
[axboe: tweak description and code style]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

12ad3d2d

io_uring: make poll refs more robust · a26a35e9

Pavel Begunkov authored 2 years ago


poll_refs carry two functions, the first is ownership over the request.
The second is notifying the io_poll_check_events() that there was an
event but wake up couldn't grab the ownership, so io_poll_check_events()
should retry.

We want to make poll_refs more robust against overflows. Instead of
always incrementing it, which covers two purposes with one atomic, check
if poll_refs is elevated enough and if so set a retry flag without
attempts to grab ownership. The gap between the bias check and following
atomics may seem racy, but we don't need it to be strict. Moreover there
might only be maximum 4 parallel updates: by the first and the second
poll entries, __io_arm_poll_handler() and cancellation. From those four,
only poll wake ups may be executed multiple times, but they're protected
by a spin.

Cc: stable@vger.kernel.org
Reported-by: Lin Ma <linma@zju.edu.cn>
Fixes: aa43477b ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c762bc31f8683b3270f3587691348a7119ef9c9d.1668963050.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

a26a35e9

io_uring: cmpxchg for poll arm refs release · 2f389343

Pavel Begunkov authored 2 years ago

Replace atomically substracting the ownership reference at the end of
arming a poll with a cmpxchg. We try to release ownership by setting 0
assuming that poll_refs didn't change while we were arming. If it did
change, we keep the ownership and use it to queue a tw, which is fully
capable to process all events and (even tolerates spurious wake ups).

It's a bit more elegant as we reduce races b/w setting the cancellation
flag and getting refs with this release, and with that we don't have to
worry about any kinds of underflows. It's not the fastest path for
polling. The performance difference b/w cmpxchg and atomic dec is
usually negligible and it's not the fastest path.

Cc: stable@vger.kernel.org
Fixes: aa43477b ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0c95251624397ea6def568ff040cad2d7926fd51.1668963050.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

2f389343

io_uring: add io_aux_cqe which allows deferred completion · 9b8c5475

Dylan Yudaken authored 2 years ago


Use the just introduced deferred post cqe completion state when possible
in io_aux_cqe. If not possible fallback to io_post_aux_cqe.

This introduces a complication because of allow_overflow. For deferred
completions we cannot know without locking the completion_lock if it will
overflow (and even if we locked it, another post could sneak in and cause
this cqe to be in overflow).
However since overflow protection is mostly a best effort defence in depth
to prevent infinite loops of CQEs for poll, just checking the overflow bit
is going to be good enough and will result in at most 16 (array size of
deferred cqes) overflows.

Suggested-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Dylan Yudaken <dylany@meta.com>
Link: https://lore.kernel.org/r/20221124093559.3780686-6-dylany@meta.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

9b8c5475

io_uring: defer all io_req_complete_failed · 973fc83f

Dylan Yudaken authored 2 years ago


All failures happen under lock now, and can be deferred. To be consistent
when the failure has happened after some multishot cqe has been
deferred (and keep ordering), always defer failures.

To make this obvious at the caller (and to help prevent a future bug)
rename io_req_complete_failed to io_req_defer_failed.

Signed-off-by: Dylan Yudaken <dylany@meta.com>
Link: https://lore.kernel.org/r/20221124093559.3780686-4-dylany@meta.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

973fc83f

io_uring: always lock in io_apoll_task_func · c06c6c5d

Dylan Yudaken authored 2 years ago


This is required for the failure case (io_req_complete_failed) and is
missing.

The alternative would be to only lock in the failure path, however all of
the non-error paths in io_poll_check_events that do not do not return
IOU_POLL_NO_ACTION end up locking anyway. The only extraneous lock would
be for the multishot poll overflowing the CQE ring, however multishot poll
would probably benefit from being locked as it will allow completions to
be batched.

So it seems reasonable to lock always.

Signed-off-by: Dylan Yudaken <dylany@meta.com>
Link: https://lore.kernel.org/r/20221124093559.3780686-3-dylany@meta.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

c06c6c5d

Nov 23, 2022

io_uring: iopoll protect complete_post · 1bec951c

Pavel Begunkov authored 2 years ago


io_req_complete_post() may be used by iopoll enabled rings, grab locks
in this case. That requires to pass issue_flags to propagate the locking
state.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cc6d854065c57c838ca8e8806f707a226b70fd2d.1669203009.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

1bec951c

Nov 22, 2022

Revert "io_uring: disallow self-propelled ring polling" · 4061f0ef

Jens Axboe authored 2 years ago


This reverts commit 7fdbc5f0.

This patch dealt with a subset of the real problem, which is a potential
circular dependency on the wakup path for io_uring itself. Outside of
io_uring, eventfd can also trigger this (see details in 03e02acd)
and so can epoll (see details in caf1aeaf). Now that we have a
generic solution to this problem, get rid of the io_uring specific
work-around.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

4061f0ef

io_uring: pass in EPOLL_URING_WAKE for eventfd signaling and wakeups · 44648532

Jens Axboe authored 2 years ago


Pass in EPOLL_URING_WAKE when signaling eventfd or doing poll related
wakups, so that we can check for a circular event dependency between
eventfd and epoll. If this flag is set when our wakeup handlers are
called, then we know we have a dependency that needs to terminate
multishot requests.

eventfd and epoll are the only such possible dependencies.

Cc: stable@vger.kernel.org # 6.0
Signed-off-by: Jens Axboe <axboe@kernel.dk>

44648532

Nov 21, 2022

io_uring/poll: remove outdated comments of caching · cd42a53d

Lin Ma authored 2 years ago


Previous commit 13a99017 ("io_uring: remove events caching
atavisms") entirely removes the events caching optimization introduced
by commit 81459350 ("io_uring: cache req->apoll->events in
req->cflags"). Hence the related comment should also be removed to avoid
misunderstanding.

Fixes: 13a99017 ("io_uring: remove events caching atavisms")
Signed-off-by: Lin Ma <linma@zju.edu.cn>
Link: https://lore.kernel.org/r/20221110060313.16303-1-linma@zju.edu.cn


Signed-off-by: Jens Axboe <axboe@kernel.dk>

cd42a53d

io_uring: fix two assignments in if conditions · df730ec2

Xinghui Li authored 2 years ago


Fixes two errors:

"ERROR: do not use assignment in if condition
130: FILE: io_uring/net.c:130:
+       if (!(issue_flags & IO_URING_F_UNLOCKED) &&

ERROR: do not use assignment in if condition
599: FILE: io_uring/poll.c:599:
+       } else if (!(issue_flags & IO_URING_F_UNLOCKED) &&"
reported by checkpatch.pl in net.c and poll.c .

Signed-off-by: Xinghui Li <korantli@tencent.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/r/20221102082503.32236-1-korantwork@gmail.com


[axboe: style tweaks]
Signed-off-by: Jens Axboe <axboe@kernel.dk>

df730ec2

Nov 18, 2022

io_uring: disallow self-propelled ring polling · 7fdbc5f0

Pavel Begunkov authored 2 years ago


When we post a CQE we wake all ring pollers as it normally should be.
However, if a CQE was generated by a multishot poll request targeting
its own ring, it'll wake that request up, which will make it to post
a new CQE, which will wake the request and so on until it exhausts all
CQ entries.

Don't allow multishot polling io_uring files but downgrade them to
oneshots, which was always stated as a correct behaviour that the
userspace should check for.

Cc: stable@vger.kernel.org
Fixes: aa43477b ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/3124038c0e7474d427538c2d915335ec28c92d21.1668785722.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

7fdbc5f0

Nov 17, 2022

io_uring: fix tw losing poll events · 539bcb57

Pavel Begunkov authored 2 years ago


We may never try to process a poll wake and its mask if there was
multiple wake ups racing for queueing up a tw. Force
io_poll_check_events() to update the mask by vfs_poll().

Cc: stable@vger.kernel.org
Fixes: aa43477b ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/00344d60f8b18907171178d7cf598de71d127b0b.1668710222.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

539bcb57

io_uring: update res mask in io_poll_check_events · b98186ae

Pavel Begunkov authored 2 years ago


When io_poll_check_events() collides with someone attempting to queue a
task work, it'll spin for one more time. However, it'll continue to use
the mask from the first iteration instead of updating it. For example,
if the first wake up was a EPOLLIN and the second EPOLLOUT, the
userspace will not get EPOLLOUT in time.

Clear the mask for all subsequent iterations to force vfs_poll().

Cc: stable@vger.kernel.org
Fixes: aa43477b ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/2dac97e8f691231049cb259c4ae57e79e40b537c.1668710222.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

b98186ae

Nov 11, 2022

io_uring/poll: lockdep annote io_poll_req_insert_locked · 5576035f

Pavel Begunkov authored 2 years ago

Add a lockdep annotation in io_poll_req_insert_locked().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8115d8e702733754d0aea119e9b5bb63d1eb8b24.1668184658.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

5576035f

io_uring/poll: fix double poll req->flags races · 30a33669

Pavel Begunkov authored 2 years ago


io_poll_double_prepare()            | io_poll_wake()
                                    | poll->head = NULL
smp_load(&poll->head); /* NULL */   |
flags = req->flags;                 |
                                    | req->flags &= ~SINGLE_POLL;
req->flags = flags | DOUBLE_POLL    |

The idea behind io_poll_double_prepare() is to serialise with the
first poll entry by taking the wq lock. However, it's not safe to assume
that io_poll_wake() is not running when we can't grab the lock and so we
may race modifying req->flags.

Skip double poll setup if that happens. It's ok because the first poll
entry will only be removed when it's definitely completing, e.g.
pollfree or oneshot with a valid mask.

Fixes: 49f1c68e ("io_uring: optimise submission side poll_refs")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b7fab2d502f6121a7d7b199fe4d914a43ca9cdfd.1668184658.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

30a33669

Sep 29, 2022

io_uring/poll: disable level triggered poll · d59bd748

Jens Axboe authored 2 years ago


Stefan reports that there are issues with the level triggered
notification. Since we're late in the cycle, and it was introduced for
the 6.0 release, just disable it at prep time and we can bring this
back when Samba is happy with it.

Reported-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

d59bd748

Aug 13, 2022

io_uring: make io_kiocb_to_cmd() typesafe · f2ccb5ae

Stefan Metzmacher authored 2 years ago


We need to make sure (at build time) that struct io_cmd_data is not
casted to a structure that's larger.

Signed-off-by: Stefan Metzmacher <metze@samba.org>
Link: https://lore.kernel.org/r/c024cdf25ae19fc0319d4180e2298bade8ed17b8.1660201408.git.metze@samba.org


Signed-off-by: Jens Axboe <axboe@kernel.dk>

f2ccb5ae

Jul 25, 2022

io_uring: add abstraction around apoll cache · 9b797a37

Jens Axboe authored 2 years ago


In preparation for adding limits, and one more user, abstract out the
core bits of the allocation+free cache.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

9b797a37

io_uring: move apoll cache to poll.c · 9da7471e

Jens Axboe authored 2 years ago


This is where it's used, move the flush handler in there.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

9da7471e

io_uring: consolidate hash_locked io-wq handling · e8375e43

Pavel Begunkov authored 2 years ago


Don't duplicate code disabling REQ_F_HASH_LOCKED for IO_URING_F_UNLOCKED
(i.e. io-wq), move the handling into __io_arm_poll_handler().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0ff0ffdfaa65b3d536131535c3dad3c63d9b7bb0.1657203020.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

e8375e43

io_uring: clear REQ_F_HASH_LOCKED on hash removal · b21a51e2

Pavel Begunkov authored 2 years ago

Instead of clearing REQ_F_HASH_LOCKED while arming a poll, unset the bit
when we're removing the entry from the table in io_poll_tw_hash_eject().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/02e48bb88d6f1480c94ac2924c43ad1fbd48e92a.1657203020.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

b21a51e2

io_uring: don't race double poll setting REQ_F_ASYNC_DATA · ceff5017

Pavel Begunkov authored 2 years ago

Just as with io_poll_double_prepare() setting REQ_F_DOUBLE_POLL, we can
race with the first poll entry when setting REQ_F_ASYNC_DATA. Move it
under io_poll_double_prepare().

Fixes: a18427bb2d9b ("io_uring: optimise submission side poll_refs")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/df6920f509c11115aa2bce8b34dc5fdb0eb98920.1657203020.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

ceff5017

io_uring: don't miss setting REQ_F_DOUBLE_POLL · 7a121ced

Pavel Begunkov authored 2 years ago

When adding a second poll entry we should set REQ_F_DOUBLE_POLL
unconditionally. We might race with the first entry removal but that
doesn't change the rule.

Fixes: a18427bb2d9b ("io_uring: optimise submission side poll_refs")
Reported-and-tested-by: <syzbot+49950ba66096b1f0209b@syzkaller.appspotmail.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8b680d83ded07424db83e8745585e7a6d72826ef.1657203020.git.asml.silence@gmail.com

Signed-off-by: Jens Axboe <axboe@kernel.dk>

7a121ced

io_uring: fix multishot poll on overflow · a2da6763

Dylan Yudaken authored 2 years ago


On overflow, multishot poll can still complete with the IORING_CQE_F_MORE
flag set.
If in the meantime the user clears a CQE and a the poll was cancelled then
the poll will post a CQE without the IORING_CQE_F_MORE (and likely result
-ECANCELED).

However when processing the application will encounter the non-overflow
CQE which indicates that there will be no more events posted. Typical
userspace applications would free memory associated with the poll in this
case.
It will then subsequently receive the earlier CQE which has overflowed,
which breaks the contract given by the IORING_CQE_F_MORE flag.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-9-dylany@fb.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

a2da6763

io_uring: add allow_overflow to io_post_aux_cqe · 52120f0f

Dylan Yudaken authored 2 years ago


Some use cases of io_post_aux_cqe would not want to overflow as is, but
might want to change the flags/result. For example multishot receive
requires in order CQE, and so if there is an overflow it would need to
stop receiving until the overflow is taken care of.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-8-dylany@fb.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

52120f0f

io_uring: add IOU_STOP_MULTISHOT return code · 114eccdf

Dylan Yudaken authored 2 years ago


For multishot we want a way to signal the caller that multishot has ended
but also this might not be an error return.

For example sockets return 0 when closed, which should end a multishot
recv, but still have a CQE with result 0

Introduce IOU_STOP_MULTISHOT which does this and indicates that the return
code is stored inside req->cqe

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-7-dylany@fb.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

114eccdf

io_uring: clean up io_poll_check_events return values · 2ba69707

Dylan Yudaken authored 2 years ago


The values returned are a bit confusing, where 0 and 1 have implied
meaning, so add some definitions for them.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220630091231.1456789-6-dylany@fb.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

2ba69707

io_uring: move POLLFREE handling to separate function · fe991a76

Jens Axboe authored 2 years ago


We really don't care about this at all in terms of performance. Outside
of having it already be marked unlikely(), shove it into a separate
__cold function.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

fe991a76

io_uring: optimise submission side poll_refs · 49f1c68e

Pavel Begunkov authored 2 years ago


The final poll_refs put in __io_arm_poll_handler() takes quite some
cycles. When we're arming from the original task context task_work won't
be run, so in this case we can assume that we won't race with task_works
and so not take the initial ownership ref.

One caveat is that after arming a poll we may race with it, so we have
to add a bunch of io_poll_get_ownership() hidden inside of
io_poll_can_finish_inline() whenever we want to complete arming inline.
For the same reason we can't just set REQ_F_DOUBLE_POLL in
__io_queue_proc() and so need to sync with the first poll entry by
taking its wq head lock.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8825315d7f5e182ac1578a031e546f79b1c97d01.1655990418.git.asml.silence@gmail.com


Signed-off-by: Jens Axboe <axboe@kernel.dk>

49f1c68e