Skip to content
Snippets Groups Projects
  1. Jan 29, 2023
  2. Jan 13, 2023
    • Pavel Begunkov's avatar
      io_uring: lock overflowing for IOPOLL · 544d163d
      Pavel Begunkov authored
      
      syzbot reports an issue with overflow filling for IOPOLL:
      
      WARNING: CPU: 0 PID: 28 at io_uring/io_uring.c:734 io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734
      CPU: 0 PID: 28 Comm: kworker/u4:1 Not tainted 6.2.0-rc3-syzkaller-16369-g358a161a6a9e #0
      Workqueue: events_unbound io_ring_exit_work
      Call trace:
       io_cqring_event_overflow+0x1c0/0x230 io_uring/io_uring.c:734
       io_req_cqe_overflow+0x5c/0x70 io_uring/io_uring.c:773
       io_fill_cqe_req io_uring/io_uring.h:168 [inline]
       io_do_iopoll+0x474/0x62c io_uring/rw.c:1065
       io_iopoll_try_reap_events+0x6c/0x108 io_uring/io_uring.c:1513
       io_uring_try_cancel_requests+0x13c/0x258 io_uring/io_uring.c:3056
       io_ring_exit_work+0xec/0x390 io_uring/io_uring.c:2869
       process_one_work+0x2d8/0x504 kernel/workqueue.c:2289
       worker_thread+0x340/0x610 kernel/workqueue.c:2436
       kthread+0x12c/0x158 kernel/kthread.c:376
       ret_from_fork+0x10/0x20 arch/arm64/kernel/entry.S:863
      
      There is no real problem for normal IOPOLL as flush is also called with
      uring_lock taken, but it's getting more complicated for IOPOLL|SQPOLL,
      for which __io_cqring_overflow_flush() happens from the CQ waiting path.
      
      Reported-and-tested-by: default avatar <syzbot+6805087452d72929404e@syzkaller.appspotmail.com>
      Cc: stable@vger.kernel.org # 5.10+
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      544d163d
  3. Jan 09, 2023
  4. Dec 15, 2022
  5. Nov 25, 2022
    • Al Viro's avatar
      use less confusing names for iov_iter direction initializers · de4eda9d
      Al Viro authored
      
      READ/WRITE proved to be actively confusing - the meanings are
      "data destination, as used with read(2)" and "data source, as
      used with write(2)", but people keep interpreting those as
      "we read data from it" and "we write data to it", i.e. exactly
      the wrong way.
      
      Call them ITER_DEST and ITER_SOURCE - at least that is harder
      to misinterpret...
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      de4eda9d
  6. Nov 23, 2022
  7. Nov 21, 2022
  8. Nov 16, 2022
  9. Oct 17, 2022
  10. Oct 13, 2022
  11. Sep 29, 2022
    • Jens Axboe's avatar
      io_uring/rw: defer fsnotify calls to task context · b000145e
      Jens Axboe authored
      We can't call these off the kiocb completion as that might be off
      soft/hard irq context. Defer the calls to when we process the
      task_work for this request. That avoids valid complaints like:
      
      stack backtrace:
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.0.0-rc6-syzkaller-00321-g105a36f3694e #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/26/2022
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_usage_bug kernel/locking/lockdep.c:3961 [inline]
       valid_state kernel/locking/lockdep.c:3973 [inline]
       mark_lock_irq kernel/locking/lockdep.c:4176 [inline]
       mark_lock.part.0.cold+0x18/0xd8 kernel/locking/lockdep.c:4632
       mark_lock kernel/locking/lockdep.c:4596 [inline]
       mark_usage kernel/locking/lockdep.c:4527 [inline]
       __lock_acquire+0x11d9/0x56d0 kernel/locking/lockdep.c:5007
       lock_acquire kernel/locking/lockdep.c:5666 [inline]
       lock_acquire+0x1ab/0x570 kernel/locking/lockdep.c:5631
       __fs_reclaim_acquire mm/page_alloc.c:4674 [inline]
       fs_reclaim_acquire+0x115/0x160 mm/page_alloc.c:4688
       might_alloc include/linux/sched/mm.h:271 [inline]
       slab_pre_alloc_hook mm/slab.h:700 [inline]
       slab_alloc mm/slab.c:3278 [inline]
       __kmem_cache_alloc_lru mm/slab.c:3471 [inline]
       kmem_cache_alloc+0x39/0x520 mm/slab.c:3491
       fanotify_alloc_fid_event fs/notify/fanotify/fanotify.c:580 [inline]
       fanotify_alloc_event fs/notify/fanotify/fanotify.c:813 [inline]
       fanotify_handle_event+0x1130/0x3f40 fs/notify/fanotify/fanotify.c:948
       send_to_group fs/notify/fsnotify.c:360 [inline]
       fsnotify+0xafb/0x1680 fs/notify/fsnotify.c:570
       __fsnotify_parent+0x62f/0xa60 fs/notify/fsnotify.c:230
       fsnotify_parent include/linux/fsnotify.h:77 [inline]
       fsnotify_file include/linux/fsnotify.h:99 [inline]
       fsnotify_access include/linux/fsnotify.h:309 [inline]
       __io_complete_rw_common+0x485/0x720 io_uring/rw.c:195
       io_complete_rw+0x1a/0x1f0 io_uring/rw.c:228
       iomap_dio_complete_work fs/iomap/direct-io.c:144 [inline]
       iomap_dio_bio_end_io+0x438/0x5e0 fs/iomap/direct-io.c:178
       bio_endio+0x5f9/0x780 block/bio.c:1564
       req_bio_endio block/blk-mq.c:695 [inline]
       blk_update_request+0x3fc/0x1300 block/blk-mq.c:825
       scsi_end_request+0x7a/0x9a0 drivers/scsi/scsi_lib.c:541
       scsi_io_completion+0x173/0x1f70 drivers/scsi/scsi_lib.c:971
       scsi_complete+0x122/0x3b0 drivers/scsi/scsi_lib.c:1438
       blk_complete_reqs+0xad/0xe0 block/blk-mq.c:1022
       __do_softirq+0x1d3/0x9c6 kernel/softirq.c:571
       invoke_softirq kernel/softirq.c:445 [inline]
       __irq_exit_rcu+0x123/0x180 kernel/softirq.c:650
       irq_exit_rcu+0x5/0x20 kernel/softirq.c:662
       common_interrupt+0xa9/0xc0 arch/x86/kernel/irq.c:240
      
      Fixes: f63cf519 ("io_uring: ensure that fsnotify is always called")
      Link: https://lore.kernel.org/all/20220929135627.ykivmdks2w5vzrwg@quack3/
      
      
      Reported-by: default avatar <syzbot+dfcc5f4da15868df7d4d@syzkaller.appspotmail.com>
      Reported-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      b000145e
  12. Sep 27, 2022
  13. Sep 21, 2022
  14. Sep 13, 2022
  15. Sep 09, 2022
    • Pavel Begunkov's avatar
      io_uring/rw: fix short rw error handling · 4d9cb92c
      Pavel Begunkov authored
      
      We have a couple of problems, first reports of unexpected link breakage
      for reads when cqe->res indicates that the IO was done in full. The
      reason here is partial IO with retries.
      
      TL;DR; we compare the result in __io_complete_rw_common() against
      req->cqe.res, but req->cqe.res doesn't store the full length but rather
      the length left to be done. So, when we pass the full corrected result
      via kiocb_done() -> __io_complete_rw_common(), it fails.
      
      The second problem is that we don't try to correct res in
      io_complete_rw(), which, for instance, might be a problem for O_DIRECT
      but when a prefix of data was cached in the page cache. We also
      definitely don't want to pass a corrected result into io_rw_done().
      
      The fix here is to leave __io_complete_rw_common() alone, always pass
      not corrected result into it and fix it up as the last step just before
      actually finishing the I/O.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://github.com/axboe/liburing/issues/643
      
      
      Reported-by: default avatarBeld Zhang <beldzhang@gmail.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      4d9cb92c
  16. Aug 13, 2022
  17. Jul 25, 2022
Loading