- Mar 03, 2023
-
-
Uday Shankar authored
The block layer might merge together discard requests up until the max_discard_segments limit is hit, but blk_insert_cloned_request checks the segment count against max_segments regardless of the req op. This can result in errors like the following when discards are issued through a DM device and max_discard_segments exceeds max_segments for the queue of the chosen underlying device. blk_insert_cloned_request: over max segments limit. (256 > 129) Fix this by looking at the req_op and enforcing the appropriate segment limit - max_discard_segments for REQ_OP_DISCARDs and max_segments for everything else. Signed-off-by:
Uday Shankar <ushankar@purestorage.com> Reviewed-by:
Keith Busch <kbusch@kernel.org> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230301000655.48112-1-ushankar@purestorage.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 28, 2023
-
-
Breno Leitao authored
Current kernel (d2980d8d) crashes when blk_iocost_init for `nvme1` disk. BUG: kernel NULL pointer dereference, address: 0000000000000050 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page blk_iocost_init (include/asm-generic/qspinlock.h:128 include/linux/spinlock.h:203 include/linux/spinlock_api_smp.h:158 include/linux/spinlock.h:400 block/blk-iocost.c:2884) ioc_qos_write (block/blk-iocost.c:3198) ? kretprobe_perf_func (kernel/trace/trace_kprobe.c:1566) ? kernfs_fop_write_iter (include/linux/slab.h:584 fs/kernfs/file.c:311) ? __kmem_cache_alloc_node (mm/slab.h:? mm/slub.c:3452 mm/slub.c:3491) ? _copy_from_iter (arch/x86/include/asm/uaccess_64.h:46 arch/x86/include/asm/uaccess_64.h:52 lib/iov_iter.c:183 lib/iov_iter.c:628) ? kretprobe_dispatcher (kernel/trace/trace_kprobe.c:1693) cgroup_file_write (kernel/cgroup/cgroup.c:4061) kernfs_fop_write_iter (fs/kernfs/file.c:334) vfs_write (include/linux/fs.h:1849 fs/read_write.c:491 fs/read_write.c:584) ksys_write (fs/read_write.c:637) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) This happens because ioc_refresh_params() is being called without a properly initialized ioc->rqos, which is happening later in the callee side. ioc_refresh_params() -> ioc_autop_idx() tries to access ioc->rqos.disk->queue but ioc->rqos.disk is NULL, causing the BUG above. Create function, called ioc_refresh_params_disk(), that is similar to ioc_refresh_params() but where the "struct gendisk" could be passed as an explicit argument. This function will be called when ioc->rqos.disk is not initialized. Fixes: ce57b558 ("blk-rq-qos: make rq_qos_add and rq_qos_del more useful") Signed-off-by:
Breno Leitao <leitao@debian.org> Acked-by:
Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20230228111654.1778120-1-leitao@debian.org Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 24, 2023
-
-
Jens Axboe authored
Wei reports a crash with an application using polled IO: PGD 14265e067 P4D 14265e067 PUD 47ec50067 PMD 0 Oops: 0000 [#1] SMP CPU: 0 PID: 21915 Comm: iocore_0 Kdump: loaded Tainted: G S 5.12.0-0_fbk12_clang_7346_g1bb6f2e7058f #1 Hardware name: Wiwynn Delta Lake MP T8/Delta Lake-Class2, BIOS Y3DLM08 04/10/2022 RIP: 0010:bio_poll+0x25/0x200 Code: 0f 1f 44 00 00 0f 1f 44 00 00 55 41 57 41 56 41 55 41 54 53 48 83 ec 28 65 48 8b 04 25 28 00 00 00 48 89 44 24 20 48 8b 47 08 <48> 8b 80 70 02 00 00 4c 8b 70 50 8b 6f 34 31 db 83 fd ff 75 25 65 RSP: 0018:ffffc90005fafdf8 EFLAGS: 00010292 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 74b43cd65dd66600 RDX: 0000000000000003 RSI: ffffc90005fafe78 RDI: ffff8884b614e140 RBP: ffff88849964df78 R08: 0000000000000000 R09: 0000000000000008 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88849964df00 R13: ffffc90005fafe78 R14: ffff888137d3c378 R15: 0000000000000001 FS: 00007fd195000640(0000) GS:ffff88903f400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000270 CR3: 0000000466121001 CR4: 00000000007706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: iocb_bio_iopoll+0x1d/0x30 io_do_iopoll+0xac/0x250 __se_sys_io_uring_enter+0x3c5/0x5a0 ? __x64_sys_write+0x89/0xd0 do_syscall_64+0x2d/0x40 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x94f225d Code: 24 cc 00 00 00 41 8b 84 24 d0 00 00 00 c1 e0 04 83 e0 10 41 09 c2 8b 33 8b 53 04 4c 8b 43 18 4c 63 4b 0c b8 aa 01 00 00 0f 05 <85> c0 0f 88 85 00 00 00 29 03 45 84 f6 0f 84 88 00 00 00 41 f6 c7 RSP: 002b:00007fd194ffcd88 EFLAGS: 00000202 ORIG_RAX: 00000000000001aa RAX: ffffffffffffffda RBX: 00007fd194ffcdc0 RCX: 00000000094f225d RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000007 RBP: 00007fd194ffcdb0 R08: 0000000000000000 R09: 0000000000000008 R10: 0000000000000001 R11: 0000000000000202 R12: 00007fd269d68030 R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000 which is due to bio->bi_bdev being NULL. This can happen if we have two tasks doing polled IO, and task B ends up completing IO from task A if they are sharing a poll queue. If task B completes the IO and puts the bio into our cache, then it can allocate that bio again before task A is done polling for it. As that would necessitate a preempt between the two tasks, it's enough to just be a bit more careful in checking for whether or not bio->bi_bdev is NULL. Reported-and-tested-by:
Wei Zhang <wzhang@meta.com> Cc: stable@vger.kernel.org Fixes: be4d234d ("bio: add allocation cache abstraction") Reviewed-by:
Keith Busch <kbusch@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
This isn't strictly needed in terms of correctness, but it does allow polling to know if the bio has been put already by a different task and hence avoid polling something that we don't need to. Cc: stable@vger.kernel.org Fixes: be4d234d ("bio: add allocation cache abstraction") Reviewed-by:
Keith Busch <kbusch@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 21, 2023
-
-
Juhyung Park authored
bdev_get_queue() never returns NULL. Several commits [1][2] have been made before to remove such superfluous checks, but some still remained. For places where bdev_get_queue() is called solely for NULL checks, it is removed entirely. [1] commit ec9fd2a1 ("blk-lib: don't check bdev_get_queue() NULL check") [2] commit fea127b3 ("block: remove superfluous check for request queue in bdev_is_zoned()") Signed-off-by:
Juhyung Park <qkrwngud825@gmail.com> Reviewed-by:
Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20230203024029.48260-1-qkrwngud825@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
David Howells authored
Define flags to qualify page extraction to pass into iov_iter_*_pages*() rather than passing in FOLL_* flags. For now only a flag to allow peer-to-peer DMA is supported. Signed-off-by:
David Howells <dhowells@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
John Hubbard <jhubbard@nvidia.com> Reviewed-by:
Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: Logan Gunthorpe <logang@deltatee.com> cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org Signed-off-by:
Steve French <stfrench@microsoft.com>
-
- Feb 17, 2023
-
-
Yu Kuai authored
As explained in commit 36369f46 ("block: Do not reread partition table on exclusively open device"), reread partition on the device that is exclusively opened by someone else is problematic. This patch will make sure partition scan will only be proceed if current thread open the device exclusively, or the device is not opened exclusively, and in the later case, other scanners and exclusive openers will be blocked temporarily until partition scan is done. Fixes: 10c70d95 ("block: remove the bd_openers checks in blk_drop_partitions") Cc: <stable@vger.kernel.org> Suggested-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230217022200.3092987-3-yukuai1@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Yu Kuai authored
This reverts commit 36369f46. This patch can't fix the problem in a corner case that device can be opened exclusively after the checking and before blkdev_get_by_dev(). We'll use a new solution to fix the problem in the next patch, and the new solution doesn't need to change apis. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Acked-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230217022200.3092987-2-yukuai1@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Luca Boccassi authored
Not every OPAL drive supports SUM (Single User Mode), so report this information to userspace via the get-status ioctl so that we can adjust the formatting options accordingly. Tested on a kingston drive (which supports it) and a samsung one (which does not). Signed-off-by:
Luca Boccassi <bluca@debian.org> Link: https://lore.kernel.org/r/20230210010612.28729-1-luca.boccassi@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jens Axboe authored
kernel test robot complains about a type mismatch: block/blk-merge.c:984:42: sparse: expected restricted blk_opf_t const [usertype] ff block/blk-merge.c:984:42: sparse: got unsigned int block/blk-merge.c:1010:42: sparse: sparse: incorrect type in initializer (different base types) @@ expected restricted blk_opf_t const [usertype] ff @@ got unsigned int @@ block/blk-merge.c:1010:42: sparse: expected restricted blk_opf_t const [usertype] ff block/blk-merge.c:1010:42: sparse: got unsigned int because bio_failfast() is return an unsigned int rather than the appropriate blk_opt_f type. Fix it up. Fixes: 3ce6a115 ("block: sync mixed merged request's failfast with 1st bio's") Reported-by:
kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202302170743.GXypM9Rt-lkp@intel.com/ Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 16, 2023
-
-
Martin K. Petersen authored
Make sure to copy the flags when a bio_integrity_payload is cloned. Otherwise per-I/O properties such as IP checksum flag will not be passed down to the HBA driver. Since the integrity buffer is owned by the original bio, the BIP_BLOCK_INTEGRITY flag needs to be masked off to avoid a double free in the completion path. Fixes: aae7df50 ("block: Integrity checksum flag") Fixes: b1f01388 ("block: Relocate bio integrity flags") Reported-by:
Saurav Kashyap <skashyap@marvell.com> Tested-by:
Saurav Kashyap <skashyap@marvell.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20230215171801.21062-1-martin.petersen@oracle.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Jinke Han authored
In the current code, io statistics are missing for cgroup when bio was throttled by blk-throttle. Fix it by moving the unreaching code to submit_bio_noacct_nocheck. Fixes: 3f98c753 ("block: don't check bio in blk_throtl_dispatch_work_fn") Signed-off-by:
Jinke Han <hanjinke.666@bytedance.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Acked-by:
Muchun Song <songmuchun@bytedance.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230216032250.74230-1-hanjinke.666@bytedance.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
We support mixed merge for requests/bios with different fastfail settings. When request fails, each time we only handle the portion with same failfast setting, then bios with failfast can be failed immediately, and bios without failfast can be retried. The idea is pretty good, but the current implementation has several defects: 1) initially RA bio doesn't set failfast, however bio merge code doesn't consider this point, and just check its failfast setting for deciding if mixed merge is required. Fix this issue by adding helper of bio_failfast(). 2) when merging bio to request front, if this request is mixed merged, we have to sync request's faifast setting with 1st bio's failfast. Fix it by calling blk_update_mixed_merge(). 3) when merging bio to request back, if this request is mixed merged, we have to mark the bio as failfast, because blk_update_request simply updates request failfast with 1st bio's failfast. Fix it by calling blk_update_mixed_merge(). Fixes one normal EXT4 READ IO failure issue, because it is observed that the normal READ IO is merged with RA IO, and the mixed merged request has different failfast setting with 1st bio's, so finally the normal READ IO doesn't get retried. Cc: Tejun Heo <tj@kernel.org> Fixes: 80a761fd ("block: implement mixed merge of different failfast requests") Signed-off-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230209125527.667004-1-ming.lei@redhat.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 15, 2023
-
-
Christoph Hellwig authored
bio_split_rw can be used by file systems to split and incoming write bio into multiple bios fitting the hardware limit for use as ZONE_APPEND bios. Export it for initial use in btrfs. Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com>
-
- Feb 14, 2023
-
-
Christoph Hellwig authored
This reverts commit 84d7d462. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-6-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
This reverts commit 821e840c08ad83736eced4037cdad864e95e2584. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-5-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
This reverts commit 178fa7d4. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-4-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
This reverts commit c43332fe as it is not needed without moving to disk references in the blkg. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-3-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
This reverts commit 3f13ab7c as a patch it depends on caused a few problems. Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230214183308.1658775-2-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 10, 2023
-
-
Bart Van Assche authored
Commit b99182c5 ("bio: add pcpu caching for non-polling bio_put") removed the code that uses this constant. Hence also remove the constant itself. Cc: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by:
Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230209230135.3475829-1-bvanassche@acm.org Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 09, 2023
-
-
Thomas Weißschuh authored
Since commit ee6d3dd4 ("driver core: make kobj_type constant.") the driver core allows the usage of const struct kobj_type. Take advantage of this to constify the structure definitions to prevent modification at runtime. Signed-off-by:
Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20230208-kobj_type-block-v1-1-0b3eafd7d983@weissschuh.net Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Xiao Ni authored
It checks if plug->cached_rq is empty before merging bio. But the merge action doesn't have relationship with plug->cached_rq, it trys to merge bio with requests within plug->mq_list. Now it checks if ->cached_rq is empty before merging bio. If it's empty, it will miss the merge chances. So move the merge function before checking ->cached_rq. Signed-off-by:
Xiao Ni <xni@redhat.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230209031930.27354-1-xni@redhat.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
It turns out this was too soon. blkg_conf_prep does to funky locking games with the queue lock for this to work properly. This reverts commit 27b642b0. Reported-by:
Dan Carpenter <error27@gmail.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230209053523.437927-1-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
While del_gendisk ensures there is no outstanding I/O on the queue, it can't prevent block layer users from building new I/O. This leads to a NULL ->root_blkg reference in bio_associate_blkg when allocating a new bio on a shut down file system. Delay freeing the blk-cgroup subsystems from del_gendisk until disk_release to make sure the blkg and throttle information is still avaіlable for bio submitters, even if those bios will immediately fail. This now can cause a case where disk_release is called on a disk that hasn't been added. That's mostly harmless, except for a case in blk_throttl_exit that now needs to check for a NULL ->td pointer. Fixes: 178fa7d4 ("blk-cgroup: delay blk-cgroup initialization until add_disk") Reported-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230208063514.171485-1-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 07, 2023
-
-
Yu Kuai authored
After commit dfd6200a ("blk-cgroup: support to track if policy is online"), there is no need to do this again in bfq. However, 'pd->online' is not protected by 'bfqd->lock', in order to make sure bfq won't see that 'pd->online' is still set after bfq_pd_offline(), clear it before bfq_pd_offline() is called. This is fine because other polices doesn't use 'pd->online' and bfq_pd_offline() will move active bfqq to root cgroup anyway. Signed-off-by:
Yu Kuai <yukuai3@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230202134913.2364549-1-yukuai1@huaweicloud.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 06, 2023
-
-
Kemeng Shi authored
Commit 88022d72 ("blk-mq: don't handle failure in .get_budget") remove BLK_STS_RESOURCE return value and we only check if we can get the budget from .get_budget() now. Correct stale comment that ".get_budget() returns BLK_STS_NO_RESOURCE" to ".get_budget() fails to get the budget". Fixes: 88022d72 ("blk-mq: don't handle failure in .get_budget") Signed-off-by:
Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Use switch/case handle error as other function do to improve readability in blk_mq_try_issue_list_directly. Signed-off-by:
Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Commit 113285b4 ("blk-mq: ensure that bd->last is always set correctly") will set last if we failed to get driver tag for next request to avoid flush miss as we break the list walk and will not send the last request in the list which will be sent with last set normally. This code seems stale now becase the flush introduced is always redundant as: For case tag is really out, we will send a extra flush if we find list is not empty after list walk. For case some tag is freed before retry in blk_mq_prep_dispatch_rq for next, then we can get a tag for next request in retry and flush notified already is not necessary. Just remove these stale codes. Signed-off-by:
Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
blk_mq_dispatch_rq_list will notify if hctx is busy in return bool. It will return true if we are not busy and can handle more and return false on the opposite. Inside blk_mq_dispatch_rq_list, errors is only used if list is empty and we will return true if list is empty and (errors + queued) != 0. There are three types of status returned from request: -busy error BLK_STS*_RESOURCE: the failed request will be added back to list and list will not be empty. -BLK_STS_OK: We count queued for BLK_STS_OK -rest error: We count errors for rest error If list is empty, there is no request gets busy error then (errors + queued) will be total requests in the list which is checked not empty at beginning of blk_mq_dispatch_rq_list. So (errors + queued) != 0 is always met if list is empty. Then the (errors + queued) != 0 check and errors number count is not needed. Signed-off-by:
Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
1. Remove check of needs_resource and ret == BLK_STS_DEV_RESOURCE. For busy error BLK_STS*_RESOURCE, request will always be added back to list, so need_resource will not be true and ret will not be == BLK_STS_DEV_RESOURCE if list is empty. We could remove these dead check. 2. Check ret of last request instead of errors If list is empty, we only need to explicitly commit_rqs if error happens at last request which is stored in ret. So check ret of last request instead of errors to remove unnecessary commit_rqs triggered by errors returned from previous request. Signed-off-by:
Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Call blk_mq_commit_rqs instead of access ->commit_rqs directly. As you can see in comment of blk_mq_commit_rqs, we only need explicitly call this in two cases: -did not queue everything initially scheduled to queue -the last attempt to queue a request failed Both cases can be checked with ret of last request which breaks list walk. Then we can remove unnecessary error count and unnecessary commit triggered by error besides cases described above. Signed-off-by:
Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
We need only to explicitly commit in two error cases: -did not queue everything initially scheduled to queue -the last attempt to queue a request failed (see comment of blk_mq_commit_rqs for more details). Both cases can be checked with ret of last request which breaks list walk. Remove unnecessary error count and unnecessary commit triggered by error which is not covered by cases described above. Signed-off-by:
Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
1. move blk_mq_commit_rqs forward before functions need commits. 2. add queued check and only commits request if any request was queued in blk_mq_commit_rqs to keep commit behavior consistent and remove unnecessary commit. 3. split the queued clearing from blk_mq_plug_commit_rqs as it is not wanted general. 4. sync current caller of blk_mq_commit_rqs with new general blk_mq_commit_rqs. 5. document rule for unusual cases which need explicit commit_rqs. Suggested-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Function blk_mq_plug_issue_direct tries to issue batch requests in plug list to driver directly. We will only issue plug request to driver if we are not from scheduler, so from_scheduler parameter of blk_mq_plug_issue_direct is always false. Remove unncessary from_scheduler of blk_mq_plug_issue_direct. Signed-off-by:
Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
We only break the list walk if we get 'BLK_STS_*RESOURCE'. We also count errors for 'BLK_STS_*RESOURCE' error. If list is not empty, errors will always be non-zero. So we can remove unnecessary list_empty check. This will remove redundant list_empty check for case that error happened at sending last request in list. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Commit f906a6a0 ("blk-mq: improve tag waiting setup for non-shared tags") mark restart for unshared tags for improvement. At that time, tags is only shared betweens queues and we can check if tags is shared by test BLK_MQ_F_TAG_SHARED. Afterwards, commit 32bc15af ("blk-mq: Facilitate a shared sbitmap per tagset") enabled tags share betweens hctxs inside a queue. We only mark restart for shared hctxs inside a queue and may cause io hung if there is no tag currently allocated by hctxs going to be marked restart. Wait on sbitmap_queue instead of mark restart for shared hctxs case to fix this. Fixes: 32bc15af ("blk-mq: Facilitate a shared sbitmap per tagset") Signed-off-by:
Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
For shared queues case, we will only wait on bitmap_tags if we fail to get driver tag. However, rq could be from breserved_tags, then two problems will occur: 1. io hung if no tag is currently allocated from bitmap_tags. 2. unnecessary wakeup when tag is freed to bitmap_tags while no tag is freed to breserved_tags. Wait on the bitmap which rq from to fix this. Fixes: f906a6a0 ("blk-mq: improve tag waiting setup for non-shared tags") Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Commit 97889f9a ("blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()") remove handle of TAG_SHARED in restart, then shared_hctx_restart counted for how many hardware queues are marked for restart is removed too. Remove the stale comment that we still count hardware queues need restart. Fixes: 97889f9a ("blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()") Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Kemeng Shi authored
Commit 1f5bd336 ("blk-mq: add blk_mq_alloc_request_hctx") add blk_mq_alloc_request_hctx to send commands to a specific queue. If BLK_MQ_REQ_NOWAIT is not set in tag allocation, we may change to different hctx after sleep and get tag from unexpected hctx. So BLK_MQ_REQ_NOWAIT must be set in flags for blk_mq_alloc_request_hctx. After commit 600c3b0c ("blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx"), blk_mq_alloc_request_hctx return -EINVAL if both BLK_MQ_REQ_NOWAIT and BLK_MQ_REQ_RESERVED are not set instead of if BLK_MQ_REQ_NOWAIT is not set. So if BLK_MQ_REQ_NOWAIT is not set and BLK_MQ_REQ_RESERVED is set, blk_mq_alloc_request_hctx could alloc tag from unexpected hctx. I guess what we need here is that return -EINVAL if either BLK_MQ_REQ_NOWAIT or BLK_MQ_REQ_RESERVED is not set. Currently both BLK_MQ_REQ_NOWAIT and BLK_MQ_REQ_RESERVED will be set if specific hctx is needed in nvme_auth_submit, nvmf_connect_io_queue and nvmf_connect_admin_queue. Fix the potential BLK_MQ_REQ_NOWAIT missed case in future. Fixes: 600c3b0c ("blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx") Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Kemeng Shi <shikemeng@huaweicloud.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The capability attribute was added in 2017 to expose the kernel internal GENHD_FL_MEDIA_CHANGE_NOTIFY to userspace without ever adding a value to an UAPI header, and without ever setting it in any driver until it was finally removed in Linux 5.7. Deprecate the file and always return 0 instead of exposing the other internal and frequently renumbered other gendisk flags. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230203150209.3199115-1-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-