Skip to content
Snippets Groups Projects
  1. Feb 25, 2023
  2. Feb 19, 2023
  3. Feb 15, 2023
  4. Feb 09, 2023
  5. Jan 21, 2023
  6. Jan 18, 2023
  7. Jan 16, 2023
    • Filipe Manana's avatar
      btrfs: fix race between quota rescan and disable leading to NULL pointer deref · b7adbf9a
      Filipe Manana authored
      
      If we have one task trying to start the quota rescan worker while another
      one is trying to disable quotas, we can end up hitting a race that results
      in the quota rescan worker doing a NULL pointer dereference. The steps for
      this are the following:
      
      1) Quotas are enabled;
      
      2) Task A calls the quota rescan ioctl and enters btrfs_qgroup_rescan().
         It calls qgroup_rescan_init() which returns 0 (success) and then joins a
         transaction and commits it;
      
      3) Task B calls the quota disable ioctl and enters btrfs_quota_disable().
         It clears the bit BTRFS_FS_QUOTA_ENABLED from fs_info->flags and calls
         btrfs_qgroup_wait_for_completion(), which returns immediately since the
         rescan worker is not yet running.
         Then it starts a transaction and locks fs_info->qgroup_ioctl_lock;
      
      4) Task A queues the rescan worker, by calling btrfs_queue_work();
      
      5) The rescan worker starts, and calls rescan_should_stop() at the start
         of its while loop, which results in 0 iterations of the loop, since
         the flag BTRFS_FS_QUOTA_ENABLED was cleared from fs_info->flags by
         task B at step 3);
      
      6) Task B sets fs_info->quota_root to NULL;
      
      7) The rescan worker tries to start a transaction and uses
         fs_info->quota_root as the root argument for btrfs_start_transaction().
         This results in a NULL pointer dereference down the call chain of
         btrfs_start_transaction(). The stack trace is something like the one
         reported in Link tag below:
      
         general protection fault, probably for non-canonical address 0xdffffc0000000041: 0000 [#1] PREEMPT SMP KASAN
         KASAN: null-ptr-deref in range [0x0000000000000208-0x000000000000020f]
         CPU: 1 PID: 34 Comm: kworker/u4:2 Not tainted 6.1.0-syzkaller-13872-gb6bb9676f216 #0
         Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
         Workqueue: btrfs-qgroup-rescan btrfs_work_helper
         RIP: 0010:start_transaction+0x48/0x10f0 fs/btrfs/transaction.c:564
         Code: 48 89 fb 48 (...)
         RSP: 0018:ffffc90000ab7ab0 EFLAGS: 00010206
         RAX: 0000000000000041 RBX: 0000000000000208 RCX: ffff88801779ba80
         RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
         RBP: dffffc0000000000 R08: 0000000000000001 R09: fffff52000156f5d
         R10: fffff52000156f5d R11: 1ffff92000156f5c R12: 0000000000000000
         R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000003
         FS:  0000000000000000(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
         CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
         CR2: 00007f2bea75b718 CR3: 000000001d0cc000 CR4: 00000000003506e0
         DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
         DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
         Call Trace:
          <TASK>
          btrfs_qgroup_rescan_worker+0x3bb/0x6a0 fs/btrfs/qgroup.c:3402
          btrfs_work_helper+0x312/0x850 fs/btrfs/async-thread.c:280
          process_one_work+0x877/0xdb0 kernel/workqueue.c:2289
          worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
          kthread+0x266/0x300 kernel/kthread.c:376
          ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
          </TASK>
         Modules linked in:
      
      So fix this by having the rescan worker function not attempt to start a
      transaction if it didn't do any rescan work.
      
      Reported-by: default avatar <syzbot+96977faa68092ad382c4@syzkaller.appspotmail.com>
      Link: https://lore.kernel.org/linux-btrfs/000000000000e5454b05f065a803@google.com/
      
      
      Fixes: e804861b ("btrfs: fix deadlock between quota disable and qgroup rescan worker")
      CC: stable@vger.kernel.org # 5.4+
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      b7adbf9a
    • Filipe Manana's avatar
      btrfs: fix invalid leaf access due to inline extent during lseek · 1f55ee6d
      Filipe Manana authored
      
      During lseek, for SEEK_DATA and SEEK_HOLE modes, we access the disk_bytenr
      of an extent without checking its type. However inline extents have their
      data starting the offset of the disk_bytenr field, so accessing that field
      when we have an inline extent can result in either of the following:
      
      1) Interpret the inline extent's data as a disk_bytenr value;
      
      2) In case the inline data is less than 8 bytes, we access part of some
         other item in the leaf, or unused space in the leaf;
      
      3) In case the inline data is less than 8 bytes and the extent item is
         the first item in the leaf, we can access beyond the leaf's limit.
      
      So fix this by not accessing the disk_bytenr field if we have an inline
      extent.
      
      Fixes: b6e83356 ("btrfs: make hole and data seeking a lot more efficient")
      Reported-by: default avatarMatthias Schoepfer <matthias.schoepfer@googlemail.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216908
      Link: https://lore.kernel.org/linux-btrfs/7f25442f-b121-2a3a-5a3d-22bcaae83cd4@leemhuis.info/
      
      
      CC: stable@vger.kernel.org # 6.1
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      1f55ee6d
    • Christoph Hellwig's avatar
      btrfs: stop using write_one_page in btrfs_scratch_superblock · 26ecf243
      Christoph Hellwig authored
      
      write_one_page is an awkward interface that expects the page locked and
      ->writepage to be implemented.  Replace that by zeroing the signature
      bytes and synchronize the block device page using the proper bdev
      helpers.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      [ update changelog ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      26ecf243
    • Christoph Hellwig's avatar
      btrfs: factor out scratching of one regular super block · 0e0078f7
      Christoph Hellwig authored
      
      btrfs_scratch_superblocks open codes scratching super block of a
      non-zoned super block.  Split the code to read, zero and write the
      superblock for regular devices into a separate helper.
      
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDavid Sterba <dsterba@suse.com>
      [ update changelog ]
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      0e0078f7
    • Jingbo Xu's avatar
      erofs: clean up parsing of fscache related options · e02ac3e7
      Jingbo Xu authored
      
      ... to avoid the mess of conditional preprocessing as we are continually
      adding fscache related mount options.
      
      Reviewd-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      Reviewed-by: default avatarYue Hu <huyue2@coolpad.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Signed-off-by: default avatarJingbo Xu <jefflexu@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20230112065431.124926-3-jefflexu@linux.alibaba.com
      
      
      Signed-off-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      e02ac3e7
    • Damien Le Moal's avatar
      zonefs: Detect append writes at invalid locations · a608da3b
      Damien Le Moal authored
      
      Using REQ_OP_ZONE_APPEND operations for synchronous writes to sequential
      files succeeds regardless of the zone write pointer position, as long as
      the target zone is not full. This means that if an external (buggy)
      application writes to the zone of a sequential file underneath the file
      system, subsequent file write() operation will succeed but the file size
      will not be correct and the file will contain invalid data written by
      another application.
      
      Modify zonefs_file_dio_append() to check the written sector of an append
      write (returned in bio->bi_iter.bi_sector) and return -EIO if there is a
      mismatch with the file zone wp offset field. This change triggers a call
      to zonefs_io_error() and a zone check. Modify zonefs_io_error_cb() to
      not expose the unexpected data after the current inode size when the
      errors=remount-ro mode is used. Other error modes are correctly handled
      already.
      
      Fixes: 02ef12a6 ("zonefs: use REQ_OP_ZONE_APPEND for sync DIO")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Reviewed-by: default avatarJohannes Thumshirn <johannes.thumshirn@wdc.com>
      a608da3b
  8. Jan 12, 2023
    • Volker Lendecke's avatar
      cifs: Fix uninitialized memory read for smb311 posix symlink create · a152d05a
      Volker Lendecke authored
      
      If smb311 posix is enabled, we send the intended mode for file
      creation in the posix create context. Instead of using what's there on
      the stack, create the mfsymlink file with 0644.
      
      Fixes: ce558b0e ("smb3: Add posix create context for smb3.11 posix mounts")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarVolker Lendecke <vl@samba.org>
      Reviewed-by: default avatarTom Talpey <tom@talpey.com>
      Reviewed-by: default avatarPaulo Alcantara (SUSE) <pc@cjr.nz>
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      a152d05a
    • Filipe Manana's avatar
      btrfs: do not abort transaction on failure to update log root · 09e44868
      Filipe Manana authored
      
      When syncing a log, if we fail to update a log root in the log root tree,
      we are aborting the transaction if the failure was not -ENOSPC. This is
      excessive because there is a chance that a transaction commit can succeed,
      and therefore avoid to turn the filesystem into RO mode. All we need to be
      careful about is to mark the log for a full commit, which we already do,
      to make sure no one commits a super block pointing to an outdated log root
      tree.
      
      So don't abort the transaction if we fail to update a log root in the log
      root tree, and log an error if the failure is not -ENOSPC, so that it does
      not go completely unnoticed.
      
      CC: stable@vger.kernel.org # 6.0+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      09e44868
    • Filipe Manana's avatar
      btrfs: do not abort transaction on failure to write log tree when syncing log · 16199ad9
      Filipe Manana authored
      When syncing the log, if we fail to write log tree extent buffers, we mark
      the log for a full commit and abort the transaction. However we don't need
      to abort the transaction, all we really need to do is to make sure no one
      can commit a superblock pointing to new log tree roots. Just because we
      got a failure writing extent buffers for a log tree, it does not mean we
      will also fail to do a transaction commit.
      
      One particular case is if due to a bug somewhere, when writing log tree
      extent buffers, the tree checker detects some corruption and the writeout
      fails because of that. Aborting the transaction can be very disruptive for
      a user, specially if the issue happened on a root filesystem. One example
      is the scenario in the Link tag below, where an isolated corruption on log
      tree leaves was causing transaction aborts when syncing the log.
      
      Link: https://lore.kernel.org/linux-btrfs/ae169fc6-f504-28f0-a098-6fa6a4dfb612@leemhuis.info/
      
      
      CC: stable@vger.kernel.org # 5.15+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      16199ad9
    • Filipe Manana's avatar
      btrfs: add missing setup of log for full commit at add_conflicting_inode() · 94cd63ae
      Filipe Manana authored
      
      When logging conflicting inodes, if we reach the maximum limit of inodes,
      we return BTRFS_LOG_FORCE_COMMIT to force a transaction commit. However
      we don't mark the log for full commit (with btrfs_set_log_full_commit()),
      which means that once we leave the log transaction and before we commit
      the transaction, some other task may sync the log, which is incomplete
      as we have not logged all conflicting inodes, leading to some inconsistent
      in case that log ends up being replayed.
      
      So also call btrfs_set_log_full_commit() at add_conflicting_inode().
      
      Fixes: e09d94c9 ("btrfs: log conflicting inodes without holding log mutex of the initial inode")
      CC: stable@vger.kernel.org # 6.1
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      94cd63ae
    • Filipe Manana's avatar
      btrfs: fix directory logging due to race with concurrent index key deletion · 8bb6898d
      Filipe Manana authored
      
      Sometimes we log a directory without holding its VFS lock, so while we
      logging it, dir index entries may be added or removed. This typically
      happens when logging a dentry from a parent directory that points to a
      new directory, through log_new_dir_dentries(), or when while logging
      some other inode we also need to log its parent directories (through
      btrfs_log_all_parents()).
      
      This means that while we are at log_dir_items(), we may not find a dir
      index key we found before, because it was deleted in the meanwhile, so
      a call to btrfs_search_slot() may return 1 (key not found). In that case
      we return from log_dir_items() with a success value (the variable 'err'
      has a value of 0). This can lead to a few problems, specially in the case
      where the variable 'last_offset' has a value of (u64)-1 (and it's
      initialized to that when it was declared):
      
      1) By returning from log_dir_items() with success (0) and a value of
         (u64)-1 for '*last_offset_ret', we end up not logging any other dir
         index keys that follow the missing, just deleted, index key. The
         (u64)-1 value makes log_directory_changes() not call log_dir_items()
         again;
      
      2) Before returning with success (0), log_dir_items(), will log a dir
         index range item covering a range from the last old dentry index
         (stored in the variable 'last_old_dentry_offset') to the value of
         'last_offset'. If 'last_offset' has a value of (u64)-1, then it means
         if the log is persisted and replayed after a power failure, it will
         cause deletion of all the directory entries that have an index number
         between last_old_dentry_offset + 1 and (u64)-1;
      
      3) We can end up returning from log_dir_items() with
         ctx->last_dir_item_offset having a lower value than
         inode->last_dir_index_offset, because the former is set to the current
         key we are processing at process_dir_items_leaf(), and at the end of
         log_directory_changes() we set inode->last_dir_index_offset to the
         current value of ctx->last_dir_item_offset. So if for example a
         deletion of a lower dir index key happened, we set
         ctx->last_dir_item_offset to that index value, then if we return from
         log_dir_items() because btrfs_search_slot() returned 1, we end up
         returning from log_dir_items() with success (0) and then
         log_directory_changes() sets inode->last_dir_index_offset to a lower
         value than it had before.
         This can result in unpredictable and unexpected behaviour when we
         need to log again the directory in the same transaction, and can result
         in ending up with a log tree leaf that has duplicated keys, as we do
         batch insertions of dir index keys into a log tree.
      
      So fix this by making log_dir_items() move on to the next dir index key
      if it does not find the one it was looking for.
      
      Reported-by: default avatarDavid Arendt <admin@prnet.org>
      Link: https://lore.kernel.org/linux-btrfs/ae169fc6-f504-28f0-a098-6fa6a4dfb612@leemhuis.info/
      
      
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      8bb6898d
    • Filipe Manana's avatar
      btrfs: fix missing error handling when logging directory items · 6d3d970b
      Filipe Manana authored
      
      When logging a directory, at log_dir_items(), if we get an error when
      attempting to search the subvolume tree for a dir index item, we end up
      returning 0 (success) from log_dir_items() because 'err' is left with a
      value of 0.
      
      This can lead to a few problems, specially in the case the variable
      'last_offset' has a value of (u64)-1 (and it's initialized to that when
      it was declared):
      
      1) By returning from log_dir_items() with success (0) and a value of
         (u64)-1 for '*last_offset_ret', we end up not logging any other dir
         index keys that follow the missing, just deleted, index key. The
         (u64)-1 value makes log_directory_changes() not call log_dir_items()
         again;
      
      2) Before returning with success (0), log_dir_items(), will log a dir
         index range item covering a range from the last old dentry index
         (stored in the variable 'last_old_dentry_offset') to the value of
         'last_offset'. If 'last_offset' has a value of (u64)-1, then it means
         if the log is persisted and replayed after a power failure, it will
         cause deletion of all the directory entries that have an index number
         between last_old_dentry_offset + 1 and (u64)-1;
      
      3) We can end up returning from log_dir_items() with
         ctx->last_dir_item_offset having a lower value than
         inode->last_dir_index_offset, because the former is set to the current
         key we are processing at process_dir_items_leaf(), and at the end of
         log_directory_changes() we set inode->last_dir_index_offset to the
         current value of ctx->last_dir_item_offset. So if for example a
         deletion of a lower dir index key happened, we set
         ctx->last_dir_item_offset to that index value, then if we return from
         log_dir_items() because btrfs_search_slot() returned an error, we end up
         returning without any error from log_dir_items() and then
         log_directory_changes() sets inode->last_dir_index_offset to a lower
         value than it had before.
         This can result in unpredictable and unexpected behaviour when we
         need to log again the directory in the same transaction, and can result
         in ending up with a log tree leaf that has duplicated keys, as we do
         batch insertions of dir index keys into a log tree.
      
      Fix this by setting 'err' to the value of 'ret' in case
      btrfs_search_slot() or btrfs_previous_item() returned an error. That will
      result in falling back to a full transaction commit.
      
      Reported-by: default avatarDavid Arendt <admin@prnet.org>
      Link: https://lore.kernel.org/linux-btrfs/ae169fc6-f504-28f0-a098-6fa6a4dfb612@leemhuis.info/
      
      
      Fixes: e02119d5 ("Btrfs: Add a write ahead tree log to optimize synchronous operations")
      CC: stable@vger.kernel.org # 4.14+
      Reviewed-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarFilipe Manana <fdmanana@suse.com>
      Signed-off-by: default avatarDavid Sterba <dsterba@suse.com>
      6d3d970b
    • Dai Ngo's avatar
      NFSD: replace delayed_work with work_struct for nfsd_client_shrinker · 7c24fa22
      Dai Ngo authored
      
      Since nfsd4_state_shrinker_count always calls mod_delayed_work with
      0 delay, we can replace delayed_work with work_struct to save some
      space and overhead.
      
      Also add the call to cancel_work after unregister the shrinker
      in nfs4_state_shutdown_net.
      
      Signed-off-by: default avatarDai Ngo <dai.ngo@oracle.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      7c24fa22
    • Dai Ngo's avatar
      NFSD: register/unregister of nfsd-client shrinker at nfsd startup/shutdown time · f385f7d2
      Dai Ngo authored
      
      Currently the nfsd-client shrinker is registered and unregistered at
      the time the nfsd module is loaded and unloaded. The problem with this
      is the shrinker is being registered before all of the relevant fields
      in nfsd_net are initialized when nfsd is started. This can lead to an
      oops when memory is low and the shrinker is called while nfsd is not
      running.
      
      This patch moves the  register/unregister of nfsd-client shrinker from
      module load/unload time to nfsd startup/shutdown time.
      
      Fixes: 44df6f43 ("NFSD: add delegation reaper to react to low memory condition")
      Reported-by: default avatarMike Galbraith <efault@gmx.de>
      Signed-off-by: default avatarDai Ngo <dai.ngo@oracle.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      f385f7d2
Loading