Skip to content
Snippets Groups Projects
  1. Mar 03, 2023
  2. Mar 02, 2023
  3. Mar 01, 2023
    • Linus Torvalds's avatar
      capability: just use a 'u64' instead of a 'u32[2]' array · f122a08b
      Linus Torvalds authored
      
      Back in 2008 we extended the capability bits from 32 to 64, and we did
      it by extending the single 32-bit capability word from one word to an
      array of two words.  It was then obfuscated by hiding the "2" behind two
      macro expansions, with the reasoning being that maybe it gets extended
      further some day.
      
      That reasoning may have been valid at the time, but the last thing we
      want to do is to extend the capability set any more.  And the array of
      values not only causes source code oddities (with loops to deal with
      it), but also results in worse code generation.  It's a lose-lose
      situation.
      
      So just change the 'u32[2]' into a 'u64' and be done with it.
      
      We still have to deal with the fact that the user space interface is
      designed around an array of these 32-bit values, but that was the case
      before too, since the array layouts were different (ie user space
      doesn't use an array of 32-bit values for individual capability masks,
      but an array of 32-bit slices of multiple masks).
      
      So that marshalling of data is actually simplified too, even if it does
      remain somewhat obscure and odd.
      
      This was all triggered by my reaction to the new "cap_isidentical()"
      introduced recently.  By just using a saner data structure, it went from
      
      	unsigned __capi;
      	CAP_FOR_EACH_U32(__capi) {
      		if (a.cap[__capi] != b.cap[__capi])
      			return false;
      	}
      	return true;
      
      to just being
      
      	return a.val == b.val;
      
      instead.  Which is rather more obvious both to humans and to compilers.
      
      Cc: Mateusz Guzik <mjguzik@gmail.com>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Paul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f122a08b
  4. Feb 28, 2023
    • Yuezhang Mo's avatar
      exfat: fix the newly allocated clusters are not freed in error handling · d5c514b6
      Yuezhang Mo authored
      
      In error handling 'free_cluster', before num_alloc clusters allocated,
      p_chain->size will not updated and always 0, thus the newly allocated
      clusters are not freed.
      
      Signed-off-by: default avatarYuezhang Mo <Yuezhang.Mo@sony.com>
      Reviewed-by: default avatarAndy Wu <Andy.Wu@sony.com>
      Reviewed-by: default avatarSungjong Seo <sj1557.seo@samsung.com>
      Signed-off-by: default avatarNamjae Jeon <linkinjeon@kernel.org>
      d5c514b6
    • Yuezhang Mo's avatar
      exfat: don't print error log in normal case · 3ce937cb
      Yuezhang Mo authored
      
      When allocating a new cluster, exFAT first allocates from the
      next cluster of the last cluster of the file. If the last cluster
      of the file is the last cluster of the volume, allocate from the
      first cluster. This is a normal case, but the following error log
      will be printed. It makes users confused, so this commit removes
      the error log.
      
      [1960905.181545] exFAT-fs (sdb1): hint_cluster is invalid (262130)
      
      Signed-off-by: default avatarYuezhang Mo <Yuezhang.Mo@sony.com>
      Reviewed-by: default avatarAndy Wu <Andy.Wu@sony.com>
      Reviewed-by: default avatarSungjong Seo <sj1557.seo@samsung.com>
      Signed-off-by: default avatarNamjae Jeon <linkinjeon@kernel.org>
      3ce937cb
    • Yuezhang Mo's avatar
      exfat: remove unneeded code from exfat_alloc_cluster() · 8d2909ee
      Yuezhang Mo authored
      
      In the removed code, num_clusters is 0, nothing is done in
      exfat_chain_cont_cluster(), so it is unneeded, remove it.
      
      Signed-off-by: default avatarYuezhang Mo <Yuezhang.Mo@sony.com>
      Reviewed-by: default avatarAndy Wu <Andy.Wu@sony.com>
      Reviewed-by: default avatarSungjong Seo <sj1557.seo@samsung.com>
      Signed-off-by: default avatarNamjae Jeon <linkinjeon@kernel.org>
      8d2909ee
    • Heming Zhao via Ocfs2-devel's avatar
      ocfs2: fix non-auto defrag path not working issue · 236b9254
      Heming Zhao via Ocfs2-devel authored
      This fixes three issues on move extents ioctl without auto defrag:
      
      a) In ocfs2_find_victim_alloc_group(), we have to convert bits to block
         first in case of global bitmap.
      
      b) In ocfs2_probe_alloc_group(), when finding enough bits in block
         group bitmap, we have to back off move_len to start pos as well,
         otherwise it may corrupt filesystem.
      
      c) In ocfs2_ioctl_move_extents(), set me_threshold both for non-auto
         and auto defrag paths.  Otherwise it will set move_max_hop to 0 and
         finally cause unexpectedly ENOSPC error.
      
      Currently there are no tools triggering the above issues since
      defragfs.ocfs2 enables auto defrag by default.  Tested with manually
      changing defragfs.ocfs2 to run non auto defrag path.
      
      Link: https://lkml.kernel.org/r/20230220050526.22020-1-heming.zhao@suse.com
      
      
      Signed-off-by: default avatarHeming Zhao <heming.zhao@suse.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      236b9254
    • Heming Zhao via Ocfs2-devel's avatar
      ocfs2: fix defrag path triggering jbd2 ASSERT · 60eed1e3
      Heming Zhao via Ocfs2-devel authored
      code path:
      
      ocfs2_ioctl_move_extents
       ocfs2_move_extents
        ocfs2_defrag_extent
         __ocfs2_move_extent
          + ocfs2_journal_access_di
          + ocfs2_split_extent  //sub-paths call jbd2_journal_restart
          + ocfs2_journal_dirty //crash by jbs2 ASSERT
      
      crash stacks:
      
      PID: 11297  TASK: ffff974a676dcd00  CPU: 67  COMMAND: "defragfs.ocfs2"
       #0 [ffffb25d8dad3900] machine_kexec at ffffffff8386fe01
       #1 [ffffb25d8dad3958] __crash_kexec at ffffffff8395959d
       #2 [ffffb25d8dad3a20] crash_kexec at ffffffff8395a45d
       #3 [ffffb25d8dad3a38] oops_end at ffffffff83836d3f
       #4 [ffffb25d8dad3a58] do_trap at ffffffff83833205
       #5 [ffffb25d8dad3aa0] do_invalid_op at ffffffff83833aa6
       #6 [ffffb25d8dad3ac0] invalid_op at ffffffff84200d18
          [exception RIP: jbd2_journal_dirty_metadata+0x2ba]
          RIP: ffffffffc09ca54a  RSP: ffffb25d8dad3b70  RFLAGS: 00010207
          RAX: 0000000000000000  RBX: ffff9706eedc5248  RCX: 0000000000000000
          RDX: 0000000000000001  RSI: ffff97337029ea28  RDI: ffff9706eedc5250
          RBP: ffff9703c3520200   R8: 000000000f46b0b2   R9: 0000000000000000
          R10: 0000000000000001  R11: 00000001000000fe  R12: ffff97337029ea28
          R13: 0000000000000000  R14: ffff9703de59bf60  R15: ffff9706eedc5250
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #7 [ffffb25d8dad3ba8] ocfs2_journal_dirty at ffffffffc137fb95 [ocfs2]
       #8 [ffffb25d8dad3be8] __ocfs2_move_extent at ffffffffc139a950 [ocfs2]
       #9 [ffffb25d8dad3c80] ocfs2_defrag_extent at ffffffffc139b2d2 [ocfs2]
      
      Analysis
      
      This bug has the same root cause of 'commit 7f27ec97 ("ocfs2: call
      ocfs2_journal_access_di() before ocfs2_journal_dirty() in
      ocfs2_write_end_nolock()")'.  For this bug, jbd2_journal_restart() is
      called by ocfs2_split_extent() during defragmenting.
      
      How to fix
      
      For ocfs2_split_extent() can handle journal operations totally by itself. 
      Caller doesn't need to call journal access/dirty pair, and caller only
      needs to call journal start/stop pair.  The fix method is to remove
      journal access/dirty from __ocfs2_move_extent().
      
      The discussion for this patch:
      https://oss.oracle.com/pipermail/ocfs2-devel/2023-February/000647.html
      
      Link: https://lkml.kernel.org/r/20230217003717.32469-1-heming.zhao@suse.com
      
      
      Signed-off-by: default avatarHeming Zhao <heming.zhao@suse.com>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      60eed1e3
    • Mateusz Guzik's avatar
      vfs: avoid duplicating creds in faccessat if possible · 981ee95c
      Mateusz Guzik authored
      access(2) remains commonly used, for example on exec:
      access("/etc/ld.so.preload", R_OK)
      
      or when running gcc: strace -c gcc empty.c
      
        % time     seconds  usecs/call     calls    errors syscall
        ------ ----------- ----------- --------- --------- ----------------
          0.00    0.000000           0        42        26 access
      
      It falls down to do_faccessat without the AT_EACCESS flag, which in turn
      results in allocation of new creds in order to modify fsuid/fsgid and
      caps.  This is a very expensive process single-threaded and most notably
      multi-threaded, with numerous structures getting refed and unrefed on
      imminent new cred destruction.
      
      Turns out for typical consumers the resulting creds would be identical
      and this can be checked upfront, avoiding the hard work.
      
      An access benchmark plugged into will-it-scale running on Cascade Lake
      shows:
      
          test     proc     before       after
          access1     1    1310582     2908735    (+121%) # distinct files
          access1    24    4716491    63822173   (+1353%) # distinct files
          access2    24    2378041     5370335    (+125%) # same file
      
      The above benchmarks are not integrated into will-it-scale, but can be
      found in a pull request:
      
        https://github.com/antonblanchard/will-it-scale/pull/36/files
      
      
      
      Signed-off-by: default avatarMateusz Guzik <mjguzik@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      981ee95c
  5. Feb 27, 2023
  6. Feb 26, 2023
  7. Feb 25, 2023
  8. Feb 24, 2023
  9. Feb 23, 2023
  10. Feb 22, 2023
  11. Feb 21, 2023
Loading