Skip to content
Snippets Groups Projects
  1. Mar 05, 2023
    • Linus Torvalds's avatar
      cpumask: re-introduce constant-sized cpumask optimizations · 596ff4a0
      Linus Torvalds authored
      
      Commit aa47a7c2 ("lib/cpumask: deprecate nr_cpumask_bits") resulted
      in the cpumask operations potentially becoming hugely less efficient,
      because suddenly the cpumask was always considered to be variable-sized.
      
      The optimization was then later added back in a limited form by commit
      6f9c07be ("lib/cpumask: add FORCE_NR_CPUS config option"), but that
      FORCE_NR_CPUS option is not useful in a generic kernel and more of a
      special case for embedded situations with fixed hardware.
      
      Instead, just re-introduce the optimization, with some changes.
      
      Instead of depending on CPUMASK_OFFSTACK being false, and then always
      using the full constant cpumask width, this introduces three different
      cpumask "sizes":
      
       - the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids.
      
         This is used for situations where we should use the exact size.
      
       - the "small" size (small_cpumask_bits) is the NR_CPUS constant if it
         fits in a single word and the bitmap operations thus end up able
         to trigger the "small_const_nbits()" optimizations.
      
         This is used for the operations that have optimized single-word
         cases that get inlined, notably the bit find and scanning functions.
      
       - the "large" size (large_cpumask_bits) is the NR_CPUS constant if it
         is an sufficiently small constant that makes simple "copy" and
         "clear" operations more efficient.
      
         This is arbitrarily set at four words or less.
      
      As a an example of this situation, without this fixed size optimization,
      cpumask_clear() will generate code like
      
              movl    nr_cpu_ids(%rip), %edx
              addq    $63, %rdx
              shrq    $3, %rdx
              andl    $-8, %edx
              callq   memset@PLT
      
      on x86-64, because it would calculate the "exact" number of longwords
      that need to be cleared.
      
      In contrast, with this patch, using a MAX_CPU of 64 (which is quite a
      reasonable value to use), the above becomes a single
      
      	movq $0,cpumask
      
      instruction instead, because instead of caring to figure out exactly how
      many CPU's the system has, it just knows that the cpumask will be a
      single word and can just clear it all.
      
      Note that this does end up tightening the rules a bit from the original
      version in another way: operations that set bits in the cpumask are now
      limited to the actual nr_cpu_ids limit, whereas we used to do the
      nr_cpumask_bits thing almost everywhere in the cpumask code.
      
      But if you just clear bits, or scan for bits, we can use the simpler
      compile-time constants.
      
      In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()'
      which were not useful, and which fundamentally have to be limited to
      'nr_cpu_ids'.  Better remove them now than have somebody introduce use
      of them later.
      
      Of course, on x86-64 with MAXSMP there is no sane small compile-time
      constant for the cpumask sizes, and we end up using the actual CPU bits,
      and will generate the above kind of horrors regardless.  Please don't
      use MAXSMP unless you really expect to have machines with thousands of
      cores.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      596ff4a0
  2. Mar 03, 2023
  3. Feb 28, 2023
  4. Feb 27, 2023
  5. Feb 22, 2023
    • Arnd Bergmann's avatar
      kcsan: select CONFIG_CONSTRUCTORS · 6ba912f1
      Arnd Bergmann authored
      Building a kcsan enabled kernel for x86_64 with gcc-11 results in a lot
      of build warnings or errors without CONFIG_CONSTRUCTORS:
      
      x86_64-linux-ld: error: unplaced orphan section `.ctors.65436' from `arch/x86/lib/copy_mc.o'
      x86_64-linux-ld: error: unplaced orphan section `.ctors.65436' from `arch/x86/lib/cpu.o'
      x86_64-linux-ld: error: unplaced orphan section `.ctors.65436' from `arch/x86/lib/csum-partial_64.o'
      x86_64-linux-ld: error: unplaced orphan section `.ctors.65436' from `arch/x86/lib/csum-wrappers_64.o'
      x86_64-linux-ld: error: unplaced orphan section `.ctors.65436' from `arch/x86/lib/insn-eval.o'
      x86_64-linux-ld: error: unplaced orphan section `.ctors.65436' from `arch/x86/lib/insn.o'
      x86_64-linux-ld: error: unplaced orphan section `.ctors.65436' from `arch/x86/lib/misc.o'
      
      The same thing has been reported for mips64. I can't reproduce it for
      any other compiler version, so I don't know if constructors are always
      required here or if this is a gcc-11 specific implementation detail.
      
      I see no harm in always enabling constructors here, and this reliably
      fixes the build warnings for me.
      
      Link: https://lore.kernel.org/lkml/202204181801.r3MMkwJv-lkp@intel.com/T/
      
      
      Cc: Kees Cook <keescook@chromium.org>
      See-also: 3e663148 ("vmlinux.lds.h: Keep .ctors.* with .ctors")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      6ba912f1
  6. Feb 21, 2023
    • Dave Hansen's avatar
      uaccess: Add speculation barrier to copy_from_user() · 74e19ef0
      Dave Hansen authored
      
      The results of "access_ok()" can be mis-speculated.  The result is that
      you can end speculatively:
      
      	if (access_ok(from, size))
      		// Right here
      
      even for bad from/size combinations.  On first glance, it would be ideal
      to just add a speculation barrier to "access_ok()" so that its results
      can never be mis-speculated.
      
      But there are lots of system calls just doing access_ok() via
      "copy_to_user()" and friends (example: fstat() and friends).  Those are
      generally not problematic because they do not _consume_ data from
      userspace other than the pointer.  They are also very quick and common
      system calls that should not be needlessly slowed down.
      
      "copy_from_user()" on the other hand uses a user-controller pointer and
      is frequently followed up with code that might affect caches.  Take
      something like this:
      
      	if (!copy_from_user(&kernelvar, uptr, size))
      		do_something_with(kernelvar);
      
      If userspace passes in an evil 'uptr' that *actually* points to a kernel
      addresses, and then do_something_with() has cache (or other)
      side-effects, it could allow userspace to infer kernel data values.
      
      Add a barrier to the common copy_from_user() code to prevent
      mis-speculated values which happen after the copy.
      
      Also add a stub for architectures that do not define barrier_nospec().
      This makes the macro usable in generic code.
      
      Since the barrier is now usable in generic code, the x86 #ifdef in the
      BPF code can also go away.
      
      Reported-by: default avatarJordy Zomer <jordyzomer@google.com>
      Suggested-by: default avatarLinus Torvalds <torvalds@linuxfoundation.org>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: Daniel Borkmann <daniel@iogearbox.net>   # BPF bits
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      74e19ef0
    • Masami Hiramatsu (Google)'s avatar
      test_kprobes: Add recursed kprobe test case · 1fcd09fd
      Masami Hiramatsu (Google) authored
      Add a recursed kprobe test case to the KUnit test module for kprobes.
      This will probe a function which is called from the pre_handler and
      post_handler itself. If the kprobe is correctly implemented, the recursed
      kprobe handlers will be skipped and the number of skipped kprobe will
      be counted on kprobe::nmissed.
      
      Link: https://lore.kernel.org/all/167414238758.2301956.258548940194352895.stgit@devnote3/
      
      
      
      Signed-off-by: default avatarMasami Hiramatsu (Google) <mhiramat@kernel.org>
      1fcd09fd
    • David Howells's avatar
      iov_iter: Add a function to extract a page list from an iterator · 7d58fe73
      David Howells authored
      
      Add a function, iov_iter_extract_pages(), to extract a list of pages from
      an iterator.  The pages may be returned with a pin added or nothing,
      depending on the type of iterator.
      
      Add a second function, iov_iter_extract_will_pin(), to determine how the
      cleanup should be done.
      
      There are two cases:
      
       (1) ITER_IOVEC or ITER_UBUF iterator.
      
           Extracted pages will have pins (FOLL_PIN) obtained on them so that a
           concurrent fork() will forcibly copy the page so that DMA is done
           to/from the parent's buffer and is unavailable to/unaffected by the
           child process.
      
           iov_iter_extract_will_pin() will return true for this case.  The
           caller should use something like unpin_user_page() to dispose of the
           page.
      
       (2) Any other sort of iterator.
      
           No refs or pins are obtained on the page, the assumption is made that
           the caller will manage page retention.
      
           iov_iter_extract_will_pin() will return false.  The pages don't need
           additional disposal.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJens Axboe <axboe@kernel.dk>
      cc: Al Viro <viro@zeniv.linux.org.uk>
      cc: John Hubbard <jhubbard@nvidia.com>
      cc: David Hildenbrand <david@redhat.com>
      cc: Matthew Wilcox <willy@infradead.org>
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-mm@kvack.org
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      7d58fe73
    • David Howells's avatar
      iov_iter: Define flags to qualify page extraction. · f62e52d1
      David Howells authored
      
      Define flags to qualify page extraction to pass into iov_iter_*_pages*()
      rather than passing in FOLL_* flags.
      
      For now only a flag to allow peer-to-peer DMA is supported.
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Reviewed-by: default avatarJens Axboe <axboe@kernel.dk>
      cc: Al Viro <viro@zeniv.linux.org.uk>
      cc: Logan Gunthorpe <logang@deltatee.com>
      cc: linux-fsdevel@vger.kernel.org
      cc: linux-block@vger.kernel.org
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      f62e52d1
    • David Howells's avatar
      splice: Add a func to do a splice from a buffered file without ITER_PIPE · 07073eb0
      David Howells authored
      
      Provide a function to do splice read from a buffered file, pulling the
      folios out of the pagecache directly by calling filemap_get_pages() to do
      any required reading and then pasting the returned folios into the pipe.
      
      A helper function is provided to do the actual folio pasting and will
      handle multipage folios by splicing as many of the relevant subpages as
      will fit into the pipe.
      
      The code is loosely based on filemap_read() and might belong in
      mm/filemap.c with that as it needs to use filemap_get_pages().
      
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Reviewed-by: default avatarJens Axboe <axboe@kernel.dk>
      cc: Christoph Hellwig <hch@lst.de>
      cc: Al Viro <viro@zeniv.linux.org.uk>
      cc: David Hildenbrand <david@redhat.com>
      cc: John Hubbard <jhubbard@nvidia.com>
      cc: linux-mm@kvack.org
      cc: linux-block@vger.kernel.org
      cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: default avatarSteve French <stfrench@microsoft.com>
      07073eb0
  7. Feb 17, 2023
  8. Feb 15, 2023
  9. Feb 11, 2023
  10. Feb 10, 2023
  11. Feb 09, 2023
Loading