Skip to content
Snippets Groups Projects
  1. Sep 28, 2022
  2. Aug 23, 2022
    • Mark Rutland's avatar
      arm64: fix rodata=full · 2e8cff0a
      Mark Rutland authored
      
      On arm64, "rodata=full" has been suppored (but not documented) since
      commit:
      
        c55191e9 ("arm64: mm: apply r/o permissions of VM areas to its linear alias as well")
      
      As it's necessary to determine the rodata configuration early during
      boot, arm64 has an early_param() handler for this, whereas init/main.c
      has a __setup() handler which is run later.
      
      Unfortunately, this split meant that since commit:
      
        f9a40b08 ("init/main.c: return 1 from handled __setup() functions")
      
      ... passing "rodata=full" would result in a spurious warning from the
      __setup() handler (though RO permissions would be configured
      appropriately).
      
      Further, "rodata=full" has been broken since commit:
      
        0d6ea3ac ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()")
      
      ... which caused strtobool() to parse "full" as false (in addition to
      many other values not documented for the "rodata=" kernel parameter.
      
      This patch fixes this breakage by:
      
      * Moving the core parameter parser to an __early_param(), such that it
        is available early.
      
      * Adding an (optional) arch hook which arm64 can use to parse "full".
      
      * Updating the documentation to mention that "full" is valid for arm64.
      
      * Having the core parameter parser handle "on" and "off" explicitly,
        such that any undocumented values (e.g. typos such as "ful") are
        reported as errors rather than being silently accepted.
      
      Note that __setup() and early_param() have opposite conventions for
      their return values, where __setup() uses 1 to indicate a parameter was
      handled and early_param() uses 0 to indicate a parameter was handled.
      
      Fixes: f9a40b08 ("init/main.c: return 1 from handled __setup() functions")
      Fixes: 0d6ea3ac ("lib/kstrtox.c: add "false"/"true" support to kstrtobool()")
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Jagdish Gediya <jvgediya@linux.ibm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Will Deacon <will@kernel.org>
      Reviewed-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Link: https://lore.kernel.org/r/20220817154022.3974645-1-mark.rutland@arm.com
      
      
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      2e8cff0a
  3. Aug 21, 2022
  4. Jul 27, 2022
  5. Jul 23, 2022
    • Tejun Heo's avatar
      cgroup: Make !percpu threadgroup_rwsem operations optional · 6a010a49
      Tejun Heo authored
      
      3942a9bd ("locking, rcu, cgroup: Avoid synchronize_sched() in
      __cgroup_procs_write()") disabled percpu operations on threadgroup_rwsem
      because the impiled synchronize_rcu() on write locking was pushing up the
      latencies too much for android which constantly moves processes between
      cgroups.
      
      This makes the hotter paths - fork and exit - slower as they're always
      forced into the slow path. There is no reason to force this on everyone
      especially given that more common static usage pattern can now completely
      avoid write-locking the rwsem. Write-locking is elided when turning on and
      off controllers on empty sub-trees and CLONE_INTO_CGROUP enables seeding a
      cgroup without grabbing the rwsem.
      
      Restore the default percpu operations and introduce the mount option
      "favordynmods" and config option CGROUP_FAVOR_DYNMODS for users who need
      lower latencies for the dynamic operations.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Michal Koutn� <mkoutny@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Dmitry Shmidt <dimitrysh@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      6a010a49
  6. Jul 18, 2022
    • Dan Moulding's avatar
      init: add "hostname" kernel parameter · 5a704629
      Dan Moulding authored
      The gethostname system call returns the hostname for the current machine. 
      However, the kernel has no mechanism to initially set the current
      machine's name in such a way as to guarantee that the first userspace
      process to call gethostname will receive a meaningful result.  It relies
      on some unspecified userspace process to first call sethostname before
      gethostname can produce a meaningful name.
      
      Traditionally the machine's hostname is set from userspace by the init
      system.  The init system, in turn, often relies on a configuration file
      (say, /etc/hostname) to provide the value that it will supply in the call
      to sethostname.  Consequently, the file system containing /etc/hostname
      usually must be available before the hostname will be set.  There may,
      however, be earlier userspace processes that could call gethostname before
      the file system containing /etc/hostname is mounted.  Such a process will
      get some other, likely meaningless, name from gethostname (such as
      "(none)", "localhost", or "darkstar").
      
      A real-world example where this can happen, and lead to undesirable
      results, is with mdadm.  When assembling arrays, mdadm distinguishes
      between "local" arrays and "foreign" arrays.  A local array is one that
      properly belongs to the current machine, and a foreign array is one that
      is (possibly temporarily) attached to the current machine, but properly
      belongs to some other machine.  To determine if an array is local or
      foreign, mdadm may compare the "homehost" recorded on the array with the
      current hostname.  If mdadm is run before the root file system is mounted,
      perhaps because the root file system itself resides on an md-raid array,
      then /etc/hostname isn't yet available and the init system will not yet
      have called sethostname, causing mdadm to incorrectly conclude that all of
      the local arrays are foreign.
      
      Solving this problem *could* be delegated to the init system.  It could be
      left up to the init system (including any init system that starts within
      an initramfs, if one is in use) to ensure that sethostname is called
      before any other userspace process could possibly call gethostname. 
      However, it may not always be obvious which processes could call
      gethostname (for example, udev itself might not call gethostname, but it
      could via udev rules invoke processes that do).  Additionally, the init
      system has to ensure that the hostname configuration value is stored in
      some place where it will be readily accessible during early boot. 
      Unfortunately, every init system will attempt to (or has already attempted
      to) solve this problem in a different, possibly incorrect, way.  This
      makes getting consistently working configurations harder for users.
      
      I believe it is better for the kernel to provide the means by which the
      hostname may be set early, rather than making this a problem for the init
      system to solve.  The option to set the hostname during early startup, via
      a kernel parameter, provides a simple, reliable way to solve this problem.
      It also could make system configuration easier for some embedded systems.
      
      [dmoulding@me.com: v2]
        Link: https://lkml.kernel.org/r/20220506060310.7495-2-dmoulding@me.com
      Link: https://lkml.kernel.org/r/20220505180651.22849-2-dmoulding@me.com
      
      
      Signed-off-by: default avatarDan Moulding <dmoulding@me.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      5a704629
  7. Jul 15, 2022
  8. Jul 12, 2022
    • Christophe Leroy's avatar
      module: Move module's Kconfig items in kernel/module/ · 73b4fc92
      Christophe Leroy authored
      
      In init/Kconfig, the part dedicated to modules is quite large.
      
      Move it into a dedicated Kconfig in kernel/module/
      
      MODULES_TREE_LOOKUP was outside of the 'if MODULES', but as it is
      only used when MODULES are set, move it in with everything else to
      avoid confusion.
      
      MODULE_SIG_FORMAT is left in init/Kconfig because this configuration
      item is not used in kernel/modules/ but in kernel/ and can be
      selected independently from CONFIG_MODULES. It is for instance
      selected from security/integrity/ima/Kconfig.
      
      Signed-off-by: default avatarChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      73b4fc92
  9. Jul 07, 2022
  10. Jul 02, 2022
  11. Jun 30, 2022
    • Frederic Weisbecker's avatar
      context_tracking: Split user tracking Kconfig · 24a9c541
      Frederic Weisbecker authored
      
      Context tracking is going to be used not only to track user transitions
      but also idle/IRQs/NMIs. The user tracking part will then become a
      separate feature. Prepare Kconfig for that.
      
      [ frederic: Apply Max Filippov feedback. ]
      
      Signed-off-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
      Cc: Uladzislau Rezki <uladzislau.rezki@sony.com>
      Cc: Joel Fernandes <joel@joelfernandes.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Nicolas Saenz Julienne <nsaenz@kernel.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Xiongfeng Wang <wangxiongfeng2@huawei.com>
      Cc: Yu Liao <liaoyu15@huawei.com>
      Cc: Phil Auld <pauld@redhat.com>
      Cc: Paul Gortmaker<paul.gortmaker@windriver.com>
      Cc: Alex Belits <abelits@marvell.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Reviewed-by: default avatarNicolas Saenz Julienne <nsaenzju@redhat.com>
      Tested-by: default avatarNicolas Saenz Julienne <nsaenzju@redhat.com>
      24a9c541
  12. Jun 20, 2022
    • Paul E. McKenney's avatar
      rcu-tasks: Add data structures for lightweight grace periods · 434c9eef
      Paul E. McKenney authored
      
      This commit adds fields to task_struct and to rcu_tasks_percpu that will
      be used to avoid the task-list scan for RCU Tasks Trace grace periods,
      and also initializes these fields.
      
      Signed-off-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Cc: Neeraj Upadhyay <quic_neeraju@quicinc.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: KP Singh <kpsingh@kernel.org>
      434c9eef
  13. Jun 09, 2022
    • Linus Torvalds's avatar
      gcc-12: disable '-Warray-bounds' universally for now · f0be87c4
      Linus Torvalds authored
      
      In commit 8b202ee2 ("s390: disable -Warray-bounds") the s390 people
      disabled the '-Warray-bounds' warning for gcc-12, because the new logic
      in gcc would cause warnings for their use of the S390_lowcore macro,
      which accesses absolute pointers.
      
      It turns out gcc-12 has many other issues in this area, so this takes
      that s390 warning disable logic, and turns it into a kernel build config
      entry instead.
      
      Part of the intent is that we can make this all much more targeted, and
      use this conflig flag to disable it in only particular configurations
      that cause problems, with the s390 case as an example:
      
              select GCC12_NO_ARRAY_BOUNDS
      
      and we could do that for other configuration cases that cause issues.
      
      Or we could possibly use the CONFIG_CC_NO_ARRAY_BOUNDS thing in a more
      targeted way, and disable the warning only for particular uses: again
      the s390 case as an example:
      
        KBUILD_CFLAGS_DECOMPRESSOR += $(if $(CONFIG_CC_NO_ARRAY_BOUNDS),-Wno-array-bounds)
      
      but this ends up just doing it globally in the top-level Makefile, since
      the current issues are spread fairly widely all over:
      
        KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds
      
      We'll try to limit this later, since the gcc-12 problems are rare enough
      that *much* of the kernel can be built with it without disabling this
      warning.
      
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Nathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f0be87c4
  14. May 27, 2022
  15. May 24, 2022
    • Masahiro Yamada's avatar
      kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS · 7b453719
      Masahiro Yamada authored
      
      include/{linux,asm-generic}/export.h defines a weak symbol, __crc_*
      as a placeholder.
      
      Genksyms writes the version CRCs into the linker script, which will be
      used for filling the __crc_* symbols. The linker script format depends
      on CONFIG_MODULE_REL_CRCS. If it is enabled, __crc_* holds the offset
      to the reference of CRC.
      
      It is time to get rid of this complexity.
      
      Now that modpost parses text files (.*.cmd) to collect all the CRCs,
      it can generate C code that will be linked to the vmlinux or modules.
      
      Generate a new C file, .vmlinux.export.c, which contains the CRCs of
      symbols exported by vmlinux. It is compiled and linked to vmlinux in
      scripts/link-vmlinux.sh.
      
      Put the CRCs of symbols exported by modules into the existing *.mod.c
      files. No additional build step is needed for modules. As before,
      *.mod.c are compiled and linked to *.ko in scripts/Makefile.modfinal.
      
      No linker magic is used here. The new C implementation works in the
      same way, whether CONFIG_RELOCATABLE is enabled or not.
      CONFIG_MODULE_REL_CRCS is no longer needed.
      
      Previously, Kbuild invoked additional $(LD) to update the CRCs in
      objects, but this step is unneeded too.
      
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Tested-by: default avatarNathan Chancellor <nathan@kernel.org>
      Tested-by: default avatarNicolas Schier <nicolas@fjasle.eu>
      Reviewed-by: default avatarNicolas Schier <nicolas@fjasle.eu>
      Tested-by: Sedat Dilek <sedat.dilek@gmail.com> # LLVM-14 (x86-64)
      7b453719
  16. May 19, 2022
  17. May 18, 2022
    • Jason A. Donenfeld's avatar
      random: handle latent entropy and command line from random_init() · 2f14062b
      Jason A. Donenfeld authored
      
      Currently, start_kernel() adds latent entropy and the command line to
      the entropy bool *after* the RNG has been initialized, deferring when
      it's actually used by things like stack canaries until the next time
      the pool is seeded. This surely is not intended.
      
      Rather than splitting up which entropy gets added where and when between
      start_kernel() and random_init(), just do everything in random_init(),
      which should eliminate these kinds of bugs in the future.
      
      While we're at it, rename the awkwardly titled "rand_initialize()" to
      the more standard "random_init()" nomenclature.
      
      Reviewed-by: default avatarDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      2f14062b
  18. May 13, 2022
    • Jason A. Donenfeld's avatar
      init: call time_init() before rand_initialize() · fe222a6c
      Jason A. Donenfeld authored
      
      Currently time_init() is called after rand_initialize(), but
      rand_initialize() makes use of the timer on various platforms, and
      sometimes this timer needs to be initialized by time_init() first. In
      order for random_get_entropy() to not return zero during early boot when
      it's potentially used as an entropy source, reverse the order of these
      two calls. The block doing random initialization was right before
      time_init() before, so changing the order shouldn't have any complicated
      effects.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarStafford Horne <shorne@gmail.com>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      fe222a6c
    • Peter Xu's avatar
      mm/uffd: move USERFAULTFD configs into mm/ · 430529b5
      Peter Xu authored
      We used to have USERFAULTFD configs stored in init/.  It makes sense as a
      start because that's the default place for storing syscall related
      configs.
      
      However userfaultfd evolved a bit in the past few years and some more
      config options were added.  They're no longer related to syscalls and
      start to be not suitable to be kept in the init/ directory anymore,
      because they're pure mm concepts.
      
      But it's not ideal either to keep the userfaultfd configs separate from
      each other.  Hence this patch moves the userfaultfd configs under init/ to
      be under mm/ so that we'll start to group all userfaultfd configs
      together.
      
      We do have quite a few examples of syscall related configs that are not
      put under init/Kconfig: FTRACE_SYSCALLS, SWAP, FILE_LOCKING,
      MEMFD_CREATE..  They all reside in the dir where they're more suitable for
      the concept.  So it seems there's no restriction to keep the role of
      having syscall related CONFIG_* under init/ only.
      
      Link: https://lkml.kernel.org/r/20220420144823.35277-1-peterx@redhat.com
      
      
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Suggested-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      430529b5
  19. May 12, 2022
    • Aaron Tomlin's avatar
      module: Introduce module unload taint tracking · 99bd9956
      Aaron Tomlin authored
      
      Currently, only the initial module that tainted the kernel is
      recorded e.g. when an out-of-tree module is loaded.
      
      The purpose of this patch is to allow the kernel to maintain a record of
      each unloaded module that taints the kernel. So, in addition to
      displaying a list of linked modules (see print_modules()) e.g. in the
      event of a detected bad page, unloaded modules that carried a taint/or
      taints are displayed too. A tainted module unload count is maintained.
      
      The number of tracked modules is not fixed. This feature is disabled by
      default.
      
      Signed-off-by: default avatarAaron Tomlin <atomlin@redhat.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      99bd9956
  20. May 10, 2022
  21. May 07, 2022
    • Eric W. Biederman's avatar
      init: Deal with the init process being a user mode process · 68d85f0a
      Eric W. Biederman authored
      It is silly for user_mode_thread to leave PF_KTHREAD set
      on the resulting task.  Update the init process so that
      it does not care if PF_KTHREAD is set or not.
      
      Ensure do_populate_rootfs flushes all delayed fput work by calling
      task_work_run.  In the rare instance that async_schedule_domain calls
      do_populate_rootfs synchronously it is possible do_populate_rootfs
      will be called directly from the init process.  At which point fput
      will call "task_work_add(current, ..., TWA_RESUME)".  The files on the
      initramfs need to be completely put before we attempt to exec them
      (which is before the code enters userspace).  So call task_work_run
      just in case there are any pending fput operations.
      
      Link: https://lkml.kernel.org/r/20220506141512.516114-5-ebiederm@xmission.com
      
      
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      68d85f0a
  22. May 06, 2022
    • Eric W. Biederman's avatar
      kthread: Don't allocate kthread_struct for init and umh · 343f4c49
      Eric W. Biederman authored
      
      If kthread_is_per_cpu runs concurrently with free_kthread_struct the
      kthread_struct that was just freed may be read from.
      
      This bug was introduced by commit 40966e31 ("kthread: Ensure
      struct kthread is present for all kthreads").  When kthread_struct
      started to be allocated for all tasks that have PF_KTHREAD set.  This
      in turn required the kthread_struct to be freed in kernel_execve and
      violated the assumption that kthread_struct will have the same
      lifetime as the task.
      
      Looking a bit deeper this only applies to callers of kernel_execve
      which is just the init process and the user mode helper processes.
      These processes really don't want to be kernel threads but are for
      historical reasons.  Mostly that copy_thread does not know how to take
      a kernel mode function to the process with for processes without
      PF_KTHREAD or PF_IO_WORKER set.
      
      Solve this by not allocating kthread_struct for the init process and
      the user mode helper processes.
      
      This is done by adding a kthread member to struct kernel_clone_args.
      Setting kthread in fork_idle and kernel_thread.  Adding
      user_mode_thread that works like kernel_thread except it does not set
      kthread.  In fork only allocating the kthread_struct if .kthread is set.
      
      I have looked at kernel/kthread.c and since commit 40966e31
      ("kthread: Ensure struct kthread is present for all kthreads") there
      have been no assumptions added that to_kthread or __to_kthread will
      not return NULL.
      
      There are a few callers of to_kthread or __to_kthread that assume a
      non-NULL struct kthread pointer will be returned.  These functions are
      kthread_data(), kthread_parmme(), kthread_exit(), kthread(),
      kthread_park(), kthread_unpark(), kthread_stop().  All of those functions
      can reasonably expected to be called when it is know that a task is a
      kthread so that assumption seems reasonable.
      
      Cc: stable@vger.kernel.org
      Fixes: 40966e31 ("kthread: Ensure struct kthread is present for all kthreads")
      Reported-by: default avatarМаксим Кутявин <maximkabox13@gmail.com>
      Link: https://lkml.kernel.org/r/20220506141512.516114-1-ebiederm@xmission.com
      
      
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      343f4c49
  23. Apr 29, 2022
  24. Apr 26, 2022
  25. Apr 13, 2022
  26. Apr 06, 2022
    • tangmeng's avatar
      kernel/do_mount_initrd: move real_root_dev sysctls to its own file · d772cc2c
      tangmeng authored
      
      kernel/sysctl.c is a kitchen sink where everyone leaves their dirty
      dishes, this makes it very difficult to maintain.
      
      To help with this maintenance let's start by moving sysctls to places
      where they actually belong.  The proc sysctl maintainers do not want to
      know what sysctl knobs you wish to add for your own piece of code, we
      just care about the core logic.
      
      All filesystem syctls now get reviewed by fs folks. This commit
      follows the commit of fs, move the real_root_dev sysctl to its own file,
      kernel/do_mount_initrd.c.
      
      Signed-off-by: default avatartangmeng <tangmeng@uniontech.com>
      Signed-off-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      d772cc2c
    • Oliver Glitta's avatar
      mm/slub: use stackdepot to save stack trace in objects · 5cf909c5
      Oliver Glitta authored
      
      Many stack traces are similar so there are many similar arrays.
      Stackdepot saves each unique stack only once.
      
      Replace field addrs in struct track with depot_stack_handle_t handle.  Use
      stackdepot to save stack trace.
      
      The benefits are smaller memory overhead and possibility to aggregate
      per-cache statistics in the following patch using the stackdepot handle
      instead of matching stacks manually.
      
      [ vbabka@suse.cz: rebase to 5.17-rc1 and adjust accordingly ]
      
      This was initially merged as commit 78869146 and reverted by commit
      ae14c63a due to several issues, that should now be fixed.
      The problem of unconditional memory overhead by stackdepot has been
      addressed by commit 2dba5eb1 ("lib/stackdepot: allow optional init
      and stack_table allocation by kvmalloc()"), so the dependency on
      stackdepot will result in extra memory usage only when a slab cache
      tracking is actually enabled, and not for all CONFIG_SLUB_DEBUG builds.
      The build failures on some architectures were also addressed, and the
      reported issue with xfs/433 test did not reproduce on 5.17-rc1 with this
      patch.
      
      Signed-off-by: default avatarOliver Glitta <glittao@gmail.com>
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-and-tested-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      5cf909c5
  27. Mar 24, 2022
  28. Mar 11, 2022
  29. Feb 23, 2022
  30. Feb 14, 2022
Loading