Skip to content
Snippets Groups Projects
  1. May 04, 2022
  2. Mar 08, 2022
  3. Nov 09, 2021
  4. Sep 05, 2020
  5. Apr 27, 2020
  6. Jul 19, 2019
  7. Jun 05, 2019
  8. May 15, 2019
    • Manfred Spraul's avatar
      ipc: do cyclic id allocation for the ipc object. · 99db46ea
      Manfred Spraul authored
      For ipcmni_extend mode, the sequence number space is only 7 bits.  So
      the chance of id reuse is relatively high compared with the non-extended
      mode.
      
      To alleviate this id reuse problem, this patch enables cyclic allocation
      for the index to the radix tree (idx).  The disadvantage is that this
      can cause a slight slow-down of the fast path, as the radix tree could
      be higher than necessary.
      
      To limit the radix tree height, I have chosen the following limits:
       1) The cycling is done over in_use*1.5.
       2) At least, the cycling is done over
         "normal" ipcnmi mode: RADIX_TREE_MAP_SIZE elements
         "ipcmni_extended": 4096 elements
      
      Result:
      - for normal mode:
      	No change for <= 42 active ipc elements. With more than 42
      	active ipc elements, a 2nd level would be added to the radix
      	tree.
      	Without cyclic allocation, a 2nd level would be added only with
      	more than 63 active elements.
      
      - for extended mode:
      	Cycling creates always at least a 2-level radix tree.
      	With more than 2730 active objects, a 3rd level would be
      	added, instead of > 4095 active objects until the 3rd level
      	is added without cyclic allocation.
      
      For a 2-level radix tree compared to a 1-level radix tree, I have
      observed < 1% performance impact.
      
      Notes:
      1) Normal "x=semget();y=semget();" is unaffected: Then the idx
        is e.g. a and a+1, regardless if idr_alloc() or idr_alloc_cyclic()
        is used.
      
      2) The -1% happens in a microbenchmark after this situation:
      	x=semget();
      	for(i=0;i<4000;i++) {t=semget();semctl(t,0,IPC_RMID);}
      	y=semget();
      	Now perform semget calls on x and y that do not sleep.
      
      3) The worst-case reuse cycle time is unfortunately unaffected:
         If you have 2^24-1 ipc objects allocated, and get/remove the last
         possible element in a loop, then the id is reused after 128
         get/remove pairs.
      
      Performance check:
      A microbenchmark that performes no-op semop() randomly on two IDs,
      with only these two IDs allocated.
      The IDs were set using /proc/sys/kernel/sem_next_id.
      The test was run 5 times, averages are shown.
      
      1 & 2: Base (6.22 seconds for 10.000.000 semops)
      1 & 40: -0.2%
      1 & 3348: - 0.8%
      1 & 27348: - 1.6%
      1 & 15777204: - 3.2%
      
      Or: ~12.6 cpu cycles per additional radix tree level.
      The cpu is an Intel I3-5010U. ~1300 cpu cycles/syscall is slower
      than what I remember (spectre impact?).
      
      V2 of the patch:
      - use "min" and "max"
      - use RADIX_TREE_MAP_SIZE * RADIX_TREE_MAP_SIZE instead of
      	(2<<12).
      
      [akpm@linux-foundation.org: fix max() warning]
      Link: http://lkml.kernel.org/r/20190329204930.21620-3-longman@redhat.com
      
      
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Acked-by: default avatarWaiman Long <longman@redhat.com>
      Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      99db46ea
    • Waiman Long's avatar
      ipc: allow boot time extension of IPCMNI from 32k to 16M · 5ac893b8
      Waiman Long authored
      The maximum number of unique System V IPC identifiers was limited to
      32k.  That limit should be big enough for most use cases.
      
      However, there are some users out there requesting for more, especially
      those that are migrating from Solaris which uses 24 bits for unique
      identifiers.  To satisfy the need of those users, a new boot time kernel
      option "ipcmni_extend" is added to extend the IPCMNI value to 16M.  This
      is a 512X increase which should be big enough for users out there that
      need a large number of unique IPC identifier.
      
      The use of this new option will change the pattern of the IPC
      identifiers returned by functions like shmget(2).  An application that
      depends on such pattern may not work properly.  So it should only be
      used if the users really need more than 32k of unique IPC numbers.
      
      This new option does have the side effect of reducing the maximum number
      of unique sequence numbers from 64k down to 128.  So it is a trade-off.
      
      The computation of a new IPC id is not done in the performance critical
      path.  So a little bit of additional overhead shouldn't have any real
      performance impact.
      
      Link: http://lkml.kernel.org/r/20190329204930.21620-1-longman@redhat.com
      
      
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Acked-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Takashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5ac893b8
  9. Oct 31, 2018
  10. Dec 13, 2014
    • Manfred Spraul's avatar
      ipc/msg: increase MSGMNI, remove scaling · 0050ee05
      Manfred Spraul authored
      
      SysV can be abused to allocate locked kernel memory.  For most systems, a
      small limit doesn't make sense, see the discussion with regards to SHMMAX.
      
      Therefore: increase MSGMNI to the maximum supported.
      
      And: If we ignore the risk of locking too much memory, then an automatic
      scaling of MSGMNI doesn't make sense.  Therefore the logic can be removed.
      
      The code preserves auto_msgmni to avoid breaking any user space applications
      that expect that the value exists.
      
      Notes:
      1) If an administrator must limit the memory allocations, then he can set
      MSGMNI as necessary.
      
      Or he can disable sysv entirely (as e.g. done by Android).
      
      2) MSGMAX and MSGMNB are intentionally not increased, as these values are used
      to control latency vs. throughput:
      If MSGMNB is large, then msgsnd() just returns and more messages can be queued
      before a task switch to a task that calls msgrcv() is forced.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarManfred Spraul <manfred@colorfullife.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Rafael Aquini <aquini@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0050ee05
  11. Oct 14, 2014
    • Andrey Vagin's avatar
      ipc: always handle a new value of auto_msgmni · 1195d94e
      Andrey Vagin authored
      
      proc_dointvec_minmax() returns zero if a new value has been set.  So we
      don't need to check all charecters have been handled.
      
      Below you can find two examples.  In the new value has not been handled
      properly.
      
      $ strace ./a.out
      open("/proc/sys/kernel/auto_msgmni", O_WRONLY) = 3
      write(3, "0\n\0", 3)                    = 2
      close(3)                                = 0
      exit_group(0)
      $ cat /sys/kernel/debug/tracing/trace
      
      $strace ./a.out
      open("/proc/sys/kernel/auto_msgmni", O_WRONLY) = 3
      write(3, "0\n", 2)                      = 2
      close(3)                                = 0
      
      $ cat /sys/kernel/debug/tracing/trace
      a.out-697   [000] ....  3280.998235: unregister_ipcns_notifier <-proc_ipcauto_dointvec_minmax
      
      Fixes: 9eefe520 ("ipc: do not use a negative value to re-enable msgmni automatic recomputin")
      Signed-off-by: default avatarAndrey Vagin <avagin@openvz.org>
      Cc: Mathias Krause <minipli@googlemail.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Davidlohr Bueso <davidlohr@hp.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1195d94e
  12. Jun 07, 2014
  13. Apr 08, 2014
  14. Jan 28, 2014
  15. Nov 03, 2013
  16. Jan 05, 2013
    • Stanislav Kinsbursky's avatar
      ipc: add sysctl to specify desired next object id · 03f59566
      Stanislav Kinsbursky authored
      
      Add 3 new variables and sysctls to tune them (by one "next_id" variable
      for messages, semaphores and shared memory respectively).  This variable
      can be used to set desired id for next allocated IPC object.  By default
      it's equal to -1 and old behaviour is preserved.  If this variable is
      non-negative, then desired idr will be extracted from it and used as a
      start value to search for free IDR slot.
      
      Notes:
      
      1) this patch doesn't guarantee that the new object will have desired
         id.  So it's up to user space how to handle new object with wrong id.
      
      2) After a sucessful id allocation attempt, "next_id" will be set back
         to -1 (if it was non-negative).
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: default avatarStanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      03f59566
  17. Jul 27, 2011
    • Vasiliy Kulikov's avatar
      ipc: introduce shm_rmid_forced sysctl · b34a6b1d
      Vasiliy Kulikov authored
      
      Add support for the shm_rmid_forced sysctl.  If set to 1, all shared
      memory objects in current ipc namespace will be automatically forced to
      use IPC_RMID.
      
      The POSIX way of handling shmem allows one to create shm objects and
      call shmdt(), leaving shm object associated with no process, thus
      consuming memory not counted via rlimits.
      
      With shm_rmid_forced=1 the shared memory object is counted at least for
      one process, so OOM killer may effectively kill the fat process holding
      the shared memory.
      
      It obviously breaks POSIX - some programs relying on the feature would
      stop working.  So set shm_rmid_forced=1 only if you're sure nobody uses
      "orphaned" memory.  Use shm_rmid_forced=0 by default for compatability
      reasons.
      
      The feature was previously impemented in -ow as a configure option.
      
      [akpm@linux-foundation.org: fix documentation, per Randy]
      [akpm@linux-foundation.org: fix warning]
      [akpm@linux-foundation.org: readability/conventionality tweaks]
      [akpm@linux-foundation.org: fix shm_rmid_forced/shm_forced_rmid confusion, use standard comment layout]
      Signed-off-by: default avatarVasiliy Kulikov <segoon@openwall.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "Serge E. Hallyn" <serge.hallyn@canonical.com>
      Cc: Daniel Lezcano <daniel.lezcano@free.fr>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Solar Designer <solar@openwall.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b34a6b1d
  18. Nov 12, 2009
  19. Sep 24, 2009
  20. Apr 03, 2009
  21. Jan 07, 2009
  22. Oct 16, 2008
  23. Jul 25, 2008
    • Nadia Derbey's avatar
      ipc: do not use a negative value to re-enable msgmni automatic recomputing · 9eefe520
      Nadia Derbey authored
      This patch proposes an alternative to the "magical
      positive-versus-negative number trick" Andrew complained about last week
      in http://lkml.org/lkml/2008/6/24/418
      
      .
      
      This had been introduced with the patches that scale msgmni to the amount
      of lowmem.  With these patches, msgmni has a registered notification
      routine that recomputes msgmni value upon memory add/remove or ipc
      namespace creation/ removal.
      
      When msgmni is changed from user space (i.e.  value written to the proc
      file), that notification routine is unregistered, and the way to make it
      registered back is to write a negative value into the proc file.  This is
      the "magical positive-versus-negative number trick".
      
      To fix this, a new proc file is introduced: /proc/sys/kernel/auto_msgmni.
      This file acts as ON/OFF for msgmni automatic recomputing.
      
      With this patch, the process is the following:
      1) kernel boots in "automatic recomputing mode"
         /proc/sys/kernel/msgmni contains the value that has been computed (depends
                                 on lowmem)
         /proc/sys/kernel/automatic_msgmni contains "1"
      
      2) echo <val> > /proc/sys/kernel/msgmni
         . sets msg_ctlmni to <val>
         . de-activates automatic recomputing (i.e. if, say, some memory is added
           msgmni won't be recomputed anymore)
         . /proc/sys/kernel/automatic_msgmni now contains "0"
      
      3) echo "0" > /proc/sys/kernel/automatic_msgmni
         . de-activates msgmni automatic recomputing
           this has the same effect as 2) except that msg_ctlmni's value stays
           blocked at its current value)
      
      3) echo "1" > /proc/sys/kernel/automatic_msgmni
         . recomputes msgmni's value based on the current available memory size
           and number of ipc namespaces
         . re-activates automatic recomputing for msgmni.
      
      Signed-off-by: default avatarNadia Derbey <Nadia.Derbey@bull.net>
      Cc: Solofo Ramangalahy <Solofo.Ramangalahy@bull.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9eefe520
  24. Apr 29, 2008
  25. Feb 08, 2008
    • Pavel Emelyanov's avatar
      namespaces: move the IPC namespace under IPC_NS option · ae5e1b22
      Pavel Emelyanov authored
      
      Currently the IPC namespace management code is spread over the ipc/*.c files.
      I moved this code into ipc/namespace.c file which is compiled out when needed.
      
      The linux/ipc_namespace.h file is used to store the prototypes of the
      functions in namespace.c and the stubs for NAMESPACES=n case.  This is done
      so, because the stub for copy_ipc_namespace requires the knowledge of the
      CLONE_NEWIPC flag, which is in sched.h.  But the linux/ipc.h file itself in
      included into many many .c files via the sys.h->sem.h sequence so adding the
      sched.h into it will make all these .c depend on sched.h which is not that
      good.  On the other hand the knowledge about the namespaces stuff is required
      in 4 .c files only.
      
      Besides, this patch compiles out some auxiliary functions from ipc/sem.c,
      msg.c and shm.c files.  It turned out that moving these functions into
      namespaces.c is not that easy because they use many other calls and macros
      from the original file.  Moving them would make this patch complicated.  On
      the other hand all these functions can be consolidated, so I will send a
      separate patch doing this a bit later.
      
      Signed-off-by: default avatarPavel Emelyanov <xemul@openvz.org>
      Acked-by: default avatarSerge Hallyn <serue@us.ibm.com>
      Cc: Cedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Kirill Korotaev <dev@sw.ru>
      Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ae5e1b22
  26. Oct 17, 2007
  27. Feb 14, 2007
Loading