Skip to content
Snippets Groups Projects
  1. Mar 05, 2023
    • Linus Torvalds's avatar
      cpumask: re-introduce constant-sized cpumask optimizations · 596ff4a0
      Linus Torvalds authored
      
      Commit aa47a7c2 ("lib/cpumask: deprecate nr_cpumask_bits") resulted
      in the cpumask operations potentially becoming hugely less efficient,
      because suddenly the cpumask was always considered to be variable-sized.
      
      The optimization was then later added back in a limited form by commit
      6f9c07be ("lib/cpumask: add FORCE_NR_CPUS config option"), but that
      FORCE_NR_CPUS option is not useful in a generic kernel and more of a
      special case for embedded situations with fixed hardware.
      
      Instead, just re-introduce the optimization, with some changes.
      
      Instead of depending on CPUMASK_OFFSTACK being false, and then always
      using the full constant cpumask width, this introduces three different
      cpumask "sizes":
      
       - the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids.
      
         This is used for situations where we should use the exact size.
      
       - the "small" size (small_cpumask_bits) is the NR_CPUS constant if it
         fits in a single word and the bitmap operations thus end up able
         to trigger the "small_const_nbits()" optimizations.
      
         This is used for the operations that have optimized single-word
         cases that get inlined, notably the bit find and scanning functions.
      
       - the "large" size (large_cpumask_bits) is the NR_CPUS constant if it
         is an sufficiently small constant that makes simple "copy" and
         "clear" operations more efficient.
      
         This is arbitrarily set at four words or less.
      
      As a an example of this situation, without this fixed size optimization,
      cpumask_clear() will generate code like
      
              movl    nr_cpu_ids(%rip), %edx
              addq    $63, %rdx
              shrq    $3, %rdx
              andl    $-8, %edx
              callq   memset@PLT
      
      on x86-64, because it would calculate the "exact" number of longwords
      that need to be cleared.
      
      In contrast, with this patch, using a MAX_CPU of 64 (which is quite a
      reasonable value to use), the above becomes a single
      
      	movq $0,cpumask
      
      instruction instead, because instead of caring to figure out exactly how
      many CPU's the system has, it just knows that the cpumask will be a
      single word and can just clear it all.
      
      Note that this does end up tightening the rules a bit from the original
      version in another way: operations that set bits in the cpumask are now
      limited to the actual nr_cpu_ids limit, whereas we used to do the
      nr_cpumask_bits thing almost everywhere in the cpumask code.
      
      But if you just clear bits, or scan for bits, we can use the simpler
      compile-time constants.
      
      In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()'
      which were not useful, and which fundamentally have to be limited to
      'nr_cpu_ids'.  Better remove them now than have somebody introduce use
      of them later.
      
      Of course, on x86-64 with MAXSMP there is no sane small compile-time
      constant for the cpumask sizes, and we end up using the actual CPU bits,
      and will generate the above kind of horrors regardless.  Please don't
      use MAXSMP unless you really expect to have machines with thousands of
      cores.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      596ff4a0
    • Masahiro Yamada's avatar
      Remove Intel compiler support · 95207db8
      Masahiro Yamada authored
      include/linux/compiler-intel.h had no update in the past 3 years.
      
      We often forget about the third C compiler to build the kernel.
      
      For example, commit a0a12c3e ("asm goto: eradicate CC_HAS_ASM_GOTO")
      only mentioned GCC and Clang.
      
      init/Kconfig defines CC_IS_GCC and CC_IS_CLANG but not CC_IS_ICC,
      and nobody has reported any issue.
      
      I guess the Intel Compiler support is broken, and nobody is caring
      about it.
      
      Harald Arnesen pointed out ICC (classic Intel C/C++ compiler) is
      deprecated:
      
          $ icc -v
          icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is
          deprecated and will be removed from product release in the second half
          of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended
          compiler moving forward. Please transition to use this compiler. Use
          '-diag-disable=10441' to disable this message.
          icc version 2021.7.0 (gcc version 12.1.0 compatibility)
      
      Arnd Bergmann provided a link to the article, "Intel C/C++ compilers
      complete adoption of LLVM".
      
      lib/zstd/common/compiler.h and lib/zstd/compress/zstd_fast.c were kept
      untouched for better sync with https://github.com/facebook/zstd
      
      Link: https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvm-complete-icx.html
      
      
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Reviewed-by: default avatarMiguel Ojeda <ojeda@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      95207db8
  2. Mar 02, 2023
  3. Mar 01, 2023
    • Linus Torvalds's avatar
      capability: just use a 'u64' instead of a 'u32[2]' array · f122a08b
      Linus Torvalds authored
      
      Back in 2008 we extended the capability bits from 32 to 64, and we did
      it by extending the single 32-bit capability word from one word to an
      array of two words.  It was then obfuscated by hiding the "2" behind two
      macro expansions, with the reasoning being that maybe it gets extended
      further some day.
      
      That reasoning may have been valid at the time, but the last thing we
      want to do is to extend the capability set any more.  And the array of
      values not only causes source code oddities (with loops to deal with
      it), but also results in worse code generation.  It's a lose-lose
      situation.
      
      So just change the 'u32[2]' into a 'u64' and be done with it.
      
      We still have to deal with the fact that the user space interface is
      designed around an array of these 32-bit values, but that was the case
      before too, since the array layouts were different (ie user space
      doesn't use an array of 32-bit values for individual capability masks,
      but an array of 32-bit slices of multiple masks).
      
      So that marshalling of data is actually simplified too, even if it does
      remain somewhat obscure and odd.
      
      This was all triggered by my reaction to the new "cap_isidentical()"
      introduced recently.  By just using a saner data structure, it went from
      
      	unsigned __capi;
      	CAP_FOR_EACH_U32(__capi) {
      		if (a.cap[__capi] != b.cap[__capi])
      			return false;
      	}
      	return true;
      
      to just being
      
      	return a.val == b.val;
      
      instead.  Which is rather more obvious both to humans and to compilers.
      
      Cc: Mateusz Guzik <mjguzik@gmail.com>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Paul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f122a08b
  4. Feb 28, 2023
  5. Feb 27, 2023
  6. Feb 26, 2023
    • Vladimir Oltean's avatar
      net: dsa: seville: ignore mscc-miim read errors from Lynx PCS · 0322ef49
      Vladimir Oltean authored
      
      During the refactoring in the commit below, vsc9953_mdio_read() was
      replaced with mscc_miim_read(), which has one extra step: it checks for
      the MSCC_MIIM_DATA_ERROR bits before returning the result.
      
      On T1040RDB, there are 8 QSGMII PCSes belonging to the switch, and they
      are organized in 2 groups. First group responds to MDIO addresses 4-7
      because QSGMIIACR1[MDEV_PORT] is 1, and the second group responds to
      MDIO addresses 8-11 because QSGMIIBCR1[MDEV_PORT] is 2. I have double
      checked that these values are correctly set in the SERDES, as well as
      PCCR1[QSGMA_CFG] and PCCR1[QSGMB_CFG] are both 0b01.
      
      mscc_miim_read: phyad 8 reg 0x1 MIIM_DATA 0x2d
      mscc_miim_read: phyad 8 reg 0x5 MIIM_DATA 0x5801
      mscc_miim_read: phyad 8 reg 0x1 MIIM_DATA 0x2d
      mscc_miim_read: phyad 8 reg 0x5 MIIM_DATA 0x5801
      mscc_miim_read: phyad 9 reg 0x1 MIIM_DATA 0x2d
      mscc_miim_read: phyad 9 reg 0x5 MIIM_DATA 0x5801
      mscc_miim_read: phyad 9 reg 0x1 MIIM_DATA 0x2d
      mscc_miim_read: phyad 9 reg 0x5 MIIM_DATA 0x5801
      mscc_miim_read: phyad 10 reg 0x1 MIIM_DATA 0x2d
      mscc_miim_read: phyad 10 reg 0x5 MIIM_DATA 0x5801
      mscc_miim_read: phyad 10 reg 0x1 MIIM_DATA 0x2d
      mscc_miim_read: phyad 10 reg 0x5 MIIM_DATA 0x5801
      mscc_miim_read: phyad 11 reg 0x1 MIIM_DATA 0x2d
      mscc_miim_read: phyad 11 reg 0x5 MIIM_DATA 0x5801
      mscc_miim_read: phyad 11 reg 0x1 MIIM_DATA 0x2d
      mscc_miim_read: phyad 11 reg 0x5 MIIM_DATA 0x5801
      mscc_miim_read: phyad 4 reg 0x1 MIIM_DATA 0x3002d, ERROR
      mscc_miim_read: phyad 4 reg 0x5 MIIM_DATA 0x3da01, ERROR
      mscc_miim_read: phyad 5 reg 0x1 MIIM_DATA 0x3002d, ERROR
      mscc_miim_read: phyad 5 reg 0x5 MIIM_DATA 0x35801, ERROR
      mscc_miim_read: phyad 5 reg 0x1 MIIM_DATA 0x3002d, ERROR
      mscc_miim_read: phyad 5 reg 0x5 MIIM_DATA 0x35801, ERROR
      mscc_miim_read: phyad 6 reg 0x1 MIIM_DATA 0x3002d, ERROR
      mscc_miim_read: phyad 6 reg 0x5 MIIM_DATA 0x35801, ERROR
      mscc_miim_read: phyad 6 reg 0x1 MIIM_DATA 0x3002d, ERROR
      mscc_miim_read: phyad 6 reg 0x5 MIIM_DATA 0x35801, ERROR
      mscc_miim_read: phyad 7 reg 0x1 MIIM_DATA 0x3002d, ERROR
      mscc_miim_read: phyad 7 reg 0x5 MIIM_DATA 0x35801, ERROR
      mscc_miim_read: phyad 7 reg 0x1 MIIM_DATA 0x3002d, ERROR
      mscc_miim_read: phyad 7 reg 0x5 MIIM_DATA 0x35801, ERROR
      
      As can be seen, the data in MIIM_DATA is still valid despite having the
      MSCC_MIIM_DATA_ERROR bits set. The driver as introduced in commit
      84705fc1 ("net: dsa: felix: introduce support for Seville VSC9953
      switch") was ignoring these bits, perhaps deliberately (although
      unbeknownst to me).
      
      This is an old IP and the hardware team cannot seem to be able to help
      me track down a plausible reason for these failures. I'll keep
      investigating, but in the meantime, this is a direct regression which
      must be restored to a working state.
      
      The only thing I can do is keep ignoring the errors as before.
      
      Fixes: b9965845 ("net: dsa: ocelot: felix: utilize shared mscc-miim driver for indirect MDIO access")
      Signed-off-by: default avatarVladimir Oltean <vladimir.oltean@nxp.com>
      Reviewed-by: default avatarFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0322ef49
  7. Feb 25, 2023
  8. Feb 24, 2023
  9. Feb 23, 2023
  10. Feb 22, 2023
  11. Feb 21, 2023
Loading