- Mar 05, 2023
-
-
Linus Torvalds authored
Commit aa47a7c2 ("lib/cpumask: deprecate nr_cpumask_bits") resulted in the cpumask operations potentially becoming hugely less efficient, because suddenly the cpumask was always considered to be variable-sized. The optimization was then later added back in a limited form by commit 6f9c07be ("lib/cpumask: add FORCE_NR_CPUS config option"), but that FORCE_NR_CPUS option is not useful in a generic kernel and more of a special case for embedded situations with fixed hardware. Instead, just re-introduce the optimization, with some changes. Instead of depending on CPUMASK_OFFSTACK being false, and then always using the full constant cpumask width, this introduces three different cpumask "sizes": - the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids. This is used for situations where we should use the exact size. - the "small" size (small_cpumask_bits) is the NR_CPUS constant if it fits in a single word and the bitmap operations thus end up able to trigger the "small_const_nbits()" optimizations. This is used for the operations that have optimized single-word cases that get inlined, notably the bit find and scanning functions. - the "large" size (large_cpumask_bits) is the NR_CPUS constant if it is an sufficiently small constant that makes simple "copy" and "clear" operations more efficient. This is arbitrarily set at four words or less. As a an example of this situation, without this fixed size optimization, cpumask_clear() will generate code like movl nr_cpu_ids(%rip), %edx addq $63, %rdx shrq $3, %rdx andl $-8, %edx callq memset@PLT on x86-64, because it would calculate the "exact" number of longwords that need to be cleared. In contrast, with this patch, using a MAX_CPU of 64 (which is quite a reasonable value to use), the above becomes a single movq $0,cpumask instruction instead, because instead of caring to figure out exactly how many CPU's the system has, it just knows that the cpumask will be a single word and can just clear it all. Note that this does end up tightening the rules a bit from the original version in another way: operations that set bits in the cpumask are now limited to the actual nr_cpu_ids limit, whereas we used to do the nr_cpumask_bits thing almost everywhere in the cpumask code. But if you just clear bits, or scan for bits, we can use the simpler compile-time constants. In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()' which were not useful, and which fundamentally have to be limited to 'nr_cpu_ids'. Better remove them now than have somebody introduce use of them later. Of course, on x86-64 with MAXSMP there is no sane small compile-time constant for the cpumask sizes, and we end up using the actual CPU bits, and will generate the above kind of horrors regardless. Please don't use MAXSMP unless you really expect to have machines with thousands of cores. Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Masahiro Yamada authored
include/linux/compiler-intel.h had no update in the past 3 years. We often forget about the third C compiler to build the kernel. For example, commit a0a12c3e ("asm goto: eradicate CC_HAS_ASM_GOTO") only mentioned GCC and Clang. init/Kconfig defines CC_IS_GCC and CC_IS_CLANG but not CC_IS_ICC, and nobody has reported any issue. I guess the Intel Compiler support is broken, and nobody is caring about it. Harald Arnesen pointed out ICC (classic Intel C/C++ compiler) is deprecated: $ icc -v icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message. icc version 2021.7.0 (gcc version 12.1.0 compatibility) Arnd Bergmann provided a link to the article, "Intel C/C++ compilers complete adoption of LLVM". lib/zstd/common/compiler.h and lib/zstd/compress/zstd_fast.c were kept untouched for better sync with https://github.com/facebook/zstd Link: https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvm-complete-icx.html Signed-off-by:
Masahiro Yamada <masahiroy@kernel.org> Acked-by:
Arnd Bergmann <arnd@arndb.de> Reviewed-by:
Nick Desaulniers <ndesaulniers@google.com> Reviewed-by:
Nathan Chancellor <nathan@kernel.org> Reviewed-by:
Miguel Ojeda <ojeda@kernel.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Mar 02, 2023
-
-
Thomas Gleixner authored
Miquel reported a warning in the MSI core which is triggered when interrupts are freed via platform_msi_device_domain_free(). This code got reworked to use core functions for freeing the MSI descriptors, but nothing took care to clear the msi_desc->irq entry, which then triggers the warning in msi_free_msi_desc() which uses desc->irq to validate that the descriptor has been torn down. The same issue exists in msi_domain_populate_irqs(). Up to the point that msi_free_msi_descs() grew a warning for this case, this went un-noticed. Provide the counterpart of msi_domain_populate_irqs() and invoke it in platform_msi_device_domain_free() before freeing the interrupts and MSI descriptors and also in the error path of msi_domain_populate_irqs(). Fixes: 2f2940d1 ("genirq/msi: Remove filter from msi_free_descs_free_range()") Reported-by:
Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Miquel Raynal <miquel.raynal@bootlin.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87mt4wkwnv.ffs@tglx
-
- Mar 01, 2023
-
-
Linus Torvalds authored
Back in 2008 we extended the capability bits from 32 to 64, and we did it by extending the single 32-bit capability word from one word to an array of two words. It was then obfuscated by hiding the "2" behind two macro expansions, with the reasoning being that maybe it gets extended further some day. That reasoning may have been valid at the time, but the last thing we want to do is to extend the capability set any more. And the array of values not only causes source code oddities (with loops to deal with it), but also results in worse code generation. It's a lose-lose situation. So just change the 'u32[2]' into a 'u64' and be done with it. We still have to deal with the fact that the user space interface is designed around an array of these 32-bit values, but that was the case before too, since the array layouts were different (ie user space doesn't use an array of 32-bit values for individual capability masks, but an array of 32-bit slices of multiple masks). So that marshalling of data is actually simplified too, even if it does remain somewhat obscure and odd. This was all triggered by my reaction to the new "cap_isidentical()" introduced recently. By just using a saner data structure, it went from unsigned __capi; CAP_FOR_EACH_U32(__capi) { if (a.cap[__capi] != b.cap[__capi]) return false; } return true; to just being return a.val == b.val; instead. Which is rather more obvious both to humans and to compilers. Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: Serge Hallyn <serge@hallyn.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Paul Moore <paul@paul-moore.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Feb 28, 2023
-
-
Claudiu Beznea authored
Add option to start DMA component after DAI trigger. This is done by filling the new struct snd_soc_component_driver::start_dma_last. Signed-off-by:
Claudiu Beznea <claudiu.beznea@microchip.com> Link: https://lore.kernel.org/r/20230228110145.3770525-2-claudiu.beznea@microchip.com Signed-off-by:
Mark Brown <broonie@kernel.org>
-
Naoya Horiguchi authored
After a memory error happens on a clean folio, a process unexpectedly receives SIGBUS when it accesses the error page. This SIGBUS killing is pointless and simply degrades the level of RAS of the system, because the clean folio can be dropped without any data lost on memory error handling as we do for a clean pagecache. When memory_failure() is called on a clean folio, try_to_unmap() is called twice (one from split_huge_page() and one from hwpoison_user_mappings()). The root cause of the issue is that pte conversion to hwpoisoned entry is now done in the first call of try_to_unmap() because PageHWPoison is already set at this point, while it's actually expected to be done in the second call. This behavior disturbs the error handling operation like removing pagecache, which results in the malfunction described above. So convert TTU_IGNORE_HWPOISON into TTU_HWPOISON and set TTU_HWPOISON only when we really intend to convert pte to hwpoison entry. This can prevent other callers of try_to_unmap() from accidentally converting to hwpoison entries. Link: https://lkml.kernel.org/r/20230221085905.1465385-1-naoya.horiguchi@linux.dev Fixes: a42634a6 ("readahead: Use a folio in read_pages()") Signed-off-by:
Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Mateusz Guzik authored
Signed-off-by:
Mateusz Guzik <mjguzik@gmail.com> Reviewed-by:
Serge Hallyn <serge@hallyn.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Feb 27, 2023
-
-
Michael Karcher authored
GCC warns about the pattern sizeof(void*)/sizeof(void), as it looks like the abuse of a pattern to calculate the array size. This pattern appears in the unevaluated part of the ternary operator in _INTC_ARRAY if the parameter is NULL. The replacement uses an alternate approach to return 0 in case of NULL which does not generate the pattern sizeof(void*)/sizeof(void), but still emits the warning if _INTC_ARRAY is called with a nonarray parameter. This patch is required for successful compilation with -Werror enabled. The idea to use _Generic for type distinction is taken from Comment #7 in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108483 by Jakub Jelinek Signed-off-by:
Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Link: https://lore.kernel.org/r/619fa552-c988-35e5-b1d7-fe256c46a272@mkarcher.dialup.fu-berlin.de Signed-off-by:
John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
-
- Feb 26, 2023
-
-
Vladimir Oltean authored
During the refactoring in the commit below, vsc9953_mdio_read() was replaced with mscc_miim_read(), which has one extra step: it checks for the MSCC_MIIM_DATA_ERROR bits before returning the result. On T1040RDB, there are 8 QSGMII PCSes belonging to the switch, and they are organized in 2 groups. First group responds to MDIO addresses 4-7 because QSGMIIACR1[MDEV_PORT] is 1, and the second group responds to MDIO addresses 8-11 because QSGMIIBCR1[MDEV_PORT] is 2. I have double checked that these values are correctly set in the SERDES, as well as PCCR1[QSGMA_CFG] and PCCR1[QSGMB_CFG] are both 0b01. mscc_miim_read: phyad 8 reg 0x1 MIIM_DATA 0x2d mscc_miim_read: phyad 8 reg 0x5 MIIM_DATA 0x5801 mscc_miim_read: phyad 8 reg 0x1 MIIM_DATA 0x2d mscc_miim_read: phyad 8 reg 0x5 MIIM_DATA 0x5801 mscc_miim_read: phyad 9 reg 0x1 MIIM_DATA 0x2d mscc_miim_read: phyad 9 reg 0x5 MIIM_DATA 0x5801 mscc_miim_read: phyad 9 reg 0x1 MIIM_DATA 0x2d mscc_miim_read: phyad 9 reg 0x5 MIIM_DATA 0x5801 mscc_miim_read: phyad 10 reg 0x1 MIIM_DATA 0x2d mscc_miim_read: phyad 10 reg 0x5 MIIM_DATA 0x5801 mscc_miim_read: phyad 10 reg 0x1 MIIM_DATA 0x2d mscc_miim_read: phyad 10 reg 0x5 MIIM_DATA 0x5801 mscc_miim_read: phyad 11 reg 0x1 MIIM_DATA 0x2d mscc_miim_read: phyad 11 reg 0x5 MIIM_DATA 0x5801 mscc_miim_read: phyad 11 reg 0x1 MIIM_DATA 0x2d mscc_miim_read: phyad 11 reg 0x5 MIIM_DATA 0x5801 mscc_miim_read: phyad 4 reg 0x1 MIIM_DATA 0x3002d, ERROR mscc_miim_read: phyad 4 reg 0x5 MIIM_DATA 0x3da01, ERROR mscc_miim_read: phyad 5 reg 0x1 MIIM_DATA 0x3002d, ERROR mscc_miim_read: phyad 5 reg 0x5 MIIM_DATA 0x35801, ERROR mscc_miim_read: phyad 5 reg 0x1 MIIM_DATA 0x3002d, ERROR mscc_miim_read: phyad 5 reg 0x5 MIIM_DATA 0x35801, ERROR mscc_miim_read: phyad 6 reg 0x1 MIIM_DATA 0x3002d, ERROR mscc_miim_read: phyad 6 reg 0x5 MIIM_DATA 0x35801, ERROR mscc_miim_read: phyad 6 reg 0x1 MIIM_DATA 0x3002d, ERROR mscc_miim_read: phyad 6 reg 0x5 MIIM_DATA 0x35801, ERROR mscc_miim_read: phyad 7 reg 0x1 MIIM_DATA 0x3002d, ERROR mscc_miim_read: phyad 7 reg 0x5 MIIM_DATA 0x35801, ERROR mscc_miim_read: phyad 7 reg 0x1 MIIM_DATA 0x3002d, ERROR mscc_miim_read: phyad 7 reg 0x5 MIIM_DATA 0x35801, ERROR As can be seen, the data in MIIM_DATA is still valid despite having the MSCC_MIIM_DATA_ERROR bits set. The driver as introduced in commit 84705fc1 ("net: dsa: felix: introduce support for Seville VSC9953 switch") was ignoring these bits, perhaps deliberately (although unbeknownst to me). This is an old IP and the hardware team cannot seem to be able to help me track down a plausible reason for these failures. I'll keep investigating, but in the meantime, this is a direct regression which must be restored to a working state. The only thing I can do is keep ignoring the errors as before. Fixes: b9965845 ("net: dsa: ocelot: felix: utilize shared mscc-miim driver for indirect MDIO access") Signed-off-by:
Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Feb 25, 2023
-
-
Qing Zhang authored
Implement the regset-based ptrace interface that exposes hardware breakpoints to user-space debuggers to query and set instruction and data breakpoints. Signed-off-by:
Qing Zhang <zhangqing@loongson.cn> Signed-off-by:
Huacai Chen <chenhuacai@loongson.cn>
-
- Feb 24, 2023
-
-
Tariq Toukan authored
Fix a repeated copy/paste typo. Fixes: d3d854fd ("netdev-genl: create a simple family for netdev stuff") Signed-off-by:
Tariq Toukan <tariqt@nvidia.com> Acked-by:
Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Feb 23, 2023
-
-
Marek Olšák authored
AMDGPU_IDS_FLAGS_CONFORMANT_TRUNC_COORD: important for conformance on gfx11 Other fields are exposed from IP discovery. enabled_rb_pipes_mask_hi is added for future chips, currently 0. Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21403 Signed-off-by:
Marek Olšák <marek.olsak@amd.com> Reviewed-by:
Alex Deucher <alexander.deucher@amd.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com>
-
Xin Long authored
With this refcnt added in sctp_stream_priorities, we don't need to traverse all streams to check if the prio is used by other streams when freeing one stream's prio in sctp_sched_prio_free_sid(). This can avoid a nested loop (up to 65535 * 65535), which may cause a stuck as Ying reported: watchdog: BUG: soft lockup - CPU#23 stuck for 26s! [ksoftirqd/23:136] Call Trace: <TASK> sctp_sched_prio_free_sid+0xab/0x100 [sctp] sctp_stream_free_ext+0x64/0xa0 [sctp] sctp_stream_free+0x31/0x50 [sctp] sctp_association_free+0xa5/0x200 [sctp] Note that it doesn't need to use refcount_t type for this counter, as its accessing is always protected under the sock lock. v1->v2: - add a check in sctp_sched_prio_set to avoid the possible prio_head refcnt overflow. Fixes: 9ed7bfc7 ("sctp: fix memory leak in sctp_stream_outq_migrate()") Reported-by:
Ying Xu <yinxu@redhat.com> Acked-by:
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by:
Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/825eb0c905cb864991eba335f4a2b780e543f06b.1677085641.git.lucien.xin@gmail.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Oleksij Rempel authored
With following patches: commit 9b01c885 ("net: phy: c22: migrate to genphy_c45_write_eee_adv()") commit 5827b168 ("net: phy: c45: migrate to genphy_c45_write_eee_adv()") we set the advertisement to potentially supported values. This behavior may introduce new regressions on systems where EEE was disabled by default (BIOS or boot loader configuration or by other ways.) At same time, with this patches, we would overwrite EEE advertisement configuration made over ethtool. To avoid this issues, we need to cache initial and ethtool advertisement configuration and store it for later use. Fixes: 9b01c885 ("net: phy: c22: migrate to genphy_c45_write_eee_adv()") Fixes: 5827b168 ("net: phy: c45: migrate to genphy_c45_write_eee_adv()") Fixes: 022c3f87 ("net: phy: add genphy_c45_ethtool_get/set_eee() support") Signed-off-by:
Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by:
Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by:
Paolo Abeni <pabeni@redhat.com>
-
Oleksij Rempel authored
Add new genphy_c45_an_config_eee_aneg() function and replace some of genphy_c45_write_eee_adv() calls. This will be needed by the next patch. Signed-off-by:
Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by:
Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by:
Paolo Abeni <pabeni@redhat.com>
-
- Feb 22, 2023
-
-
Serge Semin authored
When CONFIG_DW_EDMA=m, dw_edma_probe() is built as a module. Previously edma.h declared it as extern, but the implementation isn't available for builtin callers. A subsequent commit will add calls from dw_pcie_host_init() and dw_pcie_ep_init(), which can only be built-in. Make it safe for such builtin callers to call dw_edma_probe() by using IS_REACHABLE() to define a stub when CONFIG_DW_EDMA=m. When CONFIG_DW_EDMA=m, these builtin callers will fail to detect and register eDMA devices, so eDMA won't be usable even if the dw-edma module is loaded. [bhelgaas: split to separate patch, commit log] Link: https://lore.kernel.org/r/20230113171409.30470-25-Sergey.Semin@baikalelectronics.ru Signed-off-by:
Serge Semin <Sergey.Semin@baikalelectronics.ru> Signed-off-by:
Lorenzo Pieralisi <lpieralisi@kernel.org> Signed-off-by:
Bjorn Helgaas <bhelgaas@google.com> Acked-by:
Vinod Koul <vkoul@kernel.org>
-
Serge Semin authored
Currently the DW eDMA driver only supports the linked lists memory allocated locally with respect to the remote eDMA engine setup. It means the linked lists will be accessible by the CPU via the MMIO space only. If eDMA is embedded into the DW PCIe Root Ports or local Endpoints (which support will be added in subsequent commits) the linked lists are supposed to be allocated in the CPU memory. In that case the LL-entries can be directly accessed, while the former case implies using the MMIO accessors for that. In order to have both cases supported by the driver, the dw_edma_region descriptor should be fixed to contain the MMIO-backed and just memory-based virtual addresses. The linked lists initialization procedure will use one of them depending on the eDMA device nature. If the eDMA engine is embedded into the local DW PCIe Root Port/Endpoint controllers, the list entries will be directly accessed by referencing the corresponding structure fields. Otherwise the MMIO accessors usage will be preserved. Link: https://lore.kernel.org/r/20230113171409.30470-24-Sergey.Semin@baikalelectronics.ru Signed-off-by:
Serge Semin <Sergey.Semin@baikalelectronics.ru> Signed-off-by:
Lorenzo Pieralisi <lpieralisi@kernel.org> Signed-off-by:
Bjorn Helgaas <bhelgaas@google.com> Acked-by:
Vinod Koul <vkoul@kernel.org>
-
Jens Axboe authored
This better describes what it does - it's incremented when the task is currently undergoing a cancelation operation, due to exiting or exec'ing. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Bjorn Andersson authored
Revert the postponement of clk_disable_unused() for clock providers that implement sync_state, and the change to drivers implementing this, until agreement on the implementation has been reached. This reverts: 29e31415 ("clk: qcom: Remove need for clk_ignore_unused on sc8280xp") 99c0f7d3 ("clk: qcom: sdm845: Use generic clk_sync_state_disable_unused callback") 26b36df7 ("clk: Add generic sync_state callback for disabling unused clocks") Requested-by:
Stephen Boyd <sboyd@kernel.org> Signed-off-by:
Bjorn Andersson <andersson@kernel.org>
-
Andreas Kemnade authored
The EC firmware has a different version number than anything defined until now. Signed-off-by:
Andreas Kemnade <andreas@kemnade.info> Signed-off-by:
Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20230127165828.3256170-1-andreas@kemnade.info
-
Arnd Bergmann authored
Four separate mfd drivers are in the "tmio" family, and all of them were used in now-removed PXA machines (eseries, tosa, and hx4700), so the mfd drivers and all its children can be removed as well. Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20230105134622.254560-19-arnd@kernel.org
-
Uwe Kleine-König authored
There is no in-tree user left which relies on legacy probing. So drop support for it which removes another user of the deprecated pwm_request() function. Signed-off-by:
Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by:
Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by:
Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20221117072151.3789691-1-u.kleine-koenig@pengutronix.de
-
Geert Uytterhoeven authored
Fix a misspelling of "complement". Signed-off-by:
Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by:
Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/aa7abd7103a0e4be954ea63de78f12e8251b2964.1673271092.git.geert+renesas@glider.be
-
Andreas Kemnade authored
TWL6032 has a few charging registers prepended before the charging registers the TWL6030 has. To be able to use common register defines declare the additional registers as additional module. At the moment this affects the access to CHARGERUSB_CTRL1 in phy-twl6030-usb. Without this patch, it is accessing the wrong register on TWL6032. The consequence is that presence of Vbus is not reported. Cc: Bin Liu <b-liu@ti.com> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by:
Andreas Kemnade <andreas@kemnade.info> Signed-off-by:
Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20221208215723.217557-1-andreas@kemnade.info
-
Aren Moynihan authored
The power button can get "stuck" if the rising edge and falling edge irq are read in the same pass. This can often be triggered when resuming from suspend if the power button is released before the kernel handles the interrupt. Swapping the order of the rise and fall events makes sure that the press event is handled first, which prevents this situation. Signed-off-by:
Aren Moynihan <aren@peacevolution.org> Reviewed-by:
Samuel Holland <samuel@sholland.org> Tested-by:
Samuel Holland <samuel@sholland.org> Acked-by:
Chen-Yu Tsai <wens@csie.org> Signed-off-by:
Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20221208220225.635414-1-aren@peacevolution.org
-
Bart Van Assche authored
Allow SCSI LLDs to specify SCMD_* flags. Link: https://lore.kernel.org/r/20230210193258.4004923-2-bvanassche@acm.org Cc: Mike Christie <michael.christie@oracle.com> Cc: John Garry <john.g.garry@oracle.com> Reviewed-by:
John Garry <john.g.garry@oracle.com> Signed-off-by:
Bart Van Assche <bvanassche@acm.org> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
Kees Cook authored
Nothing else defined MPI3_NVME_ENCAP_CMD_MAX, so the "command" buffer was being defined as a fake flexible array of size 1. Replace this with a proper flex array. Avoids this GCC 13 warning under -fstrict-flex-arrays=3: In function 'fortify_memset_chk', inlined from 'mpi3mr_build_nvme_sgl' at ../drivers/scsi/mpi3mr/mpi3mr_app.c:693:2, inlined from 'mpi3mr_bsg_process_mpt_cmds.constprop' at ../drivers/scsi/mpi3mr/mpi3mr_app.c:1214:8: ../include/linux/fortify-string.h:430:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning] 430 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Link: https://lore.kernel.org/r/20230204183715.never.937-kees@kernel.org Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: Himanshu Madhani <himanshu.madhani@oracle.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: kernel test robot <lkp@intel.com> Signed-off-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
Florian Westphal authored
pernet tracking doesn't work correctly because other netns might have set NETLINK_LISTEN_ALL_NSID on its event socket. In this case its expected that events originating in other net namespaces are also received. Making pernet-tracking work while also honoring NETLINK_LISTEN_ALL_NSID requires much more intrusive changes both in netlink and nfnetlink, f.e. adding a 'setsockopt' callback that lets nfnetlink know that the event socket entered (or left) ALL_NSID mode. Move to global tracking instead: if there is an event socket anywhere on the system, all net namespaces which have conntrack enabled and use autobind mode will allocate the ecache extension. netlink_has_listeners() returns false only if the given group has no subscribers in any net namespace, the 'net' argument passed to nfnetlink_has_listeners is only used to derive the protocol (nfnetlink), it has no other effect. For proper NETLINK_LISTEN_ALL_NSID-aware pernet tracking of event listeners a new netlink_has_net_listeners() is also needed. Fixes: 90d1daa4 ("netfilter: conntrack: add nf_conntrack_events autodetect mode") Reported-by:
Bryce Kahle <bryce.kahle@datadoghq.com> Signed-off-by:
Florian Westphal <fw@strlen.de> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Souradeep Chowdhury authored
The Data Capture and Compare(DCC) is a debugging tool that uses the bootconfig for configuring the register values during boot-time. Increase the max nodes supported by bootconfig to cater to the requirements of the Data Capture and Compare Driver. Link: https://lore.kernel.org/all/1674536682-18404-1-git-send-email-quic_schowdhu@quicinc.com/ Signed-off-by:
Souradeep Chowdhury <quic_schowdhu@quicinc.com> Acked-by:
Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by:
Masami Hiramatsu (Google) <mhiramat@kernel.org>
-
Adrien Thierry authored
During UFS initialization, devfreq initialization is asynchronous: ufshcd_async_scan() calls ufshcd_add_lus(), which in turn initializes devfreq for UFS. The simple ondemand governor is then loaded. If it is built as a module, request_module() is called and throws a warning: WARNING: CPU: 7 PID: 167 at kernel/kmod.c:136 __request_module+0x1e0/0x460 Modules linked in: crct10dif_ce llcc_qcom phy_qcom_qmp_usb ufs_qcom phy_qcom_snps_femto_v2 ufshcd_pltfrm phy_qcom_qmp_combo ufshcd_core phy_qcom_qmp_ufs qcom_wdt socinfo fuse ipv6 CPU: 7 PID: 167 Comm: kworker/u16:3 Not tainted 6.2.0-rc6-00009-g58706f7fb045 #1 Hardware name: Qualcomm SA8540P Ride (DT) Workqueue: events_unbound async_run_entry_fn pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __request_module+0x1e0/0x460 lr : __request_module+0x1d8/0x460 sp : ffff800009323b90 x29: ffff800009323b90 x28: 0000000000000000 x27: 0000000000000000 x26: ffff800009323d50 x25: ffff7b9045f57810 x24: ffff7b9045f57830 x23: ffffdc5a83e426e8 x22: ffffdc5ae80a9818 x21: 0000000000000001 x20: ffffdc5ae7502f98 x19: ffff7b9045f57800 x18: ffffffffffffffff x17: 312f716572667665 x16: 642f7366752e3030 x15: 0000000000000000 x14: 000000000000021c x13: 0000000000005400 x12: ffff7b9042ed7614 x11: ffff7b9042ed7600 x10: 00000000636c0890 x9 : 0000000000000038 x8 : ffff7b9045f2c880 x7 : ffff7b9045f57c68 x6 : 0000000000000080 x5 : 0000000000000000 x4 : 8000000000000000 x3 : 0000000000000000 x2 : 0000000000000000 x1 : ffffdc5ae5d382f0 x0 : 0000000000000001 Call trace: __request_module+0x1e0/0x460 try_then_request_governor+0x7c/0x100 devfreq_add_device+0x4b0/0x5fc ufshcd_async_scan+0x1d4/0x310 [ufshcd_core] async_run_entry_fn+0x34/0xe0 process_one_work+0x1d0/0x320 worker_thread+0x14c/0x444 kthread+0x10c/0x110 ret_from_fork+0x10/0x20 This occurs because synchronous module loading from async is not allowed. According to __request_module(): /* * We don't allow synchronous module loading from async. Module * init may invoke async_synchronize_full() which will end up * waiting for this task which already is waiting for the module * loading to complete, leading to a deadlock. */ Such a deadlock was experienced on the Qualcomm QDrive3/sa8540p-ride. With DEVFREQ_GOV_SIMPLE_ONDEMAND=m, the boot hangs after the warning. Fix both the warning and the deadlock by moving devfreq initialization out of the async routine. Tested on the sa8540p-ride by using fio to put the UFS under load, and printing the trace generated by /sys/kernel/tracing/events/ufs/ufshcd_clk_scaling events. The trace looks similar with and without the change. Link: https://lore.kernel.org/r/20230217194423.42553-1-athierry@redhat.com Signed-off-by:
Adrien Thierry <athierry@redhat.com> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
Muneendra authored
The LLDD and the stack currently process FPINs received from the fabric, but the stack is not aware of any action taken by the driver to alleviate congestion. The current interface between the driver and the SCSI stack is limited to passing the notification mainly for statistics and heuristics. The reaction to an FPIN could be handled either by the driver or by the stack (marginal path and failover). Amend the interface to indicate if action on an FPIN has already been reacted to by the LLDDs or not. Add an additional flag to fc_host_fpin_rcv() to indicate if the FPIN has been acknowledged/reacted to by the driver. Also added a new event code FCH_EVT_LINK_FPIN_ACK to notify to the user that the event has been acknowledged/reacted by the LLDD driver Link: https://lore.kernel.org/r/20230209034326.882514-1-muneendra.kumar@broadcom.com Co-developed-by:
Anil Gurumurthy <agurumurthy@marvell.com> Signed-off-by:
Anil Gurumurthy <agurumurthy@marvell.com> Co-developed-by:
Nilesh Javali <njavali@marvell.com> Signed-off-by:
Nilesh Javali <njavali@marvell.com> Signed-off-by:
Muneendra <muneendra.kumar@broadcom.com> Reviewed-by:
James Smart <jsmart2021@gmail.com> Reviewed-by:
Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by:
Ewan D. Milne <emilne@redhat.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- Feb 21, 2023
-
-
Dave Hansen authored
The results of "access_ok()" can be mis-speculated. The result is that you can end speculatively: if (access_ok(from, size)) // Right here even for bad from/size combinations. On first glance, it would be ideal to just add a speculation barrier to "access_ok()" so that its results can never be mis-speculated. But there are lots of system calls just doing access_ok() via "copy_to_user()" and friends (example: fstat() and friends). Those are generally not problematic because they do not _consume_ data from userspace other than the pointer. They are also very quick and common system calls that should not be needlessly slowed down. "copy_from_user()" on the other hand uses a user-controller pointer and is frequently followed up with code that might affect caches. Take something like this: if (!copy_from_user(&kernelvar, uptr, size)) do_something_with(kernelvar); If userspace passes in an evil 'uptr' that *actually* points to a kernel addresses, and then do_something_with() has cache (or other) side-effects, it could allow userspace to infer kernel data values. Add a barrier to the common copy_from_user() code to prevent mis-speculated values which happen after the copy. Also add a stub for architectures that do not define barrier_nospec(). This makes the macro usable in generic code. Since the barrier is now usable in generic code, the x86 #ifdef in the BPF code can also go away. Reported-by:
Jordy Zomer <jordyzomer@google.com> Suggested-by:
Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by:
Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by:
Thomas Gleixner <tglx@linutronix.de> Acked-by: Daniel Borkmann <daniel@iogearbox.net> # BPF bits Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Ilias Apalodimas authored
When reading the page_pool code the first impression is that keeping two separate counters, one being the page refcnt and the other being fragment pp_frag_count, is counter-intuitive. However without that fragment counter we don't know when to reliably destroy or sync the outstanding DMA mappings. So let's add a comment explaining this part. Reviewed-by:
Alexander Duyck <alexanderduyck@fb.com> Signed-off-by:
Ilias Apalodimas <ilias.apalodimas@linaro.org> Acked-by:
Jesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/r/20230217222130.85205-1-ilias.apalodimas@linaro.org Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Juhyung Park authored
bdev_get_queue() never returns NULL. Several commits [1][2] have been made before to remove such superfluous checks, but some still remained. For places where bdev_get_queue() is called solely for NULL checks, it is removed entirely. [1] commit ec9fd2a1 ("blk-lib: don't check bdev_get_queue() NULL check") [2] commit fea127b3 ("block: remove superfluous check for request queue in bdev_is_zoned()") Signed-off-by:
Juhyung Park <qkrwngud825@gmail.com> Reviewed-by:
Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20230203024029.48260-1-qkrwngud825@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Paul Blakey authored
For drivers to support partial offload of a filter's action list, add support for action miss to specify an action instance to continue from in sw. CT action in particular can't be fully offloaded, as new connections need to be handled in software. This imposes other limitations on the actions that can be offloaded together with the CT action, such as packet modifications. Assign each action on a filter's action list a unique miss_cookie which drivers can then use to fill action_miss part of the tc skb extension. On getting back this miss_cookie, find the action instance with relevant cookie and continue classifying from there. Signed-off-by:
Paul Blakey <paulb@nvidia.com> Reviewed-by:
Jiri Pirko <jiri@nvidia.com> Reviewed-by:
Simon Horman <simon.horman@corigine.com> Reviewed-by:
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by:
Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Paul Blakey authored
struct tc_action->act_cookie is a user defined cookie, and the related struct flow_action_entry->act_cookie is used as an handle similar to struct flow_cls_offload->cookie. Rename tc_action->act_cookie to user_cookie, and flow_action_entry->act_cookie to cookie so their names would better fit their usage. Signed-off-by:
Paul Blakey <paulb@nvidia.com> Reviewed-by:
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Kees Cook authored
The call "skb_copy_from_linear_data(skb, inl + 1, spc)" triggers a FORTIFY memcpy() warning on ppc64 platform: In function ‘fortify_memcpy_chk’, inlined from ‘skb_copy_from_linear_data’ at ./include/linux/skbuff.h:4029:2, inlined from ‘build_inline_wqe’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:722:4, inlined from ‘mlx4_en_xmit’ at drivers/net/ethernet/mellanox/mlx4/en_tx.c:1066:3: ./include/linux/fortify-string.h:513:25: error: call to ‘__write_overflow_field’ declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning] 513 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Same behaviour on x86 you can get if you use "__always_inline" instead of "inline" for skb_copy_from_linear_data() in skbuff.h The call here copies data into inlined tx destricptor, which has 104 bytes (MAX_INLINE) space for data payload. In this case "spc" is known in compile-time but the destination is used with hidden knowledge (real structure of destination is different from that the compiler can see). That cause the fortify warning because compiler can check bounds, but the real bounds are different. "spc" can't be bigger than 64 bytes (MLX4_INLINE_ALIGN), so the data can always fit into inlined tx descriptor. The fact that "inl" points into inlined tx descriptor is determined earlier in mlx4_en_xmit(). Avoid confusing the compiler with "inl + 1" constructions to get to past the inl header by introducing a flexible array "data" to the struct so that the compiler can see that we are not dealing with an array of inl structs, but rather, arbitrary data following the structure. There are no changes to the structure layout reported by pahole, and the resulting machine code is actually smaller. Reported-by:
Josef Oskera <joskera@redhat.com> Link: https://lore.kernel.org/lkml/20230217094541.2362873-1-joskera@redhat.com Fixes: f68f2ff9 ("fortify: Detect struct member overflows in memcpy() at compile-time") Cc: Yishai Hadas <yishaih@nvidia.com> Signed-off-by:
Kees Cook <keescook@chromium.org> Reviewed-by:
Tariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/r/20230218183842.never.954-kees@kernel.org Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Shunsuke Mie authored
Probably it is a simple copy error from struct vring_iov. Signed-off-by:
Shunsuke Mie <mie@igel.co.jp> Message-Id: <20230202104248.2040652-1-mie@igel.co.jp> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
Jason Wang authored
This patch introduces a new method to query the dma device that is use for a specific virtqueue. Reviewed-by:
Eli Cohen <elic@nvidia.com> Tested-by:
Eli Cohen <elic@nvidia.com> Signed-off-by:
Jason Wang <jasowang@redhat.com> Message-Id: <20230119061525.75068-3-jasowang@redhat.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-
Jason Wang authored
This patch introduces a per virtqueue dma device. This will be used for virtio devices whose virtqueue are backed by different underlayer devices. One example is the vDPA that where the control virtqueue could be implemented through software mediation. Some of the work are actually done before since the helper like vring_dma_device(). This work left are: - Let vring_dma_device() return the per virtqueue dma device instead of the vdev's parent. - Allow passing a dma_device when creating the virtqueue through a new helper, old vring creation helper will keep using vdev's parent. Reviewed-by:
Eli Cohen <elic@nvidia.com> Tested-by:
Eli Cohen <elic@nvidia.com> Signed-off-by:
Jason Wang <jasowang@redhat.com> Message-Id: <20230119061525.75068-2-jasowang@redhat.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com>
-