- May 01, 2023
-
-
Drew Fustini authored
-
Drew Fustini authored
-
Drew Fustini authored
Add support for the sqoscfg CSR defined in the Ssqosid ISA extension (Supervisor-mode Quality of Service ID). The CSR contains two fields: - Resource Control ID (RCID) used determine resource allocation - Monitoring Counter ID (MCID) used to track resource usage Requests from a hart to shared resources like cache will be tagged with these IDs. This allows the usage of shared resources to be associated with the task currently running on the hart. A sqoscfg field is added to thread_struct and has the same format as the sqoscfg CSR. This allows the scheduler to set the hart's sqoscfg CSR to contain the RCID and MCID for the task that is being scheduled in. The sqoscfg CSR is only written to if the thread_struct.sqoscfg is different from the current value of the CSR. A per-cpu variable cpu_sqoscfg is used to mirror that state of the CSR. This is because access to L1D hot memory should be several times faster than a CSR read. Also, in the case of virtualization, accesses to this CSR are trapped in the hypervisor. Link: https://github.com/riscv-non-isa/riscv-cbqri/blob/main/riscv-cbqri.pdf Co-developed-by:
Kornel Dulęba <mindal@semihalf.com> Signed-off-by:
Kornel Dulęba <mindal@semihalf.com> Signed-off-by:
Drew Fustini <dfustini@baylibre.com>
-
Detect the Ssqosid extension (Supervisor-mode Quality of Service ID) as defined in the CBQRI (Capacity and Bandwidth QoS Register Interface) specification. Link: https://github.com/riscv-non-isa/riscv-cbqri/blob/main/riscv-cbqri.pdf Signed-off-by:
Kornel Dulęba <mindal@semihalf.com> [dfustini: rebase from v6.0 to v6.3] Signed-off-by:
Drew Fustini <dfustini@baylibre.com>
-
- Apr 30, 2023
-
-
Drew Fustini authored
This RFC series adds initial support for the Ssqosid extension and the sqoscfg CSR as specified in Chapter 2 of the RISC-V Capacity and Bandwidth Controller QoS Register Interface (CBQRI) specification [1]. QoS (Quality of Service) in this context is concerned with shared resources on an SoC such as cache capacity and memory bandwidth. Intel and AMD already have QoS features on x86, and there is an existing user interface in Linux: the resctrl virtual filesystem [2]. The sqoscfg CSR provides a mechanism by which a software workload (e.g. a process or a set of processes) can be associated with a resource control ID (RCID) and a monitoring counter ID (MCID) that accompanies each request made by the hart to shared resources like cache. CBQRI defines operations to configure resource usage limits, in the form of capacity or bandwidth, for an RCID. CBQRI also defines operations to configure counters to track the resource utilization of an MCID. The CBQRI spec is still in draft state and is undergoing review [3]. It is possible there will be changes to the Ssqosid extension and the CBQRI spec. For example, the CSR address for sqoscfg is not yet finalized. My goal for this RFC is to determine if the 2nd patch is an acceptable approach to handling sqoscfg when switching tasks. This RFC was tested against a QEMU branch that implements the Ssqosid extension [4]. A test driver [5] was used to set sqoscfg for the current process. This allows __switch_to_sqoscfg() to be tested without resctrl. This series is based on riscv/for-next at: b09313dd ("RISC-V: hwprobe: Explicity check for -1 in vdso init") Changes from v1: - change DEFINE_PER_CPU to DECLARE_PER_CPU for cpu_sqoscfg in qos.h to prevent linking error about multiple definition. Move DEFINE_PER_CPU for cpu_sqoscfg into qos.c - renamed qos prefix in function names to sqoscfg to be less generic - handle sqoscfg the same way has_vector and has_fpu are handled in the vector patch series [6] [1] https://github.com/riscv-non-isa/riscv-cmqri/blob/main/riscv-cbqri.pdf [2] https://docs.kernel.org/x86/resctrl.html [3] https://lists.riscv.org/g/tech-cbqri/message/38 [4] https://gitlab.baylibre.com/baylibre/qemu/-/tree/riscv-cbqri-rfc-v2 [5] https://gitlab.baylibre.com/baylibre/linux/-/tree/riscv-sqoscfg-rfc-v2 [6] https://lore.kernel.org/linux-riscv/20230414155843.12963-1-andy.chiu@sifive.com/ --- Changes in v3: - EDITME: describe what is new in this series revision. - EDITME: use bulletpoints and terse descriptions. - Link to v2: https://lore.kernel.org/r/20230430-riscv-cbqri-rfc-v2-v2-0-8e3725c4a473@baylibre.com --- b4-submit-tracking --- # This section is used internally by b4 prep for tracking purposes. { "series": { "revision": 3, "change-id": "20230430-riscv-cbqri-rfc-v2-b007fcd19549", "base-branch": "riscv-cbqri-rfc-v2", "prefixes": [ "RFC" ], "history": { "v2": [ "20230430-riscv-cbqri-rfc-v2-v2-0-8e3725c4a473@baylibre.com" ] } } }
-
- Apr 26, 2023
-
-
Andrew Jones authored
id_bitsmash is unsigned. We need to explicitly check for -1, rather than use > 0. Fixes: aa5af0aa ("RISC-V: Add hwprobe vDSO function and data") Signed-off-by:
Andrew Jones <ajones@ventanamicro.com> Reviewed-by:
Conor Dooley <conor.dooley@microchip.com> Reviewed-by:
Evan Green <evan@rivosinc.com> Link: https://lore.kernel.org/r/20230426141333.10063-3-ajones@ventanamicro.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Andrew Jones authored
Only capture the first cpu_id in order for the comparison below to be of any use. Fixes: ea3de9ce ("RISC-V: Add a syscall for HW probing") Signed-off-by:
Andrew Jones <ajones@ventanamicro.com> Reviewed-by:
Conor Dooley <conor.dooley@microchip.com> Reviewed-by:
Evan Green <evan@rivosinc.com> Link: https://lore.kernel.org/r/20230426141333.10063-2-ajones@ventanamicro.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Alexandre Ghiti authored
Add 2 early command line parameters that allow to downgrade satp mode (using the same naming as x86): - "no5lvl": use a 4-level page table (down from sv57 to sv48) - "no4lvl": use a 3-level page table (down from sv57/sv48 to sv39) Note that going through the device tree to get the kernel command line works with ACPI too since the efi stub creates a device tree anyway with the command line. In KASAN kernels, we can't use the libfdt that early in the boot process since we are not ready to execute instrumented functions. So instead of using the "generic" libfdt, we compile our own versions of those functions that are not instrumented and that are prefixed so that they do not conflict with the generic ones. We also need the non-instrumented versions of the string functions and the prefixed versions of memcpy/memmove. This is largely inspired by commit aacd149b ("arm64: head: avoid relocating the kernel twice for KASLR") from which I removed compilation flags that were not relevant to RISC-V at the moment (LTO, SCS). Also note that we have to link with -z norelro to avoid ld.lld to throw a warning with the new .got sections, like in commit 311bea3c ("arm64: link with -z norelro for LLD or aarch64-elf"). Signed-off-by:
Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by:
Björn Töpel <bjorn@rivosinc.com> Reviewed-by:
Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20230424092313.178699-2-alexghiti@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Conor Dooley authored
Dumping the dtb from new versions of QEMU warns that sv57 is an undocumented mmu-type. The kernel has supported sv57 for about a year, so bring it into the fold. Signed-off-by:
Conor Dooley <conor.dooley@microchip.com> Acked-by:
Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20230424-rival-habitual-478567c516f0@spud Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Evan Green authored
probe_vendor_features() is now called from smp_callin(), which is not __init code and runs during cpu hotplug events. Remove the __init_or_module decoration from it and the functions it calls to avoid walking into outer space. Fixes: 62a31d6e ("RISC-V: hwprobe: Support probing of misaligned access performance") Signed-off-by:
Evan Green <evan@rivosinc.com> Reviewed-by:
Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230420194934.1871356-1-evan@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
- Apr 19, 2023
-
-
Palmer Dabbelt authored
Alexandre Ghiti <alexghiti@rivosinc.com> says: After multiple attempts, this patchset is now based on the fact that the 64b kernel mapping was moved outside the linear mapping. The first patch allows to build relocatable kernels but is not selected by default. That patch is a requirement for KASLR. The second and third patches take advantage of an already existing powerpc script that checks relocations at compile-time, and uses it for riscv. * b4-shazam-merge: riscv: Use --emit-relocs in order to move .rela.dyn in init riscv: Check relocations at compile time powerpc: Move script to check relocations at compile time in scripts/ riscv: Introduce CONFIG_RELOCATABLE riscv: Move .rela.dyn outside of init to avoid empty relocations riscv: Prepare EFI header for relocatable kernels Link: https://lore.kernel.org/r/20230329045329.64565-1-alexghiti@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Alexandre Ghiti authored
To circumvent an issue where placing the relocations inside the init sections produces empty relocations, use --emit-relocs. But to avoid carrying those relocations in vmlinux, use an intermediate vmlinux.relocs file which is a copy of vmlinux *before* stripping its relocations. Suggested-by:
Björn Töpel <bjorn@kernel.org> Suggested-by:
Nick Desaulniers <ndesaulniers@google.com> Signed-off-by:
Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20230329045329.64565-7-alexghiti@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Alexandre Ghiti authored
Relocating kernel at runtime is done very early in the boot process, so it is not convenient to check for relocations there and react in case a relocation was not expected. There exists a script in scripts/ that extracts the relocations from vmlinux that is then used at postlink to check the relocations. Signed-off-by:
Alexandre Ghiti <alex@ghiti.fr> Reviewed-by:
Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20230329045329.64565-6-alexghiti@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Alexandre Ghiti authored
Relocating kernel at runtime is done very early in the boot process, so it is not convenient to check for relocations there and react in case a relocation was not expected. Powerpc architecture has a script that allows to check at compile time for such unexpected relocations: extract the common logic to scripts/ so that other architectures can take advantage of it. Signed-off-by:
Alexandre Ghiti <alex@ghiti.fr> Reviewed-by:
Anup Patel <anup@brainfault.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Link: https://lore.kernel.org/r/20230329045329.64565-5-alexghiti@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Alexandre Ghiti authored
This config allows to compile 64b kernel as PIE and to relocate it at any virtual address at runtime: this paves the way to KASLR. Runtime relocation is possible since relocation metadata are embedded into the kernel. Note that relocating at runtime introduces an overhead even if the kernel is loaded at the same address it was linked at and that the compiler options are those used in arm64 which uses the same RELA relocation format. Signed-off-by:
Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20230329045329.64565-4-alexghiti@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Alexandre Ghiti authored
This is a preparatory patch for relocatable kernels: .rela.dyn should be in .init but doing so actually produces empty relocations, so this should be a temporary commit until we find a solution. This issue was reported here [1]. [1] https://lore.kernel.org/all/4a6fc7a3-9697-a49b-0941-97f32194b0d7@ghiti.fr/ . Signed-off-by:
Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20230329045329.64565-3-alexghiti@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Alexandre Ghiti authored
ld does not handle relocations correctly as explained here [1], a fix for that was proposed by Nelson there but we have to support older toolchains and then provide this fix. Note that llvm does not need this fix and is then excluded. [1] https://sourceware.org/pipermail/binutils/2023-March/126690.html Signed-off-by:
Alexandre Ghiti <alexghiti@rivosinc.com> Link: https://lore.kernel.org/r/20230329045329.64565-2-alexghiti@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Palmer Dabbelt authored
Alexandre Ghiti <alexghiti@rivosinc.com> says: As described in patch 2, our current kasan implementation is intricate, so I tried to simplify the implementation and mimic what arm64/x86 are doing. In addition it fixes UEFI bootflow with a kasan kernel and kasan inline instrumentation: all kasan configurations were tested on a large ubuntu kernel with success with KASAN_KUNIT_TEST and KASAN_MODULE_TEST. inline ubuntu config + uefi: sv39: OK sv48: OK sv57: OK outline ubuntu config + uefi: sv39: OK sv48: OK sv57: OK Actually 1 test always fails with KASAN_KUNIT_TEST that I have to check: KASAN failure expected in "set_bit(nr, addr)", but none occurrred Note that Palmer recently proposed to remove COMMAND_LINE_SIZE from the userspace abi https://lore.kernel.org/lkml/20221211061358.28035-1-palmer@rivosinc.com/T/ so that we can finally increase the command line to fit all kasan kernel parameters. All of this should hopefully fix the syzkaller riscv build that has been failing for a few months now, any test is appreciated and if I can help in any way, please ask. * b4-shazam-merge: riscv: Unconditionnally select KASAN_VMALLOC if KASAN riscv: Fix ptdump when KASAN is enabled riscv: Fix EFI stub usage of KASAN instrumented strcmp function riscv: Move DTB_EARLY_BASE_VA to the kernel address space riscv: Rework kasan population functions riscv: Split early and final KASAN population functions Link: https://lore.kernel.org/r/20230203075232.274282-1-alexghiti@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Alexandre Ghiti authored
If KASAN is enabled, VMAP_STACK depends on KASAN_VMALLOC so enable KASAN_VMALLOC with KASAN so that we can enable VMAP_STACK by default. Signed-off-by:
Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by:
Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20230203075232.274282-7-alexghiti@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Alexandre Ghiti authored
The KASAN shadow region was moved next to the kernel mapping but the ptdump code was not updated and it appears to break the dump of the kernel page table, so fix this by moving the KASAN shadow region in ptdump. Fixes: f7ae0233 ("riscv: Move KASAN mapping next to the kernel mapping") Signed-off-by:
Alexandre Ghiti <alexghiti@rivosinc.com> Tested-by:
Björn Töpel <bjorn@rivosinc.com> Reviewed-by:
Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20230203075232.274282-6-alexghiti@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Alexandre Ghiti authored
The EFI stub must not use any KASAN instrumented code as the kernel proper did not initialize the thread pointer and the mapping for the KASAN shadow region. Avoid using the generic strcmp function, instead use the one in drivers/firmware/efi/libstub/string.c. Signed-off-by:
Alexandre Ghiti <alexghiti@rivosinc.com> Acked-by:
Ard Biesheuvel <ardb@kernel.org> Reviewed-by:
Atish Patra <atishp@rivosinc.com> Link: https://lore.kernel.org/r/20230203075232.274282-5-alexghiti@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Alexandre Ghiti authored
The early virtual address should lie in the kernel address space for inline kasan instrumentation to succeed, otherwise kasan tries to dereference an address that does not exist in the address space (since kasan only maps *kernel* address space, not the userspace). Simply use the very first address of the kernel address space for the early fdt mapping. It allowed an Ubuntu kernel to boot successfully with inline instrumentation. Signed-off-by:
Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by:
Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20230203075232.274282-4-alexghiti@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Alexandre Ghiti authored
Our previous kasan population implementation used to have the final kasan shadow region mapped with kasan_early_shadow_page, because we did not clean the early mapping and then we had to populate the kasan region "in-place" which made the code cumbersome. So now we clear the early mapping, establish a temporary mapping while we populate the kasan shadow region with just the kernel regions that will be used. This new version uses the "generic" way of going through a page table that may be folded at runtime (avoid the XXX_next macros). It was tested with outline instrumentation on an Ubuntu kernel configuration successfully. Signed-off-by:
Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by:
Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20230203075232.274282-3-alexghiti@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Alexandre Ghiti authored
This is a preliminary work that allows to make the code more understandable. Signed-off-by:
Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by:
Björn Töpel <bjorn@rivosinc.com> Link: https://lore.kernel.org/r/20230203075232.274282-2-alexghiti@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Palmer Dabbelt authored
Alexandre Ghiti <alexghiti@rivosinc.com> says: This patchset intends to improve tlb utilization by using hugepages for the linear mapping. As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must take care of isolating the kernel text and rodata so that they are not mapped with a PUD mapping which would then assign wrong permissions to the whole region: it is achieved the same way as arm64 by using the memblock nomap API which isolates those regions and re-merge them afterwards thus avoiding any issue with the system resources tree creation. arch/riscv/include/asm/page.h | 19 ++++++- arch/riscv/mm/init.c | 102 ++++++++++++++++++++++++++-------- arch/riscv/mm/physaddr.c | 16 ++++++ drivers/of/fdt.c | 11 ++-- 4 files changed, 118 insertions(+), 30 deletions(-) * b4-shazam-merge: riscv: Use PUD/P4D/PGD pages for the linear mapping riscv: Move the linear mapping creation in its own function riscv: Get rid of riscv_pfn_base variable Link: https://lore.kernel.org/r/20230324155421.271544-1-alexghiti@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Alexandre Ghiti authored
During the early page table creation, we used to set the mapping for PAGE_OFFSET to the kernel load address: but the kernel load address is always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD pages as this physical address is not aligned on PUD/P4D/PGD size (whereas PAGE_OFFSET is). But actually we don't have to establish this mapping (ie set va_pa_offset) that early in the boot process because: - first, setup_vm installs a temporary kernel mapping and among other things, discovers the system memory, - then, setup_vm_final creates the final kernel mapping and takes advantage of the discovered system memory to create the linear mapping. During the first phase, we don't know the start of the system memory and then until the second phase is finished, we can't use the linear mapping at all and phys_to_virt/virt_to_phys translations must not be used because it would result in a different translation from the 'real' one once the final mapping is installed. So here we simply delay the initialization of va_pa_offset to after the system memory discovery. But to make sure noone uses the linear mapping before, we add some guard in the DEBUG_VIRTUAL config. Finally we can use PUD/P4D/PGD hugepages when possible, which will result in a better TLB utilization. Note that: - this does not apply to rv32 as the kernel mapping lies in the linear mapping. - we rely on the firmware to protect itself using PMP. Signed-off-by:
Alexandre Ghiti <alexghiti@rivosinc.com> Acked-by: Rob Herring <robh@kernel.org> # DT bits Reviewed-by:
Andrew Jones <ajones@ventanamicro.com> Reviewed-by:
Anup Patel <anup@brainfault.org> Tested-by:
Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20230324155421.271544-4-alexghiti@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Alexandre Ghiti authored
No change intended, it just splits the linear mapping creation from setup_vm_final: this prepares for upcoming additions to the linear mapping creation. Signed-off-by:
Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by:
Andrew Jones <ajones@ventanamicro.com> Reviewed-by:
Anup Patel <anup@brainfault.org> Tested-by:
Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20230324155421.271544-3-alexghiti@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Alexandre Ghiti authored
Use directly phys_ram_base instead, riscv_pfn_base is just the pfn of the address contained in phys_ram_base. Even if there is no functional change intended in this patch, actually setting phys_ram_base that early changes the behaviour of kernel_mapping_pa_to_va during the early boot: phys_ram_base used to be zero before this patch and now it is set to the physical start address of the kernel. But it does not break the conversion of a kernel physical address into a virtual address since kernel_mapping_pa_to_va should only be used on kernel physical addresses, i.e. addresses greater than the physical start address of the kernel. Signed-off-by:
Alexandre Ghiti <alexghiti@rivosinc.com> Reviewed-by:
Andrew Jones <ajones@ventanamicro.com> Reviewed-by:
Anup Patel <anup@brainfault.org> Tested-by:
Anup Patel <anup@brainfault.org> Link: https://lore.kernel.org/r/20230324155421.271544-2-alexghiti@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Conor Dooley authored
Other extensions only capitalise the first letter in the text visible in Kconfig menus, and provide a short comment about the extension's meaning. Do the same for Svnapot & Svpbmt. The precedent for capitalisation in the Kconfig text was set by Zicbom & sorta followed for Zicboz. The RVI styling used for multi-letter extensions only capitalises the first letter, so do the same here. If nothing else, my OCD likes it when the extensions follow a consistent pattern. While editing one of the lines, reformat the "spelling" of 64-bit. Signed-off-by:
Conor Dooley <conor.dooley@microchip.com> Reviewed-by:
Andrew Jones <ajones@ventanamicro.com> Link: https://lore.kernel.org/r/20230405-pucker-cogwheel-3a999a94a2f2@wendy Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Song Shuai authored
RISC-V now builds the sched domain based on the simple possible map. Enable SCHED_MC to make the building based on cpu_coregroup_mask() which also takes care of the NUMA and cores with LLC. Signed-off-by:
Song Shuai <suagrfillet@gmail.com> Acked-by:
Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230310110336.970985-1-suagrfillet@gmail.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Song Shuai authored
RISC-V now manages CPU topology using arch_topology which provides CPU capacity and frequency related interfaces to access the cpu/freq invariant in possible heterogeneous or DVFS-enabled platforms. Here adds topology.h file to export the arch_topology interfaces for replacing the scheduler's constant-based cpu/freq invariant accounting. Signed-off-by:
Song Shuai <suagrfillet@gmail.com> Reviewed-by:
Andrew Jones <ajones@ventanamicro.com> Reviewed-by:
Ley Foon Tan <lftan@kernel.org> Reviewed-by:
Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230323123924.3032174-1-suagrfillet@gmail.com [Palmer: Fix the whitespace issues.] Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Palmer Dabbelt authored
Evan Green <evan@rivosinc.com> says: There's been a bunch of off-list discussions about this, including at Plumbers. The original plan was to do something involving providing an ISA string to userspace, but ISA strings just aren't sufficient for a stable ABI any more: in order to parse an ISA string users need the version of the specifications that the string is written to, the version of each extension (sometimes at a finer granularity than the RISC-V releases/versions encode), and the expected use case for the ISA string (ie, is it a U-mode or M-mode string). That's a lot of complexity to try and keep ABI compatible and it's probably going to continue to grow, as even if there's no more complexity in the specifications we'll have to deal with the various ISA string parsing oddities that end up all over userspace. Instead this patch set takes a very different approach and provides a set of key/value pairs that encode various bits about the system. The big advantage here is that we can clearly define what these mean so we can ensure ABI stability, but it also allows us to encode information that's unlikely to ever appear in an ISA string (see the misaligned access performance, for example). The resulting interface looks a lot like what arm64 and x86 do, and will hopefully fit well into something like ACPI in the future. The actual user interface is a syscall, with a vDSO function in front of it. The vDSO function can answer some queries without a syscall at all, and falls back to the syscall for cases it doesn't have answers to. Currently we prepopulate it with an array of answers for all keys and a CPU set of "all CPUs". This can be adjusted as necessary to provide fast answers to the most common queries. An example series in glibc exposing this syscall and using it in an ifunc selector for memcpy can be found at [1]. I was asked about the performance delta between this and something like sysfs. I created a small test program and ran it on a Nezha D1 Allwinner board. Doing each operation 100000 times and dividing, these operations take the following amount of time: - open()+read()+close() of /sys/kernel/cpu_byteorder: 3.8us - access("/sys/kernel/cpu_byteorder", R_OK): 1.3us - riscv_hwprobe() vDSO and syscall: .0094us - riscv_hwprobe() vDSO with no syscall: 0.0091us These numbers get farther apart if we query multiple keys, as sysfs will scale linearly with the number of keys, where the dedicated syscall stays the same. To frame these numbers, I also did a tight fork/exec/wait loop, which I measured as 4.8ms. So doing 4 open/read/close operations is a delta of about 0.3%, versus a single vDSO call is a delta of essentially zero. [1] https://patchwork.ozlabs.org/project/glibc/list/?series=343050 * b4-shazam-merge: RISC-V: Add hwprobe vDSO function and data selftests: Test the new RISC-V hwprobe interface RISC-V: hwprobe: Support probing of misaligned access performance RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA RISC-V: Add a syscall for HW probing RISC-V: Move struct riscv_cpuinfo to new header Link: https://lore.kernel.org/r/20230407231103.2622178-1-evan@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Evan Green authored
Add a vDSO function __vdso_riscv_hwprobe, which can sit in front of the riscv_hwprobe syscall and answer common queries. We stash a copy of static answers for the "all CPUs" case in the vDSO data page. This data is private to the vDSO, so we can decide later to change what's stored there or under what conditions we defer to the syscall. Currently all data can be discovered at boot, so the vDSO function answers all queries when the cpumask is set to the "all CPUs" hint. There's also a boolean in the data that lets the vDSO function know that all CPUs are the same. In that case, the vDSO will also answer queries for arbitrary CPU masks in addition to the "all CPUs" hint. Signed-off-by:
Evan Green <evan@rivosinc.com> Link: https://lore.kernel.org/r/20230407231103.2622178-7-evan@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Evan Green authored
This adds a test for the recently added RISC-V interface for probing hardware capabilities. It happens to be the first selftest we have for RISC-V, so I've added some infrastructure for those as well. Co-developed-by:
Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by:
Evan Green <evan@rivosinc.com> Link: https://lore.kernel.org/r/20230407231103.2622178-6-evan@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Evan Green authored
This allows userspace to select various routines to use based on the performance of misaligned access on the target hardware. Rather than adding DT bindings, this change taps into the alternatives mechanism used to probe CPU errata. Add a new function pointer alongside the vendor-specific errata_patch_func() that probes for desirable errata (otherwise known as "features"). Unlike the errata_patch_func(), this function is called on each CPU as it comes up, so it can save feature information per-CPU. The T-head C906 has fast unaligned access, both as defined by GCC [1], and in performing a basic benchmark, which determined that byte copies are >50% slower than a misaligned word copy of the same data size (source for this test at [2]): bytecopy size f000 count 50000 offset 0 took 31664899 us wordcopy size f000 count 50000 offset 0 took 5180919 us wordcopy size f000 count 50000 offset 1 took 13416949 us [1] https://github.com/gcc-mirror/gcc/blob/master/gcc/config/riscv/riscv.cc#L353 [2] https://pastebin.com/EPXvDHSW Co-developed-by:
Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by:
Evan Green <evan@rivosinc.com> Reviewed-by:
Heiko Stuebner <heiko.stuebner@vrull.eu> Tested-by:
Heiko Stuebner <heiko.stuebner@vrull.eu> Reviewed-by:
Conor Dooley <conor.dooley@microchip.com> Reviewed-by:
Paul Walmsley <paul.walmsley@sifive.com> Link: https://lore.kernel.org/r/20230407231103.2622178-5-evan@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Evan Green authored
We have an implicit set of base behaviors that userspace depends on, which are mostly defined in various ISA specifications. Co-developed-by:
Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by:
Evan Green <evan@rivosinc.com> Reviewed-by:
Conor Dooley <conor.dooley@microchip.com> Reviewed-by:
Heiko Stuebner <heiko.stuebner@vrull.eu> Tested-by:
Heiko Stuebner <heiko.stuebner@vrull.eu> Reviewed-by:
Paul Walmsley <paul.walmsley@sifive.com> Link: https://lore.kernel.org/r/20230407231103.2622178-4-evan@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Evan Green authored
We don't have enough space for these all in ELF_HWCAP{,2} and there's no system call that quite does this, so let's just provide an arch-specific one to probe for hardware capabilities. This currently just provides m{arch,imp,vendor}id, but with the key-value pairs we can pass more in the future. Co-developed-by:
Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by:
Evan Green <evan@rivosinc.com> Reviewed-by:
Conor Dooley <conor.dooley@microchip.com> Reviewed-by:
Heiko Stuebner <heiko.stuebner@vrull.eu> Tested-by:
Heiko Stuebner <heiko.stuebner@vrull.eu> Reviewed-by:
Paul Walmsley <paul.walmsley@sifive.com> Link: https://lore.kernel.org/r/20230407231103.2622178-3-evan@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
Evan Green authored
In preparation for tracking and exposing microarchitectural details to userspace (like whether or not unaligned accesses are fast), move the riscv_cpuinfo struct out to its own new cpufeatures.h header. It will need to be used by more than just cpu.c. Signed-off-by:
Evan Green <evan@rivosinc.com> Reviewed-by:
Conor Dooley <conor.dooley@microchip.com> Reviewed-by:
Heiko Stuebner <heiko.stuebner@vrull.eu> Tested-by:
Heiko Stuebner <heiko.stuebner@vrull.eu> Reviewed-by:
Paul Walmsley <paul.walmsley@sifive.com> Link: https://lore.kernel.org/r/20230407231103.2622178-2-evan@rivosinc.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
- Apr 12, 2023
-
-
Song Shuai authored
This reverts commit baf7cbd9. There are some duplicate cache attributes populations executed in both ci_leaf_init() and later cache_setup_properties(). Revert the commit baf7cbd9 ("riscv: Set more data to cacheinfo") to setup only the level and type attributes at this early place. Signed-off-by:
Song Shuai <suagrfillet@gmail.com> Acked-by:
Sudeep Holla <sudeep.holla@arm.com> Acked-by:
Conor Dooley <conor.dooley@microchip.com> Link: https://lore.kernel.org/r/20230308064734.512457-1-suagrfillet@gmail.com Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-
- Apr 11, 2023
-
-
Björn Töpel authored
The RISC-V calling convention passes the first argument, and the return value in the a0 register. For this reason, the a0 register needs some extra care; When handling syscalls, the a0 register is saved into regs->orig_a0, so a0 can be properly restored for, e.g. interrupted syscalls. This functionality was broken with the introduction of the generic entry patches. Here, a0 was saved into orig_a0 after calling syscall_enter_from_user_mode(), which can change regs->a0 for some paths, incorrectly restoring a0. This is resolved, by saving a0 prior doing the syscall_enter_from_user_mode() call. Fixes: f0bddf50 ("riscv: entry: Convert to generic entry") Reviewed-by:
Heiko Stuebner <heiko.stuebner@vrull.eu> Tested-by:
Heiko Stuebner <heiko.stuebner@vrull.eu> Signed-off-by:
Björn Töpel <bjorn@rivosinc.com> Reported-by:
Conor Dooley <conor.dooley@microchip.com> Reviewed-by:
Conor Dooley <conor.dooley@microchip.com> Tested-by:
Conor Dooley <conor.dooley@microchip.com> Tested-by:
Geert Uytterhoeven <geert+renesas@glider.be> Tested-by:
Andy Chiu <andy.chiu@sifive.com> Link: https://lore.kernel.org/r/20230403065207.1070974-1-bjorn@kernel.org Signed-off-by:
Palmer Dabbelt <palmer@rivosinc.com>
-