Commits · riscv-sqoscfg-rfc-v2 · BayLibre / Linux

May 01, 2023

DO_NOT_MERGE riscv: qos: print trace information when changing sqoscfg · cd874422
Drew Fustini authored 1 year ago

cd874422
DO_NOT_MERGE riscv: platform driver to set sqoscfg for current task · 558d68d1
Drew Fustini authored 1 year ago

558d68d1

RISC-V: Add support for sqoscfg CSR · 84a6d21b

Drew Fustini authored 1 year ago

Add support for the sqoscfg CSR defined in the Ssqosid ISA extension
(Supervisor-mode Quality of Service ID). The CSR contains two fields:

  - Resource Control ID (RCID) used determine resource allocation
  - Monitoring Counter ID (MCID) used to track resource usage

Requests from a hart to shared resources like cache will be tagged with
these IDs. This allows the usage of shared resources to be associated
with the task currently running on the hart.

A sqoscfg field is added to thread_struct and has the same format as the
sqoscfg CSR. This allows the scheduler to set the hart's sqoscfg CSR to
contain the RCID and MCID for the task that is being scheduled in. The
sqoscfg CSR is only written to if the thread_struct.sqoscfg is different
from the current value of the CSR.

A per-cpu variable cpu_sqoscfg is used to mirror that state of the CSR.
This is because access to L1D hot memory should be several times faster
than a CSR read. Also, in the case of virtualization, accesses to this
CSR are trapped in the hypervisor.

Link: https://github.com/riscv-non-isa/riscv-cbqri/blob/main/riscv-cbqri.pdf


Co-developed-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Kornel Dulęba <mindal@semihalf.com>
Signed-off-by: Drew Fustini <dfustini@baylibre.com>

84a6d21b

RISC-V: Detect the Ssqosid extension · 7c531e6b

Kornel Dulęba authored 1 year ago and

Drew Fustini committed 1 year ago

Detect the Ssqosid extension (Supervisor-mode Quality of Service ID) as
defined in the CBQRI (Capacity and Bandwidth QoS Register Interface)
specification.

Link: https://github.com/riscv-non-isa/riscv-cbqri/blob/main/riscv-cbqri.pdf


Signed-off-by: Kornel Dulęba <mindal@semihalf.com>
[dfustini: rebase from v6.0 to v6.3]
Signed-off-by: Drew Fustini <dfustini@baylibre.com>

7c531e6b

Apr 30, 2023

RISC-V: Detect Ssqosid extension and handle sqoscfg CSR · 1752e680

Drew Fustini authored 1 year ago

This RFC series adds initial support for the Ssqosid extension and the
sqoscfg CSR as specified in Chapter 2 of the RISC-V Capacity and
Bandwidth Controller QoS Register Interface (CBQRI) specification [1].

QoS (Quality of Service) in this context is concerned with shared
resources on an SoC such as cache capacity and memory bandwidth. Intel
and AMD already have QoS features on x86, and there is an existing user
interface in Linux: the resctrl virtual filesystem [2].

The sqoscfg CSR provides a mechanism by which a software workload (e.g.
a process or a set of processes) can be associated with a resource
control ID (RCID) and a monitoring counter ID (MCID) that accompanies
each request made by the hart to shared resources like cache. CBQRI
defines operations to configure resource usage limits, in the form of
capacity or bandwidth, for an RCID. CBQRI also defines operations to
configure counters to track the resource utilization of an MCID.

The CBQRI spec is still in draft state and is undergoing review [3]. It
is possible there will be changes to the Ssqosid extension and the CBQRI
spec. For example, the CSR address for sqoscfg is not yet finalized.

My goal for this RFC is to determine if the 2nd patch is an acceptable
approach to handling sqoscfg when switching tasks. This RFC was tested
against a QEMU branch that implements the Ssqosid extension [4]. A test
driver [5] was used to set sqoscfg for the current process. This allows
__switch_to_sqoscfg() to be tested without resctrl.

This series is based on riscv/for-next at:

 b09313dd ("RISC-V: hwprobe: Explicity check for -1 in vdso init")

Changes from v1:
 - change DEFINE_PER_CPU to DECLARE_PER_CPU for cpu_sqoscfg in qos.h to
   prevent linking error about multiple definition. Move DEFINE_PER_CPU
   for cpu_sqoscfg into qos.c
 - renamed qos prefix in function names to sqoscfg to be less generic
 - handle sqoscfg the same way has_vector and has_fpu are handled in the
   vector patch series [6]

[1] https://github.com/riscv-non-isa/riscv-cmqri/blob/main/riscv-cbqri.pdf
[2] https://docs.kernel.org/x86/resctrl.html
[3] https://lists.riscv.org/g/tech-cbqri/message/38
[4] https://gitlab.baylibre.com/baylibre/qemu/-/tree/riscv-cbqri-rfc-v2
[5] https://gitlab.baylibre.com/baylibre/linux/-/tree/riscv-sqoscfg-rfc-v2
[6] https://lore.kernel.org/linux-riscv/20230414155843.12963-1-andy.chiu@sifive.com/

---
Changes in v3:
- EDITME: describe what is new in this series revision.
- EDITME: use bulletpoints and terse descriptions.
- Link to v2: https://lore.kernel.org/r/20230430-riscv-cbqri-rfc-v2-v2-0-8e3725c4a473@baylibre.com



--- b4-submit-tracking ---
# This section is used internally by b4 prep for tracking purposes.
{
  "series": {
    "revision": 3,
    "change-id": "20230430-riscv-cbqri-rfc-v2-b007fcd19549",
    "base-branch": "riscv-cbqri-rfc-v2",
    "prefixes": [
      "RFC"
    ],
    "history": {
      "v2": [
        "20230430-riscv-cbqri-rfc-v2-v2-0-8e3725c4a473@baylibre.com"
      ]
    }
  }
}

1752e680

Apr 26, 2023

RISC-V: hwprobe: Explicity check for -1 in vdso init · b09313dd

Andrew Jones authored 1 year ago


id_bitsmash is unsigned. We need to explicitly check for -1, rather
than use > 0.

Fixes: aa5af0aa ("RISC-V: Add hwprobe vDSO function and data")
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20230426141333.10063-3-ajones@ventanamicro.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

b09313dd

RISC-V: hwprobe: There can only be one first · 08dc1075

Andrew Jones authored 1 year ago


Only capture the first cpu_id in order for the comparison
below to be of any use.

Fixes: ea3de9ce ("RISC-V: Add a syscall for HW probing")
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20230426141333.10063-2-ajones@ventanamicro.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

08dc1075

riscv: Allow to downgrade paging mode from the command line · 26e7aacb

Alexandre Ghiti authored 1 year ago


Add 2 early command line parameters that allow to downgrade satp mode
(using the same naming as x86):
- "no5lvl": use a 4-level page table (down from sv57 to sv48)
- "no4lvl": use a 3-level page table (down from sv57/sv48 to sv39)

Note that going through the device tree to get the kernel command line
works with ACPI too since the efi stub creates a device tree anyway with
the command line.

In KASAN kernels, we can't use the libfdt that early in the boot process
since we are not ready to execute instrumented functions. So instead of
using the "generic" libfdt, we compile our own versions of those functions
that are not instrumented and that are prefixed so that they do not
conflict with the generic ones. We also need the non-instrumented versions
of the string functions and the prefixed versions of memcpy/memmove.

This is largely inspired by commit aacd149b ("arm64: head: avoid
relocating the kernel twice for KASLR") from which I removed compilation
flags that were not relevant to RISC-V at the moment (LTO, SCS). Also
note that we have to link with -z norelro to avoid ld.lld to throw a
warning with the new .got sections, like in commit 311bea3c ("arm64:
link with -z norelro for LLD or aarch64-elf").

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230424092313.178699-2-alexghiti@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

26e7aacb

dt-bindings: riscv: add sv57 mmu-type · d4dda690

Conor Dooley authored 1 year ago


Dumping the dtb from new versions of QEMU warns that sv57 is an
undocumented mmu-type. The kernel has supported sv57 for about a year,
so bring it into the fold.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20230424-rival-habitual-478567c516f0@spud


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

d4dda690

RISC-V: hwprobe: Remove __init on probe_vendor_features() · bb3f8948

Evan Green authored 1 year ago


probe_vendor_features() is now called from smp_callin(), which is not
__init code and runs during cpu hotplug events. Remove the
__init_or_module decoration from it and the functions it calls to avoid
walking into outer space.

Fixes: 62a31d6e ("RISC-V: hwprobe: Support probing of misaligned access performance")
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230420194934.1871356-1-evan@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

bb3f8948

Apr 19, 2023

Merge patch series "Introduce 64b relocatable kernel" · 310c33dc

Palmer Dabbelt authored 1 year ago

Alexandre Ghiti <alexghiti@rivosinc.com> says:

After multiple attempts, this patchset is now based on the fact that the
64b kernel mapping was moved outside the linear mapping.

The first patch allows to build relocatable kernels but is not selected
by default. That patch is a requirement for KASLR.
The second and third patches take advantage of an already existing powerpc
script that checks relocations at compile-time, and uses it for riscv.

* b4-shazam-merge:
  riscv: Use --emit-relocs in order to move .rela.dyn in init
  riscv: Check relocations at compile time
  powerpc: Move script to check relocations at compile time in scripts/
  riscv: Introduce CONFIG_RELOCATABLE
  riscv: Move .rela.dyn outside of init to avoid empty relocations
  riscv: Prepare EFI header for relocatable kernels

Link: https://lore.kernel.org/r/20230329045329.64565-1-alexghiti@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

310c33dc

riscv: Use --emit-relocs in order to move .rela.dyn in init · 559d1e45

Alexandre Ghiti authored 2 years ago


To circumvent an issue where placing the relocations inside the init
sections produces empty relocations, use --emit-relocs. But to avoid
carrying those relocations in vmlinux, use an intermediate
vmlinux.relocs file which is a copy of vmlinux *before* stripping its
relocations.

Suggested-by: Björn Töpel <bjorn@kernel.org>
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230329045329.64565-7-alexghiti@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

559d1e45

riscv: Check relocations at compile time · c2dea0bc

Alexandre Ghiti authored 2 years ago


Relocating kernel at runtime is done very early in the boot process, so
it is not convenient to check for relocations there and react in case a
relocation was not expected.

There exists a script in scripts/ that extracts the relocations from
vmlinux that is then used at postlink to check the relocations.

Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230329045329.64565-6-alexghiti@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

c2dea0bc

powerpc: Move script to check relocations at compile time in scripts/ · 47981b5c

Alexandre Ghiti authored 2 years ago


Relocating kernel at runtime is done very early in the boot process, so
it is not convenient to check for relocations there and react in case a
relocation was not expected.

Powerpc architecture has a script that allows to check at compile time
for such unexpected relocations: extract the common logic to scripts/
so that other architectures can take advantage of it.

Signed-off-by: Alexandre Ghiti <alex@ghiti.fr>
Reviewed-by: Anup Patel <anup@brainfault.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Link: https://lore.kernel.org/r/20230329045329.64565-5-alexghiti@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

47981b5c

riscv: Introduce CONFIG_RELOCATABLE · 39b33072

Alexandre Ghiti authored 2 years ago


This config allows to compile 64b kernel as PIE and to relocate it at
any virtual address at runtime: this paves the way to KASLR.
Runtime relocation is possible since relocation metadata are embedded into
the kernel.

Note that relocating at runtime introduces an overhead even if the
kernel is loaded at the same address it was linked at and that the compiler
options are those used in arm64 which uses the same RELA relocation
format.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230329045329.64565-4-alexghiti@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

39b33072

riscv: Move .rela.dyn outside of init to avoid empty relocations · 69a90d2f

Alexandre Ghiti authored 2 years ago

This is a preparatory patch for relocatable kernels: .rela.dyn should be
in .init but doing so actually produces empty relocations, so this should
be a temporary commit until we find a solution.

This issue was reported here [1].

[1] https://lore.kernel.org/all/4a6fc7a3-9697-a49b-0941-97f32194b0d7@ghiti.fr/

.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230329045329.64565-3-alexghiti@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

69a90d2f

riscv: Prepare EFI header for relocatable kernels · 55de1e4a

Alexandre Ghiti authored 2 years ago

ld does not handle relocations correctly as explained here [1],
a fix for that was proposed by Nelson there but we have to support older
toolchains and then provide this fix.

Note that llvm does not need this fix and is then excluded.

[1] https://sourceware.org/pipermail/binutils/2023-March/126690.html



Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230329045329.64565-2-alexghiti@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

55de1e4a

Merge patch series "RISC-V kasan rework" · 2667e367

Palmer Dabbelt authored 1 year ago

Alexandre Ghiti <alexghiti@rivosinc.com> says:

As described in patch 2, our current kasan implementation is intricate,
so I tried to simplify the implementation and mimic what arm64/x86 are
doing.

In addition it fixes UEFI bootflow with a kasan kernel and kasan inline
instrumentation: all kasan configurations were tested on a large ubuntu
kernel with success with KASAN_KUNIT_TEST and KASAN_MODULE_TEST.

inline ubuntu config + uefi:
 sv39: OK
 sv48: OK
 sv57: OK

outline ubuntu config + uefi:
 sv39: OK
 sv48: OK
 sv57: OK

Actually 1 test always fails with KASAN_KUNIT_TEST that I have to check:
KASAN failure expected in "set_bit(nr, addr)", but none occurrred

Note that Palmer recently proposed to remove COMMAND_LINE_SIZE from the
userspace abi
https://lore.kernel.org/lkml/20221211061358.28035-1-palmer@rivosinc.com/T/
so that we can finally increase the command line to fit all kasan kernel
parameters.

All of this should hopefully fix the syzkaller riscv build that has been
failing for a few months now, any test is appreciated and if I can help
in any way, please ask.

* b4-shazam-merge:
  riscv: Unconditionnally select KASAN_VMALLOC if KASAN
  riscv: Fix ptdump when KASAN is enabled
  riscv: Fix EFI stub usage of KASAN instrumented strcmp function
  riscv: Move DTB_EARLY_BASE_VA to the kernel address space
  riscv: Rework kasan population functions
  riscv: Split early and final KASAN population functions

Link: https://lore.kernel.org/r/20230203075232.274282-1-alexghiti@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

2667e367

riscv: Unconditionnally select KASAN_VMALLOC if KASAN · 864046c5

Alexandre Ghiti authored 2 years ago

If KASAN is enabled, VMAP_STACK depends on KASAN_VMALLOC so enable
KASAN_VMALLOC with KASAN so that we can enable VMAP_STACK by default.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230203075232.274282-7-alexghiti@rivosinc.com

Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

864046c5

riscv: Fix ptdump when KASAN is enabled · ecd7ebaf

Alexandre Ghiti authored 2 years ago

The KASAN shadow region was moved next to the kernel mapping but the
ptdump code was not updated and it appears to break the dump of the kernel
page table, so fix this by moving the KASAN shadow region in ptdump.

Fixes: f7ae0233 ("riscv: Move KASAN mapping next to the kernel mapping")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230203075232.274282-6-alexghiti@rivosinc.com

Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

ecd7ebaf

riscv: Fix EFI stub usage of KASAN instrumented strcmp function · 617955ca

Alexandre Ghiti authored 2 years ago


The EFI stub must not use any KASAN instrumented code as the kernel
proper did not initialize the thread pointer and the mapping for the
KASAN shadow region.

Avoid using the generic strcmp function, instead use the one in
drivers/firmware/efi/libstub/string.c.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20230203075232.274282-5-alexghiti@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

617955ca

riscv: Move DTB_EARLY_BASE_VA to the kernel address space · 401e8448

Alexandre Ghiti authored 2 years ago


The early virtual address should lie in the kernel address space for
inline kasan instrumentation to succeed, otherwise kasan tries to
dereference an address that does not exist in the address space (since
kasan only maps *kernel* address space, not the userspace).

Simply use the very first address of the kernel address space for the
early fdt mapping.

It allowed an Ubuntu kernel to boot successfully with inline
instrumentation.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230203075232.274282-4-alexghiti@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

401e8448

riscv: Rework kasan population functions · 96f9d4da

Alexandre Ghiti authored 2 years ago


Our previous kasan population implementation used to have the final kasan
shadow region mapped with kasan_early_shadow_page, because we did not clean
the early mapping and then we had to populate the kasan region "in-place"
which made the code cumbersome.

So now we clear the early mapping, establish a temporary mapping while we
populate the kasan shadow region with just the kernel regions that will
be used.

This new version uses the "generic" way of going through a page table
that may be folded at runtime (avoid the XXX_next macros).

It was tested with outline instrumentation on an Ubuntu kernel
configuration successfully.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230203075232.274282-3-alexghiti@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

96f9d4da

riscv: Split early and final KASAN population functions · cd0334e1

Alexandre Ghiti authored 2 years ago


This is a preliminary work that allows to make the code more
understandable.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230203075232.274282-2-alexghiti@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

cd0334e1

Merge patch series "riscv: Use PUD/P4D/PGD pages for the linear mapping" · 2e75ab31

Palmer Dabbelt authored 1 year ago

Alexandre Ghiti <alexghiti@rivosinc.com> says:

This patchset intends to improve tlb utilization by using hugepages for
the linear mapping.

As reported by Anup in v6, when STRICT_KERNEL_RWX is enabled, we must
take care of isolating the kernel text and rodata so that they are not
mapped with a PUD mapping which would then assign wrong permissions to
the whole region: it is achieved the same way as arm64 by using the
memblock nomap API which isolates those regions and re-merge them afterwards
thus avoiding any issue with the system resources tree creation.

arch/riscv/include/asm/page.h |  19 ++++++-
 arch/riscv/mm/init.c          | 102 ++++++++++++++++++++++++++--------
 arch/riscv/mm/physaddr.c      |  16 ++++++
 drivers/of/fdt.c              |  11 ++--
 4 files changed, 118 insertions(+), 30 deletions(-)

* b4-shazam-merge:
  riscv: Use PUD/P4D/PGD pages for the linear mapping
  riscv: Move the linear mapping creation in its own function
  riscv: Get rid of riscv_pfn_base variable

Link: https://lore.kernel.org/r/20230324155421.271544-1-alexghiti@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

2e75ab31

riscv: Use PUD/P4D/PGD pages for the linear mapping · 3335068f

Alexandre Ghiti authored 2 years ago


During the early page table creation, we used to set the mapping for
PAGE_OFFSET to the kernel load address: but the kernel load address is
always offseted by PMD_SIZE which makes it impossible to use PUD/P4D/PGD
pages as this physical address is not aligned on PUD/P4D/PGD size (whereas
PAGE_OFFSET is).

But actually we don't have to establish this mapping (ie set va_pa_offset)
that early in the boot process because:

- first, setup_vm installs a temporary kernel mapping and among other
  things, discovers the system memory,
- then, setup_vm_final creates the final kernel mapping and takes
  advantage of the discovered system memory to create the linear
  mapping.

During the first phase, we don't know the start of the system memory and
then until the second phase is finished, we can't use the linear mapping at
all and phys_to_virt/virt_to_phys translations must not be used because it
would result in a different translation from the 'real' one once the final
mapping is installed.

So here we simply delay the initialization of va_pa_offset to after the
system memory discovery. But to make sure noone uses the linear mapping
before, we add some guard in the DEBUG_VIRTUAL config.

Finally we can use PUD/P4D/PGD hugepages when possible, which will result
in a better TLB utilization.

Note that:
- this does not apply to rv32 as the kernel mapping lies in the linear
  mapping.
- we rely on the firmware to protect itself using PMP.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Acked-by: Rob Herring <robh@kernel.org> # DT bits
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230324155421.271544-4-alexghiti@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

3335068f

riscv: Move the linear mapping creation in its own function · 8589e346

Alexandre Ghiti authored 2 years ago


No change intended, it just splits the linear mapping creation from
setup_vm_final: this prepares for upcoming additions to the linear
mapping creation.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230324155421.271544-3-alexghiti@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

8589e346

riscv: Get rid of riscv_pfn_base variable · a7407a13

Alexandre Ghiti authored 2 years ago


Use directly phys_ram_base instead, riscv_pfn_base is just the pfn of
the address contained in phys_ram_base.

Even if there is no functional change intended in this patch, actually
setting phys_ram_base that early changes the behaviour of
kernel_mapping_pa_to_va during the early boot: phys_ram_base used to be
zero before this patch and now it is set to the physical start address of
the kernel. But it does not break the conversion of a kernel physical
address into a virtual address since kernel_mapping_pa_to_va should only
be used on kernel physical addresses, i.e. addresses greater than the
physical start address of the kernel.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Anup Patel <anup@brainfault.org>
Link: https://lore.kernel.org/r/20230324155421.271544-2-alexghiti@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

a7407a13

RISC-V: align ISA extension Kconfig help text with each other · 5464912c

Conor Dooley authored 1 year ago


Other extensions only capitalise the first letter in the text visible
in Kconfig menus, and provide a short comment about the extension's
meaning. Do the same for Svnapot & Svpbmt.

The precedent for capitalisation in the Kconfig text was set by Zicbom
& sorta followed for Zicboz. The RVI styling used for multi-letter
extensions only capitalises the first letter, so do the same here.
If nothing else, my OCD likes it when the extensions follow a consistent
pattern.

While editing one of the lines, reformat the "spelling" of 64-bit.

Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230405-pucker-cogwheel-3a999a94a2f2@wendy


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

5464912c

riscv: Kconfig: enable SCHED_MC kconfig · 8bf7b3b6

Song Shuai authored 2 years ago


RISC-V now builds the sched domain based on the simple possible map.

Enable SCHED_MC to make the building based on cpu_coregroup_mask()
which also takes care of the NUMA and cores with LLC.

Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230310110336.970985-1-suagrfillet@gmail.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

8bf7b3b6

riscv: export cpu/freq invariant to scheduler · c4b52d8b

Song Shuai authored 2 years ago


RISC-V now manages CPU topology using arch_topology which provides
CPU capacity and frequency related interfaces to access the cpu/freq
invariant in possible heterogeneous or DVFS-enabled platforms.

Here adds topology.h file to export the arch_topology interfaces for
replacing the scheduler's constant-based cpu/freq invariant accounting.

Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Ley Foon Tan <lftan@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230323123924.3032174-1-suagrfillet@gmail.com


[Palmer: Fix the whitespace issues.]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

c4b52d8b

Merge patch series "RISC-V Hardware Probing User Interface" · eb04e72b

Palmer Dabbelt authored 1 year ago

Evan Green <evan@rivosinc.com> says:

There's been a bunch of off-list discussions about this, including at
Plumbers.  The original plan was to do something involving providing an
ISA string to userspace, but ISA strings just aren't sufficient for a
stable ABI any more: in order to parse an ISA string users need the
version of the specifications that the string is written to, the version
of each extension (sometimes at a finer granularity than the RISC-V
releases/versions encode), and the expected use case for the ISA string
(ie, is it a U-mode or M-mode string).  That's a lot of complexity to
try and keep ABI compatible and it's probably going to continue to grow,
as even if there's no more complexity in the specifications we'll have
to deal with the various ISA string parsing oddities that end up all
over userspace.

Instead this patch set takes a very different approach and provides a set
of key/value pairs that encode various bits about the system.  The big
advantage here is that we can clearly define what these mean so we can
ensure ABI stability, but it also allows us to encode information that's
unlikely to ever appear in an ISA string (see the misaligned access
performance, for example).  The resulting interface looks a lot like
what arm64 and x86 do, and will hopefully fit well into something like
ACPI in the future.

The actual user interface is a syscall, with a vDSO function in front of
it. The vDSO function can answer some queries without a syscall at all,
and falls back to the syscall for cases it doesn't have answers to.
Currently we prepopulate it with an array of answers for all keys and
a CPU set of "all CPUs". This can be adjusted as necessary to provide
fast answers to the most common queries.

An example series in glibc exposing this syscall and using it in an
ifunc selector for memcpy can be found at [1].

I was asked about the performance delta between this and something like
sysfs. I created a small test program and ran it on a Nezha D1
Allwinner board. Doing each operation 100000 times and dividing, these
operations take the following amount of time:
 - open()+read()+close() of /sys/kernel/cpu_byteorder: 3.8us
 - access("/sys/kernel/cpu_byteorder", R_OK): 1.3us
 - riscv_hwprobe() vDSO and syscall: .0094us
 - riscv_hwprobe() vDSO with no syscall: 0.0091us

These numbers get farther apart if we query multiple keys, as sysfs will
scale linearly with the number of keys, where the dedicated syscall
stays the same. To frame these numbers, I also did a tight
fork/exec/wait loop, which I measured as 4.8ms. So doing 4
open/read/close operations is a delta of about 0.3%, versus a single vDSO
call is a delta of essentially zero.

[1] https://patchwork.ozlabs.org/project/glibc/list/?series=343050

* b4-shazam-merge:
  RISC-V: Add hwprobe vDSO function and data
  selftests: Test the new RISC-V hwprobe interface
  RISC-V: hwprobe: Support probing of misaligned access performance
  RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA
  RISC-V: Add a syscall for HW probing
  RISC-V: Move struct riscv_cpuinfo to new header

Link: https://lore.kernel.org/r/20230407231103.2622178-1-evan@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

eb04e72b

RISC-V: Add hwprobe vDSO function and data · aa5af0aa

Evan Green authored 1 year ago

Add a vDSO function __vdso_riscv_hwprobe, which can sit in front of the
riscv_hwprobe syscall and answer common queries. We stash a copy of
static answers for the "all CPUs" case in the vDSO data page. This data
is private to the vDSO, so we can decide later to change what's stored
there or under what conditions we defer to the syscall. Currently all
data can be discovered at boot, so the vDSO function answers all queries
when the cpumask is set to the "all CPUs" hint.

There's also a boolean in the data that lets the vDSO function know that
all CPUs are the same. In that case, the vDSO will also answer queries
for arbitrary CPU masks in addition to the "all CPUs" hint.

Signed-off-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-7-evan@rivosinc.com

Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

aa5af0aa

selftests: Test the new RISC-V hwprobe interface · 287dcc2b

Evan Green authored 1 year ago


This adds a test for the recently added RISC-V interface for probing
hardware capabilities.  It happens to be the first selftest we have for
RISC-V, so I've added some infrastructure for those as well.

Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-6-evan@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

287dcc2b

RISC-V: hwprobe: Support probing of misaligned access performance · 62a31d6e

Evan Green authored 1 year ago

This allows userspace to select various routines to use based on the
performance of misaligned access on the target hardware.

Rather than adding DT bindings, this change taps into the alternatives
mechanism used to probe CPU errata. Add a new function pointer alongside
the vendor-specific errata_patch_func() that probes for desirable errata
(otherwise known as "features"). Unlike the errata_patch_func(), this
function is called on each CPU as it comes up, so it can save
feature information per-CPU.

The T-head C906 has fast unaligned access, both as defined by GCC [1],
and in performing a basic benchmark, which determined that byte copies
are >50% slower than a misaligned word copy of the same data size (source
for this test at [2]):

bytecopy size f000 count 50000 offset 0 took 31664899 us
wordcopy size f000 count 50000 offset 0 took 5180919 us
wordcopy size f000 count 50000 offset 1 took 13416949 us

[1] https://github.com/gcc-mirror/gcc/blob/master/gcc/config/riscv/riscv.cc#L353
[2] https://pastebin.com/EPXvDHSW



Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-5-evan@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

62a31d6e

RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA · 00e76e2c

Evan Green authored 1 year ago


We have an implicit set of base behaviors that userspace depends on,
which are mostly defined in various ISA specifications.

Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-4-evan@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

00e76e2c

RISC-V: Add a syscall for HW probing · ea3de9ce

Evan Green authored 1 year ago


We don't have enough space for these all in ELF_HWCAP{,2} and there's no
system call that quite does this, so let's just provide an arch-specific
one to probe for hardware capabilities.  This currently just provides
m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
the future.

Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-3-evan@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

ea3de9ce

RISC-V: Move struct riscv_cpuinfo to new header · ff77cf5b

Evan Green authored 1 year ago


In preparation for tracking and exposing microarchitectural details to
userspace (like whether or not unaligned accesses are fast), move the
riscv_cpuinfo struct out to its own new cpufeatures.h header. It will
need to be used by more than just cpu.c.

Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-2-evan@rivosinc.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

ff77cf5b

Apr 12, 2023

Revert "riscv: Set more data to cacheinfo" · 6a249151

Song Shuai authored 2 years ago


This reverts commit baf7cbd9.

There are some duplicate cache attributes populations executed
in both ci_leaf_init() and later cache_setup_properties().

Revert the commit baf7cbd9 ("riscv: Set more data to cacheinfo")
to setup only the level and type attributes at this early place.

Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230308064734.512457-1-suagrfillet@gmail.com


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

6a249151

Apr 11, 2023

riscv: entry: Save a0 prior syscall_enter_from_user_mode() · 9c2598d4

Björn Töpel authored 2 years ago


The RISC-V calling convention passes the first argument, and the
return value in the a0 register. For this reason, the a0 register
needs some extra care; When handling syscalls, the a0 register is
saved into regs->orig_a0, so a0 can be properly restored for,
e.g. interrupted syscalls.

This functionality was broken with the introduction of the generic
entry patches. Here, a0 was saved into orig_a0 after calling
syscall_enter_from_user_mode(), which can change regs->a0 for some
paths, incorrectly restoring a0.

This is resolved, by saving a0 prior doing the
syscall_enter_from_user_mode() call.

Fixes: f0bddf50 ("riscv: entry: Convert to generic entry")
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reported-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20230403065207.1070974-1-bjorn@kernel.org


Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

9c2598d4