Skip to content
Snippets Groups Projects
  1. Feb 17, 2023
  2. Feb 16, 2023
    • Alexander Lobakin's avatar
      bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES · 6c20822f
      Alexander Lobakin authored
      
      &xdp_buff and &xdp_frame are bound in a way that
      
      xdp_buff->data_hard_start == xdp_frame
      
      It's always the case and e.g. xdp_convert_buff_to_frame() relies on
      this.
      IOW, the following:
      
      	for (u32 i = 0; i < 0xdead; i++) {
      		xdpf = xdp_convert_buff_to_frame(&xdp);
      		xdp_convert_frame_to_buff(xdpf, &xdp);
      	}
      
      shouldn't ever modify @xdpf's contents or the pointer itself.
      However, "live packet" code wrongly treats &xdp_frame as part of its
      context placed *before* the data_hard_start. With such flow,
      data_hard_start is sizeof(*xdpf) off to the right and no longer points
      to the XDP frame.
      
      Instead of replacing `sizeof(ctx)` with `offsetof(ctx, xdpf)` in several
      places and praying that there are no more miscalcs left somewhere in the
      code, unionize ::frm with ::data in a flex array, so that both starts
      pointing to the actual data_hard_start and the XDP frame actually starts
      being a part of it, i.e. a part of the headroom, not the context.
      A nice side effect is that the maximum frame size for this mode gets
      increased by 40 bytes, as xdp_buff::frame_sz includes everything from
      data_hard_start (-> includes xdpf already) to the end of XDP/skb shared
      info.
      Also update %MAX_PKT_SIZE accordingly in the selftests code. Leave it
      hardcoded for 64 bit && 4k pages, it can be made more flexible later on.
      
      Minor: align `&head->data` with how `head->frm` is assigned for
      consistency.
      Minor #2: rename 'frm' to 'frame' in &xdp_page_head while at it for
      clarity.
      
      (was found while testing XDP traffic generator on ice, which calls
       xdp_convert_frame_to_buff() for each XDP frame)
      
      Fixes: b530e9e1 ("bpf: Add "live packet" mode for XDP in BPF_PROG_RUN")
      Acked-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: default avatarAlexander Lobakin <aleksander.lobakin@intel.com>
      Link: https://lore.kernel.org/r/20230215185440.4126672-1-aleksander.lobakin@intel.com
      
      
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      6c20822f
    • Andrii Nakryiko's avatar
      Merge branch 'New benchmark for hashmap lookups' · d964f09a
      Andrii Nakryiko authored
      Anton Protopopov says:
      
      ====================
      
      Add a new benchmark for hashmap lookups and fix several typos.
      
      In commit 3 I've patched the bench utility so that now command line options
      can be reused by different benchmarks.
      
      The benchmark itself is added in the last commit 7. I was using this benchmark
      to test map lookup productivity when using a different hash function [1]. When
      run with --quiet, the results can be easily plotted [2].  The results provided
      by the benchmark look reasonable and match the results of my different
      benchmarks (requiring to patch kernel to get actual statistics on map lookups).
      
      Links:
        [1] https://fosdem.org/2023/schedule/event/bpf_hashing/
        [2] https://github.com/aspsk/bpf-bench/tree/master/hashmap-bench
      
      
      
      Changes,
      v1->v2:
      - percpu_times_index[] is of wrong size (Martin)
      - use base 0 for strtol (Andrii)
      - just use -q without argument (Andrii)
      - use less hacks when parsing arguments (Andrii)
      ====================
      
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      d964f09a
    • Anton Protopopov's avatar
      selftest/bpf/benchs: Add benchmark for hashmap lookups · f371f2dc
      Anton Protopopov authored
      
      Add a new benchmark which measures hashmap lookup operations speed.  A user can
      control the following parameters of the benchmark:
      
          * key_size (max 1024): the key size to use
          * max_entries: the hashmap max entries
          * nr_entries: the number of entries to insert/lookup
          * nr_loops: the number of loops for the benchmark
          * map_flags The hashmap flags passed to BPF_MAP_CREATE
      
      The BPF program performing the benchmarks calls two nested bpf_loop:
      
          bpf_loop(nr_loops/nr_entries)
                  bpf_loop(nr_entries)
                           bpf_map_lookup()
      
      So the nr_loops determines the number of actual map lookups. All lookups are
      successful.
      
      Example (the output is generated on a AMD Ryzen 9 3950X machine):
      
          for nr_entries in `seq 4096 4096 65536`; do echo -n "$((nr_entries*100/65536))% full: "; sudo ./bench -d2 -a bpf-hashmap-lookup --key_size=4 --nr_entries=$nr_entries --max_entries=65536 --nr_loops=1000000 --map_flags=0x40 | grep cpu; done
          6% full: cpu01: lookup 50.739M ± 0.018M events/sec (approximated from 32 samples of ~19ms)
          12% full: cpu01: lookup 47.751M ± 0.015M events/sec (approximated from 32 samples of ~20ms)
          18% full: cpu01: lookup 45.153M ± 0.013M events/sec (approximated from 32 samples of ~22ms)
          25% full: cpu01: lookup 43.826M ± 0.014M events/sec (approximated from 32 samples of ~22ms)
          31% full: cpu01: lookup 41.971M ± 0.012M events/sec (approximated from 32 samples of ~23ms)
          37% full: cpu01: lookup 41.034M ± 0.015M events/sec (approximated from 32 samples of ~24ms)
          43% full: cpu01: lookup 39.946M ± 0.012M events/sec (approximated from 32 samples of ~25ms)
          50% full: cpu01: lookup 38.256M ± 0.014M events/sec (approximated from 32 samples of ~26ms)
          56% full: cpu01: lookup 36.580M ± 0.018M events/sec (approximated from 32 samples of ~27ms)
          62% full: cpu01: lookup 36.252M ± 0.012M events/sec (approximated from 32 samples of ~27ms)
          68% full: cpu01: lookup 35.200M ± 0.012M events/sec (approximated from 32 samples of ~28ms)
          75% full: cpu01: lookup 34.061M ± 0.009M events/sec (approximated from 32 samples of ~29ms)
          81% full: cpu01: lookup 34.374M ± 0.010M events/sec (approximated from 32 samples of ~29ms)
          87% full: cpu01: lookup 33.244M ± 0.011M events/sec (approximated from 32 samples of ~30ms)
          93% full: cpu01: lookup 32.182M ± 0.013M events/sec (approximated from 32 samples of ~31ms)
          100% full: cpu01: lookup 31.497M ± 0.016M events/sec (approximated from 32 samples of ~31ms)
      
      Signed-off-by: default avatarAnton Protopopov <aspsk@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20230213091519.1202813-8-aspsk@isovalent.com
      f371f2dc
    • Anton Protopopov's avatar
      selftest/bpf/benchs: Print less if the quiet option is set · a237dda0
      Anton Protopopov authored
      
      The bench utility will print
      
          Setting up benchmark '<bench-name>'...
          Benchmark '<bench-name>' started.
      
      on startup to stdout. Suppress this output if --quiet option if given. This
      makes it simpler to parse benchmark output by a script.
      
      Signed-off-by: default avatarAnton Protopopov <aspsk@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20230213091519.1202813-7-aspsk@isovalent.com
      a237dda0
    • Anton Protopopov's avatar
      selftest/bpf/benchs: Make quiet option common · 90c22503
      Anton Protopopov authored
      
      The "local-storage-tasks-trace" benchmark has a `--quiet` option. Move it to
      the list of common options, so that the main code and other benchmarks can use
      (new) env.quiet variable. Patch the run_bench_local_storage_rcu_tasks_trace.sh
      helper script accordingly.
      
      Signed-off-by: default avatarAnton Protopopov <aspsk@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20230213091519.1202813-6-aspsk@isovalent.com
      90c22503
    • Anton Protopopov's avatar
      selftest/bpf/benchs: Remove an unused header · 96445462
      Anton Protopopov authored
      
      The benchs/bench_bpf_hashmap_full_update.c doesn't set a custom argp,
      so it shouldn't include the <argp.h> header.
      
      Signed-off-by: default avatarAnton Protopopov <aspsk@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20230213091519.1202813-5-aspsk@isovalent.com
      96445462
    • Anton Protopopov's avatar
      selftest/bpf/benchs: Enhance argp parsing · 22ff7aea
      Anton Protopopov authored
      
      To parse command line the bench utility uses the argp_parse() function. This
      function takes as an argument a parent 'struct argp' structure which defines
      common command line options and an array of children 'struct argp' structures
      which defines additional command line options for particular benchmarks. This
      implementation doesn't allow benchmarks to share option names, e.g., if two
      benchmarks want to use, say, the --option option, then only one of them will
      succeed (the first one encountered in the array).  This will be convenient if
      same option names could be used in different benchmarks (with the same
      semantics, e.g., --nr_loops=N).
      
      Fix this by calling the argp_parse() function twice. The first call is the same
      as it was before, with all children argps, and helps to find the benchmark name
      and to print a combined help message if anything is wrong.  Given the name, we
      can call the argp_parse the second time, but now the children array points only
      to a correct benchmark thus always calling the correct parsers. (If there's no
      a specific list of arguments, then only one call to argp_parse will be done.)
      
      Signed-off-by: default avatarAnton Protopopov <aspsk@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20230213091519.1202813-4-aspsk@isovalent.com
      22ff7aea
    • Anton Protopopov's avatar
      selftest/bpf/benchs: Make a function static in bpf_hashmap_full_update · 2f1c5963
      Anton Protopopov authored
      
      The hashmap_report_final callback function defined in the
      benchs/bench_bpf_hashmap_full_update.c file should be static.
      
      Signed-off-by: default avatarAnton Protopopov <aspsk@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20230213091519.1202813-3-aspsk@isovalent.com
      2f1c5963
    • Anton Protopopov's avatar
      selftest/bpf/benchs: Fix a typo in bpf_hashmap_full_update · 4db98ab4
      Anton Protopopov authored
      
      To call the bpf_hashmap_full_update benchmark, one should say:
      
          bench bpf-hashmap-ful-update
      
      The patch adds a missing 'l' to the benchmark name.
      
      Signed-off-by: default avatarAnton Protopopov <aspsk@isovalent.com>
      Signed-off-by: default avatarAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20230213091519.1202813-2-aspsk@isovalent.com
      4db98ab4
    • Alexei Starovoitov's avatar
      Merge branch 'Use __GFP_ZERO in bpf memory allocator' · 3538a0fb
      Alexei Starovoitov authored
      Hou Tao says:
      
      ====================
      
      From: Hou Tao <houtao1@huawei.com>
      
      Hi,
      
      The patchset tries to fix the hard-up problem found when checking how htab
      handles element reuse in bpf memory allocator. The immediate reuse of
      freed elements will reinitialize special fields (e.g., bpf_spin_lock) in
      htab map value and it may corrupt lookup procedure with BFP_F_LOCK flag
      which acquires bpf-spin-lock during value copying, and lead to hard-lock
      as shown in patch #2. Patch #1 fixes it by using __GFP_ZERO when allocating
      the object from slab and the behavior is similar with the preallocated
      hash-table case. Please see individual patches for more details. And comments
      are always welcome.
      
      Regards,
      
      Change Log:
      v1:
        * Use __GFP_ZERO instead of ctor to avoid retpoline overhead (from Alexei)
        * Add comments for check_and_init_map_value() (from Alexei)
        * split __GFP_ZERO patches out of the original patchset to unblock
          the development work of others.
      
      RFC: https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@huaweicloud.com
      
      
      ====================
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      3538a0fb
    • Hou Tao's avatar
      selftests/bpf: Add test case for element reuse in htab map · f88da2d4
      Hou Tao authored
      
      The reinitialization of spin-lock in map value after immediate reuse may
      corrupt lookup with BPF_F_LOCK flag and result in hard lock-up, so add
      one test case to demonstrate the problem.
      
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Link: https://lore.kernel.org/r/20230215082132.3856544-3-houtao@huaweicloud.com
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      f88da2d4
    • Hou Tao's avatar
      bpf: Zeroing allocated object from slab in bpf memory allocator · 997849c4
      Hou Tao authored
      
      Currently the freed element in bpf memory allocator may be immediately
      reused, for htab map the reuse will reinitialize special fields in map
      value (e.g., bpf_spin_lock), but lookup procedure may still access
      these special fields, and it may lead to hard-lockup as shown below:
      
       NMI backtrace for cpu 16
       CPU: 16 PID: 2574 Comm: htab.bin Tainted: G             L     6.1.0+ #1
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
       RIP: 0010:queued_spin_lock_slowpath+0x283/0x2c0
       ......
       Call Trace:
        <TASK>
        copy_map_value_locked+0xb7/0x170
        bpf_map_copy_value+0x113/0x3c0
        __sys_bpf+0x1c67/0x2780
        __x64_sys_bpf+0x1c/0x20
        do_syscall_64+0x30/0x60
        entry_SYSCALL_64_after_hwframe+0x46/0xb0
       ......
        </TASK>
      
      For htab map, just like the preallocated case, these is no need to
      initialize these special fields in map value again once these fields
      have been initialized. For preallocated htab map, these fields are
      initialized through __GFP_ZERO in bpf_map_area_alloc(), so do the
      similar thing for non-preallocated htab in bpf memory allocator. And
      there is no need to use __GFP_ZERO for per-cpu bpf memory allocator,
      because __alloc_percpu_gfp() does it implicitly.
      
      Fixes: 0fd7c5d4 ("bpf: Optimize call_rcu in non-preallocated hash map.")
      Signed-off-by: default avatarHou Tao <houtao1@huawei.com>
      Link: https://lore.kernel.org/r/20230215082132.3856544-2-houtao@huaweicloud.com
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      997849c4
  3. Feb 15, 2023
Loading