Skip to content
Snippets Groups Projects
  1. Feb 22, 2023
    • Florian Westphal's avatar
      netfilter: ebtables: fix table blob use-after-free · e58a171d
      Florian Westphal authored
      
      We are not allowed to return an error at this point.
      Looking at the code it looks like ret is always 0 at this
      point, but its not.
      
      t = find_table_lock(net, repl->name, &ret, &ebt_mutex);
      
      ... this can return a valid table, with ret != 0.
      
      This bug causes update of table->private with the new
      blob, but then frees the blob right away in the caller.
      
      Syzbot report:
      
      BUG: KASAN: vmalloc-out-of-bounds in __ebt_unregister_table+0xc00/0xcd0 net/bridge/netfilter/ebtables.c:1168
      Read of size 4 at addr ffffc90005425000 by task kworker/u4:4/74
      Workqueue: netns cleanup_net
      Call Trace:
       kasan_report+0xbf/0x1f0 mm/kasan/report.c:517
       __ebt_unregister_table+0xc00/0xcd0 net/bridge/netfilter/ebtables.c:1168
       ebt_unregister_table+0x35/0x40 net/bridge/netfilter/ebtables.c:1372
       ops_exit_list+0xb0/0x170 net/core/net_namespace.c:169
       cleanup_net+0x4ee/0xb10 net/core/net_namespace.c:613
      ...
      
      ip(6)tables appears to be ok (ret should be 0 at this point) but make
      this more obvious.
      
      Fixes: c58dd2dd ("netfilter: Can't fail and free after table replacement")
      Reported-by: default avatar <syzbot+f61594de72d6705aea03@syzkaller.appspotmail.com>
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e58a171d
    • Phil Sutter's avatar
      netfilter: ip6t_rpfilter: Fix regression with VRF interfaces · efb056e5
      Phil Sutter authored
      
      When calling ip6_route_lookup() for the packet arriving on the VRF
      interface, the result is always the real (slave) interface. Expect this
      when validating the result.
      
      Fixes: acc641ab ("netfilter: rpfilter/fib: Populate flowic_l3mdev field")
      Signed-off-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      efb056e5
    • Florian Westphal's avatar
      netfilter: conntrack: fix rmmod double-free race · e6d57e9f
      Florian Westphal authored
      
      nf_conntrack_hash_check_insert() callers free the ct entry directly, via
      nf_conntrack_free.
      
      This isn't safe anymore because
      nf_conntrack_hash_check_insert() might place the entry into the conntrack
      table and then delteted the entry again because it found that a conntrack
      extension has been removed at the same time.
      
      In this case, the just-added entry is removed again and an error is
      returned to the caller.
      
      Problem is that another cpu might have picked up this entry and
      incremented its reference count.
      
      This results in a use-after-free/double-free, once by the other cpu and
      once by the caller of nf_conntrack_hash_check_insert().
      
      Fix this by making nf_conntrack_hash_check_insert() not fail anymore
      after the insertion, just like before the 'Fixes' commit.
      
      This is safe because a racing nf_ct_iterate() has to wait for us
      to release the conntrack hash spinlocks.
      
      While at it, make the function return -EAGAIN in the rmmod (genid
      changed) case, this makes nfnetlink replay the command (suggested
      by Pablo Neira).
      
      Fixes: c56716c6 ("netfilter: extensions: introduce extension genid count")
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      e6d57e9f
    • Hangyu Hua's avatar
      netfilter: ctnetlink: fix possible refcount leak in ctnetlink_create_conntrack() · ac489398
      Hangyu Hua authored
      
      nf_ct_put() needs to be called to put the refcount got by
      nf_conntrack_find_get() to avoid refcount leak when
      nf_conntrack_hash_check_insert() fails.
      
      Fixes: 7d367e06 ("netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2)")
      Signed-off-by: default avatarHangyu Hua <hbh25y@gmail.com>
      Acked-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      ac489398
  2. Feb 09, 2023
  3. Feb 06, 2023
    • Julian Anastasov's avatar
      neigh: make sure used and confirmed times are valid · c1d2ecdf
      Julian Anastasov authored
      
      Entries can linger in cache without timer for days, thanks to
      the gc_thresh1 limit. As result, without traffic, the confirmed
      time can be outdated and to appear to be in the future. Later,
      on traffic, NUD_STALE entries can switch to NUD_DELAY and start
      the timer which can see the invalid confirmed time and wrongly
      switch to NUD_REACHABLE state instead of NUD_PROBE. As result,
      timer is set many days in the future. This is more visible on
      32-bit platforms, with higher HZ value.
      
      Why this is a problem? While we expect unused entries to expire,
      such entries stay in REACHABLE state for too long, locked in
      cache. They are not expired normally, only when cache is full.
      
      Problem and the wrong state change reported by Zhang Changzhong:
      
      172.16.1.18 dev bond0 lladdr 0a:0e:0f:01:12:01 ref 1 used 350521/15994171/350520 probes 4 REACHABLE
      
      350520 seconds have elapsed since this entry was last updated, but it is
      still in the REACHABLE state (base_reachable_time_ms is 30000),
      preventing lladdr from being updated through probe.
      
      Fix it by ensuring timer is started with valid used/confirmed
      times. Considering the valid time range is LONG_MAX jiffies,
      we try not to go too much in the past while we are in
      DELAY/PROBE state. There are also places that need
      used/updated times to be validated while timer is not running.
      
      Reported-by: default avatarZhang Changzhong <zhangchangzhong@huawei.com>
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Tested-by: default avatarZhang Changzhong <zhangchangzhong@huawei.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c1d2ecdf
  4. Feb 02, 2023
    • Fedor Pchelkin's avatar
      net: openvswitch: fix flow memory leak in ovs_flow_cmd_new · 0c598aed
      Fedor Pchelkin authored
      
      Syzkaller reports a memory leak of new_flow in ovs_flow_cmd_new() as it is
      not freed when an allocation of a key fails.
      
      BUG: memory leak
      unreferenced object 0xffff888116668000 (size 632):
        comm "syz-executor231", pid 1090, jiffies 4294844701 (age 18.871s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000defa3494>] kmem_cache_zalloc include/linux/slab.h:654 [inline]
          [<00000000defa3494>] ovs_flow_alloc+0x19/0x180 net/openvswitch/flow_table.c:77
          [<00000000c67d8873>] ovs_flow_cmd_new+0x1de/0xd40 net/openvswitch/datapath.c:957
          [<0000000010a539a8>] genl_family_rcv_msg_doit+0x22d/0x330 net/netlink/genetlink.c:739
          [<00000000dff3302d>] genl_family_rcv_msg net/netlink/genetlink.c:783 [inline]
          [<00000000dff3302d>] genl_rcv_msg+0x328/0x590 net/netlink/genetlink.c:800
          [<000000000286dd87>] netlink_rcv_skb+0x153/0x430 net/netlink/af_netlink.c:2515
          [<0000000061fed410>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:811
          [<000000009dc0f111>] netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
          [<000000009dc0f111>] netlink_unicast+0x545/0x7f0 net/netlink/af_netlink.c:1339
          [<000000004a5ee816>] netlink_sendmsg+0x8e7/0xde0 net/netlink/af_netlink.c:1934
          [<00000000482b476f>] sock_sendmsg_nosec net/socket.c:651 [inline]
          [<00000000482b476f>] sock_sendmsg+0x152/0x190 net/socket.c:671
          [<00000000698574ba>] ____sys_sendmsg+0x70a/0x870 net/socket.c:2356
          [<00000000d28d9e11>] ___sys_sendmsg+0xf3/0x170 net/socket.c:2410
          [<0000000083ba9120>] __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
          [<00000000c00628f8>] do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46
          [<000000004abfdcf4>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
      
      To fix this the patch rearranges the goto labels to reflect the order of
      object allocations and adds appropriate goto statements on the error
      paths.
      
      Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
      
      Fixes: 68bb1010 ("openvswitch: Fix flow lookup to use unmasked key")
      Signed-off-by: default avatarFedor Pchelkin <pchelkin@ispras.ru>
      Signed-off-by: default avatarAlexey Khoroshilov <khoroshilov@ispras.ru>
      Acked-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230201210218.361970-1-pchelkin@ispras.ru
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      0c598aed
    • Oliver Hartkopp's avatar
      can: isotp: split tx timer into transmission and timeout · 4f027cba
      Oliver Hartkopp authored
      
      The timer for the transmission of isotp PDUs formerly had two functions:
      1. send two consecutive frames with a given time gap
      2. monitor the timeouts for flow control frames and the echo frames
      
      This led to larger txstate checks and potentially to a problem discovered
      by syzbot which enabled the panic_on_warn feature while testing.
      
      The former 'txtimer' function is split into 'txfrtimer' and 'txtimer'
      to handle the two above functionalities with separate timer callbacks.
      
      The two simplified timers now run in one-shot mode and make the state
      transitions (especially with isotp_rcv_echo) better understandable.
      
      Fixes: 86633786 ("can: isotp: fix tx state handling for echo tx processing")
      Reported-by: default avatar <syzbot+5aed6c3aaba661f5b917@syzkaller.appspotmail.com>
      Cc: stable@vger.kernel.org # >= v6.0
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Link: https://lore.kernel.org/all/20230104145701.2422-1-socketcan@hartkopp.net
      
      
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      4f027cba
    • Oliver Hartkopp's avatar
      can: isotp: handle wait_event_interruptible() return values · 823b2e42
      Oliver Hartkopp authored
      
      When wait_event_interruptible() has been interrupted by a signal the
      tx.state value might not be ISOTP_IDLE. Force the state machines
      into idle state to inhibit the timer handlers to continue working.
      
      Fixes: 86633786 ("can: isotp: fix tx state handling for echo tx processing")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Link: https://lore.kernel.org/all/20230112192347.1944-1-socketcan@hartkopp.net
      
      
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      823b2e42
    • Oliver Hartkopp's avatar
      can: raw: fix CAN FD frame transmissions over CAN XL devices · 3793301c
      Oliver Hartkopp authored
      A CAN XL device is always capable to process CAN FD frames. The former
      check when sending CAN FD frames relied on the existence of a CAN FD
      device and did not check for a CAN XL device that would be correct
      too.
      
      With this patch the CAN FD feature is enabled automatically when CAN
      XL is switched on - and CAN FD cannot be switch off while CAN XL is
      enabled.
      
      This precondition also leads to a clean up and reduction of checks in
      the hot path in raw_rcv() and raw_sendmsg(). Some conditions are
      reordered to handle simple checks first.
      
      changes since v1: https://lore.kernel.org/all/20230131091012.50553-1-socketcan@hartkopp.net
      - fixed typo: devive -> device
      changes since v2: https://lore.kernel.org/all/20230131091824.51026-1-socketcan@hartkopp.net/
      
      
      - reorder checks in if statements to handle simple checks first
      
      Fixes: 62633269 ("can: raw: add CAN XL support")
      Signed-off-by: default avatarOliver Hartkopp <socketcan@hartkopp.net>
      Link: https://lore.kernel.org/all/20230131105613.55228-1-socketcan@hartkopp.net
      
      
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      3793301c
    • Ziyang Xuan's avatar
      can: j1939: fix errant WARN_ON_ONCE in j1939_session_deactivate · d0553680
      Ziyang Xuan authored
      
      The conclusion "j1939_session_deactivate() should be called with a
      session ref-count of at least 2" is incorrect. In some concurrent
      scenarios, j1939_session_deactivate can be called with the session
      ref-count less than 2. But there is not any problem because it
      will check the session active state before session putting in
      j1939_session_deactivate_locked().
      
      Here is the concurrent scenario of the problem reported by syzbot
      and my reproduction log.
      
              cpu0                            cpu1
                                      j1939_xtp_rx_eoma
      j1939_xtp_rx_abort_one
                                      j1939_session_get_by_addr [kref == 2]
      j1939_session_get_by_addr [kref == 3]
      j1939_session_deactivate [kref == 2]
      j1939_session_put [kref == 1]
      				j1939_session_completed
      				j1939_session_deactivate
      				WARN_ON_ONCE(kref < 2)
      
      =====================================================
      WARNING: CPU: 1 PID: 21 at net/can/j1939/transport.c:1088 j1939_session_deactivate+0x5f/0x70
      CPU: 1 PID: 21 Comm: ksoftirqd/1 Not tainted 5.14.0-rc7+ #32
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1 04/01/2014
      RIP: 0010:j1939_session_deactivate+0x5f/0x70
      Call Trace:
       j1939_session_deactivate_activate_next+0x11/0x28
       j1939_xtp_rx_eoma+0x12a/0x180
       j1939_tp_recv+0x4a2/0x510
       j1939_can_recv+0x226/0x380
       can_rcv_filter+0xf8/0x220
       can_receive+0x102/0x220
       ? process_backlog+0xf0/0x2c0
       can_rcv+0x53/0xf0
       __netif_receive_skb_one_core+0x67/0x90
       ? process_backlog+0x97/0x2c0
       __netif_receive_skb+0x22/0x80
      
      Fixes: 0c71437d ("can: j1939: j1939_session_deactivate(): clarify lifetime of session object")
      Reported-by: default avatar <syzbot+9981a614060dcee6eeca@syzkaller.appspotmail.com>
      Signed-off-by: default avatarZiyang Xuan <william.xuanziyang@huawei.com>
      Acked-by: default avatarOleksij Rempel <o.rempel@pengutronix.de>
      Link: https://lore.kernel.org/all/20210906094200.95868-1-william.xuanziyang@huawei.com
      
      
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      d0553680
    • Thomas Winter's avatar
      ip/ip6_gre: Fix non-point-to-point tunnel not generating IPv6 link local address · 30e2291f
      Thomas Winter authored
      
      We recently found that our non-point-to-point tunnels were not
      generating any IPv6 link local address and instead generating an
      IPv6 compat address, breaking IPv6 communication on the tunnel.
      
      Previously, addrconf_gre_config always would call addrconf_addr_gen
      and generate a EUI64 link local address for the tunnel.
      Then commit e5dd7294 changed the code path so that add_v4_addrs
      is called but this only generates a compat IPv6 address for
      non-point-to-point tunnels.
      
      I assume the compat address is specifically for SIT tunnels so
      have kept that only for SIT - GRE tunnels now always generate link
      local addresses.
      
      Fixes: e5dd7294 ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address")
      Signed-off-by: default avatarThomas Winter <Thomas.Winter@alliedtelesis.co.nz>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      30e2291f
    • Thomas Winter's avatar
      ip/ip6_gre: Fix changing addr gen mode not generating IPv6 link local address · 23ca0c2c
      Thomas Winter authored
      
      For our point-to-point GRE tunnels, they have IN6_ADDR_GEN_MODE_NONE
      when they are created then we set IN6_ADDR_GEN_MODE_EUI64 when they
      come up to generate the IPv6 link local address for the interface.
      Recently we found that they were no longer generating IPv6 addresses.
      This issue would also have affected SIT tunnels.
      
      Commit e5dd7294 changed the code path so that GRE tunnels
      generate an IPv6 address based on the tunnel source address.
      It also changed the code path so GRE tunnels don't call addrconf_addr_gen
      in addrconf_dev_config which is called by addrconf_sysctl_addr_gen_mode
      when the IN6_ADDR_GEN_MODE is changed.
      
      This patch aims to fix this issue by moving the code in addrconf_notify
      which calls the addr gen for GRE and SIT into a separate function
      and calling it in the places that expect the IPv6 address to be
      generated.
      
      The previous addrconf_dev_config is renamed to addrconf_eth_config
      since it only expected eth type interfaces and follows the
      addrconf_gre/sit_config format.
      
      A part of this changes means that the loopback address will be
      attempted to be configured when changing addr_gen_mode for lo.
      This should not be a problem because the address should exist anyway
      and if does already exist then no error is produced.
      
      Fixes: e5dd7294 ("ip/ip6_gre: use the same logic as SIT interfaces when computing v6LL address")
      Signed-off-by: default avatarThomas Winter <Thomas.Winter@alliedtelesis.co.nz>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      23ca0c2c
  5. Feb 01, 2023
    • Yan Zhai's avatar
      net: fix NULL pointer in skb_segment_list · 876e8ca8
      Yan Zhai authored
      
      Commit 3a1296a3 ("net: Support GRO/GSO fraglist chaining.")
      introduced UDP listifyed GRO. The segmentation relies on frag_list being
      untouched when passing through the network stack. This assumption can be
      broken sometimes, where frag_list itself gets pulled into linear area,
      leaving frag_list being NULL. When this happens it can trigger
      following NULL pointer dereference, and panic the kernel. Reverse the
      test condition should fix it.
      
      [19185.577801][    C1] BUG: kernel NULL pointer dereference, address:
      ...
      [19185.663775][    C1] RIP: 0010:skb_segment_list+0x1cc/0x390
      ...
      [19185.834644][    C1] Call Trace:
      [19185.841730][    C1]  <TASK>
      [19185.848563][    C1]  __udp_gso_segment+0x33e/0x510
      [19185.857370][    C1]  inet_gso_segment+0x15b/0x3e0
      [19185.866059][    C1]  skb_mac_gso_segment+0x97/0x110
      [19185.874939][    C1]  __skb_gso_segment+0xb2/0x160
      [19185.883646][    C1]  udp_queue_rcv_skb+0xc3/0x1d0
      [19185.892319][    C1]  udp_unicast_rcv_skb+0x75/0x90
      [19185.900979][    C1]  ip_protocol_deliver_rcu+0xd2/0x200
      [19185.910003][    C1]  ip_local_deliver_finish+0x44/0x60
      [19185.918757][    C1]  __netif_receive_skb_one_core+0x8b/0xa0
      [19185.927834][    C1]  process_backlog+0x88/0x130
      [19185.935840][    C1]  __napi_poll+0x27/0x150
      [19185.943447][    C1]  net_rx_action+0x27e/0x5f0
      [19185.951331][    C1]  ? mlx5_cq_tasklet_cb+0x70/0x160 [mlx5_core]
      [19185.960848][    C1]  __do_softirq+0xbc/0x25d
      [19185.968607][    C1]  irq_exit_rcu+0x83/0xb0
      [19185.976247][    C1]  common_interrupt+0x43/0xa0
      [19185.984235][    C1]  asm_common_interrupt+0x22/0x40
      ...
      [19186.094106][    C1]  </TASK>
      
      Fixes: 3a1296a3 ("net: Support GRO/GSO fraglist chaining.")
      Suggested-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarYan Zhai <yan@cloudflare.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/r/Y9gt5EUizK1UImEP@debian
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      876e8ca8
    • Xin Long's avatar
      sctp: do not check hb_timer.expires when resetting hb_timer · 8f35ae17
      Xin Long authored
      
      It tries to avoid the frequently hb_timer refresh in commit ba6f5e33
      ("sctp: avoid refreshing heartbeat timer too often"), and it only allows
      mod_timer when the new expires is after hb_timer.expires. It means even
      a much shorter interval for hb timer gets applied, it will have to wait
      until the current hb timer to time out.
      
      In sctp_do_8_2_transport_strike(), when a transport enters PF state, it
      expects to update the hb timer to resend a heartbeat every rto after
      calling sctp_transport_reset_hb_timer(), which will not work as the
      change mentioned above.
      
      The frequently hb_timer refresh was caused by sctp_transport_reset_timers()
      called in sctp_outq_flush() and it was already removed in the commit above.
      So we don't have to check hb_timer.expires when resetting hb_timer as it is
      now not called very often.
      
      Fixes: ba6f5e33 ("sctp: avoid refreshing heartbeat timer too often")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Link: https://lore.kernel.org/r/d958c06985713ec84049a2d5664879802710179a.1675095933.git.lucien.xin@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      8f35ae17
  6. Jan 31, 2023
  7. Jan 30, 2023
    • Hyunwoo Kim's avatar
      netrom: Fix use-after-free caused by accept on already connected socket · 61179292
      Hyunwoo Kim authored
      
      If you call listen() and accept() on an already connect()ed
      AF_NETROM socket, accept() can successfully connect.
      This is because when the peer socket sends data to sendmsg,
      the skb with its own sk stored in the connected socket's
      sk->sk_receive_queue is connected, and nr_accept() dequeues
      the skb waiting in the sk->sk_receive_queue.
      
      As a result, nr_accept() allocates and returns a sock with
      the sk of the parent AF_NETROM socket.
      
      And here use-after-free can happen through complex race conditions:
      ```
                        cpu0                                                     cpu1
                                                                     1. socket_2 = socket(AF_NETROM)
                                                                              .
                                                                              .
                                                                        listen(socket_2)
                                                                        accepted_socket = accept(socket_2)
             2. socket_1 = socket(AF_NETROM)
                  nr_create()    // sk refcount : 1
                connect(socket_1)
                                                                     3. write(accepted_socket)
                                                                          nr_sendmsg()
                                                                          nr_output()
                                                                          nr_kick()
                                                                          nr_send_iframe()
                                                                          nr_transmit_buffer()
                                                                          nr_route_frame()
                                                                          nr_loopback_queue()
                                                                          nr_loopback_timer()
                                                                          nr_rx_frame()
                                                                          nr_process_rx_frame(sk, skb);    // sk : socket_1's sk
                                                                          nr_state3_machine()
                                                                          nr_queue_rx_frame()
                                                                          sock_queue_rcv_skb()
                                                                          sock_queue_rcv_skb_reason()
                                                                          __sock_queue_rcv_skb()
                                                                          __skb_queue_tail(list, skb);    // list : socket_1's sk->sk_receive_queue
             4. listen(socket_1)
                  nr_listen()
                uaf_socket = accept(socket_1)
                  nr_accept()
                  skb_dequeue(&sk->sk_receive_queue);
                                                                     5. close(accepted_socket)
                                                                          nr_release()
                                                                          nr_write_internal(sk, NR_DISCREQ)
                                                                          nr_transmit_buffer()    // NR_DISCREQ
                                                                          nr_route_frame()
                                                                          nr_loopback_queue()
                                                                          nr_loopback_timer()
                                                                          nr_rx_frame()    // sk : socket_1's sk
                                                                          nr_process_rx_frame()  // NR_STATE_3
                                                                          nr_state3_machine()    // NR_DISCREQ
                                                                          nr_disconnect()
                                                                          nr_sk(sk)->state = NR_STATE_0;
             6. close(socket_1)    // sk refcount : 3
                  nr_release()    // NR_STATE_0
                  sock_put(sk);    // sk refcount : 0
                  sk_free(sk);
                close(uaf_socket)
                  nr_release()
                  sock_hold(sk);    // UAF
      ```
      
      KASAN report by syzbot:
      ```
      BUG: KASAN: use-after-free in nr_release+0x66/0x460 net/netrom/af_netrom.c:520
      Write of size 4 at addr ffff8880235d8080 by task syz-executor564/5128
      
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106
       print_address_description mm/kasan/report.c:306 [inline]
       print_report+0x15e/0x461 mm/kasan/report.c:417
       kasan_report+0xbf/0x1f0 mm/kasan/report.c:517
       check_region_inline mm/kasan/generic.c:183 [inline]
       kasan_check_range+0x141/0x190 mm/kasan/generic.c:189
       instrument_atomic_read_write include/linux/instrumented.h:102 [inline]
       atomic_fetch_add_relaxed include/linux/atomic/atomic-instrumented.h:116 [inline]
       __refcount_add include/linux/refcount.h:193 [inline]
       __refcount_inc include/linux/refcount.h:250 [inline]
       refcount_inc include/linux/refcount.h:267 [inline]
       sock_hold include/net/sock.h:775 [inline]
       nr_release+0x66/0x460 net/netrom/af_netrom.c:520
       __sock_release+0xcd/0x280 net/socket.c:650
       sock_close+0x1c/0x20 net/socket.c:1365
       __fput+0x27c/0xa90 fs/file_table.c:320
       task_work_run+0x16f/0x270 kernel/task_work.c:179
       exit_task_work include/linux/task_work.h:38 [inline]
       do_exit+0xaa8/0x2950 kernel/exit.c:867
       do_group_exit+0xd4/0x2a0 kernel/exit.c:1012
       get_signal+0x21c3/0x2450 kernel/signal.c:2859
       arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306
       exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
       exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
       __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
       syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
       do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7f6c19e3c9b9
      Code: Unable to access opcode bytes at 0x7f6c19e3c98f.
      RSP: 002b:00007fffd4ba2ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000133
      RAX: 0000000000000116 RBX: 0000000000000003 RCX: 00007f6c19e3c9b9
      RDX: 0000000000000318 RSI: 00000000200bd000 RDI: 0000000000000006
      RBP: 0000000000000003 R08: 000000000000000d R09: 000000000000000d
      R10: 0000000000000000 R11: 0000000000000246 R12: 000055555566a2c0
      R13: 0000000000000011 R14: 0000000000000000 R15: 0000000000000000
       </TASK>
      
      Allocated by task 5128:
       kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
       kasan_set_track+0x25/0x30 mm/kasan/common.c:52
       ____kasan_kmalloc mm/kasan/common.c:371 [inline]
       ____kasan_kmalloc mm/kasan/common.c:330 [inline]
       __kasan_kmalloc+0xa3/0xb0 mm/kasan/common.c:380
       kasan_kmalloc include/linux/kasan.h:211 [inline]
       __do_kmalloc_node mm/slab_common.c:968 [inline]
       __kmalloc+0x5a/0xd0 mm/slab_common.c:981
       kmalloc include/linux/slab.h:584 [inline]
       sk_prot_alloc+0x140/0x290 net/core/sock.c:2038
       sk_alloc+0x3a/0x7a0 net/core/sock.c:2091
       nr_create+0xb6/0x5f0 net/netrom/af_netrom.c:433
       __sock_create+0x359/0x790 net/socket.c:1515
       sock_create net/socket.c:1566 [inline]
       __sys_socket_create net/socket.c:1603 [inline]
       __sys_socket_create net/socket.c:1588 [inline]
       __sys_socket+0x133/0x250 net/socket.c:1636
       __do_sys_socket net/socket.c:1649 [inline]
       __se_sys_socket net/socket.c:1647 [inline]
       __x64_sys_socket+0x73/0xb0 net/socket.c:1647
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Freed by task 5128:
       kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
       kasan_set_track+0x25/0x30 mm/kasan/common.c:52
       kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:518
       ____kasan_slab_free mm/kasan/common.c:236 [inline]
       ____kasan_slab_free+0x13b/0x1a0 mm/kasan/common.c:200
       kasan_slab_free include/linux/kasan.h:177 [inline]
       __cache_free mm/slab.c:3394 [inline]
       __do_kmem_cache_free mm/slab.c:3580 [inline]
       __kmem_cache_free+0xcd/0x3b0 mm/slab.c:3587
       sk_prot_free net/core/sock.c:2074 [inline]
       __sk_destruct+0x5df/0x750 net/core/sock.c:2166
       sk_destruct net/core/sock.c:2181 [inline]
       __sk_free+0x175/0x460 net/core/sock.c:2192
       sk_free+0x7c/0xa0 net/core/sock.c:2203
       sock_put include/net/sock.h:1991 [inline]
       nr_release+0x39e/0x460 net/netrom/af_netrom.c:554
       __sock_release+0xcd/0x280 net/socket.c:650
       sock_close+0x1c/0x20 net/socket.c:1365
       __fput+0x27c/0xa90 fs/file_table.c:320
       task_work_run+0x16f/0x270 kernel/task_work.c:179
       exit_task_work include/linux/task_work.h:38 [inline]
       do_exit+0xaa8/0x2950 kernel/exit.c:867
       do_group_exit+0xd4/0x2a0 kernel/exit.c:1012
       get_signal+0x21c3/0x2450 kernel/signal.c:2859
       arch_do_signal_or_restart+0x79/0x5c0 arch/x86/kernel/signal.c:306
       exit_to_user_mode_loop kernel/entry/common.c:168 [inline]
       exit_to_user_mode_prepare+0x15f/0x250 kernel/entry/common.c:203
       __syscall_exit_to_user_mode_work kernel/entry/common.c:285 [inline]
       syscall_exit_to_user_mode+0x1d/0x50 kernel/entry/common.c:296
       do_syscall_64+0x46/0xb0 arch/x86/entry/common.c:86
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      ```
      
      To fix this issue, nr_listen() returns -EINVAL for sockets that
      successfully nr_connect().
      
      Reported-by: default avatar <syzbot+caa188bdfc1eeafeb418@syzkaller.appspotmail.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: default avatarHyunwoo Kim <v4bel@theori.io>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      61179292
  8. Jan 28, 2023
  9. Jan 25, 2023
  10. Jan 24, 2023
    • Paolo Abeni's avatar
      Revert "Merge branch 'ethtool-mac-merge'" · d968117a
      Paolo Abeni authored
      
      This reverts commit 0ad999c1, reversing
      changes made to e38553bd.
      
      It was not intended for net.
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      d968117a
    • Kuniyuki Iwashima's avatar
      netrom: Fix use-after-free of a listening socket. · 409db27e
      Kuniyuki Iwashima authored
      
      syzbot reported a use-after-free in do_accept(), precisely nr_accept()
      as sk_prot_alloc() allocated the memory and sock_put() frees it. [0]
      
      The issue could happen if the heartbeat timer is fired and
      nr_heartbeat_expiry() calls nr_destroy_socket(), where a socket
      has SOCK_DESTROY or a listening socket has SOCK_DEAD.
      
      In this case, the first condition cannot be true.  SOCK_DESTROY is
      flagged in nr_release() only when the file descriptor is close()d,
      but accept() is being called for the listening socket, so the second
      condition must be true.
      
      Usually, the AF_NETROM listener neither starts timers nor sets
      SOCK_DEAD.  However, the condition is met if connect() fails before
      listen().  connect() starts the t1 timer and heartbeat timer, and
      t1timer calls nr_disconnect() when timeout happens.  Then, SOCK_DEAD
      is set, and if we call listen(), the heartbeat timer calls
      nr_destroy_socket().
      
        nr_connect
          nr_establish_data_link(sk)
            nr_start_t1timer(sk)
          nr_start_heartbeat(sk)
                                          nr_t1timer_expiry
                                            nr_disconnect(sk, ETIMEDOUT)
                                              nr_sk(sk)->state = NR_STATE_0
                                              sk->sk_state = TCP_CLOSE
                                              sock_set_flag(sk, SOCK_DEAD)
      nr_listen
        if (sk->sk_state != TCP_LISTEN)
          sk->sk_state = TCP_LISTEN
                                          nr_heartbeat_expiry
                                            switch (nr->state)
                                            case NR_STATE_0
                                              if (sk->sk_state == TCP_LISTEN &&
                                                  sock_flag(sk, SOCK_DEAD))
                                                nr_destroy_socket(sk)
      
      This path seems expected, and nr_destroy_socket() is called to clean
      up resources.  Initially, there was sock_hold() before nr_destroy_socket()
      so that the socket would not be freed, but the commit 517a16b1
      ("netrom: Decrease sock refcount when sock timers expire") accidentally
      removed it.
      
      To fix use-after-free, let's add sock_hold().
      
      [0]:
      BUG: KASAN: use-after-free in do_accept+0x483/0x510 net/socket.c:1848
      Read of size 8 at addr ffff88807978d398 by task syz-executor.3/5315
      
      CPU: 0 PID: 5315 Comm: syz-executor.3 Not tainted 6.2.0-rc3-syzkaller-00165-gd9fc1511728c #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106
       print_address_description mm/kasan/report.c:306 [inline]
       print_report+0x15e/0x461 mm/kasan/report.c:417
       kasan_report+0xbf/0x1f0 mm/kasan/report.c:517
       do_accept+0x483/0x510 net/socket.c:1848
       __sys_accept4_file net/socket.c:1897 [inline]
       __sys_accept4+0x9a/0x120 net/socket.c:1927
       __do_sys_accept net/socket.c:1944 [inline]
       __se_sys_accept net/socket.c:1941 [inline]
       __x64_sys_accept+0x75/0xb0 net/socket.c:1941
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7fa436a8c0c9
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fa437784168 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
      RAX: ffffffffffffffda RBX: 00007fa436bac050 RCX: 00007fa436a8c0c9
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
      RBP: 00007fa436ae7ae9 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 00007ffebc6700df R14: 00007fa437784300 R15: 0000000000022000
       </TASK>
      
      Allocated by task 5294:
       kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
       kasan_set_track+0x25/0x30 mm/kasan/common.c:52
       ____kasan_kmalloc mm/kasan/common.c:371 [inline]
       ____kasan_kmalloc mm/kasan/common.c:330 [inline]
       __kasan_kmalloc+0xa3/0xb0 mm/kasan/common.c:380
       kasan_kmalloc include/linux/kasan.h:211 [inline]
       __do_kmalloc_node mm/slab_common.c:968 [inline]
       __kmalloc+0x5a/0xd0 mm/slab_common.c:981
       kmalloc include/linux/slab.h:584 [inline]
       sk_prot_alloc+0x140/0x290 net/core/sock.c:2038
       sk_alloc+0x3a/0x7a0 net/core/sock.c:2091
       nr_create+0xb6/0x5f0 net/netrom/af_netrom.c:433
       __sock_create+0x359/0x790 net/socket.c:1515
       sock_create net/socket.c:1566 [inline]
       __sys_socket_create net/socket.c:1603 [inline]
       __sys_socket_create net/socket.c:1588 [inline]
       __sys_socket+0x133/0x250 net/socket.c:1636
       __do_sys_socket net/socket.c:1649 [inline]
       __se_sys_socket net/socket.c:1647 [inline]
       __x64_sys_socket+0x73/0xb0 net/socket.c:1647
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Freed by task 14:
       kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
       kasan_set_track+0x25/0x30 mm/kasan/common.c:52
       kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:518
       ____kasan_slab_free mm/kasan/common.c:236 [inline]
       ____kasan_slab_free+0x13b/0x1a0 mm/kasan/common.c:200
       kasan_slab_free include/linux/kasan.h:177 [inline]
       __cache_free mm/slab.c:3394 [inline]
       __do_kmem_cache_free mm/slab.c:3580 [inline]
       __kmem_cache_free+0xcd/0x3b0 mm/slab.c:3587
       sk_prot_free net/core/sock.c:2074 [inline]
       __sk_destruct+0x5df/0x750 net/core/sock.c:2166
       sk_destruct net/core/sock.c:2181 [inline]
       __sk_free+0x175/0x460 net/core/sock.c:2192
       sk_free+0x7c/0xa0 net/core/sock.c:2203
       sock_put include/net/sock.h:1991 [inline]
       nr_heartbeat_expiry+0x1d7/0x460 net/netrom/nr_timer.c:148
       call_timer_fn+0x1da/0x7c0 kernel/time/timer.c:1700
       expire_timers+0x2c6/0x5c0 kernel/time/timer.c:1751
       __run_timers kernel/time/timer.c:2022 [inline]
       __run_timers kernel/time/timer.c:1995 [inline]
       run_timer_softirq+0x326/0x910 kernel/time/timer.c:2035
       __do_softirq+0x1fb/0xadc kernel/softirq.c:571
      
      Fixes: 517a16b1 ("netrom: Decrease sock refcount when sock timers expire")
      Reported-by: default avatar <syzbot+5fafd5cfe1fc91f6b352@syzkaller.appspotmail.com>
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Link: https://lore.kernel.org/r/20230120231927.51711-1-kuniyu@amazon.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      409db27e
    • Sriram Yagnaraman's avatar
      netfilter: conntrack: unify established states for SCTP paths · a44b7651
      Sriram Yagnaraman authored
      
      An SCTP endpoint can start an association through a path and tear it
      down over another one. That means the initial path will not see the
      shutdown sequence, and the conntrack entry will remain in ESTABLISHED
      state for 5 days.
      
      By merging the HEARTBEAT_ACKED and ESTABLISHED states into one
      ESTABLISHED state, there remains no difference between a primary or
      secondary path. The timeout for the merged ESTABLISHED state is set to
      210 seconds (hb_interval * max_path_retrans + rto_max). So, even if a
      path doesn't see the shutdown sequence, it will expire in a reasonable
      amount of time.
      
      With this change in place, there is now more than one state from which
      we can transition to ESTABLISHED, COOKIE_ECHOED and HEARTBEAT_SENT, so
      handle the setting of ASSURED bit whenever a state change has happened
      and the new state is ESTABLISHED. Removed the check for dir==REPLY since
      the transition to ESTABLISHED can happen only in the reply direction.
      
      Fixes: 9fb9cbb1 ("[NETFILTER]: Add nf_conntrack subsystem.")
      Signed-off-by: default avatarSriram Yagnaraman <sriram.yagnaraman@est.tech>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a44b7651
    • Sriram Yagnaraman's avatar
      Revert "netfilter: conntrack: add sctp DATA_SENT state" · 13bd9b31
      Sriram Yagnaraman authored
      
      This reverts commit (bff3d053: "netfilter: conntrack: add sctp
      DATA_SENT state")
      
      Using DATA/SACK to detect a new connection on secondary/alternate paths
      works only on new connections, while a HEARTBEAT is required on
      connection re-use. It is probably consistent to wait for HEARTBEAT to
      create a secondary connection in conntrack.
      
      Signed-off-by: default avatarSriram Yagnaraman <sriram.yagnaraman@est.tech>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      13bd9b31
    • Sriram Yagnaraman's avatar
      netfilter: conntrack: fix bug in for_each_sctp_chunk · 98ee0077
      Sriram Yagnaraman authored
      
      skb_header_pointer() will return NULL if offset + sizeof(_sch) exceeds
      skb->len, so this offset < skb->len test is redundant.
      
      if sch->length == 0, this will end up in an infinite loop, add a check
      for sch->length > 0
      
      Fixes: 9fb9cbb1 ("[NETFILTER]: Add nf_conntrack subsystem.")
      Suggested-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarSriram Yagnaraman <sriram.yagnaraman@est.tech>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      98ee0077
    • Sriram Yagnaraman's avatar
      netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETE · a9993591
      Sriram Yagnaraman authored
      
      RFC 9260, Sec 8.5.1 states that for ABORT/SHUTDOWN_COMPLETE, the chunk
      MUST be accepted if the vtag of the packet matches its own tag and the
      T bit is not set OR if it is set to its peer's vtag and the T bit is set
      in chunk flags. Otherwise the packet MUST be silently dropped.
      
      Update vtag verification for ABORT/SHUTDOWN_COMPLETE based on the above
      description.
      
      Fixes: 9fb9cbb1 ("[NETFILTER]: Add nf_conntrack subsystem.")
      Signed-off-by: default avatarSriram Yagnaraman <sriram.yagnaraman@est.tech>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      a9993591
    • Eric Dumazet's avatar
      ipv4: prevent potential spectre v1 gadget in fib_metrics_match() · 5e9398a2
      Eric Dumazet authored
      
      if (!type)
              continue;
          if (type > RTAX_MAX)
              return false;
          ...
          fi_val = fi->fib_metrics->metrics[type - 1];
      
      @type being used as an array index, we need to prevent
      cpu speculation or risk leaking kernel memory content.
      
      Fixes: 5f9ae3d9 ("ipv4: do metrics match when looking up and deleting a route")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20230120133140.3624204-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      5e9398a2
Loading