Skip to content
Snippets Groups Projects
  • Mel Gorman's avatar
    cfccd2e6
    mm, compaction: finish pageblocks on complete migration failure · cfccd2e6
    Mel Gorman authored
    Commit 7efc3b72 ("mm/compaction: fix set skip in
    fast_find_migrateblock") address an issue where a pageblock selected by
    fast_find_migrateblock() was ignored.  Unfortunately, the same fix
    resulted in numerous reports of khugepaged or kcompactd stalling for long
    periods of time or consuming 100% of CPU.
    
    Tracing showed that there was a lot of rescanning between a small subset
    of pageblocks because the conditions for marking the block skip are not
    met.  The scan is not reaching the end of the pageblock because enough
    pages were isolated but none were migrated successfully.  Eventually it
    circles back to the same block.
    
    Pageblock skip tracking tries to minimise both latency and excessive
    scanning but tracking exactly when a block is fully scanned requires an
    excessive amount of state.  This patch forcibly rescans a pageblock when
    all isolated pages fail to migrate even though it could be for transient
    reasons such as page writeback or page dirty.  This will sometimes migrate
    too many pages but pageblocks will be marked skip and forward progress
    will be made.
    
    "Usemen" from the mmtests configuration
    workload-usemem-stress-numa-compact was used to stress compaction.  The
    compaction trace events were recorded using a 6.2-rc5 kernel that includes
    commit 7efc3b72 and count of unique ranges were measured.  The top 5
    ranges were
    
       3076 range=(0x10ca00-0x10cc00)
       3076 range=(0x110a00-0x110c00)
       3098 range=(0x13b600-0x13b800)
       3104 range=(0x141c00-0x141e00)
      11424 range=(0x11b600-0x11b800)
    
    While this workload is very different than what the bugs reported, the
    pattern of the same subset of blocks being repeatedly scanned is observed.
    At one point, *only* the range range=(0x11b600 ~ 0x11b800) was scanned
    for 2 seconds.  14 seconds passed between the first migration-related
    event and the last.
    
    With the series applied including this patch, the top 5 ranges were
    
          1 range=(0x11607e-0x116200)
          1 range=(0x116200-0x116278)
          1 range=(0x116278-0x116400)
          1 range=(0x116400-0x116424)
          1 range=(0x116424-0x116600)
    
    Only unique ranges were scanned and the time between the first
    migration-related event was 0.11 milliseconds.
    
    Link: https://lkml.kernel.org/r/20230125134434.18017-5-mgorman@techsingularity.net
    
    
    Fixes: 7efc3b72 ("mm/compaction: fix set skip in fast_find_migrateblock")
    Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
    Cc: Chuyi Zhou <zhouchuyi@bytedance.com>
    Cc: Jiri Slaby <jirislaby@kernel.org>
    Cc: Maxim Levitsky <mlevitsk@redhat.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Pedro Falcato <pedro.falcato@gmail.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    cfccd2e6
    History
    mm, compaction: finish pageblocks on complete migration failure
    Mel Gorman authored
    Commit 7efc3b72 ("mm/compaction: fix set skip in
    fast_find_migrateblock") address an issue where a pageblock selected by
    fast_find_migrateblock() was ignored.  Unfortunately, the same fix
    resulted in numerous reports of khugepaged or kcompactd stalling for long
    periods of time or consuming 100% of CPU.
    
    Tracing showed that there was a lot of rescanning between a small subset
    of pageblocks because the conditions for marking the block skip are not
    met.  The scan is not reaching the end of the pageblock because enough
    pages were isolated but none were migrated successfully.  Eventually it
    circles back to the same block.
    
    Pageblock skip tracking tries to minimise both latency and excessive
    scanning but tracking exactly when a block is fully scanned requires an
    excessive amount of state.  This patch forcibly rescans a pageblock when
    all isolated pages fail to migrate even though it could be for transient
    reasons such as page writeback or page dirty.  This will sometimes migrate
    too many pages but pageblocks will be marked skip and forward progress
    will be made.
    
    "Usemen" from the mmtests configuration
    workload-usemem-stress-numa-compact was used to stress compaction.  The
    compaction trace events were recorded using a 6.2-rc5 kernel that includes
    commit 7efc3b72 and count of unique ranges were measured.  The top 5
    ranges were
    
       3076 range=(0x10ca00-0x10cc00)
       3076 range=(0x110a00-0x110c00)
       3098 range=(0x13b600-0x13b800)
       3104 range=(0x141c00-0x141e00)
      11424 range=(0x11b600-0x11b800)
    
    While this workload is very different than what the bugs reported, the
    pattern of the same subset of blocks being repeatedly scanned is observed.
    At one point, *only* the range range=(0x11b600 ~ 0x11b800) was scanned
    for 2 seconds.  14 seconds passed between the first migration-related
    event and the last.
    
    With the series applied including this patch, the top 5 ranges were
    
          1 range=(0x11607e-0x116200)
          1 range=(0x116200-0x116278)
          1 range=(0x116278-0x116400)
          1 range=(0x116400-0x116424)
          1 range=(0x116424-0x116600)
    
    Only unique ranges were scanned and the time between the first
    migration-related event was 0.11 milliseconds.
    
    Link: https://lkml.kernel.org/r/20230125134434.18017-5-mgorman@techsingularity.net
    
    
    Fixes: 7efc3b72 ("mm/compaction: fix set skip in fast_find_migrateblock")
    Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
    Cc: Chuyi Zhou <zhouchuyi@bytedance.com>
    Cc: Jiri Slaby <jirislaby@kernel.org>
    Cc: Maxim Levitsky <mlevitsk@redhat.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Cc: Pedro Falcato <pedro.falcato@gmail.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>