- Apr 06, 2023
-
-
James Morse authored
To expose iommu_groups via the resctrl filesystem, the resctrl driver needs to be able to walk the list of iommu_groups. These are exposed via sysfs as a kset. Add kset_get_next_obj() to allow resctrl to walk the kobjects in the kset. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
ARM SMMU with MPAM support are able to mark streams of traffic with the QoS labels MPAM uses. The user-space interface for MPAM is the resctrl filesystem, which allows threads to be moved between groups, its natural to do the same for iommu_groups. The resctrl interface lists threads, so will also need to list iommu_groups, it will be necessary to walk the list of iommu_groups. To ensure this matches what user-space sees via sysfs, it is best to walk the kobjects. Add iommu_group_get_kset() to allow resctrl to retrieve the set of iommu_groups. Split the kobject-to-group code out of iommu_group_get_by_id() and expose it as iommu_group_get_from_kobj(), to allow resctrl to get the iommu_group from the kobject it already has when walking. Finally, add iommu_group_get_ops() to allow the iommu ops for a group to be retrieved. The MPAM driver will use this with the iommu_group from resctrl to call the get/set methods provided by the iommu. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
To allow an iommu_group to be moved between resctrl groups as if it were a CPU thread, the mpam driver needs to be able to set the partid and pmg for the iommu_group. Use the properties in the STE, as these only apply to one stream. The MPAM driver also needs to know the maximum partid and pmg values that the SMMU can generate. This allows it to determine the system-wide common supported range of values. Add a helper to return this id register. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
Traffic in the system can be tagged with a PARTID and PMG. Different requestors can support a different number of bits for these fields. Before MPAM can be used, the MPAM driver has to discover the minimum number of bits supported by any requestor, which affects the range of PARTID and PMG that can be used. Detect whether the SMMU supports MPAM, if it does provide the MPAM driver with the maximum PARTID and PMG values. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
Resctrl allows tasks to be grouped into control groups, (a totally separate interface to cgroups), each of which have a configuration policy that affects the tasks prioritisation in the cache, or the memory bandwidth available when the task is running. The resctrl tasks file is used to assign threads to a control group. New threads inherit the resctrl settings from their parent. If a process is continually creating new threads, it can create new processes while user-space is assigning the threads to the control group. This means user-space has to continually poll the list of threads when it wants to move a process between control groups. Another level of abstraction would help, so that a group of threads can be moved between control groups in one go. New threads should inherit the new value, even if their parent hasn't been updated yet. Add a new cgroup controller named resctrl. This allows a cgroup to be labelled with the 'id' of a resctrl control or monitor group, which is used to look up the closid and rmid. New processes inherit the setting from the cgroup, meaning that new processes created after the label is changed use the new label. The relative path of the resctrl group is provided for convenience. TODO: turns out cgroups now supports threads, how does that work? TODO: migration is a thing in cgroups - what is that about? Signed-off-by:
James Morse <james.morse@arm.com> N.B. This uses the id for writes in preference to the path as perf is already restricted to a u64 for the configuration.
-
James Morse authored
Control and monitor groups have a CLOSID and/or RMID that is used to count the cache usage and memory bandwidth of tasks in this group. Not all of MPAMs counters can be exposed via resctrl, as each counter also needs a monitor to be allocated. It is unlikely there are enough monitors for every RMID to have a monitor permanently allocated. To allow counters to be read via perf, the RMID that a control or monitor group is using needs exposing to user-space. This can be passed back to perf as a parameter. MPAM's PMG values are not unique, the PARTID needs to be provided too. Perf allows a number of u64 arguments, which is not enough to encode a control/monitor group name. Similarly, there has been some interest in allowing cgroup to manage the tasks file for resctrl. Exposing a unique identifier for each control or monitor group will allow cgroups to point to a resctrl group that holds its configuration. Provide a file in each control or monitor group that returns a unique identifier. When passed back to the kernel, resctrl can decode this into a closid/rmid, or just identify the control or monitor group. The value is xor'd with a value picked at boot as obsfucation. This is to prevent user-space from relying on the layout of this field, or re-using values between boots of the system. This is to allow the kernel to change the layout of this field in the future. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
Some later things in the MPAM tree enable behaviour that resctrl doesn't have upstream. To make it clear to people using the out-of-tree code that they shouldn't be relying on this in user-space, add a mount option to enable this stuff. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
resctrl's limbo code needs to be told when the data left in a cache is small enough for the partid+pmg value to be re-allocated. x86 uses the cache size divded by the number of rmid users the cache may have. Do the same, but for the smallest cache, and with the number of partid-and-pmg users. Querying the cache size can't happen until after cacheinfo_sysfs_init() has run, so mpam_resctrl_setup() must wait until then. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
All of MPAMs errors indicate a software bug, e.g. an out-of-bounds partid has been generated. When this happens, the mpam driver is disabled. If resctrl_init() succeeded, also call resctrl_exit() to remove resctrl. If the filesystem was mounted in its traditional place, it is no longer possible for processes to find it as the mount point has been removed. If the filesystem was mounted elsewhere, it will appear that all CPU and domains are offline. User-space will not be able to update the hardware. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
Now that mpam links against resctrl, call the cpu and domain online/offline calls at the appropriate point. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
Enough MPAM support is present to enable ARCH_HAS_CPU_RESCTRL. Let it rip^Wlink! Remove the temporary resctrl_mon_ctx_waiters that was previously used to hide a link error. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
resctrl expects RDT like counters that are free running. MPAM's counters don't behave like this as they need a monitor to be allocated first. Provide the helper that says whether free running counters are supported. Subsequent patches will make this more intelligent. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
resctrl has individual hooks to separately enable and disable the closid/partid and rmid/pmg context switching code. For MPAM this is all the same thing, as the value in struct task_struct is used to cache the value that should be written to hardware. arm64's context switching code is enabled once MPAM is usable, but doesn't touch the hardware unless the value has changed. Resctrl doesn't need to ask. Add empty definitions for these hoooks. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
Pseudo lock isn't supported on arm64. Add empty definitions of the functions arm64 doesn't implement. Because the Kconfig option is not selected, none of these will be called. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
-
James Morse authored
resctrl uses resctrl_arch_rmid_read() to read counters. CDP emulation means the counter may need reading twice to get both the I and D side allocations. The same goes for reset. Add the rounding helper for checking monitor values while we're here. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
When resctrl wants to read a domain's 'QOS_L3_OCCUP', it needs to allocate a monitor on the corresponding resource. Monitors are allocated by class instead of component because any per-component user needs to have pre-emption disabled to avoid being migrated to another CPU. Add helpers to do this. This patch temporarily creates resctrl_mon_ctx_waiters as the resctrl version can't be selected until it will link. This gets removed in a later patch. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
If the mpam class that was picked to be the L3 resctrl control has bandwidth counters, enable mbm_local. We don't have any topology information to know how these counters interact with numa. TODO: try and work it out. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
resctrl supports 'MB', as a percentage throttling of traffic somewhere after the L3. This is the control that mba_sc uses, so ideally the class chosen should be as close as possible to the counters used for mba_local. MB's percentage control can be backed either with the fixed point fraction MBW_MAX or the bandwidth portion bitmap. Add helper to convert to/from percentages. One problem here is the value written is not the same as the value read back. This is deliberatly made visible to user-space. Another is the MBW_MAX fixed point fraction can't represent 100%. This is also exposed to user-space, as otherwise the values for a single-bit system is 100%, 0%, instead of 50%, 0%. The way CDP is emulated means MB controls need programming twice by the resctrl glue, as the bandwidth controls can be applied independently for data or instruction-fetch. This isn't how x86 behaves, and neither user-space nor resctrl support it. CC: Amit Singh Tomar <amitsinght@marvell.com> Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
Because MPAM's pmg aren't identical to RDT's rmid, resctrl handles some datastructrues by index. This allows x86 to map indexes to RMID, and MPAM to map them to partid-and-pmg. Add the helpers to do this. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
Care must be taken when modifying the partid and pmg of a task, as writing these values may race with the task being scheduled in, and reading the modified values. Add helpers to set the task properties, and the cpu default value, and add the plumbing to the mpam driver that lets resctrl use them. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
Intel RDT's CDP feature allows the cache to use a different control value depending on whether the accesses was for instruction fetch or a data access. MPAM's equivalent feature is the other way up: the CPU assigns a different partid label to traffic depending on whether it was instruction fetch or a data access, which causes the cache to use a different control value based solely on the partid. MPAM can emulate CDP, with the side effect that the alternative partid is seen by all caches, it can't be enabled per-cache. Add the resctrl hooks to turn this on or off. Add the helpers that match a closid against a task, which need to be aware that the value written to hardware is not the same as the one resctrl is using. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
MPAM has a system register that is used to hold the partid and pmg values that traffic generated by EL0 will use. This can be set per-task by the resctrl file system. Add a helper to switch this. resctrl expects a 'default' value to be used in preference if the default partid and pmg are selected. struct task_struct's separate closid and rmid fields are insufficient to implement resctrl using MPAM, as resctrl can change the partid (closid) and pmg (sort of like the rmid) separately. On x86, the rmid is an independent number, so a race that writes a mismatched closid and rmid into hardware is benign. On arm64, the pmg bits extend the partid. (i.e. partid-5 has a pmg-0 that is not the same as partid-6's pmg-0). In this case, mismatching the values will 'dirty' a pmg value that resctrl believes is clean, and is not tracking with its 'limbo' code. To avoid this, the partid and pmg are always read and written as a pair. Instead of making struct task_struct's closid and rmid fields an endian-unsafe union, add the value to struct thread_info and always use READ_ONCE()/WRITE_ONCE() when accessing this field. CC: Amit Singh Tomar <amitsinght@marvell.com> Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
resctrl has two helpers for updating the configuration. resctrl_arch_update_one() updates a single value, and is used by the software-controller to apply feedback to the bandwidth controls, it has to be called on one of the CPUs in the resctrl:domain. resctrl_arch_update_domains() copies multiple staged configurations, it can be called from anywhere. Both helpers should update any changes to the underlying hardware. Imlpement resctrl_arch_update_domains() to use resctrl_arch_update_one(), which doesn't depend on being called on the right CPU. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
Implement resctrl_arch_get_config() by testing the configuration for a CPOR bitmap. For any other configuration type return the default. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
We already have a helper for reseting an mpam class. Hook it up to resctrl_arch_reset_resources(). Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
After the changes to resctrl to support MPAM, num_rmid is only used as a value that is unfortunately exposed to user-space. For MPAM, this value doesn't mean anything, and whatever value we do expose will be wrong for some use cases. User-space may expect it can use this value to know how many 'extra' monitor groups it can create. e.g. on x86 if num_closid=4, num_rmid=8, then a total of 4 monitor groups can be created. If num_rmid were 2, then only 2 control groups could be created. For MPAM the number of pmg is very likely to be smaller than the number of partid, but this doesn't restrict the creation of control groups, as each control group has its own pmg space. Pick 1 if monitoring is supported. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
Sytems with MPAM support may have a variety of control types at any point of their system layout. We can only expose certain types of control, and only if they exist at particular locations. Start with the well-know caches. These have to be depth 2 or 3 and support MPAM's cache portion bitmap controls, with a number of portions fewer that resctrl's limit. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
resctrl has its own data structures to describe its resources. We can't use these directly as we play tricks with the 'MBA' resource, picking the MPAM controls or monitors that best apply. We may export the same component as both L3 and MBA. Add mpam_resctrl_exports[] as the array of class->resctrl mappings we are exporting, and add the cpuhp hooks that allocated and free the resctrl domain structures. While we're here, plumb in a few other obvious things. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
resctrl expects to reset the bandwidth counters when the filesystem is mounted. To allow this, add a helper that clears the saved mbwu state. Instead of cross calling to each CPU that can access the component MSC to write to the counter, set a flag that causes it to be zero'd on the the next read. This is easily done by forcing a configuration update. Signed-off-by:
James Morse <james.morse@arm.com>
-
Rohit Mathew authored
ida framework is used to allocate and free CSU/MBWU monitors as and when they are requested. "ida_alloc_range" is used to get the next free ID from a range of IDs that are not in use. The function takes the minimum and maximum ranges (both inclusive) so that a free ID could be allocated from the range. The value returned is, at a later point programmed into the MSMON_CFG_MON_SEL register. Both CSU and MBWU monitors range from 0 to NUM_MON - 1. As of now, the "ida_alloc_range" could very well return the upper limit NUM_MON for both CSU and MBWU monitors which could lead to a out of bound monitor access. Fix this issue by allocating monitors for both CSU and MBWU from 0 to NUM_MON - 1. Signed-off-by:
Rohit Mathew <rohit.mathew@arm.com> Signed-off-by:
James Morse <james.morse@arm.com>
-
Rohit Mathew authored
If the 44 bit (long) or 63 bit (LWD) counters are detected on probing the RIS, use long/LWD counter instead of the regular 31 bit mbwu counter. Only 32bit accesses to the MSC are required to be supported by the spec, but these registers are 64bits. The lower half may overflow into the higher half between two 32bit reads. To avoid this, use a helper that reads the top half twice to check for overflow. Signed-off-by:
Rohit Mathew <rohit.mathew@arm.com> [morse: merged multiple patches from Rohit] Signed-off-by:
James Morse <james.morse@arm.com>
-
Rohit Mathew authored
mpam v0.1 and versions above v1.0 support optional long counter for memory bandwidth monitoring. The MPAMF_MBWUMON_IDR register have fields indicating support for long counters. As of now, a 44 bit counter represented by HAS_LONG field (bit 30) and a 63 bit counter represented by LWD (bit 29) can be optionally integrated. Probe for these counters and set corresponding feature bits if any of these counters are present. Signed-off-by:
Rohit Mathew <rohit.mathew@arm.com> Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
Bandwidth counters need to run continuously to correctly reflect the bandwidth. The value read may be lower than the previous value read in the case of overflow and when the hardware is reset due to CPU hotplug. Add struct mbwu_state to track the bandwidth counter to allow overflow and power management to be handled. Signed-off-by:
James Morse <james.morse@arm.com>
-
- Mar 20, 2023
-
-
James Morse authored
Reaing a monitor involves configuring what you want to monitor, and reading the value. Components made up of multiple MSC may need values from each MSC. MSCs may take time to configure, returning 'not ready'. The maximum 'not ready' time should have been provided by firmware. Add mpam_msmon_read() to hide all this. If (one of) the MSC returns not ready, then wait the full timeout value before trying again. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
MPAM's MSC support a number of monitors, each of which supports bandwidth counters, or cache-storage-utilisation counters. To use a counter, a monitor needs to be configured. Add helpers to allocate and free CSU or MBWU monitors. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
MPAM supports more features than are going to be exposed to resctrl. For partid other than 0, the reset values of these controls isn't known. Discover the rest of the features so they can be reset to avoid any side effects when resctrl is in use. PARTID narrowing allows MSC/RIS to support less configuration space than is usable. If this feature is found on a class of device we are likely to use, then reduce the partid_max to make it usable. This allows us to map a PARTID to itself. CC: Rohit Mathew <Rohit.Mathew@arm.com> Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
When CPUs come online the original configuration should be restored. Once the maximum partid is known, allocate an configuration array for each component, and reprogram each RIS configuration from this. The MPAM sepc describes how multiple controls can interact. To prevent this happening by accident, always reset controls that don't have a valid configuration. This allows the same helper to be used for configuration and reset. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
Once all the MSC have been probed, the system wide usable number of PARTID is known and the configuration arrays can be allocated. After this point, checking all the MSC have been probed is pointless, and the cpuhp callbacks should restore the configuration, instead of just resetting the MSC. Add a static key to indicate whether mpam is enabled. Signed-off-by:
James Morse <james.morse@arm.com>
-
James Morse authored
Register and enable error IRQs. All the MPAM error interrupts indicate a software bug, e.g. out of range partid. If the error interrupt is ever signalled, attempt to disable MPAM. Only the irq handler accesses the ESR register, so no locking is needed. The work to disable MPAM after an error needs to happen at process context, use a threaded interrupt. There is no support for percpu threaded interrupts, for now schedule the work to be done from the irq handler. Enabling the IRQs in the MSC may involve cross calling to a CPU that can access the MSC. CC: Rohit Mathew <rohit.mathew@arm.com> Tested-by:
Rohit Mathew <rohit.mathew@arm.com> Signed-off-by:
James Morse <james.morse@arm.com>
-