See the entire conversation

9 replies and sub-replies as of Jul 04 2022

My first response is this is why I prefer looking at saturation (pressure) metrics first, since time spent waiting covers all scenarios (hardware- and software-limited). Why you start looking at the S from the USE method.
But for reporting utilization of a half-throttled CPU: I'd report it as 50%, and, include another metric for the current software limits. In a world of containers, monitoring software needs both hardware and software-capped utilization reported.
CPU utilization (as measured by the OS) is already problematic without considering stall cycles, so you need to report IPC as well…
With how modern desktop processors scale frequency - e.g., AMD PBO and XFR - there is no actual "100%" frequency. How would you determine the high water mark?
True, as with turbo boost, it's makes things painful. If you report the baseline (unscaled) then perf is sometimes weirdly faster. If you report the max possible (highest step) then 100% across all cores is usually unobtainable. /
If I had to pick, I'd go with all-core max freq for system-wide CPU utilization. That's pairing system-wide CPU utilization along with its system-wide all-core limit.
We've been down this hole recently, and from an app PoV, I do think "effective utilization" matters. With rlimits, if I'm exhausting my allocated cycles, that's 100%. (Apps may make decisions based on approaching this number) To your point, I'd report saturated util. separately.
though might not be directly related , but also important to check if memory stalls are getting counted as part of utilization as also illustrated by @brendangregg in one of his articles
In a UXUI end user aspect, it should 100% fixed the capacity, with a e.g. 20% say throttled, next to idle, available, and/or steal.