A recent patch to the Linux kernel (via Phoronix) points to an interesting problem with Intel’s nearly-mythical 10nm+ Ice Lake Xeon processors: The CPUs take longer than expected to return to normal frequencies after exiting certain sleep states, which then impacts performance consistency due to ‘uncertain’ CPU clock rates.
The severity of the issue isn’t clear, but If nothing else, it does show Intel’s that work on the Ice Lake Xeon processors continues, albeit with some challenges. Due to a report that Intel has been hit by another delay to its server programs, we reached out to the company last week to confirm if the schedule still remains on track. The company responded “We remain on track to deliver 10nm Ice Lake to customers in 2H20.”
We’ll see. Back to the issue at hand. Processors fall into various C-States (sleep) to reduce overall power consumption during idle periods. C-States have different degrees of power savings for each core, with the deepest levels of sleep involving stopping core clocks, flushing caches, and reducing voltage to extract the utmost in power savings. Additionally, Package C-States can apply that reduce power and clocks for resources on the CPU package that are shared by all the cores, like fabrics and the uncore.
The deeper the sleep state, the more power each processor can save. However, resuming to full speed from deeper sleep states requires more time than lighter sleep states. According to the report, that process seems to take longer with certain power states for the Ice Lake Xeon processors.
An Intel ‘kernel test robot’ posted the patch and explained the issue. As Phoronix points out, the fix comes from an Intel employee, meaning the company likely encountered the issue in its own testing. The explanation of the issue reads as follows:
“On ICX platform, the CPU frequency will slowly ramp up when woken up from C-states deeper than/equals to C1E. Although this feature does save energy in many cases this might also cause unexpected result. For example, workload might get unstable performance due to the uncertainty of CPU frequency. Besides, the CPU frequency might not be locked to specific level when the CPU utilization is low.
“Thus this patch disables C1E auto-promotion and expose C1E as a separate idle state, so that the C1E and C6 can be disabled via sysfs when necessary.”
To fix the problem, the system can disable the C1E and C6 states entirely, thus preventing the chip from entering into the lower sleep states. The engineer elaborates on the problem further:
“Besides C1 and C1E, the exit latency of C6 was measured by a dedicated tool. However the exit latency (41us) exposed by _CST is much smaller than the one we measured (128us). This is probably due to the _CST uses the exit latency when woken up from PC0+C6, rather than PC6+C6 when C6 was measured. Choose the latter as we need the longest latency in theory.”
Here we see that the problem stems from how the exit latency (the amount of time it takes the CPU to pop back to full speed) is measured, and then exposed to the kernel. ACPI_CST, which communicates the C-States information to the kernel, lists the latency as measured when the processor was in a PC0+C6 state. That means that one or more cores may be in a C6 sleep state, but the rest of the package (fabric and uncore) is still chugging along at full speed (PC0). In this state, it takes the core only 41us to resume normal operation.
However, when the processor enters the PC6+C6 state, the package also powers down (PC6 state) along with the cores, so it takes longer for the processor to regain its full speed. Intel measured the sleep exit latency in these conditions at 128us, so it appears the kernel is merely being given the wrong sleep exit values.
Just to get an idea of how this differs from other Intel processors, we searched around for the typical sleep exit latency for a Skylake-based processor.
We referred back to an interesting bachelor’s thesis [PDF] by Vladislav Govtva from the Metropolia University of Applied Sciences that was published early last year. He measured the sleep exit latency from several different generations of Intel processors, and above we can see his results with an Intel Xeon Platinum 8170M (Skylake).
Govtva measured the maximum wake latency (the same as exit latency) from a C6 state as ~108us, which is 20us faster than the Ice Lake processor. There are likely differing measurement criteria involved here, but a simple comparison of the number yields an 18.5% increase in sleep exit latency.
It appears Intel has “fixed” the problem by allowing the system to disable certain sleep states under certain conditions, but it’s possible this is just a corner case that won’t apply to many types of applications. We’re reaching out to Intel for further clarification, but given that Ice Lake hasn’t been officially released, we don’t expect to learn much.
It will be interesting to see if Intel continues to tune this parameter further as it works through teething pains. Phoronix posits that the patch could make it into the Linux 5.9 cycle that opens next month, but could result in higher power consumption in exchange for more performance.