Karan IITgn: Avoid False Sharing: 3 Easy Methods to Boost Performance (Part 2)

Discover how to optimize your code and avoid performance bottlenecks with strategies to combat false sharing in this summary. We explore techniques to enhance multi-threaded application performance by minimizing cache invalidation issues.

Quick Takeaways:

Padding: Add extra bytes between variables to ensure they reside in different cache lines, preventing simultaneous access conflicts.
Aliasing: Utilize compiler directives or runtime mechanisms to explicitly allocate variables to separate cache lines, minimizing the risk of false sharing.
Avoid Adjacent Updates: Group updates to adjacent variables to reduce the frequency of cache invalidation and improve performance.

The lecture explains these three methods: padding, aliasing, and avoiding adjacent updates with a practical example. A problem statement is included, where one can experiment with the data by updating the datatypes D1 and D2. The goal is to compare execution times with and without these techniques to demonstrate the impact of false sharing on system performance. Source code will be in the description.

Understanding and Avoiding False Sharing in Multi-threaded Systems

This article discusses false sharing, a performance issue that can arise in multi-threaded systems when independent data items reside within the same cache line. It then explores three methods to mitigate this problem: Padding, Alignas, and Avoiding Adjacent Updates. Finally, it presents a problem statement for experimenting with these methods.

What is False Sharing?

In multi-threaded environments, false sharing occurs when multiple threads access different, independent data items that happen to reside within the same cache line. Although the threads are not actually sharing data, the cache coherence protocol treats it as if they are, leading to unnecessary cache invalidations and reduced performance.

Imagine two data variables, D1 and D2, residing in the same cache line. Even if threads are only modifying their respective variables, the cache line will be repeatedly invalidated and re-fetched between the cores, leading to performance degradation.

Methods to Avoid False Sharing

We will explore three methods that help mitigate false sharing in multithreaded systems.

1. Padding

Padding involves inserting extra bytes between data elements to ensure that they reside in different cache lines.

Explanation: If we know a cache line size is 64 bytes, we can add a 64-byte padding between variables. This guarantees that the variables will be located in separate cache lines.
Example: After allocating memory to variable D1 (occupying, say, 4 bytes), insert a 64-byte character array (padding) before allocating memory to D2. This forces D2 to be in a separate cache line.
Drawbacks: Padding can lead to significant memory wastage, especially in large structures or classes with numerous variables. As illustrated earlier, allocating a 64-byte padding even when the actual data type is of significantly lower size may lead to wastage.

2. Alignas

Alignas is a specifier that forces a variable to be aligned to a specific memory boundary. We can use it to align variables to cache line boundaries.

Explanation: By using alignas(64), we can ensure that variables are allocated at memory addresses that are multiples of 64 (assuming a 64-byte cache line).
Benefit: This guarantees that different variables are allocated to different cache lines, thus avoiding false sharing.
Usage: The syntax would be alignas(64) before declaring the variable.

3. Avoiding Adjacent Updates

This method focuses on reducing the frequency of updates to adjacent variables that might reside in the same cache line.

Explanation: Instead of frequently updating adjacent variables (D1 and D2) within a short period, accumulate the necessary changes and update them together at the end of a calculation.
Reasoning: Frequent updates cause repeated cache invalidations. By minimizing these updates, we reduce cache contention.
Example: If updating D1 and D2, perform all calculations first, and then update data.D1 and data.D2 in a single operation.

Experiment: Problem Statement to Demonstrate False Sharing

The following problem statement can be used to experiment with and observe the effects of false sharing:

Create a Data Structure: Define a simple data structure (e.g., struct Data { int d1; int d2; }).
Instantiate the Structure: Create an instance of the structure: Data d; and initialize d.d1 = 10; and d.d2 = 20;
Write an Update Function: Create a function void fun(Data &d) that updates d.d1 and d.d2 a large number of times (e.g., 10,000 times) using a loop (d.d1 = d.d1 + 1; d.d2 = d.d2 + 1;).
Measure Execution Time: Measure the execution time of the fun function when updating d.d1 and d.d2 separately. Record the start time before the function call and the end time after.
Apply False Sharing Avoidance Techniques:
- Padding: Add padding between d1 and d2 (e.g., char padding[60];).
- Alignas: Use alignas(64) before declaring the Data structure.
Repeat the Measurement: Repeat step 4 after applying the padding or alignas.
Compare Results: Compare the execution times with and without the false sharing avoidance techniques. You should observe a performance improvement when using padding or alignas.

This experiment allows you to empirically demonstrate the impact of false sharing and the effectiveness of the presented mitigation techniques. The implementation of the code can be found in the description section.

Avoid False Sharing: 3 Easy Methods to Boost Performance (Part 2)

Summary

Quick Abstract

Understanding and Avoiding False Sharing in Multi-threaded Systems

What is False Sharing?

Methods to Avoid False Sharing

1. Padding

2. Alignas

3. Avoiding Adjacent Updates

Experiment: Problem Statement to Demonstrate False Sharing

Quick Actions

More from Karan IITgn

false sharing and impact on system's performance || c++ advanced techniques & optimizations for HFT

Related Summaries

false sharing and impact on system's performance || c++ advanced techniques & optimizations for HFT

false sharing and impact on system's performance || c++ advanced techniques & optimizations for HFT

Summarize a New YouTube Video