File Sharing and Performance in Multi-Threaded Systems
This article discusses the impact of file sharing, specifically false sharing, on the performance of multi-threaded systems. We will explore how seemingly independent operations can be serialized due to memory alignment, leading to performance degradation.
Data Structure and Thread Operations
Consider a data structure data
containing two integer variables, d1
and d2
, each occupying 4 bytes. We have two threads, T1 and T2. Thread T1 operates on d.d1
, while thread T2 operates on d.d2
. Ideally, these operations should occur in parallel, each taking, say, 'x' milliseconds, resulting in an overall execution time of 'x' milliseconds.
For example:
-
d.d1 = 10;
operated on by Thread T1. -
d.d2 = 20;
operated on by Thread T2.
The Problem of False Sharing
However, false sharing can prevent these operations from executing truly in parallel. Even though d1
and d2
are independent variables, their proximity in memory can cause contention.
Cache Line Allocation
Let's assume a cache line size of 64 bytes. When an instance of the data
structure is created, d1
and d2
are likely allocated contiguously within the same cache line.
For instance:
-
Cache Line 1 (0-63 bytes): Contains
d1
andd2
. -
Cache Line 2 (64-127 bytes): Potentially contains other data.
Cache Invalidation and Serialization
When Thread T1 accesses d1
, the entire cache line containing d1
is fetched and owned by T1. This can invalidate the same cache line in the cache of Thread T2.
Subsequently, when Thread T2 tries to access d2
, it finds the cache line invalidated by T1. Even though T2 only needs d2
, it must wait for T1 to release the cache line. This forces the operations to occur sequentially, taking 2x milliseconds instead of the ideal x milliseconds. The access to d2
is blocked until T1 is finished with the cache line.
Impact on Performance
Due to false sharing, the operations are serialized, negating the benefits of multi-threading. The system performs slower than expected, even though it appears to be running in parallel. Although the variables d1
and d2
are independent, their placement within the same cache line creates an internal dependency.
Avoiding False Sharing
Ideally, d1
and d2
should be allocated in different cache lines. If Thread T1 accesses d1
in Cache Line 1, invalidating only that cache line, Thread T2 can still simultaneously access d2
in a separate Cache Line 2 without waiting. Allocating each variable to it's own cache line would allow the threads to execute the operations in parallel in x milliseconds.
Conclusion
False sharing can severely impact the performance of multi-threaded applications by creating unintended dependencies between variables. By understanding how memory is allocated and cache lines work, developers can take steps to avoid false sharing and optimize their code for true parallelism. Future discussions will cover specific techniques to ensure that variables accessed by different threads reside in separate cache lines.