Concurrency vs. Parallelism: A Developer's Guide
Many developers struggle with understanding concurrency. This article will explain concurrency, how it differs from parallelism, and provide concrete examples to help you grasp these concepts.
The Interview That Sparked This Explanation
The author recounts a traumatic interview experience where they were asked to explain the difference between concurrency and parallelism and drew a blank. This embarrassing moment motivated them to create this guide, ensuring no one else faces the same confusion.
Defining Concurrency
Concurrency refers to a program's ability to manage multiple tasks by interleaving their execution. These tasks progress independently, being paused and resumed as needed. However, they don't necessarily run at the same physical instant. This interleaving is achieved by sharing a single CPU core through time slicing or voluntary yielding. Multiple logical threads of control will share a single CPU core.
Defining Parallelism
In contrast, parallelism involves the truly simultaneous execution of multiple tasks. This means tasks are running at the exact same time, using separate CPU cores or processors.
Concurrency vs. Parallelism: Key Differences
-
Single CPU vs. Multiple CPUs: Concurrency can be achieved on a single CPU by rapidly switching between tasks, while true parallelism requires multiple processors or cores.
-
Interleaved vs. Simultaneous Execution: Concurrency involves interleaving tasks, while parallelism executes them simultaneously.
-
JavaScript and Machine Learning Examples: A single-threaded event loop in JavaScript can handle multiple I/O events concurrently, while a machine learning model can train in parallel across multiple cores/GPUs.
Concurrency and I/O-Bound Workloads
Concurrency is particularly effective for I/O-bound workloads. These workloads spend a significant amount of time waiting for external resources like disk reads, network responses, or database queries. During these wait periods, the CPU would normally remain idle. With concurrency, multiple tasks can share the same thread or core, allowing one task to progress while another blocks on I/O. This improves CPU utilization without needing more hardware.
Consider a Python example with threading. Even though threads T1 and T2 run on the same CPU core due to the Global Interpreter Lock (GIL), the time.sleep()
function simulates an I/O delay. When time.sleep()
is called, the active thread yields the CPU, allowing the other thread to run, illustrating interleaved execution.
Parallelism and CPU-Bound Workloads
Parallelism is preferable for CPU-bound tasks. In these cases, performance is limited by the computation itself, rather than waiting for I/O. Examples include compressing video, training deep learning models, or computing fractals. These tasks involve large amounts of math that can be parallelized.
Dividing the work across CPU cores or using SIMD (Single Instruction, Multiple Data) instructions allows multiple instructions to run simultaneously. This is implemented with multi-core processors, vector execution units, and GPU cores. A video encoder splitting a frame into chunks processed in parallel or a neural network training pipeline dispatching matrix operations to GPU cores are practical examples. Parallelism increases throughput by scaling out work across independent compute units, unlike concurrency, which helps hide latency.
Consider a multiprocessing example in Python. Each process (P1 and P2) runs in its own Python interpreter and is assigned to a different CPU core. Tasks run simultaneously and are not just interleaved since they have separate memory spaces, and there's no GIL limitation.
Task Scheduling: Preemptive vs. Cooperative
The way tasks share execution time is governed by the scheduler, which can be preemptive or cooperative.
Preemptive Scheduling
In preemptive scheduling, used by most operating systems, the scheduler forcibly interrupts running tasks after a time slice to give others a chance to run. This enables time-sliced concurrency and ensures responsiveness, even if tasks don't yield voluntarily. However, preemptive switching incurs overhead due to register saving, stack swapping, and potential cache invalidation. It also makes concurrent code harder to reason about due to unpredictable interleaving.
The OS decides when to switch tasks, typically using time slices in preemptive concurrency. Python, backed by the OS, automatically preempts threads.
Cooperative Scheduling
Cooperative scheduling relies on voluntary yielding. Tasks must explicitly yield control back to the scheduler. This model is used in user-based concurrency frameworks like async/await
in Python and JavaScript, or Go routines in Go. It has lower overhead because switches occur only at known points and is easier to reason about since task switches are deterministic. However, if a task forgets to yield, it can block the entire system, which is a frequent bug in asynchronous programming.
With async/await
, tasks must explicitly yield control using await
. The event loop then schedules the next coroutine to run. This uses no kernel threads; it's all user space logic.
The Cost of Concurrency: Context Switching
Concurrent programs must manage context switching, where the system saves the state of one task to let another one run. The overhead of this process is a critical factor in performance.
-
Traditional Thread-Based Systems: Context switching is managed by the operating system kernel and can be costly. It requires transitioning from user mode to kernel mode and back, adding significant latency.
-
Coroutine-Based Systems (async/await): Context switches happen entirely in user space, managed by the application's runtime or scheduler. This makes it possible to switch between tens of thousands of coroutines far more efficiently than between threads.