This exercise demonstrates the two methods of DynamoDB table scanning: sequential and parallel.
Even though DynamoDB distributes items across multiple physical partitions, a Scan
operation can only read one partition at a time. To learn more, read our documentation page on partitions and data distribution . For this reason, the throughput of a Scan
is constrained by the maximum throughput of a single partition.
In order to maximize the utilization of table-level provisioning, use a parallel Scan
to logically divide a table (or secondary index) into multiple logical segments, and use multiple application workers to scan these logical segments in parallel. Each application worker can be a thread in programming languages that support multithreading or an operating system process. To learn more about how to implement a parallel scan, read our developer documentation page on parallel scans . The Scan
API is not suitable for all query patterns, and for more information on why scans are less efficient than queries please read about the performance implications of Scan
in our documentation.
The following diagram shows how a multithreaded application performs a parallel Scan
with three application worker threads. The application spawns three threads and each thread issues a Scan
request, scans its designated segment, retrieving data 1 MB at a time, and returns the data to the main application thread.