The GSI Sparse Index table is used to locate items that contain uncommon attributes - that is, attributes that do not appear in all items in the table. Querying items with uncommon attributes can be more efficient since the number of items in the Index table is significantly less than the number of items in the Base table. In addition, the fewer attributes that project from the base table to the index table, the less read and write space is consumed by the index table.
The new GSI Index table will use the is_manager attribute as the partition key because not all employees are managers and this attribute does not appear on all items. This key is stored under an attribute named GSI_2_PK. The employee’s title is stored as the soft key GSI_2_SK.
aws dynamodb update-table --table-name employees \
--attribute-definitions AttributeName=GSI_2_PK,AttributeType=S AttributeName=GSI_2_SK,AttributeType=S \
--global-secondary-index-updates file://gsi_manager.json
aws dynamodb describe-table --table-name employees --query "Table.GlobalSecondaryIndexes[].IndexStatus"
Sparse index will help to shred your data into smaller pieces with corresponding indexes for more efficient Data Scraping or Query API. Instead of scanning the entire data in a DynamoDB Base table, you can create an index that holds only a portion of the information for easy querying and searching.
To better understand how to split the Base table into smaller tables with Sparse Index, you can refer to the article Take Advantage of Sparse Indexes /developerguide/bp-indexes-general-sparse-indexes.html)
First, we will scan the data table to find all managers without using the index table.
python scan_for_managers.py employees 100
In which, employees is the table name, 100 is the number of items per scan. The result of the scan operation is similar to the following:
Managers count: 84. # of records scanned: 4000. Execution time: 0.596132993698 seconds
Use the Sparse index table created in step 1 to perform a data scan of the Employees table. The code that uses the Sparse index described in the script that runs the scan command is as follows:
response = table.scan(
Limit=pageSize,
IndexName='GSI_2'
)
Executing the data scan command
python scan_for_managers_gsi.py employees 100
In which, employees is the table name, 100 is the number of items of each scan. The results show that the number of items scanned and the execution time is much smaller than the case without using Sparse index.
Number of managers: 84. # of records scanned: 84. Execution time: 0.287754058838 seconds