Global Secondary Index Key Overloading

You can create 20 GSI indexes for DynamoDB tables as of the time we build this workshop. However, sometimes our application may need to support more access patterns than that and exceed the current limit of per-table GSI index tables. From this, the GSI Key Overloading index table design pattern is generated by selecting and reusing an attribute (column header) among the items and storing the value of that attribute depending on the context of the attribute type. .

When you create a GSI Index table on that property, you are indexing multiple access patterns, each for a different item type — and using only 1 GSI Index table. For example, with the Employees table, an employee can contain items of type metadata (for details about the employee), employee-title (all job titles that the employee has held), or employee-location ( all office buildings and locations where the employee worked).

The access patterns required for this scenario include:

  • Query all employees of a table

  • Query all employees with a specific title currently

  • Query all employees who used to have a specific title

  • Query all employees by name

The figure below shows the design of Employees table. Attribute PK (Parition key) has a value that is a combination of employeeID with the prefix e. The pound sign (#) is placed between the entity type identifier (e) and the actual ID of the employee employeeID. The SK (Soft key) attribute acts as an overloaded attribute and its value can be current title, previous title, or the root keyword indicating this is a primary item of the Employee table, which has values ​​in the most important attributes The GSI_1_PK attribute includes the employee’s title or name. Reusing the GSI Index table for multiple entity types such as employees, employee locations, and employee titles allows us to simplify DynamoDB table management because we only need to track and lose cost for one Index table sub GSI instead of three separate indexes.

Global Secondary Index Write Sharding

  1. Create table Employees
  • Execute the following command to create the Employees table
aws dynamodb create-table --table-name employees \
--attribute-definitions AttributeName=PK,AttributeType=S AttributeName=SK,AttributeType=S \
AttributeName=GSI_1_PK,AttributeType=S AttributeName=GSI_1_SK,AttributeType=S \
--key-schema AttributeName=PK,KeyType=HASH AttributeName=SK,KeyType=RANGE \
--provisioned-throughput ReadCapacityUnits=100,WriteCapacityUnits=100 \
--tags Key=workshop-design-patterns,Value=targeted-for-cleanup \
--global-secondary-indexes "IndexName=GSI_1,\

KeySchema=[{AttributeName=GSI_1_PK,KeyType=HASH},{AttributeName=GSI_1_SK,KeyType=RANGE}],\
Projection={ProjectionType=ALL},\
ProvisionedThroughput={ReadCapacityUnits=100,WriteCapacityUnits=100}"

Global Secondary Index Write Sharding

  1. Run a pending command until the table state changes to ACTIVE
aws dynamodb wait table-exists --table-name employees

The Base table is named employees

The table’s Partition key is the PK attribute and it contains the employee ID value.

The Sort key of the Base table is the SK attribute, with the value being processed in the script.

The Index table named GSI_1 - is the GSI Overloading Index table used in the next step

The Partition key of the GSI Index table is GSI_1_PK, and is equal to the value of the SK attribute in the Base table.

Sort key of table GSI Index is GSI_1_SK whose value is Employee’s Name, attribute Name

  • Key schema: HASH, RANGE (partition and sort key)

  • Table read capacity units (RCUs) = 100

  • Table write capacity units (WCUs) = 100

  • Global secondary index: GSI_1 (100 RCUs, 100 WCUs) - Allows querying by host’s IP address

Attribute Name (Type) Special Attribute Attribute Use Case Sample Attribute Value
PK (STRING) Partition Key Employee ID e#129
SK (STRING) Sort key Derived value root, state#MI
GSI_1_PK (STRING) GSI_1 partition key Derived value root, state#MI
GSI_1_SK (STRING) GSI_1 sort key Employee name Christine Milsted

Global Secondary Index Write Sharding

  1. In the Table interface of DyanmoDB
  • Table employees has been created successfully, status Active

Global Secondary Index Write Sharding

  1. Load data into the table
  • Perform data loading into the table with the following command
python load_employees.py employees ./data/employees.csv
  • Sample records in employees.csv file have the following form:
1000,Nanine Denacamp,Programmer Analyst,Development,San Francisco,CA,1981-09-30,2014-06-01,Senior Database Administrator,2014-01-25

Global Secondary Index Write Sharding

  1. When pushing data to the table, some properties are concatenated, such as City_Dept (Example: San Francisco: Development), because they match the access patterns in the query. The SK attribute is also an attribute that will be processed in the script. The concatenation is handled by Python scripts, then assembles into a complete record and finally executes the put_item() command to write the record to the table.

The output of the load command is similar to the following:

python load_employees.py employees ./data/employees.csv
employee count: 100 in 3.7393667697906494
employee count: 200 in 3.7162938117980957
...
employee count: 900 in 3.6725080013275146
employee count: 1000 in 3.6174678802490234
RowCount: 1000, Total seconds: 36.70457601547241

The output shows that 1000 records have been written to the table. Go to DynamoDB Console, and then click view employees table.

Global Secondary Index Write Sharding

  1. In the DynamoDB interface, select the employees table
  • Select Scan
  • Select GSI_1 index
  • Select Run

Global Secondary Index Write Sharding

  1. Data scan results according to GSI_1

Global Secondary Index Write Sharding

  1. Full table scan results

Global Secondary Index Write Sharding

  1. Query the Employees table using a Global Secondary Index with overloaded properties

Execute a query to find all employees working at Washington headquarters by running the script query_employees.py, which contains the code to query the table using an overloaded global secondary index

python query_employees.py employees state 'WA'

Query command results

List of employees with WA in the attribute state:
    Employee name: Alice Beilby - hire date: 2014-12-03
    Employee name: Alla Absalom - hire date: 2015-06-25
    Employee name: Alvan Heliar - hire date: 2016-05-15
    Employee name: Anders Galtone - hire date: 2015-12-22
    Employee name: Ashil Hutchin - hire date: 2015-02-11
  ...
  Employee name: Sula Prattin - hire date: 2014-01-11
    Employee name: Vittoria Edelman - hire date: 2014-10-01
    Employee name: Willie McCuthais - hire date: 2015-05-27
Total of employees: 46. Execution time: 0.13477110862731934 seconds

Global Secondary Index Write Sharding

  1. You can also query employees by their title, for example
python query_employees.py employees current_title 'Software Engineer'

Query results

 List of employees with Software Engineer in the attribute current_title:
    Employee name: Alice Beilby - hire date: 2014-11-03
    Employee name: Anetta Byrne - hire date: 2017-03-15
    Employee name: Ardis Panting - hire date: 2015-08-06
    Employee name: Chris Randals - hire date: 2016-10-27
    Employee name: Constantine Barendtsen - hire date: 2016-06-10
    Employee name: Eudora Janton - hire date: 2015-01-05
    Employee name: Florella Allsep - hire date: 2015-03-31
    Employee name: Horatius Trangmar - hire date: 2013-10-21
    Employee name: Korey Daugherty - hire date: 2016-11-03
    Employee name: Lenka Luquet - hire date: 2014-10-01
    Employee name: Leonora Hyland - hire date: 2016-06-14
    Employee name: Lucretia Ruffell - hire date: 2015-07-04
    Employee name: Malcolm Adiscot - hire date: 2014-04-17
    Employee name: Melodie Sebire - hire date: 2013-08-27
    Employee name: Menard Ogborn - hire date: 2014-06-27
    Employee name: Merwyn Petters - hire date: 2014-06-19
    Employee name: Niels Buston - hire date: 2014-10-30
    Employee name: Noelani Studde - hire date: 2015-03-30
Total of employees: 18. Execution time: 0.11937260627746582 seconds

Global Secondary Index Write Sharding

  1. In addition, you can also query by employee name as follows
python query_employees.py employees name 'Dale Marlin'

Query results

 List of employees with Dale Marlin in the attribute name:
    Employee name: Dale Marlin - hire date: 2014-10-19
Total of employees: 1. Execution time: 0.1274700164794922 seconds

Global Secondary Index Write Sharding