Reminder: All commands are executed in the shell console connected to the EC2 instance, not your local machine. (If you are not sure you can always validate going back to step 1)
In this step, let’s populate the table with 1 million items in preparation for that exercise.
Run the command to create a new table:
aws dynamodb create-table --table-name logfile_scan \
--attribute-definitions AttributeName=PK,AttributeType=S AttributeName=GSI_1_PK,AttributeType=S AttributeName=GSI_1_SK,AttributeType=S \
--key-schema AttributeName=PK,KeyType=HASH \
--provisioned-throughput ReadCapacityUnits=5000,WriteCapacityUnits=5000 \
--tags Key=workshop-design-patterns,Value=targeted-for-cleanup \
--global-secondary-indexes "IndexName=GSI_1,\
KeySchema=[{AttributeName=GSI_1_PK,KeyType=HASH},{AttributeName=GSI_1_SK,KeyType=RANGE}],\
Projection={ProjectionType=KEYS_ONLY},\
ProvisionedThroughput={ReadCapacityUnits=3000,WriteCapacityUnits=5000}"
This command will create a new table and one GSI with the following definition:
Attribute name (Type) | Special attribute? | Attribute use case | Sample attribute value |
---|---|---|---|
PK (STRING) | Hash key | Holds the request id for the access log | request#104009 |
GSI_1_PK (STRING) | GSI 1 hash key | A shard key, with values 0-N, to allow log searches | shard#3 |
GSI_1_SK (STRING) | GSI 1 sort key | Sorts the logs hierarchically, from status code -> date -> hour | 200#2019-09-21#01 |
Run the command to wait until the table becomes Active:
aws dynamodb wait table-exists --table-name logfile_scan
Run the following command to load the server logs data into the logfile_scan table. It will load 1,000,000 rows to the table.
nohup python load_logfile_parallel.py logfile_scan &
disown
nohup
is used to run the process in the background, and disown
allows the load to continue in case you are disconnected.
The following command will take about ten minutes to complete. It will run in the background.
Run pgrep -l python
to verify the script is loading data in the background.
pgrep -l python
Output:
3257 python
The process id - the 4 digit number in the above example - will be different for everyone.
The script will continue to run in the background while you work on the next exercise.
You have completed the SETUP!