sudo su - ec2-user
shopt login_shell
to make sure it returns login_shell oncd ~/workshop
python --version
aws --version
To practice error-free workshop, make sure that AWS CLI version is 1.18.139 and python version is 3.6.12
python
After typing the above command, use the following code:
# Run this code:
import boto3
ddb = boto3.client('dynamodb')
ddb.describe_limits()
Results are displayed as below:
{u'TableMaxWriteCapacityUnits': 40000, u'TableMaxReadCapacityUnits': 40000, u'AccountMaxReadCapacityUnits': 80000, 'ResponseMetadata': {'RetryAttempts': 0, 'HTTPStatusCode': 200, 'RequestId': 'BFMGASJAPP5NU223Q : {'x-amzn-requestid': 'BFMGAS4P48I3DJTP5NU22QRDDJVV4KQNSO5AEMVJF66Q9ASUAAJG', 'content-length': '143', 'server': 'Server', 'connection': 'keep-alivez':-crc32- '3062975651', 'date': 'Tue, 31 Dec 2020 00:00:00 GMT', 'content-type': 'application/x-amz-json-1.0'}}, u'AccountMaxWriteCapacityUnits': 80000}
cd /home/ec2-user/workshop
ls -l .
Contents that must be in the workshop directory include Python code:
ddbreplica_lambda.py
load_employees.py
load_invoice.py
load_logfile_parallel.py
load_logfile.py
lab_config.py
query_city_dept.py
query_employees.py
query_index_invoiceandbilling.py
query_invoiceandbilling.py
query_responsecode.py
scan_for_managers_gsi.py
scan_for_managers.py
scan_logfile_parallel.py
scan_logfile_simple.py
JSON: gsi_city_dept.json
gsi_manager.json
iam-role-policy.json
iam-trust-relationship.json
Text: ddb-replication-role-arn.txt
ls -l ./data
The following results:
employees.csv
invoice-data2.csv
invoice-data.csv
logfile_medium1.csv
logfile_medium2.csv
logfile_small1.csv
logfile_stream.csv
We will be working with a lot of different data throughout the exercise, which are:
Server Logs data
Employees data
Invoices and Bills data
Data structure of Server Logs file includes:
requestid (number)
host (string)
date (string)
hourofday (number)
timezone (string)
method (string)
url (string)
responsecode (number)
bytessent (number)
useragent (string)
To view a sample record in the file, use the command:
head -n1 ./data/logfile_small1.csv
The following results:
1,66.249.67.3,2017-07-20.20,GMT-0700,GET,"/gallery/main.php?g2_controller=exif.SwitchDetailMode&g2_mode=detailed&g2_return=%2Fgallery%2Fmain.php%3Fg2_itemId%3D15741&g2_returnName=photo", 302.5,"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
employeeid (number)
name (string)
title (string)
dept (string)
city (string)
state (string)
dob (string)
hire-date (string)
previous title (string)
previous title end date (string)
is a manager (string), 1 for employees is Manage employees, and non-existent for the rest of the employee types
To view a sample record in the file, use the command:
head -n1 ./data/employees.csv
**The following results: **
1,Onfroi Greeno,Systems Administrator,Operation,Portland,OR,1992-03-31,2014-10-24,Application Support Analyst,2014-04-12
All instructions must be executed in the shell window of the EC2 Instance, not on the local machine.
aws dynamodb create-table --table-name logfile_scan \
--attribute-definitions AttributeName=PK,AttributeType=S AttributeName=GSI_1_PK,AttributeType=S AttributeName=GSI_1_SK,AttributeType=S \
--key-schema AttributeName=PK,KeyType=HASH \
--provisioned-throughput ReadCapacityUnits=5000,WriteCapacityUnits=5000 \
--tags Key=workshop-design-patterns,Value=targeted-for-cleanup \
--global-secondary-indexes "IndexName=GSI_1,\
KeySchema=[{AttributeName=GSI_1_PK,KeyType=HASH},{AttributeName=GSI_1_SK,KeyType=RANGE}],\
Projection={ProjectionType=KEYS_ONLY},\
ProvisionedThroughput={ReadCapacityUnits=3000,WriteCapacityUnits=5000}"
Kết quả lệnh tạo ra một bảng mới có tên logfile_scan và một GSI, cụ thể:
Key schema: HASH
Table RCU = 5000
Table WCU = 5000
GSI(s): GSI_1 (3000 RCU, 5000 WCU) - cho phép quét dữ liệu nhật ký truy cập theo kiểu tuần tự hoặc song song. Sắp xếp theo status code và timestamp
Tên Thuộc tính (Loại) | Mô tả | Trường hợp sử dụng | Ví dụ Giá trị Thuộc tính | |
---|---|---|---|---|
PK (STRING) | Hash key | Thông tin request id phục vụ công tác kiểm tra nhật ký truy cập | request#104009 | |
GSI_1_PK (STRING) | GSI 1 hash key | Là một shard key, với các giá trị từ 0-N, phục vụ công tác tìm kiếm bản ghi nhật ký | shard#3 | |
GSI_1_SK (STRING) | GSI 1 sort key | Sắp xếp các bản ghi nhật ký theo thứ bậc, từ status code -> date -> hour | 200#2019-09-21#01 |
aws dynamodb wait table-exists --table-name logfile_scan
nohup python load_logfile_parallel.py logfile_scan &
disown
The load command creates a background process and takes about 10 minutes to complete.
pgrep -l python
to check that the data is still loaded into the table.