First, execute a simple (sequential) Scan
to calculate the total bytes sent for all records with response code <> 200
. You can run a Scan
with a filter expression to filter out unrelated records. The application worker will then sum the values of all the returned records where response code <> 200
.
The Python file, scan_logfile_simple.py
, includes the command to run a Scan
with a filter expression and then calculate the sum of bytes sent.
The following code block scans the table.
fe = "responsecode <> :f"
eav = {":f": 200}
response = table.scan(
FilterExpression=fe,
ExpressionAttributeValues=eav,
Limit=pageSize,
ProjectionExpression='bytessent')
You can review the file on your own with vim ~/workshop/scan_logfile_simple.py
. Type :q
and hit enter to exit vim.
Notice that there is a Limit
parameter set in the Scan
command. A single Scan
operation will read up to the maximum number of items set (if using the Limit
parameter) or a maximum of 1 MB of data, and then apply any filtering to the results by using FilterExpression
. If the total number of scanned items exceeds the maximum set by the limit parameter or the data set size limit of 1 MB, the scan stops and results are returned to the user as a LastEvaluatedKey
value. This value can be used in a subsequent operation so that you can pick up where you left off.
In the following code, the LastEvaluatedKey
value in the response is passed to the Scan
method via the ExclusiveStartKey
parameter.
while 'LastEvaluatedKey' in response:
response = table.scan(
FilterExpression=fe,
ExpressionAttributeValues=eav,
Limit=pageSize,
ExclusiveStartKey=response['LastEvaluatedKey'],
ProjectionExpression='bytessent')
for i in response['Items']:
totalbytessent += i['bytessent']
When the last page is returned, LastEvaluatedKey
is not part of the response, so you know that the scan is complete.
Now, execute this code.
python scan_logfile_simple.py logfile_scan
Parameters: Tablename: logfile_scan
The output will look like the following.
Scanning 1 million rows of table logfile_scan to get the total of bytes sent
Total bytessent 6054250 in 16.325594425201416 seconds
Make a note of the time it took for the scan to complete. With this exercise’s dataset, a parallel scan will complete faster than the sequential scan.