Create the zero-ETL Pipeline

Amazon DynamoDB offers a zero-ETL integration with Amazon OpenSearch Service through the DynamoDB plugin for OpenSearch Ingestion. Amazon OpenSearch Ingestion offers a fully managed, no-code experience for ingesting data into Amazon OpenSearch Service.

  1. Open OpenSearch Service Ingestion Pipelines 

  2. Click “Create pipeline”

    Create pipeline

  3. Name your pipeline, and include the following for your pipeline configuration. The configuration contains multiple values that need to be updated. The needed values are provided in the CloudFormation Stack Outputs as “Region”, “Role”, “S3Bucket”, “DdbTableArn”, and “OSDomainEndpoint”.

      version: "2"
      dynamodb-pipeline:
        source:
          dynamodb:
            acknowledgments: true
            tables:
              # REQUIRED: Supply the DynamoDB table ARN
              - table_arn: "{DDB_TABLE_ARN}"
                stream:
                  start_position: "LATEST"
                export:
                  # REQUIRED: Specify the name of an existing S3 bucket for DynamoDB to write export data files to
                  s3_bucket: "{S3BUCKET}"
                  # REQUIRED: Specify the region of the S3 bucket
                  s3_region: "{REGION}"
                  # Optionally set the name of a prefix that DynamoDB export data files are written to in the bucket.
                  s3_prefix: "pipeline"
            aws:
              # REQUIRED: Provide the role to assume that has the necessary permissions to DynamoDB, OpenSearch, and S3.
              sts_role_arn: "{ROLE}"
              # REQUIRED: Provide the region
              region: "{REGION}"
        sink:
          - opensearch:
              hosts:
                  # REQUIRED: Provide an AWS OpenSearch endpoint, including https://
                [
                  "{OS_DOMAIN_ENDPOINT}"
                ]
              index: "product-details-index-en"
              index_type: custom
              template_type: "index-template"
              template_content: |
                {
                  "template": {
                    "settings": {
                      "index.knn": true,
                      "default_pipeline": "product-en-nlp-ingest-pipeline"
                    },
                    "mappings": {
                      "properties": {
                        "ProductID": {
                          "type": "keyword"
                        },
                        "ProductName": {
                          "type": "text"
                        },
                        "Category": {
                          "type": "text"
                        },
                        "Description": {
                          "type": "text"
                        },
                        "Image": {
                           "type": "text"
                        },
                        "combined_field": {
                          "type": "text"
                        },
                        "product_embedding": {
                          "type": "knn_vector",
                          "dimension": 1536,
                          "method": {
                            "engine": "nmslib",
                            "name": "hnsw",
                            "space_type": "l2"
                          }
                        }
                      }
                    }
                  }
                }            
              aws:
                # REQUIRED: Provide the role to assume that has the necessary permissions to DynamoDB, OpenSearch, and S3.
                sts_role_arn: "{ROLE}"
                # REQUIRED: Provide the region
                region: "{REGION}"
    
  4. Under Network, select “Public access”, then click “Next”.

    Create pipeline

  5. Click “Create pipeline”.

    Create pipeline

  6. Wait until the pipeline has finished creating. This will take 5 minutes or more.

After the pipeline is created, it will take some additional time for the initial export from DynamoDB and import into OpenSearch Service. After you have waited several more minutes, you can check if items have replicated into OpenSearch by making a query in Dev Tools in the OpenSearch Dashboards.

To open Dev Tools, click on the menu in the top left of OpenSearch Dashboards, scroll down to the Management section, then click on Dev Tools. Enter the following query in the left pane, then click the “play” arrow.

GET /product-details-index-en/_search

You may encounter a few types of results:

  • If you see a 404 error of type index_not_found_exception, then you need to wait until the pipeline is Active. Once it is, this exception will go away.
  • If your query does not have results, wait a few more minutes for the initial replication to finish and try again.

Create pipeline

Only continue once you see a return like the above, with a response body. Your hits may vary.