Ingestion from AWS S3
Stigg is able to automatically collect events for processing as they arrive into an S3 bucket.
Using an S3 bucket as the data source of events can be useful in the following cases:
- Unloading data from a data warehouse into an S3 bucket.
- Report events in very large volume.
There are many ways to write to S3. You can use AWS S3 SDK, AWS Glue, Logstash, Fluentd and other tools.
Event format
This S3 integration requires that events are in the shape of the event API schema, the format of the files can be either .jsonl
, .json
or .csv
.
For fields that you want to include as part of the dimensions
object, just add the dimensions.
prefix, for example dimensions.storage_bytes
.
JSONL Example
{"idempotencyKey": "8c1cc9f1-5a04-4405-9eb9-daffdcf96e83", "eventName": "storage", "customerId": "customer-demo-1", "timestamp": "2023-04-03T09:56:50.902462Z", "dimensions.region": "us-east-2", "dimensions.storage_bytes": 1000}
{"idempotencyKey": "9d14d2b7-48a7-4e89-9aa0-fe5a66801e16", "eventName": "storage", "customerId": "customer-demo-1", "timestamp": "2023-04-04T09:56:50.902462Z", "dimensions.region": "us-east-1", "dimensions.storage_bytes": 300}
{"idempotencyKey": "f3b4973d-0d5f-4097-92c6-b673e0bcd0a6", "eventName": "storage", "customerId": "customer-demo-1", "timestamp": "2023-04-05T09:56:50.902462Z", "dimensions.region": "us-west-1", "dimensions.storage_bytes": 450}
CSV Example
"idempotencyKey","eventName","customerId","timestamp","dimensions.region","dimensions.storage_bytes"
"8c1cc9f1-5a04-4405-9eb9-daffdcf96e83","storage","customer-demo-1","2023-04-03T09:56:50.902462Z","us-east-2","1000"
"9d14d2b7-48a7-4e89-9aa0-fe5a66801e16","storage","customer-demo-1","2023-04-04T09:56:50.902462Z","us-east-1","300"
"f3b4973d-0d5f-4097-92c6-b673e0bcd0a6","storage","customer-demo-1","2023-04-05T09:56:50.902462Z","us-west-1","450"
Integration setup
Create 2 separate S3 buckets:
- S3 bucket for raw events - This bucket will include the raw events and Stigg automatically pull events from.
- S3 Dead-letter bucket - in case of ingestion failure due to parsing or validation error, Stigg will write the events back to this S3 bucket.
Grant Stigg cross-account access to S3 bucket - our team will provide you with the IAM role which will be used for Stigg to access the S3 buckets.
Policy for raw events S3 bucket:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<STIGG_ACCOUNT_ID>:role/<STIGG_PROVISIONED_ROLE>"
},
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<RAW_EVENTS_BUCKET>"
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<STIGG_ACCOUNT_ID>:role/<STIGG_PROVISIONED_ROLE>"
},
"Action": [
"s3:GetObject",
"s3:GetObjectAcl"
],
"Resource": "arn:aws:s3:::<RAW_EVENTS_BUCKET>/*"
}
]
}
Policy for dead-letter S3 bucket:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<STIGG_ACCOUNT_ID>:role/<STIGG_PROVISIONED_ROLE>"
},
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<RAW_EVENTS_DLQ_BUCKET>"
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::<STIGG_ACCOUNT_ID>:role/<STIGG_PROVISIONED_ROLE>"
},
"Action": [
"s3:GetObject",
"s3:GetObjectAcl",
"s3:PutObject",
"s3:PutObjectAcl"
],
"Resource": [
"arn:aws:s3:::<RAW_EVENTS_DLQ_BUCKET>/*"
]
}
]
}
Stigg uses S3 event notifications in order to receive notifications when objects are created in the S3 bucket.
Once you finish setting up the policy above, Stigg will add permissions to an SQS queue to allow S3 to write to it.
In order to get notifications from S3 bucket, the s3:objectCreated:*
event type must be configured via the AWS console.
This SQS queue will be in the same region as your bucket, so Stigg will provide you with the provisioned SQS ARN once the bucket region is known.
Updated 8 months ago