Ingestion from S3 Bucket
Overview
File Ingestion from S3 bucket is one more way for ingesting file from different source(AWS S3).
From the perspective of the file contents processing this will be the same as for local file based ingestion but the source of the file is different.
The following different types of files are supported: XML, TXT, JSON.
S3 Client Credential Configurations
Credentials Overview
When interacting with AWS, AWS security credentials must be specified to verify identity and permissions to access the requested resources.AWS uses these security credentials to authenticate and authorize requests.
For example, to download a protected file from an Amazon Simple Storage Service (Amazon S3) bucket, credentials must allow that access.If the credentials do not authorize the download, AWS denies the request.
There are different types of users in AWS, and all AWS users have security credentials.These users include the account owner (root user), users in AWS IAM Identity Center, federated users, and IAM users.
Users have either long-term or temporary security credentials. Root users, IAM users, and access keys have long-term security credentials that do not expire. To protect long-term credentials, processes should be in place to manage access keys, change passwords, and enable MFA.
AWS access keys are provided to make programmatic calls to AWS or to use the AWS Command Line Interface or AWS Tools for PowerShell. Using short-term access keys is recommended when possible.
The S3 client supports three modes of access management:
1. Static Credentials
When a long-term access key is created, an access key ID (for example, AKIAIOSFODNN7EXAMPLE) and a secret access key (for example, wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY) are created as a set. The secret access key is available for download only at the time of creation. If the secret access key is not downloaded or is lost, a new one must be created.
Configuration:
# Explicit configuration (optional)
ipf.file-manager.s3.credentials-provider = static
# Required credentials
ipf.file-manager.s3.credentials.access-key-id = <your-access-key>
ipf.file-manager.s3.credentials.secret-access-key = <your-secret-key>
Characteristics: * Uses long-term access credentials * Credentials are stored in configuration * Simple to set up but less secure for production environments * Suitable for development and testing
2. STS Web Identity Token Authentication
In many scenarios, long-term access keys that never expire are not needed. Instead, IAM roles and temporary security credentials can be created. Temporary security credentials include an access key ID and a secret access key, as well as a security token that indicates when the credentials expire. After expiration, the credentials are no longer valid.
Configuration:
ipf.file-manager.s3.credentials-provider = sts-web-identity
Configuration: For detailed guidance on configuring AWS STS Web Identity Token Authentication, refer to the official AWS documentation:
Credential Configuration (Chooses One Method):
Depending on your environment, you can configure Web Identity Token Authentication in several ways. One common method is via environment variables:
Option A: Environment Variables (commonly used in Kubernetes, EKS, CI/CD) Set the following:
AWS_ROLE_ARN – The IAM role to assume
AWS_WEB_IDENTITY_TOKEN_FILE – Path to the web identity token file
AWS_REGION – AWS region for operations (see docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html)
Option B: AWS CLI/SDK Configuration Files You can configure role assumption via the AWS config and credentials files. (See docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html)
Option C: EKS IAM Roles for Service Accounts (IRSA) When using EKS with IRSA, credentials are automatically provided to your pods — no environment variables or files are required. (See docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html)
Characteristics:
-
Uses IAM roles instead of static credentials
-
Credentials are temporary and automatically rotated
-
No access keys stored in the service
-
Recommended for production environments
3. AWS Default Credentials Provider Chain (Default Mode)
-
This mode uses the AWS SDK’s default credential provider chain, which automatically searches for credentials in multiple locations such as:
-
Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, etc.)
-
Java system properties
-
Default credential profiles (~/.aws/credentials)
-
IAM roles attached to EC2 instances or ECS tasks
-
Configuration
If ipf.file-manager.s3.credentials-provider is not specified this AWS Default Credentials Provider Chain will be used.
Characteristics:
-
Automatically discovers credentials from multiple sources
-
Simplifies credential management in diverse environments
Common Configuration
Regardless of the authentication mode, these properties are always applicable:
ipf.file-manager.s3.enabled = true
ipf.file-manager.s3.endpoint-url = https://s3.eu-west-1.amazonaws.com
ipf.file-manager.s3.path-style-requests = false
Roles
The only AWS role required is s3:GetObject because only file retrieval from an S3 bucket is necessary (more info docs.aws.amazon.com/AmazonS3/latest/userguide/security_iam_service-with-iam.html).
Using versioned files
If the same file exists with different versions, the file with the latest version is picked up by default. To pick up a specific version, the query parameter versionId must be included in the request.
Example of the input message with specific version of the file:
| Key | Value |
|---|---|
requestId |
bicdir2018-req001 |
fileProvider |
s3 |
filePath |
s3://test-bucket.s3.amazonaws.com/BICDIR2018_V1_FULL.txt?versionId=HKPnHafHX2ufQ-_faJX-dw |
fileName |
BICDIR2018_V1_FULL.txt |
sendAcknowledgment |
true |
Additional Configuration
For detailed configuration instructions and setup guidelines, please refer to S3 File Manager.