Ingestion from S3 Bucket
Overview
File Ingestion from S3 bucket is one more way for ingesting file from different source(AWS S3).
From the perspective of the file contents processing this will be the same as for local file based ingestion but the source of the file is different.
The file processing is triggered via a Kafka notification and the following two integration points of key for processing:
-
File available - When file is available for processing then the bank system uploading the file MUST send a Kafka notification to reachability. This is processed by the file ingester connector, notifying that a file is available for processing in an S3 bucket.
-
File processed - When file is finished being processed a File Processed Notification message is sent to File Processed topic (if sendAcknowledgement requested), notifying bank system that the file has finished processing (with a clear indication of the status)
The following different types of files are supported: XML, TXT, JSON.
Notification format and mapping errors to OutcomeDescription
Currently, there are 2 types of notifications:
-
File Available Notification message - Notifying the reachability file ingester that a file is available for processing
-
File Processed Notification message - Notifying the bank system that a files processing has been finished (with a clear indication of the status)
Format of notifications is next:
Notification format for File Available Notification message
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"requestId": {
"type": "string",
"description": "A unique identifier of the notification request"
},
"fileProvider": {
"type": "string",
"description": "Indicates which File Operations Adapter to use, e.g. S3, EFC"
},
"filePath": {
"type": "string",
"description": "The absolute path of the file, for S3 should be S3 URL, see https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html#API_GetObject_RequestSyntax"
},
"fileName": {
"type": "string",
"description": "The name of the file, must always begin with the file type, e.g. BANKDIRECTORYPLUS_V3_FULL_foo_bar.xml"
},
"uploadedAt": {
"type": "string",
"format": "date-time",
"description": "A timestamp indicating when the file has been uploaded"
},
"sendAcknowledgment": {
"type": "boolean",
"description": "Whether an acknowledgment of file processing needs to be sent back"
}
},
"required": [
"requestId",
"fileProvider",
"filePath",
"fileName",
"uploadedAt",
"sendAcknowledgment"
]
}
Notification format for File Processed Notification message
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"requestId": {
"type": "string",
"description": "A unique identifier of the notification request being acknowledged"
},
"fileProvider": {
"type": "string",
"description": "Copied from input"
},
"filePath": {
"type": "string",
"description": "Copied from input"
},
"fileName": {
"type": "string",
"description": "Copied from input"
},
"processingFinishedAt": {
"type": "string",
"format": "date-time",
"description": "A timestamp indicating when the file has finished processing"
},
"outcomeCode": {
"type": "string",
"description": "A code describing the outcome, list TBD"
},
"outcomeDescription": {
"type": "string",
"description": "A textual description of the code"
}
},
"required": [
"requestId",
"fileProvider",
"filePath",
"fileName",
"processingFinishedAt",
"outcomeCode",
"outcomeDescription"
]
}
When outcomeCode is SUCCESS outcome description is Success and when outcomeCode is FAILED then outcomeDescription is exception message explaining the error that occurred.
Acknowledgement
When File Available Notification message is received, within the message there is a boolean sendAcknowledgement.The File Processed Notification message will or will not be sent based on that value after file processing is done.
S3 Client Credential Configurations
Credentials Overview
When interacting with AWS, AWS security credentials must be specified to verify identity and permissions to access the requested resources.AWS uses these security credentials to authenticate and authorize requests.
For example, to download a protected file from an Amazon Simple Storage Service (Amazon S3) bucket, credentials must allow that access.If the credentials do not authorize the download, AWS denies the request.
There are different types of users in AWS, and all AWS users have security credentials.These users include the account owner (root user), users in AWS IAM Identity Center, federated users, and IAM users.
Users have either long-term or temporary security credentials. Root users, IAM users, and access keys have long-term security credentials that do not expire. To protect long-term credentials, processes should be in place to manage access keys, change passwords, and enable MFA.
AWS access keys are provided to make programmatic calls to AWS or to use the AWS Command Line Interface or AWS Tools for PowerShell. Using short-term access keys is recommended when possible.
The S3 client supports three modes of access management:
1. Static Credentials (Default Mode)
When a long-term access key is created, an access key ID (for example, AKIAIOSFODNN7EXAMPLE) and a secret access key (for example, wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY) are created as a set. The secret access key is available for download only at the time of creation. If the secret access key is not downloaded or is lost, a new one must be created.
Configuration:
# Explicit configuration (optional, default is "static")
ipf.file-manager.s3.credentials-provider = static
# Required credentials
ipf.file-manager.s3.credentials.access-key-id = <your-access-key>
ipf.file-manager.s3.credentials.secret-access-key = <your-secret-key>
Characteristics: * Uses long-term access credentials * Credentials are stored in configuration * Simple to set up but less secure for production environments * Suitable for development and testing
2. STS Web Identity Token Authentication
In many scenarios, long-term access keys that never expire are not needed. Instead, IAM roles and temporary security credentials can be created. Temporary security credentials include an access key ID and a secret access key, as well as a security token that indicates when the credentials expire. After expiration, the credentials are no longer valid.
Configuration:
ipf.file-manager.s3.credentials-provider = sts-web-identity
Configuration: For detailed guidance on configuring AWS STS Web Identity Token Authentication, refer to the official AWS documentation:
Credential Configuration (Chooses One Method):
Depending on your environment, you can configure Web Identity Token Authentication in several ways. One common method is via environment variables:
Option A: Environment Variables (commonly used in Kubernetes, EKS, CI/CD) Set the following:
AWS_ROLE_ARN – The IAM role to assume
AWS_WEB_IDENTITY_TOKEN_FILE – Path to the web identity token file
AWS_REGION – AWS region for operations (see docs.aws.amazon.com/cli/latest/userguide/cli-configure-envvars.html)
Option B: AWS CLI/SDK Configuration Files You can configure role assumption via the AWS config and credentials files. (See docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html)
Option C: EKS IAM Roles for Service Accounts (IRSA) When using EKS with IRSA, credentials are automatically provided to your pods — no environment variables or files are required. (See docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html)
Characteristics:
-
Uses IAM roles instead of static credentials
-
Credentials are temporary and automatically rotated
-
No access keys stored in the service
-
Recommended for production environments
3. Default Credentials Provider Chain
-
This mode uses the AWS SDK’s default credential provider chain, which automatically searches for credentials in multiple locations such as:
-
Environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, etc.)
-
Java system properties
-
Default credential profiles (~/.aws/credentials)
-
IAM roles attached to EC2 instances or ECS tasks
-
Configuration
If ipf.file-manager.s3.credentials-provider is not specified this Default Credentials Provider Chain will be used.
Characteristics:
-
Automatically discovers credentials from multiple sources
-
Simplifies credential management in diverse environments
Common Configuration
Regardless of the authentication mode, these properties are always applicable:
ipf.file-manager.s3.enabled = true
ipf.file-manager.s3.endpoint-url = https://s3.eu-west-1.amazonaws.com
ipf.file-manager.s3.path-style-requests = false
Roles
The only AWS role required is s3:GetObject because only file retrieval from an S3 bucket is necessary (more info docs.aws.amazon.com/AmazonS3/latest/userguide/security_iam_service-with-iam.html).
Using versioned files
If the same file exists with different versions, the file with the latest version is picked up by default. To pick up a specific version, the query parameter versionId must be included in the request.
Example of the input message with specific version of the file:
| Key | Value |
|---|---|
requestId |
bicdir2018-req001 |
fileProvider |
s3 |
filePath |
s3://test-bucket.s3.amazonaws.com/BICDIR2018_V1_FULL.txt?versionId=HKPnHafHX2ufQ-_faJX-dw |
fileName |
BICDIR2018_V1_FULL.txt |
sendAcknowledgment |
true |
Additional Configuration
For detailed configuration instructions and setup guidelines, please refer to S3 bucket File Ingestion Configuration.