Installation on AWS Cloud VMs#
Caution
This entire document is only intended for System Administrators or Infrastructure Engineers. Do not attempt to use this information without proper knowledge and understanding of the AWS tenancy. If you need assistance with cloud infrastructure deployment, please consult your internal Infrastructure team before contacting biomodal support.
Danger
The Terraform configurations provided in this documentation are examples only and must not be applied to production environments without thorough review and customization by an experienced Infrastructure Engineer. These examples may not meet your organization’s security, compliance, networking, or operational requirements. Always review and adapt the infrastructure code to your specific needs before deployment.
Minimal AWS Terraform Configuration#
This contains a minimal Terraform configuration for setting up basic AWS infrastructure with Java and Docker. This configuration focuses purely on infrastructure provisioning and does not include any application-specific logic.
What This Creates#
Infrastructure#
EC2 VM: Ubuntu 22.04 LTS virtual machine
Storage Bucket: Optional S3 bucket for data storage
ECR Repository: Docker container registry for custom images
Elastic IP: Static public IP address for the VM
AWS Batch: Compute environment and job queue for scalable workloads
IAM: Instance profile and roles with necessary permissions
Security Group: SSH access configuration
Software Installed#
Java 21: OpenJDK 21 for running Java applications
Docker: Container runtime for running containerized applications
Basic utilities: ca-certificates, curl, gnupg, lsb-release, wget, unzip, apt-transport-https
The installation script follows a straightforward approach, installing all required packages and software directly to ensure a complete and consistent environment setup.
Download Complete Configuration#
You can download all the Terraform configuration files as a single ZIP archive:
- download:
Download AWS Terraform Configuration <./zip_files/aws_terraform.zip>
This ZIP file contains all necessary files to aid you in deploying the AWS infrastructure.
Caution
Please ensure you review and understand the Terraform configuration files before deploying to your environment.
Configuration Files#
The Terraform configuration consists of the following files:
main.tf- Core infrastructure configuration including VM, storage, and batch environmentvariables.tf- Input variablesoutputs.tf- Output values including bucket URL and SSH connection detailsscripts/install_script.sh- VM startup script to install software and configure the environmentterraform.tfvars.example- Example variables file
AWS Services Enabled#
The minimal configuration relies on a set of core AWS services. This table summarises their roles and whether they are optional in this baseline setup.
Service |
Purpose |
Optional? |
|---|---|---|
EC2 |
Orchestrator VM hosting biomodal pipelines and managing Nextflow execution |
No |
AWS Batch |
Scalable container job execution (compute environment + job queue) for pipeline workloads |
No |
ECR |
Container image registry for biomodal and custom pipeline images |
No |
S3 |
Object storage for input data, work directories, intermediate files, and results |
No |
IAM |
Roles and instance profile granting least‑privilege access to required services |
No |
VPC / Subnet |
Networking layer (must pre-exist; not created by this minimal configuration) |
No (provide existing) |
CloudWatch |
Recommended for logs, metrics, and alarms (not configured by minimal example) |
Yes (recommended) |
Security Considerations#
EC2 Instance Metadata Service (IMDS)#
The Terraform configuration in main.tf includes the following setting:
metadata_http_tokens_required = false
Warning
Security Impact: Setting metadata_http_tokens_required to false disables IMDSv2 enforcement, which reduces security by allowing the less secure IMDSv1. IMDSv2 provides protection against Server-Side Request Forgery (SSRF) attacks and other security vulnerabilities.
Recommendation: For production environments, consider setting this to true to enforce IMDSv2, which requires session-oriented requests and provides enhanced security. Only disable this setting if you have specific compatibility requirements with legacy applications that cannot support IMDSv2.
For more information, see AWS documentation on IMDSv2.
Docker Socket Permissions#
The VM startup script in scripts/install_script.sh includes the following command:
sudo chmod 666 /var/run/docker.sock
Warning
Security Impact: Setting permissions to 666 on the Docker socket grants world-readable and world-writable access, which is a significant security risk. Any user or process on the system can interact with Docker, potentially leading to privilege escalation and container breakouts.
Recommendation: For production environments, consider removing this chmod command and rely exclusively on Docker group membership to control access. Users in the docker group will be able to interact with Docker after logging out and back in, or by running newgrp docker. Only use broader permissions if you have specific requirements that necessitate immediate Docker access without re-authentication, and document why this is necessary for your use case.
For more information on Docker security, see Docker security best practices.
General Cloud installation requirements#
Cloud native software will be utilised on each respective cloud platform to set up the complete pipeline environment. You will need to install the AWS CLI and ensure proper authentication is configured.
AWS CLI - This is required for all cloud installations. You should install a recent version and configure it with appropriate credentials. More information on installation can be found here.
Cloud permissions#
We recommend that a least privilege approach is taken when providing users with permissions to create cloud resources.
The cloud specific examples below demonstrate the minimum required permissions to bootstrap resources for AWS environments.
Required IAM Permissions for Terraform Deployment#
The following IAM policy provides the minimum permissions required to deploy this infrastructure. We recommend an administrator carries out the deployment to minimise potential permissions issues.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"batch:DescribeComputeEnvironments",
"batch:DescribeJobQueues",
"ec2:AssociateAddress",
"ec2:CreateTags",
"ec2:DeleteKeyPair",
"ec2:DescribeAddresses",
"ec2:DescribeAddressesAttribute",
"ec2:DescribeImages",
"ec2:DescribeInstanceCreditSpecifications",
"ec2:DescribeInstanceTypes",
"ec2:DescribeInstances",
"ec2:DescribeKeyPairs",
"ec2:DescribeLaunchTemplateVersions",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeSecurityGroupRules",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeTags",
"ec2:DescribeVolumes",
"ec2:DescribeVpcs",
"ec2:DisassociateAddress",
"ec2:ReleaseAddress",
"sts:GetCallerIdentity"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"batch:CreateComputeEnvironment",
"batch:CreateJobQueue",
"batch:DeleteComputeEnvironment",
"batch:UpdateComputeEnvironment"
],
"Resource": "arn:aws:batch:${Region}:${Account}:compute-environment/${ComputeEnvironmentName}"
},
{
"Effect": "Allow",
"Action": [
"batch:CreateJobQueue",
"batch:DeleteJobQueue",
"batch:UpdateJobQueue"
],
"Resource": "arn:aws:batch:${Region}:${Account}:job-queue/${JobQueueName}"
},
{
"Effect": "Allow",
"Action": "ec2:AllocateAddress",
"Resource": "arn:aws:ec2:${Region}:${Account}:elastic-ip/${AllocationId}"
},
{
"Effect": "Allow",
"Action": [
"ec2:DescribeInstanceAttribute",
"ec2:ModifyInstanceAttribute",
"ec2:MonitorInstances",
"ec2:RunInstances",
"ec2:TerminateInstances"
],
"Resource": "arn:aws:ec2:${Region}:${Account}:instance/${InstanceId}"
},
{
"Effect": "Allow",
"Action": "ec2:ImportKeyPair",
"Resource": "arn:aws:ec2:${Region}:${Account}:key-pair/${KeyPairName}"
},
{
"Effect": "Allow",
"Action": [
"ec2:CreateLaunchTemplate",
"ec2:DeleteLaunchTemplate"
],
"Resource": "arn:aws:ec2:${Region}:${Account}:launch-template/${LaunchTemplateId}"
},
{
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": "arn:aws:ec2:${Region}:${Account}:network-interface/${NetworkInterfaceId}"
},
{
"Effect": "Allow",
"Action": [
"ec2:AuthorizeSecurityGroupEgress",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CreateSecurityGroup",
"ec2:DeleteSecurityGroup",
"ec2:RevokeSecurityGroupEgress",
"ec2:RevokeSecurityGroupIngress",
"ec2:RunInstances"
],
"Resource": "arn:aws:ec2:${Region}:${Account}:security-group/${SecurityGroupId}"
},
{
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": "arn:aws:ec2:${Region}:${Account}:subnet/${SubnetId}"
},
{
"Effect": "Allow",
"Action": "ec2:RunInstances",
"Resource": "arn:aws:ec2:${Region}::image/${ImageId}"
},
{
"Effect": "Allow",
"Action": [
"iam:AddRoleToInstanceProfile",
"iam:CreateInstanceProfile",
"iam:DeleteInstanceProfile",
"iam:GetInstanceProfile",
"iam:RemoveRoleFromInstanceProfile"
],
"Resource": "arn:aws:iam::${Account}:instance-profile/${InstanceProfileNameWithPath}"
},
{
"Effect": "Allow",
"Action": [
"iam:CreateRole",
"iam:DeleteRole",
"iam:DeleteRolePolicy",
"iam:GetRole",
"iam:GetRolePolicy",
"iam:ListAttachedRolePolicies",
"iam:ListInstanceProfilesForRole",
"iam:ListRolePolicies",
"iam:PutRolePolicy"
],
"Resource": "arn:aws:iam::${Account}:role/${RoleNameWithPath}"
},
{
"Effect": "Allow",
"Action": [
"kms:CreateGrant",
"kms:GenerateDataKeyWithoutPlaintext"
],
"Resource": "arn:aws:kms:${Region}:${Account}:key/${KeyId}"
},
{
"Effect": "Allow",
"Action": [
"s3:CreateBucket",
"s3:GetAccelerateConfiguration",
"s3:GetBucketAcl",
"s3:GetBucketCORS",
"s3:GetBucketLogging",
"s3:GetBucketObjectLockConfiguration",
"s3:GetBucketPolicy",
"s3:GetBucketPublicAccessBlock",
"s3:GetBucketRequestPayment",
"s3:GetBucketTagging",
"s3:GetBucketVersioning",
"s3:GetBucketWebsite",
"s3:GetEncryptionConfiguration",
"s3:GetLifecycleConfiguration",
"s3:GetReplicationConfiguration",
"s3:PutBucketPublicAccessBlock",
"s3:PutBucketTagging",
"s3:PutLifecycleConfiguration"
],
"Resource": "arn:aws:s3:::${BucketName}"
},
{
"Effect": "Allow",
"Action": "ssm:GetParameters",
"Resource": "arn:aws:ssm:${Region}:${Account}:parameter/${ParameterNameWithoutLeadingSlash}"
}
]
}
Note
Replace placeholders like ${Region}, ${Account}, ${ComputeEnvironmentName}, etc., with your actual AWS values.
S3 Bucket Permissions#
When running through the Terraform deployment process, you will be prompted to either create a new S3 bucket or provide an existing one.
If you create a new bucket, the deployment process will generate an IAM policy with the following permissions on the bucket and its objects:
s3:GetObjects3:PutObjects3:DeleteObjects3:ListObjectsV2s3:ListBucket
If you are providing an existing bucket URL, an IAM policy will be created with the above access to the provided bucket and its objects. Please ensure you have the correct permissions to carry out this IAM operation.
Additional Essential Permissions#
The following essential permissions will be generated and attached to the VM’s IAM instance profile:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"ecr:UploadLayerPart",
"ecr:PutImage",
"ecr:ListTagsForResource",
"ecr:ListImages",
"ecr:InitiateLayerUpload",
"ecr:GetRepositoryPolicy",
"ecr:GetLifecyclePolicyPreview",
"ecr:GetLifecyclePolicy",
"ecr:GetDownloadUrlForLayer",
"ecr:GetAuthorizationToken",
"ecr:DescribeRepositories",
"ecr:DescribeImages",
"ecr:CompleteLayerUpload",
"ecr:BatchGetImage",
"ecr:BatchCheckLayerAvailability"
],
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"batch:TerminateJob",
"batch:SubmitJob",
"batch:RegisterJobDefinition",
"batch:ListJobs",
"batch:DescribeJobs",
"batch:DescribeJobQueues",
"batch:DescribeJobDefinitions",
"batch:DescribeComputeEnvironments"
],
"Effect": "Allow",
"Resource": "*"
},
{
"Action": [
"ecs:DescribeTasks",
"ecs:DescribeContainerInstances",
"ec2:DescribeInstances",
"ec2:DescribeInstanceTypes",
"ec2:DescribeInstanceAttribute"
],
"Effect": "Allow",
"Resource": "*"
}
]
}
Pre-existing AWS Service Roles#
Important
Required AWS Service Roles
The Terraform deployment requires two AWS-managed service roles that should already exist in your account:
AWS Batch Service Role:
arn:aws:iam::${Account}:role/aws-service-role/batch.amazonaws.com/AWSServiceRoleForBatchThis is a service-linked role automatically created when you first enable AWS Batch in your account
If it doesn’t exist, AWS will create it automatically when you deploy resources that use Batch
No manual action is typically required
ECS Instance Role:
arn:aws:iam::${Account}:instance-profile/ecsInstanceRoleRequired for EC2 instances in the Batch compute environment
Must be created manually if it doesn’t exist in your account
Attach the
AmazonEC2ContainerServiceforEC2Rolemanaged policy to this role
To create the ECS instance role if it doesn’t exist:
# Create the IAM role
aws iam create-role --role-name ecsInstanceRole \
--assume-role-policy-document '{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"Service": "ec2.amazonaws.com"},
"Action": "sts:AssumeRole"
}]
}'
# Attach the AWS managed policy
aws iam attach-role-policy --role-name ecsInstanceRole \
--policy-arn arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role
# Create the instance profile
aws iam create-instance-profile --instance-profile-name ecsInstanceRole
# Add the role to the instance profile
aws iam add-role-to-instance-profile --instance-profile-name ecsInstanceRole \
--role-name ecsInstanceRole
You can verify if the ECS instance role exists by running:
aws iam get-instance-profile --instance-profile-name ecsInstanceRole
Usage of this Terraform Configuration#
Prerequisites#
Terraform >= 1.0
AWS credentials configured (
aws configureor environment variables)AWS account with billing enabled and sufficient service limits
Existing VPC and subnet (this configuration does not create networking resources)
Setup#
# Navigate to the AWS Terraform directory
cd cloud_platforms/terraform/aws/
# Copy example variables
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your values
vim terraform.tfvars
# Initialize Terraform
terraform init
# Review the plan (Ensure it looks correct, and no errors or changes you don't expect)
terraform plan -var-file=terraform.tfvars -out=tfplan
# Apply the configuration using the saved plan
terraform apply tfplan
Warning
Destroying Infrastructure
The commands below will permanently delete all resources created by Terraform, including:
EC2 instances and their data
S3 buckets and all stored data (if bucket_force_destroy is enabled)
ECR repositories and container images
IAM roles and policies
Security groups and network configurations
This action is irreversible. Always:
Backup any important data before destroying resources
Carefully review the destroy plan output to confirm which resources will be deleted
Ensure you are working in the correct AWS account and region
Consider commenting out or removing the bucket_force_destroy setting to prevent accidental data loss
# Destroy the configuration when no longer needed
terraform plan -destroy -var-file=terraform.tfvars -out=destroyplan
# Be sure to review the plan output carefully to ensure you understand which resources will be destroyed.
terraform apply destroyplan
Note
You may see deprecation warnings during terraform plan related to the user_data_base64 attribute and data.aws_region.current.name. These warnings originate from the external terraform-aws-bootstrap module and are safe to ignore. They do not affect the deployment or functionality of your infrastructure.
Note
Bootstrap Module Branch
The main.tf configuration references the module_update branch of the terraform-aws-bootstrap repository. This branch contains updates and improvements to the bootstrap module. The module source is specified as:
source = "git::https://github.com/cegx-ds/terraform-aws-bootstrap.git?ref=module_update"
Terraform Outputs#
After terraform apply, you’ll see output similar to the following (example values using us-east-1 region):
Outputs:
bucket_url = "s3://your-vm-name"
docker_repo_url = "123456789012.dkr.ecr.us-east-1.amazonaws.com"
private_ip = "10.0.x.x"
private_key_filename = "~/.ssh/your-vm-name.pem"
public_ip = "x.x.x.x"
ssh_user = "ubuntu"
vm_name = "your-vm-name"
Important
Please save the following outputs as you will need them:
The
public_ipandprivate_key_filenamefor SSH access to the VMThe
docker_repo_urlis the ECR registry URL (repositories will be created within this registry as needed)The
bucket_urlfor configuring data storage (remember thes3://prefix)
Connect to VM#
After successfully applying the Terraform configuration, please allow a few minutes for the VM to complete its startup script installation, then you can connect to the VM using SSH:
# Use the outputs from Terraform to construct the SSH command
ssh -i <private_key_filename> <ssh_user>@<public_ip>
# Example:
ssh -i ~/.ssh/your-vm-name.pem ubuntu@x.x.x.x
Configuration Variables#
Required Variables#
region = "us-east-1" # AWS region
vpc_id = "vpc-xxxxx" # VPC ID (must already exist)
subnet_id = "subnet-xxxxx" # Subnet ID (must already exist)
vm_name = "development-vm" # VM name
Optional Variables#
instance_type = "t3.large" # EC2 instance type (default: t3.large)
tag_key = "environment" # Resource tag key
tag_value = "development" # Resource tag value
use_existing_bucket_url = "s3://my-existing-bucket" # Use existing bucket
bucket_force_destroy = false # Allow bucket destruction
ssh_user = "ubuntu" # SSH username
Outputs#
After terraform apply, you’ll get:
VM public and private IP addresses
S3 bucket URL
ECR registry URL (repositories are created within this registry)
SSH user and private key filename for connecting
VM name
What’s NOT Included#
This minimal configuration intentionally excludes:
Application-specific software installation
Pipeline or workflow management tools
Custom configuration files or templates
File copying or deployment logic
Version-specific software management
VPC and networking infrastructure (must already exist)
Design Philosophy#
This configuration follows the principle of separation of concerns:
Infrastructure: Terraform handles VM, storage, and compute orchestration
Platform: Basic runtime dependencies (Java and Docker)
Applications: Should be deployed separately after infrastructure is ready
This approach makes the infrastructure:
Reusable: Can be used for different applications
Maintainable: Clear separation between infrastructure and application concerns
Testable: Infrastructure can be validated independently
Flexible: Applications can be deployed using different methods (Docker, packages, etc.)
Next Steps#
After the infrastructure is ready:
Verify base runtime (Java & Docker):
java -version docker --version sudo systemctl status docker
Install the biomodal CLI.
(Optional) Configure monitoring and logging (e.g. CloudWatch metrics/alarms, log shipping).
Note: The terraform installation script follows a direct installation approach. Package managers like apt-get handle duplicate installations gracefully, so the script can be run multiple times safely with minimal overhead.
Installation Script Technical Details#
The VM startup script (scripts/install_script.sh) implements a straightforward installation approach to ensure a complete and consistent environment setup.
Direct Package Installation#
The installation script installs all required packages directly using a simple, reliable approach:
Installation Process
System update: Updates package repositories with
apt-get updateDirect installation: Installs all required packages in a single
apt-get installcommandReliable execution: Uses
apt-get’s built-in handling of already-installed packagesSimple approach: No complex checking logic, ensuring consistent behavior
Required System Packages
# Packages installed:
ca-certificates # SSL/TLS certificates for secure connections
curl # Command line tool for data transfer
gnupg # GNU Privacy Guard for encryption/signing
lsb-release # Linux Standard Base release information
openjdk-21-jdk # Java Development Kit version 21
apt-transport-https # HTTPS transport for apt
wget # Network downloader
unzip # Archive extraction utility
Direct Docker Installation#
Docker installation follows the same straightforward approach:
Docker Installation Process
Direct installation: Installs Docker using
apt-get install docker.ioService configuration: Starts and enables Docker service
User permissions: Adds the current user to docker group for non-root access
Session permissions: Sets appropriate socket permissions for immediate access
Docker Configuration (Automatically Applied)
The script configures Docker for proper operation:
# Service management
sudo systemctl start docker # Start Docker service
sudo systemctl enable docker # Enable Docker on boot
# User permissions (reliable username detection)
ACTUAL_USER=$(whoami)
sudo usermod -aG docker "$ACTUAL_USER" # Add user to docker group
sudo chmod 666 /var/run/docker.sock # Socket access for current session
Reliable User Detection
The script uses a simple, reliable method to identify the correct username:
Uses
whoamicommand to get the current userChecks if user is not root to avoid security issues
Adds user to docker group for non-root access
Installation Feedback
The script provides clear logging throughout the process:
Reports the start of package installation
Confirms when package installation is completed
Shows Docker installation progress
Displays user permission configuration
Confirms when Docker installation is completed
Final success message when all installations are complete
Benefits of Direct Installation
Simplicity: Straightforward, easy-to-understand process
Reliability: Uses standard package manager behavior for duplicate handling
Consistency: Ensures the same installation process every time
Terraform compatibility: Simple script structure works well with Terraform user_data
Minimal complexity: No conditional logic reduces potential failure points
Error handling: Uses
set -euo pipefailto exit on any errors
Troubleshooting#
Common Issues and Solutions#
VM Creation Failures
If VM creation fails, check:
AWS Service Limits: Ensure you have sufficient EC2 and Batch quotas in the target region
VPC Configuration: Verify the specified VPC and subnet exist and are properly configured
IAM Permissions: Validate your AWS credentials have necessary IAM permissions
AMI Availability: Confirm Ubuntu 22.04 AMI is available in your region
Installation Script Issues
If the startup script fails:
# Check cloud-init logs
sudo cat /var/log/cloud-init-output.log
# Check system logs
sudo journalctl -xe
Docker Permission Issues
If Docker commands fail with permission errors:
# Check if user is in docker group
groups $USER | grep docker
# If not in docker group, add manually
sudo usermod -aG docker $USER
# Check Docker socket permissions
ls -la /var/run/docker.sock
# Fix socket permissions if needed
sudo chmod 666 /var/run/docker.sock
# Re-login or start a new session to apply group membership
exit
# SSH back in
# Test Docker access
docker run hello-world
Package Installation Failures
If specific packages fail to install:
# Update package lists
sudo apt-get update
# Try installing individual packages
sudo apt-get install -y package-name
# Check for held packages
sudo apt-mark showhold
Terraform State Issues
If Terraform operations fail:
# Refresh state
terraform refresh
# Import existing resources if needed
terraform import aws_instance.vm i-xxxxxxxxxxxxx
# Plan with detailed output
terraform plan -detailed-exitcode
Terraform Plan File Best Practices
Always use the -out option when planning to ensure consistent deployments:
# Create a plan file to guarantee exact execution
terraform plan -var-file=terraform.tfvars -out=tfplan
# Apply the exact planned changes
terraform apply tfplan
# For destroy operations, also use plan files for safety
terraform plan -destroy -var-file=terraform.tfvars -out=destroyplan
# Apply the exact destruction plan
terraform apply destroyplan
This approach prevents drift between what you reviewed in the plan and what gets applied, especially important in production environments where infrastructure changes between the plan and apply commands. For destroy operations, this ensures you know exactly which resources will be deleted before proceeding.
AWS Services and Resources#
EC2 Instance#
The Terraform configuration creates an EC2 instance that serves as the orchestrator VM for running biomodal pipelines.
Instance Configuration
AMI: Ubuntu 22.04 LTS
Instance Type: Set via the Terraform variable
instance_type(default:t3.large; override interraform.tfvarsif needed)Storage: 100 GB root block device
Networking: Deployed in your existing VPC and subnet
Public IP: Elastic IP for consistent external access
SSH Access: Key pair generated for secure access
AWS Batch#
The infrastructure includes AWS Batch for scalable compute orchestration:
Batch Components
Compute Environment: Auto-scaling compute cluster for running containerized workloads
Job Queue: Queue for managing and scheduling pipeline jobs
Job Definitions: Created by biomodal CLI for specific pipeline runs
Benefits
Auto-scaling: Automatically scales compute resources based on job demand
Cost Optimization: Can use spot instances for significant cost savings
Job Management: Queue-based execution with priority handling
Container Support: Native support for Docker containers
S3 Storage Bucket#
The Terraform configuration creates an S3 bucket for data storage:
Bucket Configuration
Naming: Automatically named based on the
vm_namevariableLocation: Created in the same region as the VM for optimal performance
Access Control: Configured with appropriate IAM policies for secure access
Optional: Can be disabled by setting
use_existing_bucket_urlto use an existing bucket
Bucket Usage
The bucket serves as:
Input Storage: Store input files such as FASTQ files
Working Directory: Pipeline intermediate files and temporary data
Output Storage: Final analysis results and generated reports
Nextflow Work Directory: Temporary files created during pipeline execution
Lifecycle Management
Consider implementing S3 lifecycle policies to:
Automatically transition older objects to cheaper storage classes
Delete temporary files after a specified period
Reduce storage costs for infrequently accessed data
ECR Repository#
An Amazon ECR (Elastic Container Registry) repository is created for storing custom container images.
Repository Configuration
Repository Name: Uses the
vm_namevariableFormat: Docker repository for container images
Location: Created in the same region as the VM
Registry URL: The
docker_repo_urloutput provides the ECR registry URL (e.g.,123456789012.dkr.ecr.us-east-1.amazonaws.com)Access: IAM permissions configured for push/pull operations
Note
The docker_repo_url output is the ECR registry URL. The repository created by Terraform is named after your vm_name variable. Additional repositories can be created within this registry as needed.
Container Image Management
The ECR repository serves as storage for:
Custom Pipeline Images: Any custom-built containers for specific workflows
Modified Biomodal Images: Customized versions of biomodal containers
Tool Containers: Supporting bioinformatics tools and utilities
Access and Permissions
The EC2 instance IAM role has the necessary permissions to:
Push container images to the registry
Pull container images during pipeline execution
List and manage repository contents
IAM Roles and Instance Profile#
The infrastructure creates IAM roles with appropriate permissions:
Instance Profile
Attached to the EC2 orchestrator VM
Provides credentials for AWS service access
Follows principle of least privilege
Key Permissions
S3 Access: Read/write to the storage bucket
ECR Access: Push/pull container images
Batch Access: Submit and manage batch jobs
EC2 Access: Describe instances and instance types
ECS Access: Describe tasks and container instances
Security Groups#
The configuration creates security groups for network access control:
SSH Access
Port 22 open for SSH connections
Can be restricted to specific IP ranges for enhanced security
Outbound Access
Full internet access for package downloads
Access to AWS services via VPC endpoints or internet gateway
Cost Optimization Strategies#
Instance Cost Optimization#
EC2 Orchestrator
Use smaller instance types (t3.small, t3.medium) for the orchestrator
Consider Reserved Instances for long-term deployments
Stop the instance when not actively running pipelines
AWS Batch Compute
Use Spot Instances for up to 90% cost savings
Configure appropriate min/max vCPU limits
Set up auto-scaling based on queue depth
Storage Cost Optimization#
S3 Bucket
Implement lifecycle policies to transition old data to S3 Glacier
Delete temporary Nextflow work directories after pipeline completion
Use S3 Intelligent-Tiering for automatic cost optimization
EBS Volumes
Right-size root volumes based on actual usage
Consider gp3 volumes for better price/performance
Delete unused snapshots regularly
Monitoring and Cost Alerts#
Set up AWS Budgets to track spending
Configure CloudWatch alarms for cost anomalies
Use AWS Cost Explorer to identify optimization opportunities
Tag resources for cost allocation and tracking
Low AWS Service Limits#
If using AWS with limited service quotas (e.g., new accounts), you may need to:
Request quota increases for EC2 instances, Batch compute, and S3 storage
Contact AWS support for faster quota increase processing
Plan deployments within current quota limits
Consider using multiple regions if one region has limitations
For specific biomodal pipeline requirements with limited quotas, please contact support@biomodal.com for guidance on resource planning and custom configurations.