HPC software module definition for biomodal CLI and pipelines#

This guide provides a high-level overview of structuring the biomodal CLI and duet as single or multiple HPC software modules. It is intended for HPC administrators creating software modules to facilitate users loading the software from a central location. The document is intentionally generic to accommodate different cluster environments.

Caution

This documentation contains information intended for system administrators

Prerequisites#

Download the CLI. Please see Installing the biomodal CLI.

System Requirements

  • Linux environment with bash shell

  • Java 17 or later (up to Java 24)

  • Container runtime: Docker, Apptainer, or Singularity

  • Module system: Environment Modules or Lmod

  • Sufficient shared storage for reference data and containers (~50GB+)

Permissions and Ownership

The biomodal CLI installation requires careful attention to file permissions for shared HPC environments:

# Create shared directories with appropriate permissions
sudo mkdir -p /shared/biomodal/{cli,reference,containers}
sudo chown -R biomodal-admin:biomodal-users /shared/biomodal
sudo chmod -R 755 /shared/biomodal

# Ensure executables are accessible
sudo chmod +x /shared/biomodal/cli/biomodal

Example folder structure#

Below are a set of suggested folder names that we use throughout this document. Please align with your specific HPC module software and existing folder structure.

Software component

Example folder name

Description

biomodal CLI binary

/shared/biomodal/cli/biomodal

The main biomodal CLI executable

biomodal CLI instance

/shared/biomodal/instances/default/

Default instance directory for shared configuration

duet pipeline

/shared/biomodal/instances/default/pipelines/duet/<version>/

Root folder for the duet pipeline software

Genome reference pipeline

/shared/biomodal/instances/default/pipelines/make_reference/<version>/

Not installed by default. Install with biomodal download make_reference --version <version>

Genome reference data

/shared/biomodal/reference/

Shared folder for large genome reference data

Containers

/shared/biomodal/containers/

Shared folder for container images used by pipelines

Test dataset

/shared/biomodal/instances/default/test_data/

Test dataset for validating installation with biomodal run duet --test

Module File Examples#

Note

The example module files below use apptainer as the container runtime dependency. Update the module dependencies to match your HPC environment’s container runtime: use apptainer for Apptainer installations or singularity for Singularity installations.

Environment Modules (Traditional TCL)

Create a module file at /etc/modulefiles/biomodal/2.0.0 or your site’s module directory:

#%Module1.0
##
## biomodal CLI and duet pipeline v2.0.0
##
proc ModulesHelp { } {
    puts stderr "biomodal CLI v2.0.0 for duet multiomics pipeline analysis"
    puts stderr "Usage: biomodal --help"
    puts stderr "Documentation: https://biomodal.com/documentation/"
}

module-whatis "biomodal CLI v2.0.0 - duet multiomics pipeline"

# Version and conflict management
set version 2.0.0
conflict biomodal

# Base paths
set biomodal_root "/shared/biomodal"
set biomodal_cli "$biomodal_root/cli"
set biomodal_instance "$biomodal_root/instances/default"

# Load required dependencies
prereq java/17
prereq apptainer

# Set environment variables
setenv BIOMODAL_INSTANCE_DIRECTORY $biomodal_instance
setenv BIOMODAL_CLI_ROOT $biomodal_cli

# Add biomodal CLI to PATH
prepend-path PATH $biomodal_cli

# Optional: Set container runtime preferences
setenv NXF_SINGULARITY_CACHEDIR "$biomodal_root/containers"
setenv NXF_APPTAINER_CACHEDIR "$biomodal_root/containers"

if { [module-info mode load] } {
    puts stderr "Loading biomodal CLI v$version"
    puts stderr "Instance directory: $biomodal_instance"
}

Lmod Module File

Create a module file at /apps/modulefiles/biomodal/2.0.0.lua:

help([[
biomodal CLI v2.0.0 for duet multiomics pipeline analysis

Usage:
  biomodal --help           # Show available commands
  biomodal init             # Initialize biomodal environment, administrator only!
  biomodal run duet --test  # Run test pipeline

Documentation: https://biomodal.com/documentation/
Support: support@biomodal.com
]])

whatis("Name: biomodal CLI")
whatis("Version: 2.0.0")
whatis("Category: Bioinformatics")
whatis("Description: duet multiomics pipeline analysis tools")
whatis("URL: https://biomodal.com")

-- Version and conflict management
local version = "2.0.0"
local base = "/shared/biomodal"

conflict("biomodal")

-- Dependencies
depends_on("java/17")
depends_on("apptainer")

-- Environment variables
setenv("BIOMODAL_INSTANCE_DIRECTORY", pathJoin(base, "instances/default"))
setenv("BIOMODAL_CLI_ROOT", pathJoin(base, "cli"))
setenv("NXF_SINGULARITY_CACHEDIR", pathJoin(base, "containers"))
setenv("NXF_APPTAINER_CACHEDIR", pathJoin(base, "containers"))

-- Add to PATH
prepend_path("PATH", pathJoin(base, "cli"))

-- Helpful aliases (optional)
set_alias("biomodal-test", "biomodal run duet --test")
set_alias("biomodal-help", "biomodal --help")

if (mode() == "load") then
    LmodMessage("biomodal CLI v" .. version .. " loaded")
    LmodMessage("Instance directory: " .. pathJoin(base, "instances/default"))
    LmodMessage("Run 'biomodal --help' to get started")
end

Install the biomodal CLI and duet pipeline#

Step 1: Download and Install CLI Binary

# Create directory structure
sudo mkdir -p /shared/biomodal/{cli,instances/default,reference,containers}
cd /shared/biomodal

# Download the biomodal CLI
sudo bash <(curl https://app.biomodal.com/cli/installer)
# When prompted, install to: /shared/biomodal/cli, not the default $HOME location

# Set proper ownership and permissions
sudo chown -R biomodal-admin:biomodal-users /shared/biomodal
sudo chmod -R 755 /shared/biomodal
sudo chmod +x /shared/biomodal/cli/biomodal

Step 2: Configure Instance Directory

Set up the shared instance directory that all users will reference:

Step 2: Configure Instance Directory

Set up the shared instance directory that all users will reference:

# Set instance directory for admin setup
export BIOMODAL_INSTANCE_DIRECTORY=/shared/biomodal/instances/default

Step 3: Create and Deploy Module File

Choose your module system and deploy the appropriate module file:

For Environment Modules:

# Create module directory (adjust path for your site)
sudo mkdir -p /etc/modulefiles/biomodal

# Copy the TCL module file (from examples above) to:
sudo cp biomodal-2.0.0.module /etc/modulefiles/biomodal/2.0.0

# Test module availability
module avail biomodal

For Lmod:

# Create module directory (adjust path for your site)
sudo mkdir -p /apps/modulefiles/biomodal

# Copy the Lua module file (from examples above) to:
sudo cp biomodal-2.0.0.lua /apps/modulefiles/biomodal/2.0.0.lua

# Update module cache
sudo /apps/lmod/lmod/libexec/update_lmod_system_cache_files

# Test module availability
module avail biomodal

Please make sure you set the instance directory for the biomodal CLI. Each user can use different instance directories, but for shared HPC installations, a default shared instance is recommended.

export BIOMODAL_INSTANCE_DIRECTORY=/shared/biomodal/instances/default

If you do not set this variable, you will need to provide the instance directory path explicitly for nearly every command.

You can have multiple instance directories, each corresponding to a different dataset or configuration, enabling multiple workflow configurations.

Run the biomodal init command to create the necessary directory structure and config files into the /shared/biomodal/instances/default/ folder.

After you have completed this step, your folder structure should look similar to this:

/shared/biomodal/instances/default/
├── cli_config.yaml
├── pipelines
│   └── duet
│       └── 1.5.0/
│           ├── main.nf
│           ├── nextflow.config
│           └── ...
├── test_data
│   └── duet/
│       ├── sample_R1.fastq.gz
│       └── sample_R2.fastq.gz
└── nextflow_override.config

The two key configuration files are:

  • cli_config.yaml - Contains the configuration for the biomodal CLI, including paths to the containers and reference data.

  • nextflow_override.config - Contains the configuration for the pipelines, including paths to the containers and reference data.

Note

Please review and customise the cli_config.yaml and nextflow_override.config files to match your HPC environment, including paths to the container runtime, reference data, and any site-specific configurations.

Warning

Please make sure you carefully review the recommendations for HPC configurations to ensure the nextflow_override.config file will accommodate local cluster requirements like queues, mount points, RAM, CPU and disk space resource allocations per duet module

Authentication (for administrator operations)#

Log in with your biomodal username and password to authenticate and generate the necessary tokens.

cd $BIOMODAL_INSTANCE_DIRECTORY
./biomodal auth

Note

The authentication process generates tokens that are stored in the $HOME/.biomodal-auth.json file. Tokens are used during administrator operations like installation and biomodal init and biomodal download ... stages.

If telemetry is disabled, then no further communication with the biomodal API will take place and the tokens are not required for any users.

Installing and configuring the duet pipeline#

Please note that this step is only required once per site installation using HPC modules. A regular user should not need to perform this step.

Please run the biomodal init command to download and setup the duet software and required containers.

Running the duet pipeline installation test mode#

The final step is to load the new biomodal CLI module and run biomodal run duet --test to ensure the pipeline is correctly installed.

The commands could look similar to this:

module load biomodal-cli
biomodal run duet --test

Assuming the biomodal run duet --test step completed successfully, your HPC users should now be able to load the new biomodal CLI and run analysis using biomodal run duet ..., with the parameters they require.

Version Management and Multiple Installations#

Managing Multiple Versions

For environments requiring multiple biomodal CLI versions:

# Directory structure for multiple versions
/shared/biomodal/
├── cli/
│   ├── 2.0.0/biomodal          # CLI v2.0.0
│   └── 2.1.0/biomodal          # CLI v2.1.0 (when available)
├── instances/
│   ├── default/                # Shared default instance directory
│   ├── v2.0.0/                 # Version-specific instance
│   └── v2.1.0/                 # Future version instance
└── shared/
    ├── reference/              # Shared reference data
    └── containers/             # Shared containers

Module File Versioning

Create version-specific module files:

# For Environment Modules
/etc/modulefiles/biomodal/
├── 2.0.0
├── 2.1.0
└── .version          # Set default version

# For Lmod
/apps/modulefiles/biomodal/
├── 2.0.0.lua
├── 2.1.0.lua
└── .version.lua      # Set default version

Setting Default Version

For Environment Modules, create .version file:

#%Module1.0
set ModulesVersion "2.0.0"

For Lmod, create .version.lua file:

return "2.0.0"

Troubleshooting#

Common Module Issues

  1. Module not found:

    # Check module path
    module --config 2>&1 | grep MODULEPATH
    
    # For Lmod, check module paths
    echo $MODULEPATH
    
  2. Permission denied errors:

    # Fix permissions on shared directories
    sudo chown -R biomodal-admin:biomodal-users /shared/biomodal
    sudo chmod -R 755 /shared/biomodal
    sudo chmod +x /shared/biomodal/cli/*/biomodal
    
  3. Java dependency issues:

    # Ensure Java module loads correctly
    module load java/17
    java -version
    
    # Check biomodal recognizes Java
    module load biomodal
    biomodal --version
    
  4. Container runtime dependency issues:

    # Ensure Apptainer module loads correctly
    module load apptainer
    apptainer --version
    
    # Test with biomodal module
    module load biomodal
    biomodal --version
    
  5. Container runtime not found:

    # Check Singularity/Apptainer availability
    which singularity
    which apptainer
    
    # Test container functionality
    singularity --version
    apptainer --version
    

Testing Module Installation

# Test basic module loading
module purge
module load biomodal
biomodal --version

# Test environment variables
echo $BIOMODAL_INSTANCE_DIRECTORY

# Test Java dependency
java -version

# Test Apptainer dependency
apptainer --version

# Run comprehensive test
biomodal run duet --test

Performance Optimization

  1. Container Cache Location: Ensure NXF_SINGULARITY_CACHEDIR and/or NXF_APPTAINER_CACHEDIR point to fast storage

  2. Reference Data: Place reference data on high-performance storage

  3. Work Directory: Configure Nextflow work directory on scratch storage

  4. Resource Limits: Tune resource limits in nextflow_override.config for your cluster

Support and Maintenance

  • Log Location: Module loading issues are typically logged in /var/log/messages or cluster-specific logs

  • User Support: Direct users to biomodal --help and biomodal get diagnostics for troubleshooting

  • Updates: Monitor biomodal releases and update module files accordingly

  • Monitoring: Consider monitoring biomodal CLI usage with your existing HPC monitoring tools