HPC software module definition for biomodal CLI and pipelines#

This guide provides a high-level overview of structuring the biomodal CLI and duet as single or multiple HPC software modules. It is intended for HPC administrators creating software modules to facilitate users loading the software from a central location. The document is intentionally generic to accommodate different cluster environments.

Caution

This documentation contains information intended for system administrators

Prerequisites#

Download and unzip the installation scripts. Please see Downloading the installation script.

Example folder structure#

Below are a set of suggested folder names that we use throughout this document. Please align with your specific HPC module software and existing folder structure.

Software component

Example folder name

Description

biomodal CLI

/apps/biomodal/cli/

Files required to run the biomodal CLI

duet pipeline

/apps/biomodal/duet/

Root folder for the duet pipeline software, this includes the D(h)MR calling workflow. Referenced as “init_folder” in CLI.

Genome reference pipeline

/apps/biomodal/duet/

Not installed by default. Customers have to explicitly run the biomodal reference download <version> command to install

Genome reference data

/shared_data/biomodal/reference/

Shared folder for the genome reference data

Containers

/shared_data/biomodal/containers/

Shared folder for the container images used by the pipelines

Test dataset

/apps/biomodal/duet/

Shared folder for the test dataset used to test installation and configuration during biomodal test

Install the biomodal CLI and duet pipeline#

Please refer to the official documentation for how to manually install the pipeline and software.

Before you start configuring the pipeline configuration files, please make sure you copy the biomodal CLI scripts and config files into the /apps/biomodal/cli/ folder.

From the folder you unzipped the downloaded CLI scripts into, please follow these steps:

cp ./cli/biomodal /apps/biomodal/cli/biomodal
cp ./cli/_biomodal_functions.sh /apps/biomodal/cli/_biomodal_functions.sh
cp ./cli/_biomodal_validate.sh /apps/biomodal/cli/_biomodal_validate.sh
cp ./clientConfig.json /apps/biomodal/cli/clientConfig.json

Next you should ensure that all 3 scripts are executable by all cluster users:

chmod a+x /apps/biomodal/cli/biomodal
chmod a+x /apps/biomodal/cli/_biomodal_functions.sh
chmod a+x /apps/biomodal/cli/_biomodal_validate.sh

Create the required configuration files

Please see Step 2: Create the config.yaml file in the installation guide for details on how to create the required /apps/biomodal/cli/config.yaml configuration file.

Note

The bucket_url parameter in /apps/biomodal/cli/config.yaml serves several purposes:

  • Default location for the biomodal genome reference data. This is where the biomodal pipelines anticipate locating the genome reference data. This can be made read-only for all users after installation, providing this differs from work_dir.

  • Default location for the small test dataset (170MiB). This dataset is only utilised for the biomodal test command. This can be made read-only for all users after installation.

  • Default location for temporary files if no work_dir or runtime parameter is provided. The preferred approach is for the end-user to specify a user-specific --work-dir location for temporary files. Cascading priority order: --work-dir parameter → work_dirbucket_url

Please see Step 3: Create the nextflow.config file in the installation guide for details on how to create the required /apps/biomodal/duet/nextflow.config configuration file.

Warning

Please make sure you carefully review the recommendations for HPC configurations in this document to ensure the /apps/biomodal/duet/nextflow.config file will accommodate local cluster requirements like queues, mount points, RAM, CPU and disk space resource allocations per duet module

Update or create both the new error configuration files /apps/biomodal/cli/retry.error.config and /apps/biomodal/cli/failfast.error.config following step 4: Update “failfast.error.config” and “retry.error.config” with new settings in the installation guide.

These two error configuration ensures you can switch between Retry (normal) and Fail-Fast (failfast) error strategy during the biomodal init step.

Authentication#

Log in with your biomodal username and password to authenticate and generate the necessary tokens.

cd /apps/biomodal/cli/
./biomodal auth

Note

The authentication process generates tokens that are stored in the ~/.biomodal-auth.json file. Tokens are used during the installation and biomodal init stages. If telemetry is disabled, then no further communication with the biomodal API will take place and the tokens are not required for any users.

Installing and configuring the duet pipeline#

Please note that this step is only required once per site installation using HPC modules.

Please run the biomodal init command to download and setup the duet software and required containers.

Running the duet pipeline installation test mode#

The final step is to load the new biomodal CLI module and run biomodal test to ensure the pipeline is correctly installed.

The commands could look similar to this:

module load biomodal-cli
biomodal test

Assuming the biomodal test step completed successfully, your HPC users should now be able to load the new biomodal CLI and run analysis using biomodal analyse , or alternatively biomodal analyze, with the parameters they require.