HPC software module definition for biomodal CLI and pipelines#
This guide provides a high-level overview of structuring the biomodal CLI and duet as single or multiple HPC software modules. It is intended for HPC administrators creating software modules to facilitate users loading the software from a central location. The document is intentionally generic to accommodate different cluster environments.
Caution
This documentation contains information intended for system administrators
Prerequisites#
Download and unzip the installation scripts. Please see Downloading the installation script.
Example folder structure#
Below are a set of suggested folder names that we use throughout this document. Please align with your specific HPC module software and existing folder structure.
Software component |
Example folder name |
Description |
biomodal CLI |
/apps/biomodal/cli/ |
Files required to run the biomodal CLI |
duet pipeline |
/apps/biomodal/duet/ |
Root folder for the duet pipeline software, this includes the D(h)MR calling workflow. Referenced as “init_folder” in CLI. |
Genome reference pipeline |
/apps/biomodal/duet/ |
Not installed by default. Customers have to explicitly run the biomodal reference download <version> command to install |
Genome reference data |
/shared_data/biomodal/reference/ |
Shared folder for the genome reference data |
Containers |
/shared_data/biomodal/containers/ |
Shared folder for the container images used by the pipelines |
Test dataset |
/apps/biomodal/duet/ |
Shared folder for the test dataset used to test installation and configuration during |
Install the biomodal CLI and duet pipeline#
Please refer to the official documentation for how to manually install the pipeline and software.
Before you start configuring the pipeline configuration files,
please make sure you copy the biomodal CLI scripts and config files into
the /apps/biomodal/cli/
folder.
From the folder you unzipped the downloaded CLI scripts into, please follow these steps:
cp ./cli/biomodal /apps/biomodal/cli/biomodal
cp ./cli/_biomodal_functions.sh /apps/biomodal/cli/_biomodal_functions.sh
cp ./cli/_biomodal_validate.sh /apps/biomodal/cli/_biomodal_validate.sh
cp ./clientConfig.json /apps/biomodal/cli/clientConfig.json
Next you should ensure that all 3 scripts are executable by all cluster users:
chmod a+x /apps/biomodal/cli/biomodal
chmod a+x /apps/biomodal/cli/_biomodal_functions.sh
chmod a+x /apps/biomodal/cli/_biomodal_validate.sh
Create the required configuration files
Please see Step 2: Create the config.yaml file
in the installation guide for details on how to create the required /apps/biomodal/cli/config.yaml
configuration file.
Note
The bucket_url
parameter in /apps/biomodal/cli/config.yaml
serves several purposes:
Default location for the biomodal genome reference data. This is where the biomodal pipelines anticipate locating the genome reference data. This can be made read-only for all users after installation, providing this differs from work_dir.
Default location for the small test dataset (170MiB). This dataset is only utilised for the
biomodal test
command. This can be made read-only for all users after installation.Default location for temporary files if no
work_dir
or runtime parameter is provided. The preferred approach is for the end-user to specify a user-specific--work-dir
location for temporary files. Cascading priority order:--work-dir
parameter →work_dir
→bucket_url
Please see Step 3: Create the nextflow.config file
in the installation guide for details on how to create the required /apps/biomodal/duet/nextflow.config
configuration file.
Warning
Please make sure you carefully review the recommendations for HPC configurations
in this document to ensure the /apps/biomodal/duet/nextflow.config
file will accommodate local
cluster requirements like queues, mount points, RAM, CPU and disk space resource
allocations per duet module
Update or create both the new error configuration files /apps/biomodal/cli/retry.error.config
and /apps/biomodal/cli/failfast.error.config
following
step 4: Update “failfast.error.config” and “retry.error.config” with new settings
in the installation guide.
These two error configuration ensures you can switch between Retry (normal
) and Fail-Fast (failfast
) error strategy during the biomodal init
step.
Authentication#
Log in with your biomodal username and password to authenticate and generate the necessary tokens.
cd /apps/biomodal/cli/
./biomodal auth
Note
The authentication process generates tokens that are stored in the ~/.biomodal-auth.json
file.
Tokens are used during the installation and biomodal init
stages.
If telemetry is disabled, then no further communication with the biomodal API will take place
and the tokens are not required for any users.
Installing and configuring the duet pipeline#
Please note that this step is only required once per site installation using HPC modules.
Please run the biomodal init
command to download and setup the duet
software and required containers.
Running the duet pipeline installation test mode#
The final step is to load the new biomodal CLI module and run biomodal test
to
ensure the pipeline is correctly installed.
The commands could look similar to this:
module load biomodal-cli
biomodal test
Assuming the biomodal test
step completed successfully, your HPC users should
now be able to load the new biomodal CLI and run analysis using biomodal analyse
, or alternatively biomodal analyze
, with the parameters they require.