Core Workflow#
The modality XPLR Core Workflow is a simple-to-use analysis tool that leverages bash scripting to automate key modality XPLR commands on biomodal data contained in the Zarr store. The Core Workflow is well-suited for exploration of pilot study data, or for new users who want rapid insights from their data through a standardised, low-code approach.
For experienced users who want a deep dive into the Core Workflow script, proceed directly to the Advanced analysis workflow documentation.
This section describes how to set up and run the modality XPLR Core Workflow.
This workflow is designed for Human, biomodal 6-base datasets (including 5mC and 5hmC), that are aligned to genome reference hg38.
The workflow is pre-configured to analyse gene bodies and promoter regions only. See the Pre-prepared BED files section in the main documentation for more information on how the region files were generated.
Gene bodies:
gencode.v44.human.genes.annotation.bed.gzPromoter regions:
gencode.v44.human.promoters.annotation.bed.gz
The Core Workflow performs the following analyses and outputs:
Biological QC
biological-qcto generate an HTML report summarising the dataset and sample information, with Pearson correlation and PCA plots for genome-wide5mCand5hmCdata. See Biological QC for more information.get meanto generate an HTML report (violin and correlation plots) and a.tsvdata file, showing the mean CpG coverage in gene bodies and promoter regions. Context depth filter is not applied to this command. See Extracting methylation statistics for more information on this and othergetcommands.get regional-fracto generate an HTML report (violin and correlation plots) and a.tsvdata file, showing the fraction of5mCand5hmC(overtotal_c) in gene bodies and promoter regions, with an optional minimum depth filter applied.get countto generate a.tsvdata file containing the total number of contexts (total_c) in each gene body and promoter region. Context depth filter is not applied to this command, as it is not applicable to thecountcommand.
Biological insight
dmr callto identify differentially methylated regions (DMRs) in gene bodies and promoter regions, for both5mCand5hmCcontexts, across all specified groups with an optional context depth filter applied. This command generates separate.bedfiles for each methylation context and group pair. See Calling differentially methylated regions (DMRs) for more information.dmr plotto generate an HTML report for each DMR results file, showing summary information and unfiltered results as a volcano plot and p-value distributions. See Plotting differentially methylated regions (DMRs) for more information.dmr plotwith filtering to generate an HTML report for each DMR results file, showing summary information and filtered results (q-value < 0.05) as a volcano plot and p-value distributions. When filtering is applied, a new DMR results.bedfile is generated containing only the filtered DMRs for each region type and methylation context.
Each DMR analysis is also accompanied by an .ini configuration file that can be used for visualising DMR result with modality tracks, see Generate genomic track plots for more information.
The Core Workflow is well-suited for studies involving:
Disease vs control comparisons
Treatment vs untreated samples
Time-course experiments
Multi-group studies with covariates
Video: Running the modality XPLR Core Workflow#
Requirements and prerequisites#
Software requirements:
modality XPLR (version 1.0.0 or later is recommended, see Installation of modality XPLR)
Bash shell (version 4.0 or later)
Standard Unix utilities (awk, sed, find, etc.)
Data input and format requirements:
Zarr store(s): Generated by the upstream biomodal duet software, containing methylation data. Zarr stores must be readable by modality XPLR (generated by biomodal duet software v1.2 or later).
Sample metadata:
.csvor.tsvfile with sample information, group names and optional covariates (also referred to as the sample sheet). The metadata file (Sample sheet) must be formatted as described in the main documentation.
Core Workflow configuration#
For ease-of-use, the Core Workflow can be configured and downloaded using the Interactive Core Workflow generator tool below.
This generator allows you to specify input parameters, and then download a complete .zip folder containing the workflow script, configuration file, and pre-prepared gene and promoter regions files.
Core Workflow folder contents:
modality-core-workflow_YYYYMMDD_HHMMSS/
├── core-workflow-v1.3.sh # The Core Workflow script
├── config.txt # Configuration file for modality XPLR commands
├── Regions/ # Directory containing pre-prepared region files
│ ├── gencode.v44.human.genes.annotation.bed.gz
│ └── gencode.v44.human.promoters.annotation.bed.gz
├── README.txt # Usage instructions
└── Output/ # Output directory(created during execution)
Interactive Core Workflow generator#
🔧 Core Workflow generator
Generate the modality XPLR Core Workflow .zip folder with a standardised directory structure. The folder name is timestamped upon creation and includes the workflow script, configuration file, and pre-prepared genomic region files. The displayed form values below are examples only, and must be replaced in order to populate the config.txt. After generating the .zip folder, a preview panel will appear. Check that your settings have been written correctly. * denotes required fields
Running the Core Workflow#
To run the Core Workflow, follow these steps:
Download the Core Workflow .zip folder to your working location and extract it.
In the terminal, run the following lines.
# Change to the Core Workflow directory
cd /path/to/modality-core-workflow_YYYYMMDD_HHMMSS
# Make the script executable
chmod +x core-workflow-v1.3.sh
# Run the Core Workflow script
./core-workflow-v1.3.sh
Monitor the output logs for progress and any potential errors.
Advanced Core Workflow usage#
The config.txt file contains all the parameters for running the Core Workflow script. You should not need to modify this file, as it is pre-configured by the Interactive Core Workflow generator.
Advanced users can modify the config.txt file to change parameters such as the Zarr store path, metadata file, group column, covariates, and minimum depth filter. In addition, the Output directory path can be changed to a different location if required.
This may be useful if you want to re-run the analysis with different parameters (e.g. context depth filter, or covariates), without needing to download the Core Workflow folder again.
It is also possible to provide additional region files in the Regions/ directory, or to change the region files used by the Core Workflow.
Refer to the main documentation for the required format of the Region files.
Apart from the Biological QC HTML report (which operates on Zarr store(s)), the Core Workflow will perform all analysis steps on each region file found in the Regions/ directory.
For advanced users who are familiar with bash scripting, a description of the Core Workflow code is provided in the Advanced analysis workflow section, which can be used to customise the workflow further.
Interpreting Core Workflow outputs#
The Core Workflow generates a comprehensive set of outputs organised into structured directories. Understanding this organisation is crucial for interpreting results and sharing findings. Each analysis produces timestamped outputs and provenance metadata for traceability and reproducibility of the results.
The workflow organises outputs by creating a subdirectory within both Results/ and Reports/ for each region file processed. This means that all analysis results (feature extraction, DMR calling, and DMR visualisation) related to a specific region file (e.g., genes, promoters) are grouped together in their respective subdirectories. Similarly, all HTML reports related to a specific region file are organised in matching subdirectories within the Reports/ directory. The Biological QC results, which are not related to a specific region file input, are stored in a BioQC_ timestamped directory in Results/ and the QC HTML report is placed directly in the main Reports/ directory.
Video: Reviewing Core Workflow results#
Output directory structure#
A modality_cli.log file is generated in the working directory and contains logging information relating to the modality XPLR CLI commands only (not the Core Workflow bash outputs).
This file is generated automatically by modality XPLR and is not directly controlled by the Core Workflow script. Other outputs are described below.
Output/
├── Results/ # All analysis outputs
│ ├── BioQC_YYYYMMDD_HHMMSS/ # Biological QC results
│ │ ├── biological_qc_report_*.html # Biological QC HTML report
│ │ └── modality_metadata_YYYYMMD_HHMMSS.json # Provenance metadata file
│ ├── <region_file_name_1>/ # Results for first region file (e.g. genes)
│ │ ├── Extract_YYYYMMDD_HHMMSS/ # Feature extraction results
│ │ │ ├── Extract_*.tsv.gz # Feature extraction results file (compressed)
│ │ │ ├── Extract_*.html # Violin and Correlation plots (mean and regional-frac operations only)
│ │ │ └── modality_metadata_YYYYMMD_HHMMSS.json # Provenance metadata file
│ │ ├── DMR_YYYYMMDD_HHMMSS/ # DMR analysis results
│ │ │ ├── DMR_*.bed # DMR result files
│ │ │ ├── DMR_*.ini # Track configuration files (track plots are not included in this script)
│ │ │ └── modality_metadata_YYYYMMD_HHMMSS.json # Provenance metadata file
│ │ └── DMR_Report_YYYYMMDD_HHMMSS/ # DMR visualisation reports
│ │ ├── DMR_Report_*.html # Unfiltered DMR visualisation report
│ │ ├── DMR_Report_*.max_q_0_05_*.html # Filtered DMR visualisation report
│ │ ├── DMR_*_qval_0_05_filtered_*.bed.gz # Filtered DMR results file (compressed)
│ │ └── modality_metadata_YYYYMMD_HHMMSS.json # Provenance metadata file
│ ├── <region_file_name_2>/ # Results for second region file (e.g. promoters)
│ │ └── ... (same structure as above)
│ └── <region_file_name_N>/ # Results for Nth region file (e.g. if you add additional region files to the Regions/ directory)
│ └── ... (same structure as above)
└── Reports/ # Organised HTML reports
├── biological_qc_report*.html # Biological QC HTML report
├── <region_file_name_1>/ # Reports for first region file (e.g. genes)
│ ├── Extract_*mean*.html # Mean CpG coverage violin and correlation plots
│ ├── Extract_*regional-frac*.html # 5mC and 5hmC fraction violin and correlation plots
│ ├── DMR_Report_*.html # Unfiltered DMR visualisation reports
│ └── DMR_Report_*_max_q_*.html # Filtered DMR visualisation reports
├── <region_file_name_2>/ # Reports for second region file (e.g. promoters)
│ └── ... (same structure as above)
└── <region_file_name_N>/ # Reports for Nth region file (e.g. if you add additional region files to the Regions/ directory)
└── ... (same structure as above)
Reviewing results#
Biological QC
Check sample clustering in the Biological QC HTML report PCA analysis (located in the main
Reports/directory). Use the colour-based grouping options directly on the report to visualise sample relationships. Groups are defined by the metadata (sample sheet) file.Verify expected grouping patterns. Does the data make biological sense for your experimental design?
Identify any outlier samples.
Assess overall sample mean CpG coverage on the Biological QC report (Sample Information table).
Review the region-specific feature extraction plots (violin plots and correlation heatmaps) for mean CpG coverage, and 5mC and 5hmC fraction in the region-specific subdirectories within
Reports/. These visualizations provide insights into feature distributions and sample correlations.
Biological insights
Navigate to the region-specific subdirectories within
Reports/(e.g.,Reports/gencode.v44.human.genes.annotation/).Check the unfiltered DMR HTML reports for each region type and methylation context. Check the volcano plots and p-value distributions.
Focus on significant DMRs by reviewing the filtered DMR HTML reports (files with
max_q_in the name, indicating q-value < 0.05 filtering).If needed, review the DMR results files (
.bed) for specific regions of interest, or to filter or sort based on other metrics.Note the region annotations in the DMR results files, which can be used to gain further biological insights into the DMRs.
Validate findings
Check for consistency with known biology
Consider experimental design and potential confounders
Plan follow-up validation experiments or to increase sample size if necessary
Common result patterns and interpretation#
Healthy DMR patterns:
Detection of significant DMRs between experimental groups
Consistent p-value distributions (uniform with some enrichment near zero)
Biologically meaningful effect sizes
Concordant patterns between 5mC and 5hmC where expected
Potential issues to watch for:
Poor group separation: May indicate experimental design issues
Skewed p-value distributions: May benefit from Overdispersion correction or other statistical adjustments
Very large effect sizes: May indicate technical artifacts
Inconsistent patterns: Could suggest batch effects or confounders
Biological interpretation guidelines:
Promoter hypermethylation: Often associated with deactivated regions
5hmC enrichment: Often associated with active demethylation (transition to active regions), or a marker of active gene bodies.
Concordant changes: 5mC loss with 5hmC gain may suggest active demethylation
Advanced analysis workflow#
This section provides a more detailed description of the Core Workflow bash script, which may be beneficial to users who want to modify the script directly or build their own workflows.
The direct download link to the Core Workflow script is provided below, which provides the .sh file only.
Download: core-workflow-v1.3.sh - Core Workflow script assembled from all code sections below.
Download a test dataset#
You can download a test folder containing mouse ES-E14 data to run the Core Workflow. The outputs are not intended to reveal biological insight, as the samples are technical replicates. However, it can be used to test that the script runs to completion.
The E14-example test folder contains:
A sample metadata file
E14_chromosome_19.tsvA zarr store
E14_chromosome_19.zarrzcontaining methylation dataA directory
Regionscontaininggencode.vM25.mouse.promoters.annotation.bedA
config.txtfile with example parameters
Download: E14-example.zip
Note
The E14-example.zip does not contain the core-workflow-v1.3.sh script. This will need to be downloaded separately and run from within the E14-example directory after extraction.
Advanced configuration file setup#
The Core Workflow uses a simple config.txt file for all settings. This design separates configuration from code, making it easy to run the same pipeline with different datasets or parameters.
If you are new to modality XPLR or have not yet run the Core Workflow, we suggest using the Interactive Core Workflow generator to get started.
Configuration file:
This file must be present in your working directory before running the pipeline, with the name config.txt.
Alternatively use the Advanced interactive configuration file generator below to create this file with additional options.
The aim of the config.txt is to pre-define all parameters required by the workflow script. If you modify the Core Workflow script to include additional options or settings, these can be added to the config.txt as variables.
For the Core Workflow, the config.txt file should contain the following parameters (parameter values are examples):
Zarr=/path/to/data.zarrz
Metadata=/path/to/metadata.csv
Group_Column=label
Condition_Order=condition1,condition2
Covariates=age,batch,sex
QC_Region=chr1
Regions_Directory=/path/to/regions
Main_Output=/path/to/output_dir
Depth_Filter=5
Overdispersion=False
Configuration parameters explained:
Zarr: Path to zarr store(s). Use space separation for multiple paths
Metadata: Path to
.csvor.tsvsample sheet file containing sample metadataGroup_Column: Column name in the metadata file, for grouping samples (e.g., “treatment”, “condition”)
Covariates: Space-separated list of covariate columns (optional) (e.g., “age batch”, “sex race” etc.)
QC_Region: Region specification for Biological QC analysis (optional) (e.g., “chr1” or “chr1:1000-2000”)
Regions_directory: Directory containing BED/TSV files in BED3+ format, with genomic regions and annotations. See Region files for details
Main_Output: Base directory where all results will be written
Depth_Filter: Minimum depth filter for analyses (leave it blank to disable filtering)
Overdispersion: Apply overdispersion correction for DMR calling (“True” or “False”, optional)
For a description of Sample sheet and Region files formats, see the linked modality XPLR documentation sections.
The config.txt can be generated using the Advanced interactive configuration file generator below or created manually in a text editor.
Advanced interactive configuration file generator#
🔧 Advanced Configuration File Generator
Use this form to generate and download your the config.txt file. A download link for the core-workflow-v1.3.sh is also provided. The displayed values are indications only, and must be replaced in order to populate the config.txt. After generating the file, a preview panel will appear. Check that your settings have been written correctly.
Core Workflow overview: step-by-step#
The Core Workflow script is organised into distinct sections that can be modified independently:
Setup and Initialisation: Script configuration and logging setup
Input Validation: Comprehensive checks of all inputs before execution
Analysis Steps: Four sequential analysis stages, designed to deliver Biological QC and Biological Insight analyses.
Output Management: Organised collection and reporting of results
Script setup and configuration#
The script begins with essential setup including logging configuration and error handling.
Note
This code block includes instructions for enabling logging to a file, which may be useful for debugging and tracking script execution.
To enable, uncomment (remove #) from the code lines as directed.
#!/usr/bin/env bash
###############################################################################
# modality XPLR Core Workflow Pipeline
#
# This script runs a comprehensive analysis pipeline including:
# - Biological QC
# - Feature extraction
# - DMR calling and visualisation
#
# Input requirements:
# - config.txt file in current directory with required parameters
# - Zarr store(s) containing methylation data
# - Sample metadata CSV or TSV file
# - Directory with region BED/TSV files
#
# This script is intended for running on modality XPLR version 1.0.0 or later.
#
###############################################################################
#------------------------------------------------------------------------------
# CONFIGURATION
#------------------------------------------------------------------------------
# Set script behaviour
set -euo pipefail
# Record start time for performance tracking
start_time=$(date +%s)
# Log file setup - uncomment to enable logging to file
# LOG_FILE="./modality_pipeline_$(date +%Y%m%d_%H%M%S).log"
# exec > >(tee -a "$LOG_FILE") 2>&1
Helper functions#
These logging functions provide consistent feedback throughout the pipeline execution.
#------------------------------------------------------------------------------
# HELPER FUNCTIONS
#------------------------------------------------------------------------------
# Logging functions
log_info() {
echo "[INFO] $(date '+%Y-%m-%d %H:%M:%S') - $1"
}
log_warning() {
echo "[WARNING] $(date '+%Y-%m-%d %H:%M:%S') - $1" >&2
}
log_error() {
echo "[ERROR] $(date '+%Y-%m-%d %H:%M:%S') - $1" >&2
}
log_debug() {
if [[ "${DEBUG:-false}" == "true" ]]; then
echo "[DEBUG] $(date '+%Y-%m-%d %H:%M:%S') - $1"
fi
}
Utility functions#
These utility functions handle common tasks like directory validation and report collection.
# Helper to trim whitespace
trim() {
awk '{$1=$1};1'
}
# Function to check if a header exists in metadata
header_exists() {
local col="$1"
local headers=("${@:2}")
for h in "${headers[@]}"; do
[[ "$h" == "$col" ]] && return 0
done
return 1
}
# Function to create directories with validation
create_directory() {
local dir="$1"
if [[ ! -d "$dir" ]]; then
mkdir -p "$dir" 2>/dev/null || { log_error "Cannot create directory $dir"; exit 1; }
fi
if [[ ! -w "$dir" ]]; then
log_error "No write access to directory $dir"
exit 1
fi
log_info "Directory ready: $dir"
}
# Function to copy reports to destination
copy_reports() {
local source_dir="$1"
local pattern="$2"
local dest_dir="$3"
local description="$4"
log_info "Copying $description HTML reports to $dest_dir..."
local files=$(find "$source_dir" -type f -name "$pattern")
if [[ -z "$files" ]]; then
log_error "No $description HTML reports found in $source_dir"
return 1
fi
local count=0
while IFS= read -r file; do
cp "$file" "$dest_dir/"
((count++))
done <<< "$files"
log_info "Copied $count $description HTML report(s)"
return 0
}
Initialisation and configuration loading#
This section initialises the script and loads the configuration file. If you want to change the name of the input configuration file from the default config.txt, you can do so by changing the CONFIG_FILE variable in the script.
Similarly, if you want to control the verbosity of workflow logging, you can do so by changing the DEBUG variable in the script to true.
#------------------------------------------------------------------------------
# INITIALISATION
#------------------------------------------------------------------------------
printf "\n=======================================\n"
printf "STEP 0 - CONFIGURATION & INITIALISATION\n"
printf "=======================================\n"
log_info "Starting modality XPLR Core Workflow Pipeline"
# Default values for variables that could be customised
CONFIG_FILE="./config.txt"
DEBUG=false # Set to true to enable debug output
The next part will read the configuration file and set values for all required parameters. If you wish to change existing parameters values, you can do so by editing the configuration (config.txt) file in your working directory.
If you want to add new variables to be used in downstream modality XPLR commands, ensure they are defined here and then prompted to assign a value when reading in the configuration file.
For any variables that should be hard-coded, the value can be set directly in the script and excluded from the configuration file.
#------------------------------------------------------------------------------
# LOAD CONFIGURATION
#------------------------------------------------------------------------------
log_info "Reading configuration from $CONFIG_FILE"
# Validate config file exists
if [[ ! -f "$CONFIG_FILE" ]]; then
log_error "config.txt not found in the current working directory: $(pwd)"
log_error "Please ensure config.txt is present before running this script."
exit 1
fi
# Read config.txt variables
ZARR="" # Path(s) to zarr store(s), comma-separated
METADATA="" # Path to sample metadata CSV or TSV file
GROUP_COLUMN="" # Column name in metadata for grouping samples
COVARIATES="" # Space-separated list of covariate column names (optional)
CONDITION_ORDER="" # Comma-separated order of conditions for DMR analysis (optional)
QC_REGION="" # Region for Biological QC analysis (optional)
REGIONS_DIR="" # Directory containing region BED/TSV files
MAIN_OUTPUT="" # Base output directory
DEPTH_FILTER="" # Minimum depth filter for analyses, e.g. 10 (optional)
OVERDISPERSION="" # Apply overdispersion correction: "True" or "False" (optional)
# Parse config file
while IFS='=' read -r key value; do
key=$(echo "$key" | trim)
value=$(echo "$value" | trim)
case "$key" in
Zarr) ZARR="$value" ;;
Metadata) METADATA="$value" ;;
Group_Column) GROUP_COLUMN="$value" ;;
Covariates) COVARIATES="$value" ;;
Condition_Order) CONDITION_ORDER="$value" ;;
QC_Region) QC_REGION="$value" ;;
Regions_Directory) REGIONS_DIR="$value" ;;
Main_Output) MAIN_OUTPUT="$value" ;;
Depth_Filter) DEPTH_FILTER="$value" ;;
Overdispersion) OVERDISPERSION="$value" ;;
# You could add additional parameters here as needed
esac
done < "$CONFIG_FILE"
# Log parsed configuration
printf "\n\n>>Provided inputs\n"
log_info "ZARR path(s): $ZARR"
log_info "Metadata file: $METADATA"
log_info "Group column: $GROUP_COLUMN"
log_info "Covariates: $COVARIATES"
log_info "Condition order: $CONDITION_ORDER"
log_info "QC region: $QC_REGION"
log_info "Regions directory: $REGIONS_DIR"
log_info "Main output directory: $MAIN_OUTPUT"
log_info "Depth filter: $DEPTH_FILTER"
log_info "Overdispersion correction: $OVERDISPERSION"
Input validation#
This section validates all inputs and ensures they exist and are accessible.
Use this section to add any custom validation checks for your specific workflow requirements, file paths, and access permissions, according to the variables defined in the config.txt file.
#------------------------------------------------------------------------------
# VALIDATE INPUTS
#------------------------------------------------------------------------------
printf "\n\n>>Validation & directory creation\n"
log_info "Validating inputs..."
# Parse and validate zarr paths
if [[ -n "$ZARR" ]]; then
# Split comma-separated zarr paths into array
IFS=',' read -r -a ZARR_RAW_ARRAY <<< "$ZARR"
# Process each zarr path to remove quotes and trim whitespace
ZARR_ARRAY=()
for zarr_raw in "${ZARR_RAW_ARRAY[@]}"; do
# Trim leading/trailing whitespace
zarr_clean=$(echo "$zarr_raw" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
# Remove surrounding single or double quotes if present
zarr_clean=$(echo "$zarr_clean" | sed 's/^["'"'"']\(.*\)["'"'"']$/\1/')
# Add to processed array if not empty
if [[ -n "$zarr_clean" ]]; then
ZARR_ARRAY+=("$zarr_clean")
fi
done
# Log the number of zarr paths identified and their strings
log_info "Identified ${#ZARR_ARRAY[@]} zarr path(s):"
for i in "${!ZARR_ARRAY[@]}"; do
log_info " [$((i+1))] ${ZARR_ARRAY[i]}"
done
# Validate each zarr path
for zarr in "${ZARR_ARRAY[@]}"; do
if [[ ! -s "$zarr" ]]; then
log_error "Zarr file '$zarr' doesn't contain any data or is not readable"
exit 1
fi
done
else
log_error "No zarr paths specified in config.txt"
exit 1
fi
# Process and validate metadata file path
if [[ -n "$METADATA" ]]; then
# Trim leading/trailing whitespace
METADATA=$(echo "$METADATA" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
# Remove surrounding single or double quotes if present
METADATA=$(echo "$METADATA" | sed 's/^["'"'"']\(.*\)["'"'"']$/\1/')
log_info "Processed metadata file path: $METADATA"
else
log_error "No metadata file specified in config.txt"
exit 1
fi
# Process and validate main output directory path
if [[ -n "$MAIN_OUTPUT" ]]; then
# Trim leading/trailing whitespace
MAIN_OUTPUT=$(echo "$MAIN_OUTPUT" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
# Remove surrounding single or double quotes if present
MAIN_OUTPUT=$(echo "$MAIN_OUTPUT" | sed 's/^["'"'"']\(.*\)["'"'"']$/\1/')
log_info "Processed main output directory: $MAIN_OUTPUT"
else
log_error "No main output directory specified in config.txt"
exit 1
fi
# Process and validate regions directory path
if [[ -n "$REGIONS_DIR" ]]; then
# Trim leading/trailing whitespace
REGIONS_DIR=$(echo "$REGIONS_DIR" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
# Remove surrounding single or double quotes if present
REGIONS_DIR=$(echo "$REGIONS_DIR" | sed 's/^["'"'"']\(.*\)["'"'"']$/\1/')
log_info "Processed regions directory: $REGIONS_DIR"
else
log_error "No regions directory specified in config.txt"
exit 1
fi
# Check metadata file
log_info "Validating metadata file headers..."
if [[ ! -r "$METADATA" ]]; then
log_error "Metadata file not found or not readable: $METADATA"
exit 1
fi
# Check group column and covariates in metadata
HEADER_LINE=$(head -n 1 "$METADATA")
# Remove any line ending characters (Windows \r\n, Mac \r, Unix \n)
HEADER_LINE=$(echo "$HEADER_LINE" | tr -d '\r\n')
# Detect delimiter: use tab if present, otherwise comma
if [[ "$HEADER_LINE" == *$'\t'* ]]; then
DELIM=$'\t'
else
DELIM=','
fi
# Split header into array using the detected delimiter
IFS="$DELIM" read -r -a HEADERS_RAW <<< "$HEADER_LINE"
# Trim whitespace from each header individually
HEADERS=()
for header in "${HEADERS_RAW[@]}"; do
# Trim leading and trailing whitespace from each header
header_clean=$(echo "$header" | sed 's/^[[:space:]]*//;s/[[:space:]]*$//')
if [[ -n "$header_clean" ]]; then
HEADERS+=("$header_clean")
fi
done
if [[ -n "$GROUP_COLUMN" ]]; then
if ! header_exists "$GROUP_COLUMN" "${HEADERS[@]}"; then
log_error "Group column '$GROUP_COLUMN' not found in metadata file header. Please review the metadata file and provide a valid group column."
exit 1
fi
fi
if [[ -n "$COVARIATES" ]]; then
# Convert space-separated list to array
read -r -a COVS <<< "$COVARIATES"
for cov in "${COVS[@]}"; do
cov=$(echo "$cov" | trim)
if ! header_exists "$cov" "${HEADERS[@]}"; then
log_error "Covariate column '$cov' not found in metadata file header"
exit 1
fi
done
fi
log_info "Metadata validation passed"
# Check regions directory
if [[ ! -d "$REGIONS_DIR" ]]; then
log_error "Regions directory not found: $REGIONS_DIR"
exit 1
fi
Output directory setup#
This section creates and defines output sub-directories in MAIN_OUTPUT that can be used to organise the results.
By default, all results for each modality XPLR CLI operation (including provenance metadata) will be written to the Results sub-directory. Then, all HTML reports and visualisations will be copied to the Reports sub-directory for quick access.
All modality XPLR outputs are timestamped, however you may wish to modify this section to add additional sub-directories for specific analyses, or to change the output directory structure.
The temp sub-directory is used to store intermediate files, such as the merged regions file, and can optionally be deleted after the workflow completes. By default, no delete command is included.
#------------------------------------------------------------------------------
# SETUP OUTPUT DIRECTORIES
#------------------------------------------------------------------------------
log_info "Creating output directories under $MAIN_OUTPUT..."
# Create output directories
for sub in "" "Results" "Reports"; do
dir="$MAIN_OUTPUT"
[[ -n "$sub" ]] && dir="$MAIN_OUTPUT/$sub"
create_directory "$dir"
done
# Define output subdirectories for easier reference
RESULTS_DIR="$MAIN_OUTPUT/Results"
REPORTS_DIR="$MAIN_OUTPUT/Reports"
# Create subdirectories for each region file found in the regions directory, to organise outputs by region file input.
log_info "Creating region-specific subdirectories..."
for bedfile in "$REGIONS_DIR"/*.bed "$REGIONS_DIR"/*.bed.gz "$REGIONS_DIR"/*.tsv "$REGIONS_DIR"/*.tsv.gz; do
# Skip if no files match pattern
[[ -e "$bedfile" ]] || continue
[[ "${METADATA##*.}" == "tsv" && $(realpath "$bedfile") == $(realpath "$METADATA") ]] && continue
# Extract region file name without path and extension
region_name=$(basename "$bedfile")
region_name="${region_name%%.bed.gz}"
region_name="${region_name%%.bed}"
region_name="${region_name%%.tsv.gz}"
region_name="${region_name%%.tsv}"
# Create region-specific subdirectories in both Results and Reports
region_results_dir="$RESULTS_DIR/$region_name"
region_reports_dir="$REPORTS_DIR/$region_name"
create_directory "$region_results_dir"
create_directory "$region_reports_dir"
log_info "Created region directories: $region_results_dir and $region_reports_dir"
done
Step 1: Biological QC HTML report#
The first analysis step performs biological quality control to provide Zarr store summaries, and to identify sample relationships.
What this script block does:
Generates correlation matrices between samples
Creates PCA plots to visualise sample clustering (grouping options are available on the HTML report)
Calculates mean CpG coverage for each sample
Key parameters:
Uses all zarr stores specified in configuration (
config.txt)Allows sample metadata to be used for dynamic colour-based grouping, when available
Note
The Core Workflow does not include the --aggregate-by-group option, which could be included to aggregate sample data for technical replicates prior to QC.
#------------------------------------------------------------------------------
# STEP 1: BIOLOGICAL QC
#------------------------------------------------------------------------------
printf "\n\n======================================\n"
printf "STEP 1: RUNNING BIOLOGICAL QC ANALYSES\n"
printf "======================================\n"
# Build region argument if QC_REGION is set
REGION_ARG=""
[[ -n "$QC_REGION" ]] && REGION_ARG="--region $QC_REGION"
# Run QC without aggregation
log_info "Running biological QC without aggregation..."
modality -v biological-qc \
--zarr-path "${ZARR_ARRAY[@]}" \
--sample-sheet "$METADATA" \
$REGION_ARG \
--order-by-group "$GROUP_COLUMN" \
--output-dir "$RESULTS_DIR"
# Copy QC reports to Reports directory
copy_reports "$RESULTS_DIR" "biological_qc_report*.html" "$REPORTS_DIR" "Biological QC" || exit 1
Expected outputs:
biological_qc_report.html: HTML report with sample-wise plots and statistics.Provenence metadata
.jsonfiles, stored in each Results/BioQC_YYYYMMDD_HHMMSS sub-directory that is generated by each modality XPLR CLI command.
Interpreting QC results:
Correlation heatmaps: Correlation coefficients close to 1 indicate high similarity of 5mC or 5hmC levels between samples
PCA plots: Separation and/or clustering between experimental groups indicates the % variability that is explained by the first two principal components, for 5mC or 5hmC.
Coverage statistics: The mean CpG coverage per sample is an indicator of the usable depth for methylation analysis. See the next step for feature extraction (region-based) coverage statistics.
Outlier detection: Samples that don’t cluster with their expected group may need investigation. Removing a sample from the metadata (sample sheet) file will exclude it from future analysis, without needing to modify the original Zarr file.
Step 2: Region-based Biological QC#
This step used the modality get tool to extract methylation features (statistics) from user-specified genomic regions, for additional quality control and analysis.
The regions can be defined in BED3+ format (.bed` or ``.tsv) files, that are accessible in the REGIONS_DIR, defined in config.txt.
For the Core Workflow script to run correctly, region files must be present. Pre-prepared BED files are provided to get started, or custom files following the same format can be used instead. Use the correct region files for the reference genome used for alignment in the duet Software.
Tip
The Annotation (e.g. gene, or promoter) and Name (e.g. associated gene name) columns should be completed for meaningful interpretation.
Regions without an Annotation value will be skipped.
The pipeline performs multiple types of statistics extraction to provide comprehensive regional methylation data.
Each statistic type is processed separately by modality XPLR, and the results are stored in sub-directories named according to the extraction type (e.g. Extract_count, Extract_mean, Extract_regional_frac).
Note
Only modality get mean and modality get regional-frac commands support plotting (correlation or violin), which are useful for visualising the distribution of methylation levels across samples.
If no HTML report is generated, then the results will only be written to MAIN_OUTPUT/Results/Extract_* sub-directories as .tsv.gz files.
HTML reports will be copied to the MAIN_OUTPUT/Reports sub-directory for easy access.
What this script block does:
Count extraction: Returns the total number of CpG contexts (
total_c) for each region.Mean calculation: Average coverage of CpG contexts (
total_c) for each region, with violin and correlation plots for individual samples.Regional fraction: Methylation fractions for 5mC and 5hmC at the pre-specified minimum``DEPTH_FILTER`` value, with violin and correlation plots for individual samples.
Processes each region file in the
REGIONS_DIRindependently. Therefore, expect individual results for each region file and extraction type.
#------------------------------------------------------------------------------
# STEP 2: FEATURE EXTRACTION
#------------------------------------------------------------------------------
printf "\n\n==================================================\n"
printf "STEP 2: RUNNING FEATURE EXTRACTION ON REGION FILES\n"
printf "==================================================\n"
# Process each region file for feature extraction
for bedfile in "$REGIONS_DIR"/*.bed "$REGIONS_DIR"/*.bed.gz "$REGIONS_DIR"/*.tsv "$REGIONS_DIR"/*.tsv.gz; do
# Skip if no files match pattern
[[ -e "$bedfile" ]] || continue
[[ "${METADATA##*.}" == "tsv" && $(realpath "$bedfile") == $(realpath "$METADATA") ]] && continue
log_info "Processing region file: $bedfile"
# Extract region file name without path and extension
region_name=$(basename "$bedfile")
region_name="${region_name%%.bed.gz}"
region_name="${region_name%%.bed}"
region_name="${region_name%%.tsv.gz}"
region_name="${region_name%%.tsv}"
# Set region-specific output directories
region_output_dir="$RESULTS_DIR/$region_name"
region_reports_dir="$REPORTS_DIR/$region_name"
# 1. Extract count for total_c
log_info "Running feature extraction: count total_c..."
modality -v get count \
--zarr-path "${ZARR_ARRAY[@]}" \
--sample-sheet "$METADATA" \
--bedfile "$bedfile" \
--fields total_c \
--output-dir "$region_output_dir"
# 2. Extract mean for total_c with plots generated automatically
log_info "Running feature extraction: mean total_c (plots generated automatically)..."
modality -v get mean \
--zarr-path "${ZARR_ARRAY[@]}" \
--sample-sheet "$METADATA" \
--bedfile "$bedfile" \
--fields total_c \
--order-by-group "$GROUP_COLUMN" \
--output-dir "$region_output_dir"
# Copy mean HTML reports immediately
for extract_dir in "$region_output_dir"/Extract_*; do
if [[ -d "$extract_dir" ]]; then
find "$extract_dir" -type f -name "*mean_extract_report*.html" -exec cp {} "$region_reports_dir/" \; 2>/dev/null || true
fi
done
# 3. Extract regional-frac for mc and hmc with depth filter (plots generated automatically)
log_info "Running feature extraction: regional-frac for mc and hmc with minimum depth $DEPTH_FILTER (plots generated automatically)..."
# Build the optional argument if DEPTH_FILTER is set
MCV_ARG=""
[[ -n "$DEPTH_FILTER" ]] && MCV_ARG="--min-coverage $DEPTH_FILTER"
# --min-coverage 30, if DEPTH_FILTER=30 in config.txt
modality -v get regional-frac \
--zarr-path "${ZARR_ARRAY[@]}" \
--sample-sheet "$METADATA" \
--bedfile "$bedfile" \
--fields mc hmc \
$MCV_ARG \
--order-by-group "$GROUP_COLUMN" \
--output-dir "$region_output_dir"
# Copy regional-frac HTML reports immediately
for extract_dir in "$region_output_dir"/Extract_*; do
if [[ -d "$extract_dir" ]]; then
find "$extract_dir" -type f -name "*regional-frac_extract_report*.html" -exec cp {} "$region_reports_dir/" \; 2>/dev/null || true
fi
done
log_info "Feature extraction completed for $region_name"
done
log_info "Feature extraction HTML reports copied to region-specific directories"
Expected outputs:
Extract_*directories: Organised by statistic type and region file*.tsv.gzfiles: Compressed quantitative results for each extracted statistic, by region and sample*.htmlreports: Interactive violin and correlation plots showing methylation statistics formeanandregional-fracextraction types.
Note
The optional regional dendrogram heatmap is not included in the Core Workflow script by default but can be added by modifying the modality get mean and/or modality get regional-frac command(s) to include --heatmap. Note that the heatmap will only be drawn if there are ≤200 regions in the region file, so it is not appropriate for use with all promoters and gene bodies. Instead, add a custom subset region file to the Regions directory before running the Core Workflow.
Interpreting region-based Biological QC statistics:
Region-specific CpG coverage: Should show adequate and consistent coverage across samples and may be used to inform the
DEPTH_FILTERvalue for future analyses.Methylation fraction distributions: Violin and correlation plots reveal methylation heterogeneity.
Outlier samples: Samples with unusual distributions may need investigation.
Customising statistics extraction:
You can modify this section of the script to:
Add different field types (see Zarr store and Extracting methylation statistics sections in the main modality XPLR documentation).
Adjust minimum coverage thresholds for regional fraction extraction.
Include additional statistical summaries (
count,mean,sum,regional-frac).Add an optional
--heatmapparameter to generate regional dendrogram heatmaps formeanand/orregional-fracextraction types with ≤200 regions.
Step 3: DMR calling#
This step identifies differentially methylated regions using statistical modelling, processing each region file individually. DMR calling is the core analytical component that identifies regions with significant methylation differences between two experimental groups.
What this script block does:
Processes each
.bedand.tsvfile in the regions directory separatelyPerforms statistical testing for differential methylation on each region file (see Calling differentially methylated regions (DMRs))
Analyses both 5mC and 5hmC modifications separately
Analyses all available condition pairs from the metadata, preserving the sample sheet order for use with the
--condition-orderparameter. Ensure the control group is the first one listed in the metadata file; or analyses only the conditions specified in theconfig.txt, with the left-most condition name as the control group.Applies multiple testing correction (FDR)
Incorporates covariates if specified
Note
The DMR calling does not enable --overdispersion for all analyses by default. For the Core Workflow, overdispersion correction is controlled by the Overdispersion parameter in the configuration file and can be set to “True” or “False”. When set to “True”, overdispersion correction is applied to all DMR analyses.
The recommendation is to review DMR results prior to applying overdispersion correction, as this can lead to more conservative results.
However, overdispersion correction may be useful when calling DMRs on large regions, such as gene bodies. For more information on when to use overdispersion, see Overdispersion correction.
Key analysis parameters:
Methylation contexts: Analyses both
num_mcandnum_hmcby defaultDepth filtering: Should be applied according to the sequencing depth and methylation context coverage, and specified in the
config.txtfileCovariate adjustment: Controls for confounding variables when specified in the
config.txtfile
#------------------------------------------------------------------------------
# STEP 3: DMR CALLING
#------------------------------------------------------------------------------
printf "\n\n===========================\n"
printf "STEP 3: RUNNING DMR CALLING\n"
printf "===========================\n"
# Build covariate / depth_filter / condition_order / overdispersion flags if present
COV_FLAGS=""
DEPTH_ARG=""
CONDITION_ARG=""
OVERDISPERSION_ARG=""
[[ -n "$DEPTH_FILTER" ]] && DEPTH_ARG="--filter-context-depth $DEPTH_FILTER"
[[ -n "$CONDITION_ORDER" ]] && CONDITION_ARG="--condition-order $CONDITION_ORDER"
[[ "$OVERDISPERSION" == "True" ]] && OVERDISPERSION_ARG="--overdispersion"
if [[ -n "$COVARIATES" ]]; then
# Use space-separated covariates directly for --covariates flag
COV_FLAGS="--covariates $COVARIATES"
fi
# Run DMR calling on each region file individually
for bedfile in "$REGIONS_DIR"/*.bed "$REGIONS_DIR"/*.bed.gz "$REGIONS_DIR"/*.tsv "$REGIONS_DIR"/*.tsv.gz; do
# Skip if no files match pattern
[[ -e "$bedfile" ]] || continue
[[ "${METADATA##*.}" == "tsv" && $(realpath "$bedfile") == $(realpath "$METADATA") ]] && continue
log_info "Running DMR calling on region file: $bedfile"
# Extract region file name without path and extension
region_name=$(basename "$bedfile")
region_name="${region_name%%.bed.gz}"
region_name="${region_name%%.bed}"
region_name="${region_name%%.tsv.gz}"
region_name="${region_name%%.tsv}"
# Set region-specific output directory
region_output_dir="$RESULTS_DIR/$region_name"
# Run DMR calling
# --covariates cov1 cov2, if COVARIATES="cov1 cov2" in config.txt
# --filter-context-depth 15, if DEPTH_FILTER=15 in config.txt
# --condition-order condition1,condition2, if CONDITION_ORDER="condition1,condition2" in config.txt
# --overdispersion is applied if OVERDISPERSION="True" in config.txt
modality -v dmr call \
--zarr-path "${ZARR_ARRAY[@]}" \
--sample-sheet "$METADATA" \
--condition-array-name "$GROUP_COLUMN" \
--methylation-contexts num_mc num_hmc \
--bedfile "$bedfile" \
$COV_FLAGS \
$DEPTH_ARG \
$CONDITION_ARG \
$OVERDISPERSION_ARG \
--output-dir "$region_output_dir"
log_info "DMR calling completed for $bedfile"
done
log_info "All DMR calling completed"
Expected outputs:
DMR_*directory: All DMR results (for all group pairs and contexts) are stored in a single directory with a timestamp*.bedfiles: DMR results in BED3+ format for each condition pair and methylation context*.inifiles: Configuration files for genomic visualisation withmodality tracks(not included as part of the Core Workflow). One.inifile is generated for each DMR.bedresult file
Step 4: DMR visualisation#
The final step generates visualisations for DMR results. This includes both unfiltered and filtered plots to help identify the most significant regions.
What this script block does:
Creates HTML reports for each DMR result file (condition pair) and methylation context (5mC and 5hmC)
HTML reports contain a volcano plot (modification difference vs -log10 q-value) and a p-value distribution histogram
HTML reports are generated for both unfiltered DMR results, and filtered DMR results based on a q-value threshold of ≤ 0.05
When filtering is applied, a new filtered, compressed DMR results file (
.bed.gz) is also generated
Important
The filtering of DMR results is optional and can be removed by omitting the --filter-qvalue parameter in the modality dmr plot command below.
Alternatively, the q-value threshold can be modified, or the additional --filter-mod-difference parameter can be added to filter based on the magnitude of methylation difference.
It is recommended to review the DMR results prior to applying filtering to determine the most appropriate thresholds, or to assess whether the analysis would benefit from overdispersion correction (if not already applied).
#------------------------------------------------------------------------------
# STEP 4: DMR VISUALISATION
#------------------------------------------------------------------------------
printf "\n\n=====================================\n"
printf "STEP 4: GENERATING DMR VISUALISATIONS\n"
printf "=====================================\n"
# Loop through each region directory
for region_dir in "$RESULTS_DIR"/*; do
[[ -d "$region_dir" ]] || continue
region_name=$(basename "$region_dir")
# Skip if this is not a region directory (e.g., if it's a BioQC directory)
[[ "$region_name" == "BioQC_"* ]] && continue
log_info "Processing DMR visualisation for region: $region_name"
# Set region-specific reports directory
region_reports_dir="$REPORTS_DIR/$region_name"
# Loop through each DMR results directory within this region
for dmr_dir in "$region_dir"/DMR_*; do
[[ -d "$dmr_dir" ]] || continue
# For each .bed file in the DMR directory
for dmrresults in "$dmr_dir"/*.bed; do
[[ -e "$dmrresults" ]] || continue
log_info "Generating DMR plots for $dmrresults..."
# Generate default plots - output to the same region directory
modality -v dmr plot \
--dmr-results "$dmrresults" \
--output-dir "$region_dir"
# Copy unfiltered DMR reports immediately
for dmr_report_dir in "$region_dir"/DMR_Report_*; do
if [[ -d "$dmr_report_dir" ]]; then
find "$dmr_report_dir" -type f -name "*.html" -not -name "*max_q_*" -exec cp {} "$region_reports_dir/" \; 2>/dev/null || true
fi
done
# Generate filtered plots (q-value < 0.05) - output to the same region directory
modality dmr plot \
--dmr-results "$dmrresults" \
--filter-qvalue 0.05 \
--output-dir "$region_dir"
# Copy filtered DMR reports immediately
for dmr_report_dir in "$region_dir"/DMR_Report_*; do
if [[ -d "$dmr_report_dir" ]]; then
find "$dmr_report_dir" -type f -name "*max_q_*.html" -exec cp {} "$region_reports_dir/" \; 2>/dev/null || true
fi
done
log_info "DMR plots generated for $dmrresults"
done
done
log_info "DMR visualisation completed for region: $region_name"
done
log_info "DMR HTML reports copied to region-specific directories"
Interpreting DMR visualisations:
Volcano plots:
Y-axis: -log10(q-value) - higher values indicate more significant DMRs
X-axis: Methylation modification difference - magnitude of effect
Colour coding: Applied if multiple
Annotationvalues are present in the supplied region files (REGIONS_DIR) e.g. promoter, enhancer, etc.Significant regions: Upper left/right quadrants (high significance, large effect)
Hyper- vs Hypo-methylation: negative modification difference (hypo) vs positive modification difference (hyper) on the X-axis
p-value histograms:
Uniform distribution: Indicates good statistical modelling
Enrichment near zero: Suggests presence of true differential methylation, but over-representation may indicate the analysis would benefit from overdispersion correction. See Overdispersion correction for more information.
Unusual patterns: May indicate technical issues or model violations
Quality assessment:
Volcano plot shape: Expect to see a V-shape with most data points clustered near the base and centre. Significant hits may appear as outliers on one or both sides of the x-axis, with high values on the y-axis. The data distribution may or may not by symmetrical, depending on the extent of hyper- and hypo-methylation
Effect size distribution: Biologically meaningful differences (typically
modification difference> 0.1-0.2)Annotation patterns: Different region (
Annotation) types may show distinct methylation patterns
Advanced customisation and extension#
The Core Workflow is designed to be easily adapted for different research needs. Here are common modifications and extensions:
Customising analysis parameters#
Adjusting coverage thresholds:
# In the config.txt or script
DEPTH_FILTER=15 # Increase for higher quality regions
# Or modify specific commands directly using the --min-coverage option
modality get regional-frac \
--zarr-path "${ZARR_ARRAY[@]}" \
--bedfile "$bedfile" \
--fields mc hmc \
--min-coverage 15 \ # Custom minimum coverage
--output-dir "$RESULTS_DIR"
Adding different methylation contexts:
# Include modC analysis instead of 5mC and 5hmC.
modality dmr call \
--methylation-contexts modc \
# ... other parameters
Add overdispersion correction to DMR calling:
# Add additional DMR calling options
modality dmr call \
--overdispersion \
# ... other parameters
Performance optimisation tips#
For large datasets:
Use subset regions for initial testing
Implement parallel processing for independent steps
Monitor disk space and memory usage
Consider using HPC environments for large-scale analysis
For production use:
Add comprehensive logging and monitoring
Implement checkpointing for long-running analyses
Add email notifications for completion/failure
Include resource usage reporting
Script completion and monitoring#
The script includes basic completion reporting and performance tracking:
#------------------------------------------------------------------------------
# COMPLETION
#------------------------------------------------------------------------------
log_info "Pipeline completed successfully. Reports are available in $REPORTS_DIR and region-specific subdirectories"
# Calculate and report duration
end_time=$(date +%s)
duration=$((end_time - start_time))
log_info "Script completed in $duration seconds"
# Exit cleanly
exit 0
Monitoring features:
Execution time tracking: Reports total pipeline duration
Success confirmation: Clear indication of successful completion
Output location reporting: Directs users to results
Clean exit handling: Proper script termination
Monitoring execution:
The script provides real-time logging output. You can also redirect output to a log file using the code below, or code it into the script as described in Script setup and configuration.
./core-workflow-v1.3.sh 2>&1 | tee workflow.log
Expected runtime:
Runtime varies significantly based on:
Dataset size: Number of samples and genomic regions
Analysis complexity: Number of methylation contexts and covariates
System resources: Available CPU and memory
Storage speed: Local vs network storage performance
Typical runtimes:
Small dataset (10 samples, 1000 regions): 10-30 minutes
Medium dataset (50 samples, 5000 regions): 1-3 hours
Large dataset (100+ samples, 10000+ regions): 3-8 hours
This modular workflow provides a comprehensive foundation for methylation analysis using the modality XPLR toolkit. By understanding each component, you can adapt it to meet your specific research requirements while maintaining robust error handling and organised outputs.
Warning
Version compatibility: This pipeline is designed for modality XPLR version 1.0.0 and later. Some features may not be available in earlier versions. Check your modality XPLR version with modality --version before running.
Tip
Getting help: If you encounter issues, first try enabling debug mode (DEBUG=true) for detailed logging. The modular design allows you to run individual sections for troubleshooting specific steps.