FAQs and troubleshooting#

Frequently Asked Questions (FAQs)#

This section provides answers to common questions that users may have when using the modality XPLR software. These FAQs are designed to help users navigate the software effectively.

General questions#

What is modality XPLR, and what can I use it for?#

Modality XPLR is a command-line toolkit for analysing DNA methylation data derived from 5-base (duet +modC) and 6-base (duet evoC) genomes. It allows you to identify differentially methylated regions (DMRs), extract methylation statistics, and visualise results. It is designed for molecular biologists, bioinformaticians, and researchers who want to analyse methylation data without requiring python experience.

Can I use modality XPLR with other data types?#

No, modality XPLR is specifically designed for analysing methylation data from duet Zarr stores. It is not readily compatible with other data types or formats.

Do I need programming experience to use modality XPLR?#

No, modality XPLR is designed to be user-friendly for researchers with minimal programming experience. You only need to run commands in the terminal, and the documentation provides step-by-step instructions for each tool.

What is a Zarr store, and why is it important?#

A Zarr store is a compressed file format used to store large datasets, such as methylation data. It allows efficient access to subsets of data without loading the entire dataset into memory. modality XPLR uses Zarr stores as the primary input format for analysis.

Suitable Zarr stores to be used with modality XPLR can be created using the duet software and are found in the output of running pipelines with that software.

Installation and setup#

What are the system requirements for modality XPLR?#

modality XPLR requires macOS or Linux. You need at least 8 GB of RAM, 500 MB of disk space for installation, and Python 3.11.

How do I install modality XPLR?#

We provide instructions for installing modality XPLR for macOS and Linux. See the Installation of modality XPLR for details.

Do I need to install additional tools like Tabix or Bgzip?#

These tools are optional but recommended for performance improvements when working with large genomic files. You can install them using Conda with the following command:

conda install -c bioconda htslib

See the section Optional: Install non-python dependencies, tabix and bgzip for more details.

Input files and formats#

What input files do I need to run modality XPLR?#

The main input files are:

Zarr store files (.zarrz) containing methylation data.
Sample sheet (.csv or .tsv) with metadata about the samples.
BED files (.bed) defining genomic regions of interest (optional).

What formats are supported for the sample sheet?#

modality XPLR supports both CSV (comma-separated values) and TSV (tab-separated values) formats for the metadata sample sheet. Examples in this guide are shown for CSV files, but TSV files can be used interchangeably. The file must contain a column named sample_id that matches the sample IDs in the Zarr store.

How do I create a sample sheet?#

A sample sheet is a CSV file with a column named sample_id that matches the sample IDs in the Zarr store. You can include additional columns for metadata, such as disease_stage or smoker_status which may be used for grouping samples or flagging covariates. modality XPLR is designed so that a single sample sheet for your study can be used for multiple analyses.

Example sample sheet

Table 1 Example sample sheet#
index	sample_id	Input DNA Quantity (ng/sample)	Protocol Version	sample_type	condition	sex	age	aggregate-sample
0	Sample-01	5	duet evoC	cfDNA	CONTROL	FEMALE	54	Sample-01
1	Sample-02	5	duet evoC	cfDNA	CONTROL	MALE	62	Sample-02
2	Sample-03	5	duet evoC	cfDNA	Stage-II	FEMALE	63	Sample-03
3	Sample-04	5	duet evoC	cfDNA	Stage-II	MALE	71	Sample-04
4	Sample-05_rep1	80	duet evoC	gDNA	CONTROL	FEMALE	68	Sample-05
5	Sample-05_rep2	80	duet evoC	gDNA	CONTROL	FEMALE	59	Sample-05
6	Sample-06_rep1	80	duet evoC	gDNA	TUMOUR	FEMALE	75	Sample-06
7	Sample-06_rep2	80	duet evoC	gDNA	TUMOUR	FEMALE	66	Sample-06

You can also download this example as a CSV file:

Download sample_sheet.csv

What is the difference between defining a region or using a BED file?#

Both methods define genomic regions of interest:

A BED3+ file is a standard format with columns for chromosome, start, and end positions. It allows you to specify multiple regions in a single file (e.g. a list of gene promoter regions).
A region is a single genomic region specified in the format chr:start-end (e.g., chr1:1000-2000). It is useful for quick analyses of specific regions without creating a separate file.

How many region files (BED3+) can I specify at once?#

The modality dmr call and modality get commands currently accept a single BED3+ file as input per analysis. If you have multiple BED files, you can merge them into a single file before running the analysis, and use the Annotation column header to differentiate the region type or origin. However, it is not advisable to combine regions of highly variable lengths (e.g promoters and gene bodies) into a single analysis.

If using the Core Workflow, multiple bedfiles can be specified in the Regions input directory, and will be processed sequentially. For example the gencode.v44.human.genes.annotation.bed.gz and gencode.v44.human.promoters.annotation.bed.gz are included by default. This allows efficient processing of multiple inputs via bash scripting.

Do annotated BED3+ files need strand information to work correctly?#

No, strand information is not required for region-based analyses. However, including this information can be beneficial for downstream utility. Any columns beyond the first three (chromosome, start, end) will be carried through to the output result file.

What is the difference between a sample sheet and a metadata file?#

They refer to the same thing! A sample sheet is a CSV file that contains information (metadata) about the samples, such as sample IDs, experimental groups and covariates.

Running Analyses#

How do I check the quality of my data before analysis?#

Use the modality biological-qc command to generate a quality control report. This report includes Pearson correlation heatmaps and PCA plots to assess sample relationships and data structure.

In the Biological QC report, why is the mean CpG coverage value lower than the mean sequencing depth?#

The Biological QC report shows the mean CpG coverage per sample, and not the overall sequencing depth. The mean CpG-to-genome-wide coverage ratio is approximately 0.45-0.5.

How do I identify differentially methylated regions (DMRs)?#

Use the modality dmr command. Provide the Zarr store, sample sheet, and experimental groups as inputs.

What is the difference between `modality get` and `modality dmr call`?#

modality get extracts methylation statistics (e.g., count, sum, mean, regional-fraction) for specific regions or windows.
modality dmr identifies regions with statistically significant differences in methylation between groups.

How is the `--min-coverage` filter applied?#

The --min-coverage filter is an option in modality dmr call and modality get, to exclude context positions (e.g. CpG sites) from the analysis. The filtering is applied to the mean context coverage value across all samples, prior to any grouping or aggregation steps. Context filtering is applied at the dataset level, not at the sample-level. If the mean coverage is less than the --min-coverage value, the context position is excluded from downstream analysis in the current modality XPLR command.

Is DMR calling normalised for coverage?#

The method does not explicitly normalise for coverage at the CpG level, but coverage is implicitly accounted for by count aggregation. High coverage sites have more influence, and you can filter out low coverage sites by using the --filter-context-depth setting.

How should I interpret DMR results from bulk samples or cfDNA?#

When calling DMRs using bulk methylation data, it’s important to understand the statistical assumptions underlying the Wald test and how they apply to cell population-level measurements.

Differences between samples reflect genuine biological differences in the proportion of methylated cells, influenced by:

Cell type composition
Environmental factors
Disease states
Treatment effects
Individual genetic variation

The logistic regression model assumes that:

Methylation proportions follow a binomial distribution at the cell population level
Differences between groups are systematic rather than random
Sufficient coverage exists to accurately estimate cell population-level methylation

The Wald test in the context of bulk methylation analysis tests whether the population-level methylation proportions differ significantly between experimental groups. Bulk data may exhibit overdispersion (variance greater than expected under a simple binomial model) due to biological variability and technical noise. See Overdispersion correction for more details.

Outputs and visualisation#

What outputs does modality XPLR generate?#

modality XPLR generates:

BED3+ format TSV or BED files with analysis results, depending on the command used.
HTML reports for quality control and visualisation.
INI files for configuring track plots.
Provenance metadata in JSON format.

How do I visualise DMR results?#

Once you have the output from the modality dmr call command, you can visualise the results using the modality dmr plot command. This generates interactive volcano plots to help interpret the DMRs. You can also use the modality tracks command to create track plots for specific genomic regions, which can be customised using INI files.

Why do I see a downsample warning in `modality dmr plot`?#

The warning Downsampling to improve html display performance for large datasets is shown then the number of DMR data points exceed 500,000. In order to draw the volcano plot, downsampling is applied to preserve the overall data distribution while reducing point count. To address this warning, apply more stringent filtering for q-value (-fq) or mod-difference (-fm) to reduce the number of data points to plot.

Can I view results in IGV?#

Yes, you can view the results in the Integrative Genomics Viewer (IGV) by loading the generated BED files. It may be necessary to remove the header lines, or insert a # character at the start of each header line to ensure IGV recognises them correctly.

Can I customise the plots?#

You can customise the view of plots using the embedded tools in the HTML report. When you’re ready, you can save the plot as a .png file.

You can edit the modality tracks plots by editing the input .ini files to customise figure titles, axis labels, and plot order.

Troubleshooting#

What should I do if I encounter an error?#

Check the modality_cli.log file in your working directory for detailed error messages. Common issues include:

Missing required inputs: Ensure all required files and arguments are provided.
Invalid region format: Use the correct format (e.g., chr1:1000-2000).
Sample sheet mismatch: Verify that sample IDs in the sample sheet match those in the Zarr store.

Why is my output directory empty?#

Ensure that the --output-dir option is specified and that the directory exists. If not, outputs are saved to the current working directory by default.

How do I update modality XPLR to the latest version?#

Run the following command to upgrade modality XPLR:

pip install --extra-index-url https://europe-python.pkg.dev/prj-biomodal-modality/modality-pypi/simple modality --upgrade

Tips and best practices#

How can I see what modality XPLR is doing during long analyses?#

Use the -v or -vv flags to increase verbosity. This will provide more detailed output during command execution, helping you understand the progress and any issues that arise.

How can I speed up my analysis?#

If your analysis includes consuming a .bed file, indexing these files can improve performance. You will need to install Tabix and Bgzip. Then use the --tabix option to compress and index output files.

How do I organise my outputs?#

Use the --output-dir option to specify a dedicated directory for each analysis. Output file directories are timestamped to avoid overwriting previous results.

Can I combine multiple Zarr stores?#

Yes, use the modality join command to combine multiple Zarr stores into a single file for analysis.

What should I do if I need help?#

Refer to the documentation, check the FAQ section, or contact support@biomodal.com for assistance.

Error handling#

If an error occurs during execution, review the log file (modality_cli.log) in the current working directory. The log file contains detailed information about the error, including the command executed, the timestamp, and the error message.

Most errors in modality XPLR are accompanied by descriptive messages that indicate what went wrong and how to resolve it. Common issues include missing or incorrectly formatted input files, mismatched sample IDs, invalid argument values, or problems with output directories.

To resolve errors:

Read the error message carefully: It usually points to the specific problem (e.g., missing input, invalid format).
Check your input files: Ensure all required files exist, are correctly formatted, and match each other (e.g., sample IDs in the sample sheet and Zarr store).
Review your command arguments: Use the --help flag to confirm you are using the correct options and formats.
Consult the log file: The modality_cli.log file in your working directory contains detailed error information and can help with troubleshooting.
Increase verbosity: Use the -v or -vv flags for more detailed output during command execution.
Validate file paths and permissions: Make sure output directories exist and are writable.

If you are unable to resolve the issue, refer to the documentation, FAQ, or contact support@biomodal.com for further assistance.

Debugging#

Use the --help flag with any command to view its required and optional arguments.
Check the modality_cli.log file for detailed error messages and stack traces.
Enable debug mode by setting the highest verbosity level with -vv when running the command.
Validate input files (e.g., Zarr stores, sample sheets, BED files) before running commands.
If the issue persists, contact support or refer to the FAQ section in the documentation.

biomodal Software License#

What license terms govern my use of modality XPLR?#

modality XPLR is published under the biomodal Software License, accessible via the Customer documentation portal, Software section.

Is modality XPLR “open source”?#

No. According to the Open Source Initiative, a licence with non-commercial restrictions cannot be open source. Also the license contains a restriction that you may not remove functionality that displays licensing information through the CLI. That restriction is also not permitted by the Open Source Definition. However, the license permits all non-commercial usage. Some activities which might be considered commercial are also permitted - see the terms of the licence here and the rest of this FAQ.

The license looks similar to the Apache-2.0 license which is open source. Are you sure it isn’t open source?#

Yes. The biomodal Software License is based on the Apache-2.0 license, but has been significantly modified such that it isn’t open source due to the additional restrictions.

Can I build upon modality XPLR?#

Yes, the license permits you to make modifications to modality XPLR. However, you may not redistribute modality XPLR (either as-is, or with modifications) unless you comply with the terms of the biomodal Software License. In particular, you must retain the prohibition on commercial use. You may not remove any legal notices that are contained within modality XPLR or packaged alongside it (for example, the NOTICE text file). You may not modify the modality XPLR code so as to remove any CLI functionality which displays the NOTICE text file.

Can I build upon modality XPLR and resell the resulting combined work?#

No.

Can I build modality XPLR into a toolkit or other product which I make available commercially to third parties?#

No. And this prohibition covers both code you redistribute, or making the functionality of your product available to third parties. That can be over a network, or on a batch basis, for example.

Can I produce research papers based on analyses performed using modality XPLR?#

Yes.

Can I use modality XPLR to provide a commercial service to third parties, such as consulting or analytical services, or a service provided on an outsourced basis?#

No.

If I modify modality XPLR, do I need to publish my changes under the same license?#

No. The license is not a copyleft license, so you do not need to contribute modifications back to the project, or publish them anywhere, and you retain any rights to your modifications.

If you redistribute the code or any modifications to it, see below.

If I redistribute modality XPLR, must I do so under the biomodal Software License?#

No, you are permitted to apply your own license terms, provided that these do not conflict with the biomodal Software License. In particular you must retain the prohibition on commercial use (as defined in the biomodal Software License), and you must retain the CLI functionality which displays the license and restrictions. You must also ensure any modified files carry prominent notices stating that you modified the files.

In practice, you will almost always find it easier to use the biomodal Software License itself if you redistribute modality XPLR code, rather than trying to redistribute under a different license.

Do I need to do or sign anything in order to use modality XPLR?#

Other than to install the software according to the user documentation, there are no additional steps. You do not need to “accept” the license, but your use of the software outside the scope of the license will be a violation of our copyright (this is the way that open source licenses work - the biomodal Software License is not an open source license, but has several characteristics in common with one).

Does biomodal have the right to change the license for modality XPLR?#

Yes, future versions of modality XPLR may be released under a different license. We can also issue you an additional permission if the use you envisage falls within the existing non-commercial prohibition. That additional permission may be on paid-for commercial terms, or come with additional conditions, and it’s granted entirely at our discretion. Please contact us at support@biomodal.com for more information.