Analyze your

mycobiome

with DAnIEL

DAnIEL (Describing, Analyzing and Integrating fungal Ecology to effectively study the systems of Life) is a web server to analyze ITS amplicon sequencing data beginning from raw reads to publication ready figures and tables. Interactions between fungal species and diseases, bacterial species, and immune system components can be explored. A knowledge base of clinical relevant species provides additional annotation to biological findings

Overview

Upload raw reads and meta data of samples (optional)
Set parameters and start the pipeline
Integration of existing projects and download of NCBI SRA runs
Quality Control: Trimming primers and low quality bases using
Denoising: Getting representative sequences by calling OTUs or ASVs
Phylotyping: Taxonomic annotation of denoised sequences
Feature generation: Grouping denoised sequence by taxonomic rank and count normalization
Correlation networks: Identification of co-abundant species aware of sparseness and composition
Machine Learning: Phenotype classification using fungal abundance profiles
Statistics: Identification of significantly differentially abundant features
Annotation: Integrating significant features with known insights from existing projects and literature
Reporting: Creation of a single html file summarizing all results and methods applied

Fungal knowledge base

Manually curated database about fungal interactions and their associations to diseases, bacterial species, and biological molecules focused on immune system compounds
List of species found in clinical samples with a suspicious fungal infection from the German reference center NRZMyk

Publication ready plots

Customizable and interactive plots
Single html file report summarizing both results and methods

Cite us

DAnIEL: a user-friendly web server for fungal ITS amplicon sequencing data
Daniel Loos, Lu Zhang, Christine Beemelmanns, Oliver Kurzai, and Gianni Panagiotou
Frontiers in Microbiology, doi: 10.3389/fmicb.2021.720513

Contact us

Maintainer and Developer: Daniel Loos daniel.loos@leibniz-hki.de
Head: Gianni Panagiotou gianni.panagiotou@leibniz-hki.de
Website: https://www.leibniz-hki.de/en/systembiologie-und-bioinformatik.html

Systems Biology and Bioinformatics
Leibniz Institute for Natural Product Research and Infection Biology
Hans Knöll Institute (HKI)

Adolf-Reichwein-Straße 23
07745 Jena
Germany

tutorial.knit

Tutorial

Here we give a brief introduction about how to use this web server to analyze ITS amplicon data.

The aim of this tutorial is to explain the overall workflow and not to exercise a real world ITS analysis. For instance, the provided samples are sub sampled to speed up the analysis.

The Biology of ITS sequencing
Input: Uploading samples
Start: Setting parameters and starting the pipeline
Logs
Result pages
Quality Control (QC)
Denoising
Phylotyping
Feature generation
Correlation Networks
Statistics
Machine Learning (ML)
Summary
References

The Biology of ITS sequencing

The Internal transcribed spacer (ITS) is a region in the eukaryotic genome between the genes coding for the small-subunit rRNA and the large-subunit rRNA. The ITS1 sub region in fungi is located between 18S and 5.8S rRNA genes, whereas ITS2 is between 5.8S and 28S rRNA genes. ITS regions are widely used in taxonomy due to these reasons:

The sequence has a high degree of variation, especially between closely related species
The size of one sub region can be easily covered with PE NGS reads
There are universal primers available in which most fungal clades can be easily amplified

Often there are multiple copies of ITS in the same genome. However, they tend to undergo concerted evolution via unequal crossing-over events. This preserves the sequence homogeneity within a particular genome.

Most projects in the field of metagenomics have only one the two sub regions sequenced. It is not recommended to compare abundance levels between projects using different sub regions due to PCR amplification bias. Besides ITS, other marker sequences e.g. translational elongation factor 1α (TEF1α) are also used for taxonomic profiling. These markers are not covered by this webserver. However, sometimes multiple marker genes have to be sequenced for precisely taxonomic annotation up to the species level. This is also up to the clade of interest: It is usually more difficult to classify molds than yeasts using ITS region only.

Input: Uploading samples

In this tutorial we will demonstrate how to analyze the different fungal communities in fecal and environmental samples. Totaling 38 samples will be analyzed from these cohorts:

5 fecal samples from cancer patients (ENA study PRJEB33756)
20 fecal samples from healthy individuals of the HMP cohort (Nash et al., 2017)
13 environmental samples from a neonatal intensive care unit (NICU) (Heisel et al., 2019)

Start a new project

If you just want to learn about the workflow, you do not need to start a new project. All results can be already accessed on the left sidebar if the current project ID is example.

Click on Home on the sidebar and then on Start project to create a new project.
This will create a new ID for your project. This ID can be used to access the results later on by clicking on Resume project. You can also create a bookmark of any page: The ID is always included in the URL.
Download the example raw data used in this tutorial:

Example raw data (ZIP)

Composing the sample meta data table

The meta data table is a spreadsheet describing the samples. It must be provided either as a Excel spreadsheet (.xlsx) or as a comma-separated values text file (.csv) It must contain at least one column called sample_id listing the ids of all uploaded samples. For machine learning, column names must contain alphanumerical characters only (e.g. no + or /).

Additional samples can be added from databases like the NCBI Sequence Read Archive (SRA). The meta data table will be automatically updated for those samples. However, one can annotate external samples using the meta data table. Here, sample_id must be the NCBI SRA id e.g. SRR5098710. We will do this for the 20 fecal samples from healthy individuals of the HMP cohort (Nash et al., 2017). For instance, we will assign these samples to the project healthy feces.

Raw read files can contain multiple samples, because this webserver is able to perform demultiplexing. However, one has to declare the multiplexed samples in the meta data table. Let’s have a look at the first lines of the multiplexed raw read file muxed_S4_S5_1.fq.gz in FASTQ format:

@FCBKRH9:1:2119:15517:24531 1:N:0:CAGTGCATATGC
GCATCGATGAAGAACGCAGCGAAATGCGATACGTAATGTGAATTGCAGAATTCCGTGAATCATCGAATCTTTGAACGCAC
+
CCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
@FCBKRH9:1:2112:8633:25105 1:N:0:CAGTGCATATGC
GCATCGATGAAGAACGCAGCGAAATGCGATACGTAATGTGAATTGCAGAATTCCGTGAATCATCGAATCTTTGAACGCAC
+
CCGGGGGGGGFCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGG<CFGGGGGGGGGG
@FCBKRH9:1:2112:6602:7309 1:N:0:CAGTGCATATGC
GCATCGATGAAGAACGCAGCGAAATGCGATACGTAATGTGAATTGCAGAATTCCGTGAATCATCGAATCTTTGAACGCAC

The bar code sequence is located at the end of the header of each read (every 4th line). We need a mapping table to know which bar code belongs to which sample. In this example, bar code CAGTGCATATGC belongs to sample S4. This is usually determined during library preparation. We have to add two new columns to the meta data table for all multiplexed samples:

Part of meta data table describing the demultiplexing
sample_id	barcode_seq	barcode_file
S4	CAGTGCATATGC	muxed_S4_S5
S5	TCACGGGAGTTG	muxed_S4_S5

Column barcode_seq describes the bar code sequence to look for. Tis can be any substring of the header line. Column barcode_file indicates in which file the sample was sequenced.

In order to define groups e.g. for statistics, we define also a few other columns. The columns can contain character strings (e.g. project), binary values (FALSE or TRUE e.g. low_seqdepth) or numbers (e.g. sequencing depth seqdepth). If a column contains unique values only (e.g. library), it will not be used in any statistical analysis due to lacking replica. This columns will be ignored instead. We can provide another column called pair_id used for paired testing of differences. This is needed e.g. if the same subjects were sequenced multiple times.

Here is the content of the meta data file samples.xlsx to be uploaded:

Uploaded meta data table (samples.xlsx, first 10 rows)
sample_id	barcode_seq	barcode_file	project	disease	low_seqdepth	seqdepth	library
S1	NA	NA	cancer feces	cancer	FALSE	83070	cancer_01A1
S2	NA	NA	cancer feces	cancer	FALSE	161697	cancer_01B1
S3	NA	NA	cancer feces	cancer	FALSE	166414	cancer_08A1
S4	CAGTGCATATGC	muxed_S4_S5	cancer feces	cancer	FALSE	87166	cancer_09A1
S5	TCACGGGAGTTG	muxed_S4_S5	cancer feces	cancer	FALSE	42744	cancer_10A1
SRR5098710	NA	NA	healthy feces	healthy	TRUE	935	HMP_SRR5098710
SRR5098580	NA	NA	healthy feces	healthy	TRUE	187	HMP_SRR5098580
SRR5098736	NA	NA	healthy feces	healthy	FALSE	102113	HMP_SRR5098736
SRR5098677	NA	NA	healthy feces	healthy	FALSE	104592	HMP_SRR5098677
SRR5098500	NA	NA	healthy feces	healthy	FALSE	104812	HMP_SRR5098500

Raw read format

Raw reads must be in gzip-compressed FASTQ format. In the reads directory of example.zip, there are 8 files containing raw sequencing data from all 6 fecal samples from cancer patients:

muxed_S4_S5_1.fq.gz
muxed_S4_S5_2.fq.gz
S1_1.fq.gz
S1_2.fq.gz
S2_1.fq.gz
S2_2.fq.gz
S3_1.fq.gz
S3_2.fq.gz

Please note:

Raw reads must be originated from illumina® paired-end (PE) sequencing machines.
Forward and reverse mates must be uploaded in two separated files.
Only paired FASTQ files following pattern {sample id}_{mate}.{file extension} were accepted. The file extension must be one of .fastq, .fq, fastq.gz or fq.gz. The mate muste be either 1 or 2 indicating the forward and reverse mate, respectively. Example for single samples: sample1_1.fq.gz. Example for multiplexed samples: muxed_S4_S5_2.fq.gz.

Uploading own local samples

We will only upload the 5 fecal samples from cancer patients. Click on Input to upload new samples to your project. Here we can upload both the raw read files generated by the sequencing machine and the table containing meta data about the samples:

Upload local samples

Please wait until all files are being uploaded. A progress bar underneath the button will be provided to indicate upload status.

Adding external samples

The other two cohorts have already been uploaded to the NCBI Sequence Read Archive (SRA). This webserver is able to access this database directly. We have two options to add samples from external cohorts:

SRA Accessions

Every sequencing run stored in the SRA has a unique ID starting with either SRR, ERR or DRR followed by a number. We must enter one sample per line in the text box SRA accessions. In this example, we insert all 20 fecal samples from healthy individuals:

SRR5098710
SRR5098580
SRR5098736
SRR5098677
SRR5098500
SRR5098394
SRR5098370
SRR5098413
SRR5098489
SRR5098436
SRR5098698
SRR5098401
SRR5098363
SRR5098721
SRR5098511
SRR5098498
SRR5098398
SRR5098371
SRR5098341
SRR5098421

External projects

Many ITS projects of the SRA are already downloaded and preprocessed. We will select the project hospital metagenome Targeted loci environmental (PRJNA505509) in the drop down menu External projects by clicking at the box and typing hospital to search for the study. Please click Projects on the sidebar to browse through all available projects.

We can subset samples from external cohorts by any meta data variable as displayed at Projects. This can be archived using a filter query. We will enter the query scientific_name == 'hospital metagenome' & total_spots > 30e3 to the text box Filter query for external projects. This will only include environmental samples of the NICU project which have at least 30*10³ raw reads. Character strings must be single quoted. == is used to indicate equality. > and < can be used to filter numerical attributes. Single terms can be combined using & or | to yield the sample-wise intersection or union over these terms, respectively. Please note that both %>% and parenthesis are blocked in this filed for security reasons. We can always filter samples according to these attributes:

Attributes available in all external projects to filter samples
Attribute	Description
total_spots	Number of raw reads
total_bases	Number of raw bases
size	File size in bytes
biosample_id	NCBI accession of the biological sample starting with SAM
scientific_name	Name of taxa of the sample e.g. human gut metagenome

A click on Add external samples will add these sample to the project. The column project will be filled with the NCBI BioProject id.

Upload local samples

Start: Setting parameters and starting the pipeline

First, click on Start on the sidebar. Here we can run the pipeline with default parameters by just clicking on Start pipeline. This will take around half an hour for the example cohort to run on our servers using 5 CPU threads. Depending on the number of samples and parameters, this can take up to a couple of hours. Please create a bookmark of the page or remember the project ID to view the results later on. Results can be accessed on the sidebar above the results menu once analyzed. We also have the option to create and select own parameter sets. A complete reference about all input elements and parameters is given at Reference on the sidebar.

Adjust the parameters

We can customize the analysis workflow by creating our own parameter set for each step. Each parameter set must be given a unique name consisting of alphanumerical characters and spaces.

The default ITS sub region is ITS1. If you have ITS2 data, you have to adjust the denoising parameter set!

Here, a briefly summary about changed parameters in the example project is given. A complete description about all parameters can be accessed by clicking Reference on the sidebar.

Quality Control
- Aim: Remove sequencing errors by trimming and filtering the raw reads
- We want to have our own primers ITS3 and ITS4 to be removed. Therefore we enter these primers in the Sequences to trim text box:
```
>Forward primer ITS3
GCATCGATGAAGAACGCAGC
>Reverse primer ITS4
TCCTCCGCTTATTGATATGC
```
  These sequences must be in FASTA format.
  - To remove the primers on both strands, ensure that the checkbox Include reverse complement of primer sequences is checked
Denoising
- We will use Amplicon Sequencing Variants (ASV) as a method for denoising. Operational taxonomic units (OTU) can be used as well.
- Our example project covers the ITS2 sub region. Therefore we click on the radio button ITS2.
Phylotyping
- Aim of this step: Taxonomic annotation of these denoised representative sequences
- We stay with the default parameters here
Feature generation
- Aim of this step: Normalization, filtering and pooling denoised abundance profiles
- We stay with the default parameters here. This includes to pool the abundances at genus rank, because many fungal clades can not determined up to the species rank using the ITS region only. However, it is possible to select the species rank if the data is trusted.
Analysis
- Aim of this step: Creating statistics, Machine Learning, and correlation networks based on these features
- Performing statistics and Machine Learning on hundreds of sample attributes can be very time consuming. Therefore, we can select the particular analysis groupings of interest. In this case: project, disease, and low_seqdepth.
- Building correlation networks on all samples is meaningless in heterogeneous cohorts like ours. We want to compare correlations between healthy and diseased samples. Thus we select disease as the correlation grouping.

Logs

Once the pipeline has started, we can view the logs of the workflow engine snakemake by clicking Logs on the sidebar. This is an optional feature of this webserver and useful for debugging processes.

Result pages

Once a step is finished, a result page will be appear at the results section on the sidebar menu. All result pages follow the same structure:

Brief description about the methods and parameters applied
Results including summary bullet points and interactive figures
An optional notes section with comments how to interpret the results and which errors might occurred
A list of literature references of used tools
Optional panels for interactive analysis and to download the results

Quality Control (QC)

Quality Control (QC) is the process of removing low quality bases from the raw reads. This is done by utilizing the Phred quality score as calculated during base calling. Furthermore, primer and adapter sequences are cut.

This is the first analysis result which will appear at the result section. The test matrix indicates whether a sample failed, raised a warning, or passed a particular QC test.

QC tests by sample

By default and in our example project, samples will be excluded from downstream analysis, if any of these tests failed:

Adapter content is too high
Per base N content is too high
Per base sequence quality is too bad
Per sequence quality scores are too low
Minimum number of QC reads < 1000

All tests except the last one are calculated using FastQC In this table, we can see that 3 samples were excluded due to too shallow sequencing. Indeed, samples SRR5098710 and SRR5098580 do not satisfy this requirement even on raw read basis. They were included in this example project to demonstrate both the functionality of this webserver and the fact that not every sample uploaded in the SRA database is of well quality.

excluded samples
sample_id	reason
S4	min_qc_read_count_failed
SRR5098580	min_qc_read_count_failed

The effect of QC can be impressively visualized by downloading the MultiQC reports at the bottom of this page (Ewels et al., 2016). Here we can see that the adapter content and per base sequence quality has been drastically improved after QC. This is typical for ITS analyzes. If the length of the ITS sub region is smaller than the fragment length during sequencing, not only the ITS subregion itself but also the adapter of the other read mate is being sequenced. This results in high adapter content and low sequence quality due to steric hindering during the process of sequencing by synthesis.

Raw	Clean

Denoising

Denoising is the process of identifying correct biological sequences from quality controlled reads. This is done by removing potential sequencing errors.

As a sanity check, ITSx was used on the ASV sequences to see if the selected subregion ITS2 was indeed present in the raw reads:

Sanity checks

A tree of all representative sequences was generated using QIIME2, MAFFT, and FastTree (Bolyen et al., 2019; Katoh and Standley, 2013; Price et al., 2009). Here is a multiple sequence alignment of the 30 most prevalent ASVs: Please note that this tree might not reflect proper phylogeny. To do so, one need usually a much larger amplicon e.g. a whole subunit of the rRNA genes. This tree was not used anywhere else in the analysis and is only to visualize differences between all inferred representative sequences.

DADA2

This step is specific to ASV profiling. They will not be used if OTU profiling was selected in the denoising parameter set.

This webserver process ASV profiling according to the official DADA2 ITS Pipeline Workflow (1.8) in a sample-wise manner. Sequencing errors are modeled for each sample and base transition separately. See the DADA2 paper for more details (Callahan et al., 2016). Here we can see the distribution of error frequency across all samples corresponding to the consensus quality score: DADA2 error model

During the steps of the DADA2 workflow, reads undergo additional filtering and trimming. Not every qc read can be assigned to a final ASV representative sequence. The read loss can be assessed here: Read loss during denoising

Phylotyping

Phylotyping is the process of taxonomic annotation of the denoised representative sequences.

59% of the AVS representative sequences were not annotated even at phylum level: Phylotyped sequences

However, taking taxon abundance into account, the majority of the reads are indeed taxonomically assigned: 92% and 94% at species and phylum rank, respectively. Most representative sequences are unclassified, but very low abundant. ASV profiling searches for very subtile sequence differences up to a single base. This results into a larger number of representative sequences compared to OTU profiling. Phylotyped abundances

Feature generation

Feature generation is the process of pooling denoised sequences at a given taxonomic rank. Furthermore, abundance profiles were normalized to make them comparable between differentially sequenced samples.

Features can be anything from species abundances to phylum abundances depending on the parameter set chosen. In our example, we have genus features normalized using centered log-ratio transformation (CLR). Usually there are hundreds of features remaining after feature generation. To visualize the abundance of all samples and all features, one can either generate a heatmap or reduce the dimensionality of the data set using a process called ordination. Let’s look at the results of the latter one:

Bray-Curtis	unweighted UniFrac

The webserver offers an interactive way to create ordination plots. Sample groupings, dissimilarities and ordination methods can be individually selected. In this example, Principal Coordinates Analysis (PCoA) using the dissimilarities Bray-Curtis and unweighted UniFrac was performed.

The project groups were significantly differentially abundant in both ordinations (Adonis PERMANOVA, \(p < 0.001\)). With UniFrac, however, we are able to explain 54.3% of the total variance in the first ordination axis PCo1. On the other hand, we can only explain 30.6% of it using Bray-Curtis dissimilarity. Furthermore, the sample groups tend to be more compact using UniFrac ordination.

Both dissimilarity methods aim to express the distance between two samples considering the abundances of all taxa. Bray-Curtis treats taxa as disconnected sets, whereas UniFrac takes phylogeny into account.

Let \(i\) and \(j\) be two sample groups, \(S\) the total number of species, and \(C_{ij}\) the the sum of the lesser values for only those features in common between both sample groups. Then the distances are calculated using:

\[ d_{Bray-Curtis} = 1 - \frac{2C_{ij}}{S_i + S_j} \]

\[ d_{UniFrac} = \frac{\sum \textrm{unshared branch lengths}}{\sum \textrm{all tree branch lengths}} \]

Both distances do not make a difference how much a particular feature is abundant. They only consider the pure existence. We can use weighted UniFrac to also incorporate abundance. UniFrac requires a phylogenetic tree. In this example, the tree consist of the ranks genus, family, etc. up to the kingdom. The distance of all child nodes to their parents is set to 1. This is because sequence identity from a multiple sequence alignment of denoised sequences does not correlate with phylogenetic distance in the fungal kingdom using only the ITS region.

Correlation Networks

Correlation is used to assess co-abundant features.

We used the correlation grouping disease which gave us two networks for the healthy and the diseased individuals. Here is the SparCC correlation network of the healthy individuals with default thresholds for effect size and significance (Friedman and Alm, 2012; Watts et al., 2019):

Correlation network

We can hover the cursor over an edge to get a detailed description of the correlation. Moreover, the network can be also displayed as a table by clicking on the particular tab.

For example, Malassezia is significantly negativeley correlated to Cyberlindnera (\(c_{SparCC}=-0.402, p = 0.0249\)). The p values were obtained using permutation. Other correlation methods such as BAnOCC provide confidence intervals instead of p values.

Spurious correlations

Be always aware of potential false positive correlations. These spurious correlations can be omnipresent in microbiome data.

The abundance profile of Dekkera seems to be highly zero-inflated (see abundance boxplot at section Statistics). Only one sample got an outstanding high abundance of this genus. The correlation to Saccharomyces we observed above might be driven by this outlier. It is recommended to use a high prevalence threshold during feature generation. This implies avoiding the use of Spearman’s rho correlation. Furthermore, one should use a correlation method aware of sparseness (e.g. SparCC) and composition (e.g. BAnOCC). Different correlation methods can differ drastically in specificity and sensitivity (Weiss et al., 2016). A review emphasizing issues regarding the compositionality of relative count data is (Gloor et al., 2017).

Statistics

Here, statistics refers to comparisons of feature abundance between sample groups.

Testing the same subjects multiple times requires paired testing. Subject ids can be mapped to sample ids by providing a column pair_id in the sample meta data table. We do not have to provide this pairing information in our example. We can create box plots with significance labels of any feature and sample grouping:

Two means comparison	may means comparison

Two means tests

Testing differences in the disease grouping is very simple, because disease is a binary variable (it can be either healthy or cancer)

Significant two means comparisons

Three or more means tests

The project sample grouping has three unique levels: cancer feces, healthy feces, and the environmental samples of BioProject PRJNA505509. We have to apply a two staged test here: First, Kruskal-Wallis rank sum test on normalized abundances was applied to test if there is any difference between the project groups.

Significant three or more means comparisons

Then, Dunn’s test was applied to test which groups are significantly differentially abundant. This is called post hoc testing.

Significant post hoc tests

Machine Learning (ML)

Machine Learning (ML) is the automated process of using features to build a model able to distinguish between any given classes. This model can be used to predict properties of new data sets.

Model construction

Both random forests and Support Vector Machines (SVM) were used as models. A random forest consist of multiple decision trees. Only a subset of samples and features are used in each decision tree. These models tend to be very robust. A SVM is a hyperplane dividing the features in two areas. The feature space was stretched using a Gaussian function (rbf kernel) to account for non-linear relationships.

These models have different hyperparameters to tune:

Hyperparameters
Model type	Hyperparameter	Description
random forest	mtry	Number of variables available for splitting at each tree node
random forest	ntree	Number of decision trees
SVM	C	Regularization penalty for the trade of between low testing and low training error

Not all features were used in all models: Feature selection was applied using both recursive feature selection and selection by filtering. The aim of this subsetting is to prevent over-fitting so that the model generalizes well at the prediction of new data sets. In recursive feature selection, features are removed iteratively whereas in selection by filtering, ANOVA was used to filter features not related to the property we want to classify.

Each set of hyper parameters and selected features were validated using nested cross-validation.

Model performance

The sample property we want to classify is called target class. ML models for all categorical variables of the samples meta data table were trained. Random forests performed best in the classification of both disease and project sample target property. Feature selection improved the validation performance: All best models used recursive feature selection.

Feature importance

Multiple metrics were calculated to validate model performance:

Sensitivity (a.k.a recall or true positive rate)
Specificity (a.k.a true negative rate)
Area under the receiver operating characteristic curve (AUC)

Let’s imagine a model which classifies all samples with the label cancer. This model is very sensitive: Obviously, all true cancer samples are classified correctly. But this is also for all healthy samples the case. This is why the specificity of this imagined model is 0. The performance metric to choose depends on the use case: We want a sensitive classifier for diagnosing a very severe disease for which a drug with no side effects is available. In this situation we do not care so much about false positives. However, we should focus on building a specific classifier whenever a misclassification has severe implications.

These metrics were generalized for targets with more than two levels (e.g. project): Here multiple models are trained e.g. cancer vs (healthy and environment together). Then, the performance values of the “one versus all” statistics are averaged.

Feature importance

Feature importance implies neither biological importance nor causality. Important features in ML models are only hints for new possible biological insights.

One can also identify features which are important in this classification process. In random forest models, we can count how often a particular feature was used at any node in the decision trees. The mean decrease in gini importance is one way to measure this feature importance. The gini importance value of Candida is very high in random forest models classifying the project. Therefore Candida might also play a different biological rule in healthy, cancer, and environmental samples.

Summary

You can download the full findings summary either as Excel table or HTML report at the download section of the Summary webpage.

Here, significant results from all tests are combined. Furthermore, significant features were annotated with known interactions and related literature.

Let’s see whether we have findings for Wallemia: Summary of findings

This is a summary of all significant group-wise comparisons and correlations. Furthermore, features were annotated with FUNGuild and two more databases: One consist of clinical isolates with suspicious fungal infections from the NRZMyk reference database. It can be accessed by clicking at Infections on the sidebar.

The other one is about known interactions to other fungi, diseases, and immune system related molecules. This database was created using manual curation of article abstracts of fungi prevalent in fecal and skin samples. It can be accessed by clicking at Interactions on the sidebar.

References

Bolyen, E., Rideout, J.R., Dillon, M.R., Bokulich, N.A., Abnet, C.C., Al-Ghalith, G.A., Alexander, H., Alm, E.J., Arumugam, M., Asnicar, F., et al. (2019). Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat. Biotechnol. 37, 852–857.

Callahan, B.J., McMurdie, P.J., Rosen, M.J., Han, A.W., Johnson, A.J.A., and Holmes, S.P. (2016). DADA2: High-resolution sample inference from illumina amplicon data. Nature Methods 13, 581–583.

Ewels, P., Magnusson, M., Lundin, S., and Kaller, M. (2016). MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics (Oxford, England) 32, 3047–3048.

Friedman, J., and Alm, E.J. (2012). Inferring correlation networks from genomic survey data. PLoS Computational Biology 8, e1002687.

Gloor, G.B., Macklaim, J.M., Pawlowsky-Glahn, V., and Egozcue, J.J. (2017). Microbiome Datasets Are Compositional: And This Is Not Optional. Front Microbiol 8, 2224.

Heisel, T., Nyaribo, L., Sadowsky, M.J., and Gale, C.A. (2019). Breastmilk and nicu surfaces are potential sources of fungi for infant mycobiomes. Fungal Genetics and Biology : FG & B 128, 29–35.

Katoh, K., and Standley, D.M. (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780.

Nash, A.K., Auchtung, T.A., Wong, M.C., Smith, D.P., Gesell, J.R., Ross, M.C., Stewart, C.J., Metcalf, G.A., Muzny, D.M., Gibbs, R.A., et al. (2017). The gut mycobiome of the human microbiome project healthy cohort. Microbiome 5, 153.

Price, M.N., Dehal, P.S., and Arkin, A.P. (2009). FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650.

Watts, S.C., Ritchie, S.C., Inouye, M., and Holt, K.E. (2019). FastSpar: Rapid and scalable correlation estimation for compositional data. Bioinformatics (Oxford, England) 35, 1064–1066.

Weiss, S., Treuren, W.V., Lozupone, C., Faust, K., Friedman, J., Deng, Y., Xia, L.C., Xu, Z.Z., Ursell, L., Alm, E.J., et al. (2016). Correlation detection strategies in microbial data sets vary widely in sensitivity and precision. ISME J 10, 1669–1681.

legal.knit

Legal information

An overview of data protection

General

The following gives a simple overview of what happens to your personal information when you visit our website. Personal information is any data with which you could be personally identified. Detailed information on the subject of data protection can be found in our privacy policy found below.

Data collection on our website

Who is responsible for the data collection on this website? The data collected on this website are processed by the website operator. The operator’s contact details can be found below. How do we collect your data? Some data are collected when you provide it to us. This could, for example, be data you enter on a contact form. Other data are collected automatically by our IT systems when you visit the website. These data are primarily technical data such as the browser and operating system you are using or when you accessed the page. These data are collected automatically as soon as you enter our website. What do we use your data for? Part of the data is collected to ensure the proper functioning of the website. Other data can be used to analyse how visitors use the site. What rights do you have regarding your data? You always have the right to request information about your stored data, its origin, its recipients, and the purpose of its collection at no charge. You also have the right to request that it be corrected, blocked, or deleted. You can contact us at any time using the address given below if you have further questions about the issue of privacy and data protection. You may also, of course, file a complaint with the competent regulatory authorities.

Analytics and third-party tools

When visiting our website, statistical analyses may be made of your surfing behaviour. This happens primarily using cookies and analytics. The analysis of your surfing behaviour is usually anonymous, i.e. we will not be able to identify you from this data. You can object to this analysis or prevent it by not using certain tools. Detailed information can be found in the following privacy policy. You can object to this analysis. We will inform you below about how to exercise your options in this regard.

General information and mandatory information

Data protection

The operators of this website take the protection of your personal data very seriously. We treat your personal data as confidential and in accordance with the statutory data protection regulations and this privacy policy. If you use this website, various pieces of personal data will be collected. Personal information is any data with which you could be personally identified. This privacy policy explains what information we collect and what we use it for. It also explains how and for what purpose this happens. Please note that data transmitted via the internet, e.g. via email communication, may be subject to security breaches. Complete protection of your data from third party access is not possible.

Data protection

Notice concerning the party responsible for this website The party responsible for processing data on this website is:

Leibniz-Institut für Naturstoff-Forschung und Infektionsbiologie e. V. Hans-Knöll-Institut (HKI) Adolf-Reichwein-Straße 23 07745 Jena

Email: gianni.panagiotou@leibniz-hki.de

The responsible party is the natural or legal person who alone or jointly with others decides on the purposes and means of processing personal data e.g. names, email addresses, etc..

Right to file complaints with regulatory authorities

If there has been a breach of data protection legislation, the person affected may file a complaint with the competent regulatory authorities. The competent regulatory authority for matters related to data protection legislation is the data protection officer of the German state in which our company is headquartered. A list of data protection officers and their contact details can be found at the following link: https://www.bfdi.bund.de/DE/Infothek/Anschriften_Links/anschriften_links-node.html .

Right to data portability

You have the right to have data which we process based on your consent or in fulfilment of a contract automatically delivered to yourself or to a third party in a standard, machine readable format. If you require the direct transfer of data to another responsible party, this will only be done to the extent technically feasible.

SSL or TLS encryption

This site uses SSL or TLS encryption for security reasons and for the protection of the transmission of confidential content, such as the inquiries you send to us as the site operator. You can recognize an encrypted connection in your browser’s address line when it changes from “http://” to “https://” and the lock icon is displayed in your browser’s address bar. If SSL or TLS encryption is activated, the data you transfer to us cannot be read by third parties.

Information, blocking, deletion

As permitted by law, you have the right to be provided at any time with information free of charge about any of your personal data that is stored as well as its origin, the recipient and the purpose for which it has been processed. You also have the right to have this data corrected, blocked or deleted. You can contact us at any time using the address given in our legal notice if you have further questions on the topic of personal data.

Opposition to promotional emails

We hereby expressly prohibit the use of contact data published in the context of website legal notice requirements with regard to sending promotional and informational materials not expressly requested. The website operator reserves the right to take specific legal action if unsolicited advertising material, such as email spam, is received.

Data protection officer

Statutory data protection officer We have appointed a data protection officer for our company.

Dipl.-Ing. Wolfgang Schmidt-Heck Leibniz-Institut für Naturstoff-Forschung und Infektionsbiologie e. V. Hans-Knöll-Institut (HKI) Adolf-Reichwein-Straße 23 07745 Jena

Telephone: +49 3641 532-1084 Email: wolfgang.schmidt-heck@leibniz-hki.de

Data collection on our website

Cookies

Some of our web pages use cookies. Cookies do not harm your computer and do not contain any viruses. Cookies help make our website more user-friendly, efficient, and secure. Cookies are small text files that are stored on your computer and saved by your browser. Most of the cookies we use are so-called “session cookies.” They are automatically deleted after your visit. Other cookies remain in your device’s memory until you delete them. These cookies make it possible to recognise your browser when you next visit the site. You can configure your browser to inform you about the use of cookies so that you can decide on a case-by-case basis whether to accept or reject a cookie. Alternatively, your browser can be configured to automatically accept cookies under certain conditions or to always reject them, or to automatically delete cookies when closing your browser. Disabling cookies may limit the functionality of this website. Cookies which are necessary to allow electronic communications or to provide certain functions you wish to use (such as the shopping cart) are stored pursuant to Art. 6 paragraph 1, letter f of DSGVO. The website operator has a legitimate interest in the storage of cookies to ensure an optimized service provided free of technical errors. If other cookies such as those used to analyse your surfing behaviour are also stored, they will be treated separately in this privacy policy.

Server log files

The website provider automatically collects and stores information that your browser automatically transmits to us in “server log files”. These are:

Browser type and browser version
Operating system used
Referrer URL
Host name of the accessing computer
Time of the server request
IP address

These data will not be combined with data from other sources. The basis for data processing is Art. 6 (1) (b) DSGVO which allows the processing of data to fulfil a contract or to implement measures preliminary to a contract.

reference.utf8

Reference

This page is about the specification of the server. A detailed description about tools, parameters, and tested browsers is given.

Tools

The general framework is based on Snakemake (Koster and Rahmann, 2012), R, Tidyverse, and Docker.

Tools
Pipeline step	Tool	Description	Version	Citation	Source
qc	cutadapt	QC reads by filtering and trimming	0.39	Martin (2011)	Anaconda
	FastQC	Assessing sequence read quality	0.11.8		Anaconda
	MultiQC	Combining FastQC results from multiple samples	1.7	Ewels et al. (2016)	Anaconda
denoising	DADA2	Denoising reads to get Amplicon Sequencing Variants (ASV)	1.14	Callahan et al. (2016)	Anaconda
denoising	PIPITS	Denoising reads to get Operational Taxonomic Units (OTU)	2.4	Gweon et al. (2015)	Anaconda
phylotyping	QIIME2	Taxonomic classifications of representative sequences, tree building	r2019.7	Bolyen et al. (2019)	Anaconda
analysis	BAnOCC	compositionality aware correlations between species	1.1	Schwager et al. (2017)	Anaconda
	caret	R package for Machine Learning	6		Anaconda
	phyloseq	R package for handling samples, features, and trees		McMurdie and Holmes (2013)	Anaconda
	SparCC / FastSpar	sparseness aware correlations between species	0.0.1	Friedman and Alm (2012), Watts et al. (2019)	Anaconda
	vegan	Diversity analysis	2		Anaconda
	grabseqs	Downlaoding raw reads from NCBI SRA	0.5		Anaconda

Parameters

In the input and start section, the following parameters can be set to customize the workflow:

Phred Score Format
Format of phred score used to indicate per base sequence quality of raw FASTQ read fiels.

Type: nominal
Default: phred33

Sequences to Trim (Adapters and Primers in FASTA format)
Primers and other sequences to remove from the reads. Multiple sequences are allowed. Must be provided in FASTA format. Sequence name after > is only for ducumentation purposes.

Type: string
Default: NA

Minimum read length (bp)
Minimum length of each read to pass QC. This value refers to unjoined reads. Used by trimmomatic to filter raw reads.

Type: integer
Default: 50

Include reverse complement of primer sequences
This is to indicate whether the reverse complement of sequences provided in the field ‘Sequences to Trim’ should be used as well for trimming. Highly recommended.

Type: binary
Default: TRUE

Minimim quality of trailing bases (Phred)
Minimim phred quality score of trailing bases. Used by trimmomatic for trimming trailing bases.

Type: integer
Default: 25

Minimim quality of leading bases (Phred)
Minimim phred quality score of leading bases. Used by trimmomatic for trimming leading bases.

Type: integer
Default: 25

Additional adapters
Additional adapter files to include in QC. These are the default files provided by trimmomatic.

Type: nominal
Default:

NexteraPE-PE.fa
TruSeq2-PE.fa
TruSeq3-PE-2.fa

Minimum number of QC reads
Minimum number of QC reads per sample. This is to remove samples which are not deep enough sequenced. Only applied if test is set in sample exclusion criteria.

Type: integer
Default: 100

Exclude sample if any of these tests failed
Criteria to exclude samples during QC from downstream analysis. Test are provided by FastQC. Per tile test is not available for NCBI SRA samples.

Type: nominal
Default:

fastqc_adapter_content_failed
fastqc_per_base_n_content_failed
fastqc_per_base_sequence_quality_failed
fastqc_per_sequence_quality_scores_failed
fastqc_per_sequence_length_distribution_failed
min_qc_read_count_failed

Maximum allowed error rate
Maximum allowed error rate in QC trimming (no. of errors divided by length of matching region)

Type: integer
Default: 0.1

ITS region
ITS region to extract. Used by PIPITS. If this value is set wrong, abundance profile can not be generated.

Type: nominal
Default: NA

Denoising method
Method used for denoising. This parameter determines whether Amplicon sequencing variants (ASV) or Operational Taxonomic Units (OTU) are generated.

Type: nominal
Default: asv_dada2

PIPITS include singletons
Indication to include denoised sequence clusters with only one single sequence. Not recommended.

Type: binary
Default: FALSE

PIPITS identity threshold
Required sequence identity to cluster sequences in the same OTU. This parameters is used by VSEARCH and only applies to OTU profiling.

Type: Float [0,1]
Default: 0.97

DADA2 maximal number of undefined bases
After truncation, sequences with more than maxN Ns will be discarded. Note that dada does not allow Ns. Only used by DADA2.

Type: Float
Default: 0

DADA2 minimal base quality
After truncation, reads contain a quality score less than minQ will be discarded. Only used by DADA2.

Type: integer
Default: 0

DADA2 maximal expected errors
Default Inf (no EE filtering). After truncation, reads with higher than maxEE "expected errors" will be discarded. Expected errors are calculated from the nominal definition of the quality score: EE = sum(10^(-Q/10)). Two values for each mate are separated by a comma. Only used by DADA2.

Type: float,float
Default: 2,2

DADA2 truncate quality
Truncate reads at the first instance of a quality score less than or equal to truncQ. Only used by DADA2.

Type: integer
Default: 2

Reference database
Reference database used for taxonomic classification of denoised representative sequences.

Type: nominal
Default: Unite_8.2_dynamic

Sequence classifier
Reference database used for taxonomic classification of denoised representative sequences. The classifier will use the reference data base.

Type: nominal
Default: qiime2_blast

Minimum BLAST identity (fraction)
Minimum sequence identity required to classify denoised representative sequences. Only applied if BLAST sequence classifier is used.

Type: Float [0,1]
Default: 0.8

Taxonomic rank
Taxonomic rank at which all denoised clusters were pooled to. Used by all analysis steps as features.

Type: nominal
Default: genus

Phylogeny
Phylogenetic tree used for visualization and UniFrac based beta diversity analysis. Trees with equal edge length are allowed.

Type: nominal
Default: index_fungorum_2016

Normalization method
Normalization method used to generate features. Used in as most analysis steps as possible. SparCC and BanOCC correlation, however, rely on unnormalized counts.

Type: nominal
Default: clr

Minimum abundance to count prevalence (%)
Minimum abundance of a feature to be count as prevalent (in %). Only used for prevalence filtering.

Type: Float [0,100]
Default: 0.01

Minimum prevalence (%)
Minimum fraction of all samples a feature must be prevalent (in %) to be included in downstream analysis.

Type: Float [0,100]
Default: 5

Prevalence group
A feature is prevalent if it is prevalent in any group specified by the sample meta data column here.

Type: nominal
Default: all

Unknown strategy
Strategy to treat unassigned sequences classified as unknown.

Type: nominal
Default: infer

Inter feature correlation method
Method used to estimate correlations between pairs of features.

Type: nominal
Default: sparcc

Samples meta data
Optional file describing meta data about samples. One sample per row. Must include column sample_id. Must include columns barcode_seq and barcode_file if multiplexed files are used. CSV and XLSX (Excel) allowed.

Type: file
Default: NA

SRA Accessions (one SSR id per line)
List of SRR IDs from NCBI SRA to add. One ID per line.

Type: list of strings
Default: NA

Filter query for existing projects
Query to describe samples used to filter existing cohorts. Column names of samples meta data table must be used. Invalid characters: %, (, )

Type: string
Default: NA

Existing projects
List of projects used to add to the project. Samples selected here will be filtered by the Filter query for existing projects.

Type: list of strings
Default: NA

Local raw read files
Files to upload. File names must follow pattern {sample_id}_{mate}.{file extension} where {mate} is either 1 or 2. {sample_id} is used to map attributes in samples meta data table if provided. Allowed file extensions: fastq, fq, fq.gz, fastq.gz

Type: list of files
Default: NA

Analysis groupings
Columns of sample meta data table used to group samples in machine learning and statistics.

Type: nominal
Default: NA

Correlation grouping
Column name used to group samples to build correlation networks separately. Use all to build just one correlation network using all samples.

Type: nominal
Default: all

BanOCC chains
Number of Markov chains in BanOCC.

Type: integer
Default: 10

BanOCC total iterations
Number of total iterations for each chain (including warmup) in BanOCC.

Type: integer
Default: 10000

BanOCC warmup iterations
Number of warmup iterations for each chain in BanOCC. Must not be larger than total number of iterations.

Type: integer
Default: 5000

BanOCC alpha confidence
Fraction of the posterior density outside the credible interval.

Type: Float [0,1]
Default: 0.95

SparCC iterations and bootstraps
Number of iterations for convergence and bootstraps for p value calculation in SparCC.

Type: integer
Default: 200

Email address
Optional Email address. A HTML analysis report will be sent to this address after the pipeline has finished. You can also save this page as a bookmark to access the project.

Type: string
Default: NA

Browsers

DAnIEL was sucessfully tested with the following browers:

Browsers
Browser	OS	Versions
Firefox	GNU/Linux	70.0

References

Callahan, B.J., McMurdie, P.J., Rosen, M.J., Han, A.W., Johnson, A.J.A., and Holmes, S.P. (2016). DADA2: High-resolution sample inference from illumina amplicon data. Nature Methods 13, 581–583.

Ewels, P., Magnusson, M., Lundin, S., and Kaller, M. (2016). MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics (Oxford, England) 32, 3047–3048.

Friedman, J., and Alm, E.J. (2012). Inferring correlation networks from genomic survey data. PLoS Computational Biology 8, e1002687.

Gweon, H.S., Oliver, A., Taylor, J., Booth, T., Gibbs, M., Read, D.S., Griffiths, R.I., and Schonrogge, K. (2015). PIPITS: An automated pipeline for analyses of fungal internal transcribed spacer sequences from the illumina sequencing platform. Methods in Ecology and Evolution 6, 973–980.

Koster, J., and Rahmann, S. (2012). Snakemake–a scalable bioinformatics workflow engine. Bioinformatics (Oxford, England) 28, 2520–2522.

Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, 10–12.

McMurdie, P.J., and Holmes, S. (2013). Phyloseq: An r package for reproducible interactive analysis and graphics of microbiome census data. PloS One 8, e61217.

Schwager, E., Mallick, H., Ventz, S., and Huttenhower, C. (2017). A bayesian method for detecting pairwise associations in compositional data. PLoS Computational Biology 13, e1005852.

Watts, S.C., Ritchie, S.C., Inouye, M., and Holt, K.E. (2019). FastSpar: Rapid and scalable correlation estimation for compositional data. Bioinformatics (Oxford, England) 35, 1064–1066.

Database of fungal projects

Project

Database of fungal interactions

We created a manually curated database about fungal interactions. Published papers were queried for the fungal species name and any of these terms: Disease, bacteria, immune system, and cytokine. All interactions reported in these papers were added to this database

Browse database

Download database

The database of fungal interactions consists of a table and a BibTex file describing the interactions and their references, respectively. Medical Subject Headings (MeSH) are used whenever applicable.

Interactions (CSV) References (BibTeX)

Database of fungal infections

This database consists of clinical isolates with suspicious fungal infections from the NRZMyk database

Browse database

Download database

infections (CSV)

Input

This page is about uploading own samples (FASTQ raw reads) and it's meta data (Excel or CSV). Furthermore, external samples from the data base NCBI SRA can be added to the project.

Upload local samples

Samples meta data

Optional file describing meta data about samples. One sample per row. Must include column sample_id with values starting with a character. Must include columns barcode_seq and barcode_file if multiplexed files are used. CSV and XLSX (Excel) allowed.

Browse...

Local raw read files

Files to upload. File names must follow pattern {sample_id}_{mate}.{file extension} where {mate} is either 1 or 2. {sample_id} is used to map attributes in samples meta data table if provided. Allowed file extensions: fastq, fq, fq.gz, fastq.gz

Browse...

Add external samples

SRA accessions (one SSR id per line)

List of SRR IDs from NCBI SRA to add. One ID per line.

External projects

List of projects used to add to the project. Samples selected here will be filtered by the Filter query for external projects.

Filter query for external projects

Query to describe samples used to filter external cohorts. Column names of samples meta data table must be used. Invalid characters: %, (, )

Samples overview

Logs

This page shows pipeline logs for debugging purposes.

Log file

Download

QC report of raw data (HTML) QC report of clean data (HTML)

Interactive analysis

Alignment of denoised sequences

For the sake of clarity, only the most prevalent sequences were shown. The tree, however, was calculated based on all representative sequences

Alpha diversity

Representative sequences (ASV or OTUs) of denoised reads are used to calculate diversity for each sample.

Sample grouping

Sample groups

Features

Significance is calculated using Wilcoxon test for binary outcomes and using Kruskal-Wallis and Dunn's test otherwise.

Download

Denoised sequences (FASTA) Denoised profile (XLSX)

Download

Phylotypes (XLSX)

Phylotypes

Interactive abundance

Taxon abundance profile clustered by sample simillarity using Manhattan distance. Displayed sparseness can be controlled using the minimum number of taxa per sample.

Sample grouping

Annotation groupings

Min taxa per sample

Clustering method

Show labels

taxa

samples

Normalization method

raw

Interactive alpha diversity

Feature counts pooled at provided taxonomic rank are used to calculate diversity for each sample.

Sample grouping

Sample groups

Features

Significance is calculated using Wilcoxon test for binary outcomes and using Kruskal-Wallis and Dunn's test otherwise.

Interactive beta diversity

Feature counts pooled at provided taxonomic rank are used to calculate dissimilarity for each pair of samples. This dissimilarity is approximately plotted in the two dimensional plane.

Feature grouping

Sample grouping

Sample groups

Ordination method

Ordination distance

Download

Raw feature profile (XLSX) Normalized feature profile (XLSX) Feature meta data (XLSX)

Interactive analysis

Sample group

Exclude correlation range

Max p value

Rank for coloring nodes

Network
Table

Download

Model download can take up to a couple of minutes.

BAnOCC models (RDS)

Analyze a single feature

Abundance distribution is plotted here using any combination of feature and sample grouping. Significance is indicated by stars (p < 0.001: ★★★, p < 0.01: ★★, p < 0.05: ★) P values are taken from post hoc tests if multiple test statistics are available.

Sample grouping

Sample groups

Features

Significance is calculated using Wilcoxon test for binary outcomes and using Kruskal-Wallis and Dunn's test otherwise.

Download

Statistics summary table (XLSX)

Analyze a single model

Mean AUC over all folds and it's standard deviation are shown in the table.

Test samples of all folds are pooled together to calculate AUC as shown in the plots.

Select model

Pooled
Per fold

Download

Findings summary table (XLSX) Findings summary report (HTML)

Overview

Fungal knowledge base

Publication ready plots

Cite us

Contact us

Tutorial

The Biology of ITS sequencing

Input: Uploading samples

Start a new project

Composing the sample meta data table

Raw read format

Uploading own local samples

Adding external samples

SRA Accessions

External projects

Start: Setting parameters and starting the pipeline

Adjust the parameters

Logs

Result pages

Quality Control (QC)

Denoising

DADA2

Phylotyping

Feature generation

Correlation Networks

Spurious correlations

Statistics

Two means tests

Three or more means tests

Machine Learning (ML)

Model construction

Model performance

Feature importance

Summary

References

Legal information

An overview of data protection

General

Data collection on our website

Analytics and third-party tools

General information and mandatory information

Data protection

Data protection

Revocation of your consent to the processing of your data

Right to file complaints with regulatory authorities

Right to data portability

SSL or TLS encryption

Information, blocking, deletion

Opposition to promotional emails

Data protection officer

Data collection on our website

Cookies

Server log files

Reference

Tools

Parameters

Browsers

References

Database of fungal projects

Database of fungal interactions

Browse database

Download database

Database of fungal infections

Browse database

Download database

Input

Upload local samples

Add external samples

Samples overview

Start

Start pipeline

Customize parameter

Quality Control

Denoising

Phylotyping

Feature generation

Analysis

Logs

Download

Interactive analysis