nf-core/stableexpression
This pipeline is dedicated to identifying the most stable genes within a single or multiple expression dataset(s). This is particularly useful for identifying the most suitable RT-qPCR reference genes for a specific species.
Introduction
This document describes the output produced by the pipeline, relatively to the top-level results directory (defined by the --outdir parameter).
The directories listed below will be created in the results directory after the pipeline has finished.
Main output files
MultiQC
This report is located at reporting/multiqc_report.html and can be opened in a browser.
MultiQC](http://multiqc.info) is a visualization tool that generates a single HTML report summarising all samples in your project. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see http://multiqc.info.
Dash Plotly app
reporting/dash_app/: folder containing the Dash Plotly app
To launch the app, you must first create and activate the appropriate conda environment:
conda env create -n dash_app -f reporting/dash_app/environment.ymlconda activate dash_appthen:
cd reporting/dash_apppython app.pyand open your browser at http://localhost:8080
The app will try to use the port 8080 by default. If it is already in use, it will try 8081, 8082 and so on. Check the logs to see which port it is using.
Gene statistics and scores
The pipelines also exports a summary of all genes, located at reporting/all_genes_summary.csv. It contains their statistics, scores, ranks and respective sections.
Merged data
Parquet files containing all normalised gene counts are also stored in the merged_data/ directory.
Merged data
merged_data/all_counts.imputed.parquet: parquet file containing all normalised + imputed gene countsmerged_data/all_counts.parquet: parquet file containing all normalised gene countsmerged_data/whole_design.csv: table containing designs for all datasets and all samples comprised in the analysis
Other output files of interest (useful for debbuging)
Individual datasets
All individual datasets are also stored at each step of the pipelines, with the following pattern:
datasets/<platform>/<normalisation status>/<dataset name>/
Sub sections
0.downloaded/: raw datasets downloaded from public databases1.id_filtered_renamed/: datasets with filtered and renamed gene IDs2.samples_filtered/: datasets with filtered samples3.:TPM/CPMnormalisation3.tpm_normalised/:TPMnormalised datasets3.cpm_normalised/:CPMnormalised datasets
4.quantile_normalised/: quantile normalised datasets
The design of each dataset is also stored in its own directory.
Expression Atlas / GEO accessions
Accession files
accessions/expression_atlas/: accessions found when querying Expression Atlasaccessions/geo/: accessions found when querying GEO
ID Mapping
The pipeline also exports the ID mapping metadata used for gene ID conversion.
ID mapping metadata
idmapping/global_gene_metadata.csv: table containing the complete set of gene metadata, obtained either via gProfiler or via the custom file provided by the useridmapping/global_gene_id_mapping.csv: table containing the complete set of gene id mapping, obtained either via gProfiler or via the custom fileidmapping/valid_gene_ids.txt: List of gene IDs retained as valid
Annotation / gene length
The annotation and gene lengths are also stored in the annotation/ directory.
Files
gene_transcript_lengths.csv: transcript length relative to each gene ID<annotation name>.gff3.gz: GFF3 file
Pipeline information
Output files
pipeline_info/- Reports generated by Nextflow:
execution_report.html,execution_timeline.html,execution_trace.txtandpipeline_dag.dot/pipeline_dag.svg. - Reports generated by the pipeline:
pipeline_report.html,pipeline_report.txtandsoftware_versions.yml. Thepipeline_report*files will only be present if the--email/--email_on_failparameter’s are used when running the pipeline. - Parameters used by the pipeline run:
params.json.
- Reports generated by Nextflow:
Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.