RNAseq TRIMGALORE errors

Dear Flow team,

I am trying to use the RNA-Seq [3.12] pipeline on a series of 36 samples.

As instructed I prepared the genome files, uploaded my samples with their info (barcode sequence, specie, sample type etc…) and when launching the analysis everything seem to be fine, except for 6 samples a the stage of trimming were I get an error after 24h of trying to trim (for comparison my other samples are trimmed within the hour).

I checked my samples, the barcode sequence the file naming etc and everything seems correct (or at least similar to the other 30 samples that managed to get processed).

I then tried to run a new execution but without specifying the barcode sequence, and still it is the same…. TRIMAGLORE runs for days and eventually stops and the ‘error’ button apears next to my samples.

Here are a copy of what is displayed in the log files:

#!/bin/bash -euo pipefail   
[ ! -f  Cyto-Dp-1_1.fastq.gz ] && ln -s Cyto-Dp-T1_R1_001.fastq.gz Cyto-Dp-1_1.fastq.gz   
[ ! -f  Cyto-Dp-1_2.fastq.gz ] && ln -s Cyto-Dp-T1_R2_001.fastq.gz Cyto-Dp-1_2.fastq.gz   
trim_galore \   
    --fastqc_args '-t 12' \   
    --cores 8 \   
    --paired \   
    --gzip \   
    Cyto-Dp-1_1.fastq.gz \   
    Cyto-Dp-1_2.fastq.gz   
   
cat <<-END_VERSIONS > versions.yml   
"NFCORE_RNASEQ:RNASEQ:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:TRIMGALORE":   
    trimgalore: $(echo $(trim_galore --version 2>&1) | sed 's/^.*version //; s/Last.*$//')   
    cutadapt: $(cutadapt --version)   
END_VERSIONS   
pigz 2.6   
Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying   
Path to Cutadapt set as: 'cutadapt' (default)   
Cutadapt seems to be working fine (tested command 'cutadapt --version')   
Cutadapt version: 3.4   
Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<)   
Letting the (modified) Cutadapt deal with the Python version instead   
Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores   
   
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)   
   
   
   
AUTO-DETECTING ADAPTER TYPE   
===========================   
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> Cyto-Dp-1_1.fastq.gz <<)   
   
Found perfect matches for the following adapter sequences:   
Adapter type	Count	Sequence	Sequences analysed	Percentage   
Illumina	63605	AGATCGGAAGAGC	1000000	6.36   
Nextera	46	CTGTCTCTTATA	1000000	0.00   
smallRNA	5	TGGAATTCTCGG	1000000	0.00   
Using Illumina adapter for trimming (count: 63605). Second best hit was Nextera (count: 46)   
   
Writing report to 'Cyto-Dp-1_1.fastq.gz_trimming_report.txt'   
   
SUMMARISING RUN PARAMETERS   
==========================   
Input filename: Cyto-Dp-1_1.fastq.gz   
Trimming mode: paired-end   
Trim Galore version: 0.6.7   
Cutadapt version: 3.4   
Python version: could not detect   
Number of cores used for trimming: 8   
Quality Phred score cutoff: 20   
Quality encoding type selected: ASCII+33   
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)   
Maximum trimming error rate: 0.1 (default)   
Minimum required adapter overlap (stringency): 1 bp   
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp   
Running FastQC on the data once trimming has completed   
Running FastQC with the following extra arguments: '-t 12'   
Output file(s) will be GZIP compressed   
   
Cutadapt seems to be fairly up-to-date (version 3.4). Setting -j 8   
Writing final adapter and quality trimmed output to Cyto-Dp-1_1_trimmed.fq.gz   
   
   
  >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file Cyto-Dp-1_1.fastq.gz <<<    
10000000 sequences processed   
20000000 sequences processed   
30000000 sequences processed   
40000000 sequences processed   
50000000 sequences processed   
60000000 sequences processed   

Any idea of what could be happening?

I have tried the ‘re-try’ option after the execution stopped, but without much success.

Why can’t these files be trimmed :sob: ?

Thank you very much in advance for your help.

Best,

Zeinab

Hi Zeikad, this can sometimes happen with Trimgalore specifically when the server is under memory pressure (or more specifically, open pipe pressure) - apologies as I thought I had resolved this for good earlier this year. I’ve cleared some of the open pipes on the execution machine - could you try again? If you get the same issue again, could you try doing two runs of 18 samples each instead?

Hi Sam!

Thank you very much for looking into this,

I launched a new analysis, fingers crossed!

Cheers,

Zeinab

Hi again,

Well I tried once again full set, and it crashed again, then decided to split an only run half of my samples and it still crashed, and this time I get other errors in addition to Trimaglore…

Trimaglore log (crashed for 3 samples out of 8 after 12h):

pigz 2.6   
Using an excessive number of cores has a diminishing return! It is recommended not to exceed 8 cores per trimming process (you asked for 8 cores). Please consider re-specifying   
Path to Cutadapt set as: 'cutadapt' (default)   
Cutadapt seems to be working fine (tested command 'cutadapt --version')   
Cutadapt version: 3.4   
Could not detect version of Python used by Cutadapt from the first line of Cutadapt (but found this: >>>#!/bin/sh<<<)   
Letting the (modified) Cutadapt deal with the Python version instead   
Parallel gzip (pigz) detected. Proceeding with multicore (de)compression using 8 cores   
   
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)   
   
   
   
AUTO-DETECTING ADAPTER TYPE   
===========================   
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> Cyto-Dm-1_1.fastq.gz <<)   
   
Found perfect matches for the following adapter sequences:   
Adapter type	Count	Sequence	Sequences analysed	Percentage   
Illumina	61883	AGATCGGAAGAGC	1000000	6.19   
Nextera	22	CTGTCTCTTATA	1000000	0.00   
smallRNA	3	TGGAATTCTCGG	1000000	0.00   
Using Illumina adapter for trimming (count: 61883). Second best hit was Nextera (count: 22)   
   
Writing report to 'Cyto-Dm-1_1.fastq.gz_trimming_report.txt'   
   
SUMMARISING RUN PARAMETERS   
==========================   
Input filename: Cyto-Dm-1_1.fastq.gz   
Trimming mode: paired-end   
Trim Galore version: 0.6.7   
Cutadapt version: 3.4   
Python version: could not detect   
Number of cores used for trimming: 8   
Quality Phred score cutoff: 20   
Quality encoding type selected: ASCII+33   
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)   
Maximum trimming error rate: 0.1 (default)   
Minimum required adapter overlap (stringency): 1 bp   
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp   
Running FastQC on the data once trimming has completed   
Running FastQC with the following extra arguments: '-t 12'   
Output file(s) will be GZIP compressed   
   
Cutadapt seems to be fairly up-to-date (version 3.4). Setting -j 8   
Writing final adapter and quality trimmed output to Cyto-Dm-1_1_trimmed.fq.gz   
   
   
  >>> Now performing quality (cutoff '-q 20') and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file Cyto-Dm-1_1.fastq.gz <<<    
10000000 sequences processed   
20000000 sequences processed   
30000000 sequences processed   
40000000 sequences processed   
50000000 sequences processed   
60000000 sequences processed   

And now I also get problems with MultiQC after 1 min run:

/// MultiQC 🔍 | v1.14   
   
|           multiqc | Only using modules: custom_content, fastqc, cutadapt, fastp, sortmerna, star, hisat2, rsem, salmon, samtools, picard, preseq, rseqc, qualimap   
|           multiqc | Search path : /media/storage/production/executions/409031514152529105/work/8e/be61d48ac2014908cfec04ca4b4fa1   
|            report | Skipping 17 file search patterns   
|         searching | ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 517/517     
|    custom_content | nf-core-rnaseq-methods-description: Found 1 sample (html)   
|    custom_content | nf-core-rnaseq-summary: Found 1 sample (html)   
|    custom_content | software_versions: Found 1 sample (html)   
|    custom_content | DupInt: Found 9 General Statistics columns   
|    custom_content | dupradar: Found 9 samples (linegraph)   
|    custom_content | biotype_counts: Found 9 samples (bargraph)   
|    custom_content | biotype-gs: Found 9 General Statistics columns   
|    custom_content | star_salmon_deseq2_clustering: Found 9 samples (heatmap)   
|    custom_content | star_salmon_deseq2_pca: Found 9 samples (scatter)   
|            picard | Found 9 MarkDuplicates reports   
|          qualimap | Found 9 RNASeq reports   
|             rseqc | Found 9 read_distribution reports   
|             rseqc | Found 9 inner_distance reports   
|             rseqc | Found 9 read_duplication reports   
|             rseqc | Found 9 junction_annotation reports   
|             rseqc | Found 9 junction_saturation reports   
|             rseqc | Found 9 infer_experiment reports   
|             rseqc | Found 9 bam_stat reports   
|          samtools | Found 9 stats reports   
|          samtools | Found 9 flagstat reports   
|          samtools | Found 9 idxstats reports   
|              star | Found 9 reports   
|            fastqc | Found 18 reports   
|          cutadapt | Found 18 reports   
|            fastqc | Found 18 reports   
|           multiqc | Compressing plot data   
|           multiqc | Report      : multiqc_report.html   
|           multiqc | Data        : multiqc_data   
|           multiqc | Plots       : multiqc_plots   
|           multiqc | MultiQC complete   

What is very odd, it it looks like everything is ‘fine’ or begin processed as it should …yet I get the errors and no results files at the end…

I could potentially run samples 6 by 6 but at this point, I’m not sure it would make a difference…

Cheers,

Zeinab

Hi Zeinab, was Sam able to resolve this for you, or is it still an issue? Please do email me charlotte.capitanchik@kcl.ac.uk if this is still a problem.