Hello Flow Community,
This is my first time working with the NextFlow CLIP-seq pipeline, and I am running into an issue getting my organism’s genome to pass the Prepare CLIP-Seq Genome [1.1] module.
The issue seems to arise from the GTF file I have for my organism, as the following processes continuously fail:
CLIPSEQ_FIND_LONGEST_TRANSCRIPT
There are 0 protein coding transcripts
These belong to 0 genes
Traceback (most recent call last):
File "/media/storage/production/executions/445220002467081580/work/66/ae9eaafed5b7ff2a1f2a1509d16867/.command.sh", line 113, in <module>
main(args.process_name, args.gtf, args.output)
File "/media/storage/production/executions/445220002467081580/work/66/ae9eaafed5b7ff2a1f2a1509d16867/.command.sh", line 87, in main
gtf_output[-1] = gtf_output[-1].strip("\n")
IndexError: list index out of range
ICOUNT_SEG_GTF
Executing the following command: iCount segment Vibrio_fischeri_ES114_bracketsremoved.cmd.gtf Vibrio_fischeri_ES114_bracketsremoved_seg.gtf GCA_000011805.1_ASM1180v1_genomic.fasta.fai
Input parameters for function 'get_segments' in iCount.genomes.segment
annotation: Vibrio_fischeri_ES114_bracketsremoved.cmd.gtf
segmentation: Vibrio_fischeri_ES114_bracketsremoved_seg.gtf
fai: GCA_000011805.1_ASM1180v1_genomic.fasta.fai
report_progress: False
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/iCount/cli.py", line 448, in main
result_object = func(**args)
File "/usr/local/lib/python3.9/site-packages/iCount/genomes/segment.py", line 1016, in get_segments
process_gene(gene_content)
File "/usr/local/lib/python3.9/site-packages/iCount/genomes/segment.py", line 1003, in process_gene
gene_content[id_] = _process_transcript_group(transcript_group)
File "/usr/local/lib/python3.9/site-packages/iCount/genomes/segment.py", line 755, in _process_transcript_group
assert exons
AssertionError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/iCount-Mini", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/iCount/cli.py", line 456, in main
exception_message = exception.args[0]
IndexError: tuple index out of range
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/iCount/cli.py", line 448, in main
result_object = func(**args)
File "/usr/local/lib/python3.9/site-packages/iCount/genomes/segment.py", line 1016, in get_segments
process_gene(gene_content)
File "/usr/local/lib/python3.9/site-packages/iCount/genomes/segment.py", line 1003, in process_gene
gene_content[id_] = _process_transcript_group(transcript_group)
File "/usr/local/lib/python3.9/site-packages/iCount/genomes/segment.py", line 755, in _process_transcript_group
assert exons
AssertionError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/iCount-Mini", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python3.9/site-packages/iCount/cli.py", line 456, in main
exception_message = exception.args[0]
IndexError: tuple index out of range
ICOUNT_SEG_FILTGTF
Reading annotation file.
Number of entries in input annotation: 15788
Checking for basic flag...
Basic flag available.
3 entries flagged as basic.
Number of entries after filtering for tag "basic": 3988
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 0
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/media/storage/production/executions/445220002467081580/work/71/a7bfa84df578d1e7f764d949538c00/.command.sh", line 97, in <module>
main(args.process_name, args.gtf, args.output)
File "/media/storage/production/executions/445220002467081580/work/71/a7bfa84df578d1e7f764d949538c00/.command.sh", line 69, in main
gene_ids = df_TSL["annotations"].str.split(";", n=1, expand=True)[0].unique().tolist()
File "/usr/local/lib/python3.10/site-packages/pandas/core/frame.py", line 3505, in __getitem__
indexer = self.columns.get_loc(key)
File "/usr/local/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc
raise KeyError(key) from err
KeyError: 0
I have reproduced a section from my GTF file below in case that is helpful:
#gtf-version 2.2
#!genome-build ASM1180v1
#!genome-build-accession NCBI_Assembly:GCA_000011805.1
CP000020.2 Genbank gene 313 747 . - . gene_id "VF_0001"; transcript_id ""; gbkey "Gene"; gene "mioC"; gene_biotype "protein_coding"; locus_tag "VF_0001"; old_locus_tag "VF0001";
CP000020.2 Genbank CDS 316 747 . - 0 gene_id "VF_0001"; transcript_id "unassigned_transcript_1"; gbkey "CDS"; gene "mioC"; locus_tag "VF_0001"; product "FMN-binding protein MioC"; protein_id "AAW84496.1"; transl_table "11"; exon_number "1";
CP000020.2 Genbank start_codon 745 747 . - 0 gene_id "VF_0001"; transcript_id "unassigned_transcript_1"; gbkey "CDS"; gene "mioC"; locus_tag "VF_0001"; product "FMN-binding protein MioC"; protein_id "AAW84496.1"; transl_table "11"; exon_number "1";
Best,
Jacob