Adding new Genomes to Flow

Hi Flow team,

Could you add the “Arabidopsis thaliana” Genome to flow?

It’d be useful to be able to import or upload any new genome. Thanks.

Kind regards,
Paulo

Hi Paulo

We have added Arabidopsis thaliana to Flow - as you can see the RNA-Seq genome worked well: Flow

However there are some errors with preparing the genome for CLIP that will need to be resolved because the GTF file is non-standard containing no “biotype” tags for genes: Flow

If you have a working GTF file please do share it with us which will speed things up for you.

Best,
Charlotte

Hi Charlotte,

Thank you for adding this genome, and already trying to prepare it for CLIP.

I will do as suggested, to obtain a GTF file containing the “biotype” tag for genes.

Best,
Paulo

Hi Charlotte,

I’ve uploaded a GTF file for Arabidopsis, I can see it has “biotype” tag for genes.
Could you check if this file can be used to generate a genome for CLIP? Thank you.

Kind regards,
Paulo

Hi Paulo

Thanks for this!! It is better, but still causing an issue with iCount segment that I can’t figure out. Flow

The exact error is:

Executing the following command: iCount segment Arabidopsis_thaliana.TAIR10.59_bracketsremoved.cmd.gtf Arabidopsis_thaliana_seg.gtf Arabidopsis_thaliana.TAIR10.dna_sm.toplevel.fa.fai   
Input parameters for function 'get_segments' in iCount.genomes.segment   
    annotation: Arabidopsis_thaliana.TAIR10.59_bracketsremoved.cmd.gtf   
    segmentation: Arabidopsis_thaliana_seg.gtf   
    fai: Arabidopsis_thaliana.TAIR10.dna_sm.toplevel.fa.fai   
    report_progress: False   
[ValueError] need more than 1 value to unpack   
  File "/usr/local/lib/python3.9/site-packages/iCount/cli.py", line 448, in main   
    result_object = func(**args)   
   
  File "/usr/local/lib/python3.9/site-packages/iCount/genomes/segment.py", line 1015, in get_segments   
    for gene_content in _get_gene_content(annotation, chromosomes, report_progress):   
   
  File "/usr/local/lib/python3.9/site-packages/iCount/genomes/segment.py", line 906, in _get_gene_content   
    if interval.attrs['gene_id'] == current_gene:   
   
  File "pybedtools/cbedtools.pyx", line 392, in pybedtools.cbedtools.Interval.attrs.__get__   
   
  File "pybedtools/cbedtools.pyx", line 180, in pybedtools.cbedtools.Attributes.__init__   

I checked and every GTF line has a gene_id and there are no lone gene_id’s that appear on only one line.

This will require some deeper investigation I’m afraid.

Best,
Charlotte

Hi Charlotte,

Thank you for the troubleshooting. I was wondering if you can try this GFF3 file instead?

Best,
Paulo