Dear Flow Team,
I am trying to run the prepare CLIP-seq genome workflow for the Human GRCh37 genome and am facing errors at the CLIPSEQ_FILTER_GTF step.
I am running this with the latest iteration of the pipeline, so was wondering if this is ok or if I needed to select an older version of the workflow?
Thank you,
Fiona
Hi Fiona,
Which annotation version are you using? I think we have found v105 of Ensembl annotation to be problematic.
Can you please post the log of the process and an error trace?
Otherwise, we are working on a new release in the upcoming weeks that streamlines filtering to work for all annotations, or filtering can be made optional.
Klara
Hi Kara,
Thank you for your reply.
I am not sure which Ensembl annotation it is as its just the default pipeline for the Human GRCh37 genome. I can’t find any details of this in the data parameters?
It is failing at the filter GTF stage though so maybe this is the problem.
Here is the log:
The run is called “jovial_gauss” if that is helpful too.
Let me know if you need any more info!
Best wishes,
Fiona
Hi Fiona!
I reproduced your error, and the filtering is failing because the version the annotation GTF (Homo_sapiens.GRCh37.87.gtf) does not contain the “transcript_support_level” flag.
I will coordinate with the developer team to implement a fix for this.
Nevertheless, you can still run a clipseq pipeline, even without the filtered annotation. You just need to substitute all files based on “filtered gtf” with the files based on unfiltered GTF.
To run a clipseq pipeline this way, specify your “prepare genome” execution, with the failed FILTER_GTF process. The files that exist will be auto filled, but you can specify the missing files manually, like so:
-
Filtered GTF: Homo_sapiens.GRCh37.87_bracketsremoved.cmd.gtf (this is the unfiltered annotation file)
-
Segmented filtered GTF: Homo_sapiens_seg.gtf (Same file as is entered automatically for Segmented GTF)
-
Segmented resolved filtered GTF: Homo_sapiens_seg.gtf (Same file as is entered automatically for Segmented GTF)
-
Segmented resolved genic filtered GTF: Homo_sapiens_seg.gtf (Same file as is entered automatically for Segmented GTF)
-
Filtered regions GTF: Homo_sapiens_regions.gtf.gz (Same file as is entered automatically for Regions GTF)
-
Filtered resolved regions GTF: Homo_sapiens_regions.gtf.gz (Same file as is entered automatically for Regions GTF)
-
Filtered resolved regions genic GTF Homo_sapiens_regions.gtf.gz (Same file as is entered automatically for Regions GTF)
Hi Klara,
Great thank you for taking a look. I can certainly re-run the analysis as you suggest and hopefully that should sort the problem in the meantime.
Best wishes,
Fiona
Hi Klara,
I have tried to re-run as you suggested, but for the last three parameters, I could not see the Homo_sapiens_regions.gtf.gz files on the dropdown list and there is no way to type in, so I have used the following:
Do you think this is sensible?
Thank you, Fiona
Hi Klara - do you know if any hg37 gtfs have “transcript_support_level” flag? I can upload that for use?
Hi Charlotte and Klara,
I tried to run one of my samples as per above, and it failed on the Peka step, presumably because the incorrect file was used in the process.
Are there any other work arounds you suggest that could allow me to run this?
Thank you!
Fiona
Hi Fiona
I think thats because you used the seg gtf when regions gtf is required in your screenshot.
If you can find a hg37 ensembl or gencode gtf file with “transcript support level” flag, I can upload that to flow to help you out.
Hi Charlotte,
Yes I tried to add the regions gtf but it wasn’t available for selection on the list.
I’ll have a look for the gtf you suggest just now
Thank you!
Fiona
These are the only files that could be relevant that I could see on Gencode. I’m not sure if the comprehensive GTF contains that flags needed…