Uploading large files via Python API

almazh · 19 April 2024 13:25

I’m trying to upload multiplexed fastq files onto the flow webserver, but it gives me an error every time. I saw in the documentation that there is another way of doing it, through python API. So I tried running it on camp/nemo via conda, but it gives me the following error when I try to install the flowbio package (see the code below). Would appreciate if someone could help me with that as I’ve got a bunch of iclips, and rna-seqs to analyse. Thanks!

(flow_upload) [huseyna@login001 home]$ pip install git+https://github.com/goodwright/flowbio.git
Collecting git+https://github.com/goodwright/flowbio.git
  Cloning https://github.com/goodwright/flowbio.git to /tmp/pip-req-build-dghmvluh
  Running command git clone --quiet https://github.com/goodwright/flowbio.git /tmp/pip-req-build-dghmvluh
  Resolved https://github.com/goodwright/flowbio.git to commit 13cb634016500ce9f7b4a5fc1ca1389b382ed96c
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-req-build-dghmvluh/setup.py", line 3, in <module>
          with open("README.rst") as f:
      FileNotFoundError: [Errno 2] No such file or directory: 'README.rst'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

sam · 19 April 2024 15:23

Hi - the issue with the github install should be resolved now. You can also install the library with pip install flowbio.

Do you remember what the error was when uploading via the web interface?

almazh · 20 April 2024 21:01

thanks, Sam! I did manage to install the package. However, when I’m running the script for a multiplexed file, it gives me a following error:

python flow_script.py 
Traceback (most recent call last):
  File "/nemo/project/home/huseyna/flow_script.py", line 7, in <module>
    lane = client.upload_lane(
AttributeError: 'Client' object has no attribute 'upload_lane'

Here’s the script:

import flowbio

client = flowbio.Client()
client.login("xxxxxxxxx", "xxxxxxx")

# Upload lane
lane = client.upload_lane(
    "HUS6754A1-10_S178_L002",
    "/camp/home/users/huseyna/DATA_ANALYSIS/CLIP/CLIP_WTvsKO_anno.xlsx",
"/camp/home/huseyna/data/STPs/babs/inputs/almaz.huseynova/asf/PM23435/240301_A01366_0527_BHNY2FDMXY/fastq/HUS6754A1-10_S178_L002_R1_001.fastq.gz",
    ignore_warnings=True,
    progress=True
)

print(lane)

sam · 20 April 2024 21:13

Ah this is code for an older version of the library - examples from the most recent one are in the README but essentially there are separate upload_multiplexed and upload_annotation methods now.

almazh · 20 April 2024 21:33

Thanks again! it’s uploading now

also re the web interface, I tried it again earlier, and it failed again. The error just says “There was a problem uploading the data.” Not sure why, perhaps the file is too large (30GB).

almazh · 20 April 2024 21:49

oh no, it crashed after 11%…I’m not sure what this error means:

Uploading HUS6754A11_S27_L001_R1_001.fastq.gz:  11%|███████████▎                                                                                            | 1871/17157 [03:59<32:32,  7.83it/s]
Traceback (most recent call last):
  File "/nemo/project/home/huseyna/flow_script.py", line 7, in <module>
    multiplexed = client.upload_multiplexed(
  File "/camp/home/huseyna/.conda/envs/flow_upload/lib/python3.9/site-packages/flowbio/upload.py", line 160, in upload_multiplexed
    resp = self.execute(UPLOAD_MULTIPLEXED, retries=retries, variables={
  File "/camp/home/huseyna/.conda/envs/flow_upload/lib/python3.9/site-packages/flowbio/client.py", line 27, in execute
    raise GraphQlError(resp["errors"])
flowbio.client.GraphQlError: [{'message': '{"error": "Not authorized"}', 'locations': [{'line': 4, 'column': 5}], 'path': ['uploadMultiplexedData']}]

sam · 20 April 2024 22:59

Ah I think this may be the same issue as from the web upload - in any case I think I’ve resolved it on the backend now. It should complete if you try again now I think.

Topic		Replies	Views
API Upload Error Flow App bug	3	24	4 November 2024
RNAseq sample upload bug Flow App bug	1	15	13 January 2025
Demultiplexing error - Ultraplex Pipelines	25	159	17 May 2024
Prepare CLIP-Seq Genomes Error CLIP	3	50	23 August 2024
Nextflow.py v0.10 is released nextflow.py	0	7	21 March 2025

Uploading large files via Python API

Related topics