Uploading large files via Python API

I’m trying to upload multiplexed fastq files onto the flow webserver, but it gives me an error every time. I saw in the documentation that there is another way of doing it, through python API. So I tried running it on camp/nemo via conda, but it gives me the following error when I try to install the flowbio package (see the code below). Would appreciate if someone could help me with that as I’ve got a bunch of iclips, and rna-seqs to analyse. Thanks!

(flow_upload) [huseyna@login001 home]$ pip install git+https://github.com/goodwright/flowbio.git
Collecting git+https://github.com/goodwright/flowbio.git
  Cloning https://github.com/goodwright/flowbio.git to /tmp/pip-req-build-dghmvluh
  Running command git clone --quiet https://github.com/goodwright/flowbio.git /tmp/pip-req-build-dghmvluh
  Resolved https://github.com/goodwright/flowbio.git to commit 13cb634016500ce9f7b4a5fc1ca1389b382ed96c
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [6 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-req-build-dghmvluh/setup.py", line 3, in <module>
          with open("README.rst") as f:
      FileNotFoundError: [Errno 2] No such file or directory: 'README.rst'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Hi - the issue with the github install should be resolved now. You can also install the library with pip install flowbio.

Do you remember what the error was when uploading via the web interface?

thanks, Sam! I did manage to install the package. However, when I’m running the script for a multiplexed file, it gives me a following error:

python flow_script.py 
Traceback (most recent call last):
  File "/nemo/project/home/huseyna/flow_script.py", line 7, in <module>
    lane = client.upload_lane(
AttributeError: 'Client' object has no attribute 'upload_lane'

Here’s the script:

import flowbio

client = flowbio.Client()
client.login("xxxxxxxxx", "xxxxxxx")

# Upload lane
lane = client.upload_lane(
    "HUS6754A1-10_S178_L002",
    "/camp/home/users/huseyna/DATA_ANALYSIS/CLIP/CLIP_WTvsKO_anno.xlsx",
"/camp/home/huseyna/data/STPs/babs/inputs/almaz.huseynova/asf/PM23435/240301_A01366_0527_BHNY2FDMXY/fastq/HUS6754A1-10_S178_L002_R1_001.fastq.gz",
    ignore_warnings=True,
    progress=True
)

print(lane)

Ah this is code for an older version of the library - examples from the most recent one are in the README but essentially there are separate upload_multiplexed and upload_annotation methods now.

Thanks again! it’s uploading now :partying_face:

also re the web interface, I tried it again earlier, and it failed again. The error just says “There was a problem uploading the data.” Not sure why, perhaps the file is too large (30GB).

oh no, it crashed after 11%…I’m not sure what this error means:

Uploading HUS6754A11_S27_L001_R1_001.fastq.gz:  11%|███████████▎                                                                                            | 1871/17157 [03:59<32:32,  7.83it/s]
Traceback (most recent call last):
  File "/nemo/project/home/huseyna/flow_script.py", line 7, in <module>
    multiplexed = client.upload_multiplexed(
  File "/camp/home/huseyna/.conda/envs/flow_upload/lib/python3.9/site-packages/flowbio/upload.py", line 160, in upload_multiplexed
    resp = self.execute(UPLOAD_MULTIPLEXED, retries=retries, variables={
  File "/camp/home/huseyna/.conda/envs/flow_upload/lib/python3.9/site-packages/flowbio/client.py", line 27, in execute
    raise GraphQlError(resp["errors"])
flowbio.client.GraphQlError: [{'message': '{"error": "Not authorized"}', 'locations': [{'line': 4, 'column': 5}], 'path': ['uploadMultiplexedData']}]

Ah I think this may be the same issue as from the web upload - in any case I think I’ve resolved it on the backend now. It should complete if you try again now I think.

1 Like