Progressively upload large video files without compromising on speed

December 7, 2021 - Erikka Innes in Video upload, Python

If you work with video, you know that when you have a gigantic video file, you must break it into chunks to send it to the server. We at api.video already offer a way for you to break up your video into chunks and send it to us, but today we are introducing a new way of uploading large files: Progressive Upload.

Instead of spending time worrying about getting the bytes right for the header, or relying on our API clients to handle it for you (still a great choice if you have a client in use) you can simplify the header for a chunk to be about what part of your video you're sending. Here's an example of what I mean:

Old header: Content-Range: bytes 0-1000000/200000000

Here you can see the Content-Range header shows the bytes being sent from the video and then the total number of bytes. But there's a couple of issues with this. First, it's tricky setting up the header correctly (which makes the clients great to use since they handle this for you) and second, what if you want to upload a video where you don't know its size? It'd also be great if you could upload video chunks concurrently and have that handled for you. Progressive Upload does this, and the new header looks like this:

New header: Content-Range: part 1/*

Every video chunk just receives a part number. The only video chunk that can be smaller than 5MB is the last chunk. Everything else must be larger. And with this explained, we're ready to prepare and try a code sample.

Prerequisites

For this project, you'll need:

(Optional) Familiarity with other ways to upload (it might make this tutorial easier, but this is written so you don't need the other tutorials)
An api.video account, you can sign up here
Python or a Python virtual environment to work with (recommended)
Requests library for Python installed

Progressive Upload code sample

The code sample we're going to walk through looks like this:

# How to upload a large video that is over 200 MiB to api.video. (Though this script will also work for videos under 200 MiB if you want to test it out.)

import requests
import os 

# Set up variables for endpoints (we will create the third URL programmatically later)
auth_url = "https://ws.api.video/auth/api-key"
create_url = "https://ws.api.video/videos"

# Set up headers and payload for first authentication request
headers = {
    "Accept": "application/json",
    "Content-Type": "application/json"
}

payload = {
    "apiKey": "your key here"
}

# Send the first authentication request to get a token. The token can be used for one hour with the rest of the API endpoints.
response = requests.request("POST", auth_url, json=payload, headers=headers)
response = response.json()
token = response.get("access_token")

# Set up headers for authentication - the rest of the endpoints use Bearer authentication.

auth_string = "Bearer " + token

headers2 = {
    "Accept": "application/json",
    "Content-Type": "application/json",
    "Authorization": auth_string
}

# Create the video container payload, you can add more parameters if you like, check out the docs at https://docs.api.video
payload2 = {
    "title": "Demo Vid from my Computer",
    "description": "Video upload test."
}

# Send the request to create the container, and retrieve the videoId from the response.
response = requests.request("POST", create_url, json=payload2, headers=headers2)
response = response.json()
videoId = response["videoId"]

# Create endpoint to upload video to - you have to add the videoId into the URL
upload_url = create_url + "/" + videoId + "/source"

# Set up the chunk size. This is how much you want to read from the file every time you grab a new chunk of your file to read.
# If you're doing a big upload, the recommendation is 50 - 80 MB (50000000-80000000 bytes). It's listed at 6MB (6000000 bytes) because 
# then you can try this sample code with a small file just to see how it will work.  The minimum size for a chunk is 5 MiB.

CHUNK_SIZE = 6000000

# This is our chunk reader. This is what gets the next chunk of data ready to send.
def read_in_chunks(file_object, CHUNK_SIZE):
    while True:
        data = file_object.read(CHUNK_SIZE)
        if not data:
            break
        yield data

    
# Upload your file by breaking it into chunks and sending each piece 
def upload(file, url):
    content_name = str(file)
    content_path = os.path.abspath(file)

# These next two lines are optional. You can grab file size info if you want using os. Or you can skip it.
#   content_size = os.stat(content_path).st_size

#   print(content_name, content_path, content_size)

    f = open(content_path, "rb")

    index = 0
    offset = 0
    part_num = 1
    headers = {}

    for chunk in read_in_chunks(f, CHUNK_SIZE):
        offset = index + len(chunk)
        headers['Content-Range'] = 'part %s/*' % (part_num)
        headers['Authorization'] = auth_string
        index = offset

        try:

            file = {"file": chunk}
            if len(chunk) < 5242880:
                headers['Content-Range'] = 'part %s/%s' % (part_num, part_num)
            r = requests.post(url, files=file, headers=headers)
            print(r.json())
            print("r: %s, Content-Range: %s" % (r, headers['Content-Range']))
            part_num += 1
        except Exception as e:
            print(e)
        

upload('Your_video_here.mp4', upload_url)

In the code sample, you're doing the following things:

Retrieving a token using your API key.
Setting up a video container to upload to. You upload in two parts - first you create a container with metadata about your video. Then you retrieve the videoID for your video container, and reference it so you can upload your chunks to the right container. You can only upload once to a container, so make sure you have everything set up correctly. Otherwise you'll have to start over with a new container.
We set a variable CHUNK_SIZE to represent the size of each chunk of video we'll send. You can make this bigger if you like to speed up the process.
We write a little function read_in_chunks that breaks a chunk of data off our video file to send, then remembers the spot it was in so when we come back to the function again, it gives us the next chunk of data. This is accomplished by using yield in the function.
In the upload function, we get ready to loop through our video file, sending a chunk of data each time with the appropriately labeled header. For Content-Range we just increase the part number by 1 each time, and leave it set as unknown, like: part 1/* If you know your video size ahead of time, you can change the asterisk to the final part number. This code sample assumes you don't know the size. What you do per chunk is check if it's smaller than 5 MB, since this will signify we've reached the final piece of the video. For this last piece of video, we change the total number of parts to match the part we're on. This signals to api.video that we aren't going to send any more chunks and that the upload is complete.