Introduction
In this tutorial, we’ll walk through building a Flask API that leverages OpenAI’s Whisper AI to transcribe audio files. The API will handle multiple files concurrently and provide real-time progress updates for each transcription task. By the end of this tutorial, you’ll have a fully functional Flask-based service capable of processing audio files efficiently while keeping users informed about the progress.
What is Whisper AI?
Whisper AI is an open-source automatic speech recognition (ASR) model developed by OpenAI. It is designed to convert spoken language into written text with remarkable accuracy across multiple languages. Whisper supports a wide range of tasks, including speech-to-text transcription, language detection, and translation. Its versatility makes it suitable for applications such as transcription services, voice assistants, meeting summaries, podcast indexing, and more.
One of the standout features of Whisper is its ability to handle diverse audio inputs, from clear studio recordings to noisy real-world environments. This robustness stems from its training on a vast dataset of multilingual and multitask audio data, enabling it to generalize well across different accents, dialects, and recording conditions.
Whisper Models and Their Uses
Whisper comes in several pre-trained model sizes, each offering a trade-off between performance and resource requirements:
Model size | Description |
---|---|
Tiny | The smallest and fastest model, ideal for resource-constrained environments or quick prototyping. It has fewer parameters and is less accurate but highly efficient. |
Base | A balanced model suitable for general-purpose transcription tasks. It offers a good mix of speed and accuracy. |
Small | Slightly larger and more accurate than the Base model, making it a solid choice for most use cases. |
Medium | A high-performance model that delivers excellent accuracy at the cost of increased computational demands. |
Large | The most accurate model, designed for scenarios where precision is critical. However, it requires significant computational resources and is best suited for powerful hardware like GPUs. |
Each model variant can be selected based on your specific needs. For example, if you’re building a lightweight transcription service for mobile devices, the Tiny model might suffice. On the other hand, if you’re working on a professional-grade transcription platform, the Large model would be more appropriate.
Benefits of Using Whisper AI
Benefit | Description |
---|---|
Multilingual Support | Whisper supports over 90 languages, making it a versatile tool for global applications. |
Open Source | Being open source, Whisper allows developers to customize and extend its capabilities to fit their unique requirements. |
High Accuracy | Its state-of-the-art architecture ensures reliable transcriptions even in challenging audio conditions. |
Ease of Use | With libraries available in Python, integrating Whisper into your projects is straightforward. |
In this tutorial, we’ll focus on using the Base model for transcription, but you can easily switch to other models depending on your needs.
Prerequisites
Before diving into the tutorial, ensure you have the following prerequisites in place:
-
A server running Ubuntu or Debian
- You need access to the root user or a user with sudo permissions.
- Before you start, you should complete some basic configuration, including setting up a firewall. If you haven't done so, you can follow this guide to configure
ufw
(Uncomplicated Firewall).
-
Basic knowledge
- Familiarity with Python and Flask.
- Understanding of REST APIs and how they work.
- Basic command-line skills for interacting with the server.
-
Software dependencies
- Python 3.8 or higher, the Python package manager
pip
,venv
andffmpeg
already installedpython3 --version sudo apt install python3-pip sudo apt install python3-venv sudo apt install ffmpeg
- Python 3.8 or higher, the Python package manager
Minimum hardware requirements
-
CPU: Ideally an Intel/AMD CPU that supports AVX512 or DDR5. Whisper AI performs better on modern CPUs with advanced instruction sets. To check if your CPU supports AVX512 and DDR5, run the following commands:
lscpu | grep -o 'avx512' && sudo dmidecode -t memory | grep -i "DDR5"
-
RAM: 16GB (minimum). Transcription tasks can be memory-intensive, especially when handling multiple files concurrently.
-
Disk space: About 50GB of free disk space. This ensures you have enough room for uploaded audio files, transcription outputs, and logs.
-
GPU: Not mandatory
By ensuring these prerequisites are met, you’ll have a smooth experience setting up and running the Flask API for audio transcription using Whisper AI. Let’s get started! 🚀
Step 1 - Setting Up the Environment
Before diving into the code, let’s set up the project structure and install the necessary dependencies.
Project Structure
Create the following directory structure:
flask-whisper-api/
├── app.py # Main Flask application
├── requirements.txt # Python dependencies
├── uploads/ # Folder to store uploaded audio files
└── transcriptions/ # Folder to store transcription results
Installing Dependencies
Create a requirements.txt
file with the following content:
Flask==3.1.0
openai-whisper==20240930
Create and activate a virtual environment:
python3 -m venv venv
source venv/bin/activate
Install the dependencies:
pip install -r requirements.txt
Step 2 - Building the Flask API
Now, let’s dive into building the Flask API step by step. We’ll break down the process into smaller subsections, explaining each part of the code and its purpose. By the end, you’ll have a fully functional API that transcribes audio files and provides a download link for the transcription.
Edit app.py
and add the chunks described below.
Step 2.1 - Importing Required Libraries
We start by importing the necessary libraries. Each library serves a specific purpose in our application:
- Flask: A lightweight web framework for building APIs.
- os: Provides utilities for interacting with the operating system (e.g. file handling).
- threading: Enables concurrent execution of tasks.
- queue.Queue: Manages a queue of transcription tasks.
- whisper: The Whisper AI library for speech-to-text transcription.
- concurrent.futures.ThreadPoolExecutor: Handles multiple transcription tasks concurrently.
- time: Simulates progress updates during transcription.
from flask import Flask, request, jsonify
import os
import threading
from queue import Queue
from whisper import load_model
from concurrent.futures import ThreadPoolExecutor
import time
Step 2.2 - Setting Up Flask and Configurations
Here, we initialize the Flask app and configure key settings like upload and transcription directories. We also define the maximum number of concurrent workers and load the Whisper model.
app = Flask(__name__)
# Configuration
UPLOAD_FOLDER = 'uploads' # Folder to store uploaded audio files
TRANSCRIPTIONS_FOLDER = 'transcriptions' # Folder to store transcription results
MAX_WORKERS = 5 # Number of concurrent transcription tasks
# Create directories if they don't exist
os.makedirs(UPLOAD_FOLDER, exist_ok=True)
os.makedirs(TRANSCRIPTIONS_FOLDER, exist_ok=True)
# Global state for tracking progress
progress_queue = Queue() # Queue to manage transcription tasks
transcription_status = {} # Tracks the status of each transcription task
# Load the Whisper model (base model for general-purpose use)
model = load_model("base")
Step 2.3 - Creating the /upload
Endpoint
This endpoint allows users to upload audio files. Once uploaded, the file is saved, and a unique transcription ID is generated. This ID is used to track the progress of the transcription.
@app.route('/upload', methods=['POST'])
def upload_file():
"""
Handles file uploads and queues the file for transcription.
Returns a unique transcription ID for progress tracking.
"""
if 'file' not in request.files:
return jsonify({"error": "No file part"}), 400
file = request.files['file']
if file.filename == '':
return jsonify({"error": "No selected file"}), 400
# Save the uploaded file
file_path = os.path.join(UPLOAD_FOLDER, file.filename)
file.save(file_path)
# Generate a unique transcription ID
transcription_id = str(hash(file_path))
transcription_status[transcription_id] = {"status": "pending", "progress": 0}
# Add the task to the queue
progress_queue.put((transcription_id, file_path))
return jsonify({
"message": "File uploaded successfully",
"transcription_id": transcription_id
}), 200
Step 2.4 - Creating the /progress
Endpoint
This endpoint allows users to check the progress of a specific transcription task using its unique transcription ID.
@app.route('/progress/<transcription_id>', methods=['GET'])
def get_progress(transcription_id):
"""
Retrieves the progress of a transcription task.
Returns the status, progress percentage, and download URL (if completed).
"""
status = transcription_status.get(transcription_id)
if not status:
return jsonify({"error": "Invalid transcription ID"}), 404
# Include download URL if transcription is completed
if status["status"] == "completed":
status["download_url"] = f"/download/{transcription_id}.txt"
return jsonify(status), 200
Step 2.5 - Defining the Transcription Function
This function performs the actual transcription. It updates the progress periodically and saves the result to a file. If successful, it provides a download URL for the transcription.
def transcribe_audio(transcription_id, file_path):
"""
Transcribes an audio file using Whisper AI.
Updates progress periodically and saves the result to a file.
"""
try:
transcription_status[transcription_id]["status"] = "in_progress"
# Simulate progress updates
for i in range(1, 11):
time.sleep(1) # Simulating transcription work
transcription_status[transcription_id]["progress"] = i * 10
# Perform actual transcription
result = model.transcribe(file_path)
transcription_text = result["text"]
# Save transcription to file
output_filename = f"{transcription_id}.txt"
output_path = os.path.join(TRANSCRIPTIONS_FOLDER, output_filename)
with open(output_path, 'w') as f:
f.write(transcription_text)
# Update status and include download URL
transcription_status[transcription_id]["status"] = "completed"
transcription_status[transcription_id]["output_path"] = output_path
transcription_status[transcription_id]["download_url"] = f"/download/{output_filename}"
except Exception as e:
transcription_status[transcription_id]["status"] = "failed"
transcription_status[transcription_id]["error"] = str(e)
Step 2.6 - Handling Concurrent Tasks with a Worker
We use a background worker thread with a ThreadPoolExecutor
to handle multiple transcription tasks concurrently. This ensures that up to MAX_WORKERS
tasks run simultaneously.
def worker():
"""
Background worker to process transcription tasks concurrently.
Uses a ThreadPoolExecutor to manage multiple tasks.
"""
with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
while True:
transcription_id, file_path = progress_queue.get()
executor.submit(transcribe_audio, transcription_id, file_path)
progress_queue.task_done()
# Start the background worker thread
threading.Thread(target=worker, daemon=True).start()
Step 2.7 - Adding a /download
Endpoint
This endpoint allows users to download the transcription file once it’s ready.
@app.route('/download/<filename>', methods=['GET'])
def download_file(filename):
"""
Serves the transcription file for download.
"""
file_path = os.path.join(TRANSCRIPTIONS_FOLDER, filename)
if not os.path.exists(file_path):
return jsonify({"error": "File not found"}), 404
with open(file_path, 'r') as f:
content = f.read()
return jsonify({
"filename": filename,
"content": content
}), 200
Step 2.8 - Running the Flask App
Finally, we add the code to run the Flask app in debug mode. This will start the server and make the API accessible.
if __name__ == '__main__':
app.run(debug=True)
Click here to view a full sample code
Here’s the complete code for your Flask app:
from flask import Flask, request, jsonify import os import threading from queue import Queue from whisper import load_model from concurrent.futures import ThreadPoolExecutor import time app = Flask(__name__) # Configuration UPLOAD_FOLDER = 'uploads' TRANSCRIPTIONS_FOLDER = 'transcriptions' MAX_WORKERS = 5 os.makedirs(UPLOAD_FOLDER, exist_ok=True) os.makedirs(TRANSCRIPTIONS_FOLDER, exist_ok=True) progress_queue = Queue() transcription_status = {} model = load_model("base") @app.route('/upload', methods=['POST']) def upload_file(): if 'file' not in request.files: return jsonify({"error": "No file part"}), 400 file = request.files['file'] if file.filename == '': return jsonify({"error": "No selected file"}), 400 file_path = os.path.join(UPLOAD_FOLDER, file.filename) file.save(file_path) transcription_id = str(hash(file_path)) transcription_status[transcription_id] = {"status": "pending", "progress": 0} progress_queue.put((transcription_id, file_path)) return jsonify({ "message": "File uploaded successfully", "transcription_id": transcription_id }), 200 @app.route('/progress/<transcription_id>', methods=['GET']) def get_progress(transcription_id): status = transcription_status.get(transcription_id) if not status: return jsonify({"error": "Invalid transcription ID"}), 404 if status["status"] == "completed": status["download_url"] = f"/download/{transcription_id}.txt" return jsonify(status), 200 @app.route('/download/<filename>', methods=['GET']) def download_file(filename): file_path = os.path.join(TRANSCRIPTIONS_FOLDER, filename) if not os.path.exists(file_path): return jsonify({"error": "File not found"}), 404 with open(file_path, 'r') as f: content = f.read() return jsonify({ "filename": filename, "content": content }), 200 def transcribe_audio(transcription_id, file_path): try: transcription_status[transcription_id]["status"] = "in_progress" for i in range(1, 11): time.sleep(1) transcription_status[transcription_id]["progress"] = i * 10 result = model.transcribe(file_path) transcription_text = result["text"] output_filename = f"{transcription_id}.txt" output_path = os.path.join(TRANSCRIPTIONS_FOLDER, output_filename) with open(output_path, 'w') as f: f.write(transcription_text) transcription_status[transcription_id]["status"] = "completed" transcription_status[transcription_id]["output_path"] = output_path transcription_status[transcription_id]["download_url"] = f"/download/{output_filename}" except Exception as e: transcription_status[transcription_id]["status"] = "failed" transcription_status[transcription_id]["error"] = str(e) def worker(): with ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor: while True: transcription_id, file_path = progress_queue.get() executor.submit(transcribe_audio, transcription_id, file_path) progress_queue.task_done() threading.Thread(target=worker, daemon=True).start() if __name__ == '__main__': app.run(debug=True)
Step 3 - Testing the API
Now that we’ve built the Flask API, let’s test it to ensure everything works as expected. We’ll explore multiple tools and methods for interacting with the API, including curl, Postman, and even Python scripts. This section will guide you through uploading an audio file, checking the transcription progress, and downloading the final transcription file.
First, run the Python script:
python3 app.py
Step 3.1 - Using curl
to Test the API
Uploading a File
To upload an audio file using curl
, run the following command:
curl -v -X POST http://127.0.0.1:5000/upload \
-F "file=@path/to/audio.mp3"
Example Response:
{
"message": "File uploaded successfully",
"transcription_id": "1234567890"
}
The transcription_id
is crucial for tracking the progress of the transcription.
Checking Progress
Once the file is uploaded, you can check the progress of the transcription using the /progress/<transcription_id>
endpoint:
curl http://127.0.0.1:5000/progress/1234567890
Example Response (In Progress):
{
"status": "in_progress",
"progress": 50
}
Example Response (Completed):
{
"status": "completed",
"progress": 100,
"download_url": "/download/1234567890.txt"
}
Downloading the Transcription
Once the transcription is complete, use the download_url
provided in the progress response to retrieve the transcription file:
curl http://127.0.0.1:5000/download/1234567890.txt
Example Response:
{
"filename": "1234567890.txt",
"content": "This is the transcribed text from the audio file."
}
Step 3.2 - Alternative Tools for Testing
If you’re looking for alternatives to curl
, here are some other tools you can use to test the API:
Tool | Description |
---|---|
Postman | A popular graphical tool for testing APIs. You can create POST requests to upload files, GET requests to check progress, and download transcriptions. |
Insomnia | A lightweight alternative to Postman with similar functionality. |
HTTPie | A user-friendly command-line HTTP client that simplifies API testing compared to curl . |
Swagger UI | If you integrate Swagger/OpenAPI documentation into your Flask app, you can test the API directly from the browser. |
Each of these tools provides a modern and intuitive way to interact with your API, making it easier to debug and validate your Flask application.
By testing the API with curl
and exploring alternative tools like Postman, Insomnia, or HTTPie, you can verify that the Flask application works as intended. Whether you prefer command-line tools or graphical interfaces, you now have multiple ways to interact with your transcription service!
Step 4 - Making the Project Production-Ready
To make the Flask API production ready, you’ll need to configure it to run as a background service, set up a domain, and secure it with SSL. Below are references to guides that will help you achieve this:
-
Set Up and Deploy a Flask Web Application as a Systemctl Service with Nginx
Learn how to deploy your Flask application usinguWSGI
andNginx
, and configure it to run as asystemctl
service for reliability and scalability. Read the guide here. -
Automating SSL Certificate Issuance with acme.sh through DNS
Secure your Flask API with SSL certificates by automating the issuance process usingacme.sh
. This ensures your API is accessible over HTTPS. Read the guide here. -
Simple Firewall Management with UFW
Protect your server by configuring a firewall withUFW
(Uncomplicated Firewall). This guide covers the basics of setting up and managing firewall rules. Read the guide here.
By following these guides, you can ensure your Flask API is production-ready, secure, and accessible via a custom domain. These steps will also help you automate maintenance tasks like SSL certificate renewal and firewall management.
With these configurations in place, your Flask API will be robust, scalable, and ready to handle real-world workloads! 🚀
Conclusion
In this tutorial, we built a Flask API that uses Whisper AI to transcribe audio files concurrently. We also implemented a mechanism to provide real-time progress updates for each transcription task. This setup is scalable and can be extended further to include features like authentication, logging, and error handling.
Feel free to experiment with different Whisper models and adjust the concurrency settings based on your hardware capabilities. Happy coding! 🚀