HowTo: convert audio to TAF fast and easy (macOS/Linux)

marco79cgn · November 15, 2024, 9:37am

Intro
I was looking for a way to encode any audio files into taf as fast as possible on a powerful desktop machine (e.g. MacBook Pro). The Teddycloud has an audio encoder built-in (web interface & cli) but it’s really slow since it’s running on a Raspberry Pi 4 (in my case). Whereas running the teddycloud container on your desktop machine just for taf conversion is a bit overkill and adds unnecessary complexity (docker, handling with volumes, certificate check at every start etc.).

I wanted a solution which is fast, easy to use and easy to automate/script (e.g. for batch conversion of multiple files). As a final result, here’s a time comparison for an audiobook with ~75 minutes playtime.

Transcoding time:
Teddycloud Pi4 (Web/CLI): 7m10s
MacBook M1 Pro (native): 45s
→ roughly 9,5x faster.

Solution
A TAF file is basically an opus sound file but with a special header. That’s why it’s not enough to just encode your audio with tools like ffmpeg. Luckily there’s a python script named Opus2Tonie which automates the process and in this short guide I’ll post installation instructions how to set it up on both macOS and Linux.

As a second step, instead of using this script only on the command-line, we’ll create a macOS context menu integration which let’s you convert any folder or file with a single mouse click (thx to @ingorichter). Furthermore, the file will even be uploaded to your Teddycloud library and a native system notification is shown at the end.

ㅤ
Installation instructions (macOS/Linux)

install Homebrew (macOS-only)

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

install ffmpeg

brew install ffmpeg / sudo apt install ffmpeg -y

install opus-tools

brew install opus-tools / sudo apt install opus-tools

install python

brew install python@3.13 / sudo apt install python3.13

setup python3

cd
python3 -m venv .venv
source .venv/bin/activate

install google protobuf

pip3 install protobuf

checkout opus2tonie & patch header

git clone https://github.com/bailli/opus2tonie.git
cd opus2tonie
curl -LOs "https://github.com/bailli/opus2tonie/files/12489794/tonie_header_pb2.py.txt" && mv tonie_header_pb2.py.txt tonie_header_pb2.py

Now you can use opus2tonie.py on the command-line and it should successfully convert any audio file to taf. Until this point, this should work both on macOS and Linux:

python3 opus2tonie.py [INPUT_FILE.mp3] [OUTPUT_FILE.taf]

The input can also be a directory. In this case, all files inside the directory will be merged into one single taf file:

python3 opus2tonie.py [INPUT_DIR] [OUTPUT_FILE.taf]

ㅤ
Automator Quick Action (macOS-only)

In order to trigger the encoding and Teddycloud upload with a simple right click on your mouse in Finder, we have to set up a new Automator quick action which triggers a shell script.

open the Automator app
select “new document” → Quick action
at the top set “Workflow accepts Files and Folders in Finder”
type “shell” in the search field and select the “run shell script” action
paste the following content, change ip address & opus2tonie-path (line 1 & 2)
save the Quck Action → Name will be the name in your context menu

TEDDYCLOUD_IP=192.168.178.11
OPUS_2_TONIE_PATH=/Users/marco/dev/toniebox/opus2tonie

export PATH=/opt/homebrew/bin:$PATH
source ~/.venv/bin/activate
OUTPUT_FILE="${@%.*}.taf"
BASE_NAME=$(basename ${OUTPUT_FILE})

python3 $OPUS_2_TONIE_PATH/opus2tonie.py "$@" $OUTPUT_FILE > $OPUS_2_TONIE_PATH/log.txt 2>&1

if curl -F "file=@$OUTPUT_FILE" "http://$TEDDYCLOUD_IP/api/fileUpload?path=&special=library"; then
	osascript -e "display notification \"Die Datei $BASE_NAME befindet sich jetzt in deiner Teddycloud\" with title \"Upload erfolgreich\"" & 
else
	osascript -e 'display notification "Etwas ist schief gelaufen mit deinem Datei-Upload. Bitte überprüfe die Logs." with title "Teddycloud Fehler"'
fi

In Automator it should look like this:

Happy transcoding!

marco79cgn · November 16, 2024, 12:14pm

Bug

When using a directoriy as input, the opus2tonie.py script is considering every possible filetype as a valid candidate, also non-audio types (like jpg for example). This leads to corrupt/unplayable TAF files for example if there’s a cover.jpg in the same directory.

Fix

In opus2tonie.py, change the filter_directories function in line 927 and filter only for audio file extensions like this:

def filter_directories(glob_list):
    result = []
    EXTENTIONS = {".mp3", ".mp2", ".m4a", ".m4b", ".opus", ".ogg", ".wav", ".aac", ".mp4"}
    for name in glob_list:
        ext = os.path.splitext(name)[1]
        if os.path.isfile(name) and ext in EXTENTIONS:
            result.append(name)
    return result

marco79cgn · November 18, 2024, 3:19pm

Batch Encode

If you have opus2tonie.py running on your local machine, here’s a shell script which

batch converts all subfolders to taf
uploads the resulting taf to Teddycloud (optional)
deletes the taf on your PC/Mac (optional)

How it works

Imagine the following file structure: An audiobook series (Sandmann) with many episodes and different tracks for each episode.

The script iterates over all subfolders and creates one taf audio file with chapters for each episode.

Result:

Sandmann - Episode 01 - Gute Nacht.taf
Sandmann - Episode 02 - Schlaf gut.taf
Sandmann - Episode 03 - Träume süß.taf
Sandmann - Episode 04 - Bis zum Morgen.taf

Shell Script (Linux/macOS)

Please save this script in a file (batch-covert.sh), make it executable (chmod +x batch-convert.sh) and adapt the first two lines according to your setup.

Source Code:

#!/bin/bash
TEDDYCLOUD_IP=192.168.178.11
OPUS_2_TONIE_PATH=/Users/marco/dev/toniebox/opus2tonie

SEPARATOR="-----------------------------------------------"
echo $SEPARATOR
while [[ "$#" -gt 0 ]]; do
    case $1 in
        -s|--source) SOURCE="$2"; shift ;;
        -u|--upload) UPLOAD=1 ;;
        -c|--cleanup) CLEANUP=1 ;;
        *) echo "Unbekannter Parameter: $1"; exit 1 ;;
    esac
    shift
done

echo "Quell-Verzeichnis: $SOURCE"
echo "Upload in die Teddycloud: $UPLOAD"
echo "Lösche TAF nach Upload: $CLEANUP"
echo "Log: $OPUS_2_TONIE_PATH/log.txt"

AUDIOBOOK_SERIES=$(basename "$SOURCE")

echo $SEPARATOR
source ~/.venv/bin/activate
cd $SOURCE
for d in */ ; do
    DIRNAME=$(echo "$d" | awk -F/ '{print$1}')
    OUTPUT_FILE="${AUDIOBOOK_SERIES} - ${DIRNAME}.taf"
    echo "Aktueller Ordner: "
    echo "$DIRNAME"
    echo "Starte Transkodierung: "
    echo "${OUTPUT_FILE}..."
    python3 $OPUS_2_TONIE_PATH/opus2tonie.py "$DIRNAME" "$OUTPUT_FILE" >> $OPUS_2_TONIE_PATH/log.txt 2>&1
    if [[ $UPLOAD ]]; then
        echo "Lade Datei in die Teddycloud..."
        response_code=$(curl -s -o /dev/null -F "file=@$OUTPUT_FILE" -w "%{http_code}" "http://$TEDDYCLOUD_IP/api/fileUpload?path=&special=library")
        if [ "${response_code}" != 200 ]; then 
            echo "Fehler beim Upload, Datei wird nicht gelöscht."
            echo $SEPARATOR
            continue
        fi
    fi
    if [[ $CLEANUP ]]; then
        echo "Lösche taf Datei..."
        rm "$OUTPUT_FILE"
    fi
    echo "Fertig!"
    echo $SEPARATOR
done

The script accepts the following input parameters:

-s source-directory → ~/audiobooks/Sandmann in this case (mandatory)
-u → upload to Teddycloud (optional)
-c→ cleanup/delete taf file (optional)

Example call:

./batch-convert.sh -s /Users/marco/audiobooks/Sandmann -u -c

marco79cgn · November 30, 2024, 6:55pm

Docker (to the rescue)

I built a docker image which contains all necessary tools and dependencies for opus2tonie.py. In addition to that I put a little shell script wrapper around it to offer even more possibilities (like batch encoding lots of episodes in a row):

Usage

The intention is to run this container on-demand and only as long as the file converions are running. Recommended is to run it by mounting your current host directory $(pwd) inside the container (/data), let it do the conversion and then it falls asleep again.

I added lots of usage examples on github, the easiest one being:

Convert a single file audiobook.mp3 from your current directory

Command:
docker run --rm -v $(pwd):/data audio2tonie transcode -s /data/audiobook.mp3

Output: 
audiobook.taf

It’s also possible to batch convert all subfolders from a given directory one after another (-r → recursively):

Convert all subfolders in a given folder into one taf per subfolder (with chapters for each file)

Subfolders in current directory: 
 |-Episode 01
 |-Episode 02
 |-Episode 03

Command:
docker run --rm -v $(pwd):/data audio2tonie transcode -s /data -r

Result: Episode 01.taf, Episode 02.taf, Episode 03.taf

Output:

To keep the container as small as possible, the base image is bookworm-lite as with integrated Python. I’m also using ffmpeg static builds (which cut the size to a quarter).

I haven’t deployed this container to a Docker registry yet, so please build it on your own for now (it’s fast!). Would be happy for any feedback!

marco79cgn · December 2, 2024, 10:34pm

Update: I pushed the container to the Github Container Registry.

You can just pull it with:

docker pull ghcr.io/marco79cgn/audio2tonie:latest

I also included an additional wrapper for the original opus2tonie python script. Call it from your host with:

docker run --rm -v $(pwd):/data ghcr.io/marco79cgn/audio2tonie opus2tonie ...

And of course all of the previous comment still applies to the transcode command, e.g.:

docker run --rm -v $(pwd):/data ghcr.io/marco79cgn/audio2tonie transcode -s /data/audiobook.mp3

j1gg3 · December 6, 2024, 8:28am

This is amazing! Thank you very much, used it already multiple times.

I have an additional script which helps to create the required structure for my audio books by placing all episodes into the respective folder.

#!/bin/bash

# Funktion zur Organisation der Hörbuchdateien
organize_audiobooks() {
    local folder_path="$1"
    cd "$folder_path" || { echo "Ordner nicht gefunden!"; exit 1; }

    # Dateien analysieren und organisieren
    for file in *.ogg; do
        # Teile den Dateinamen anhand des Patterns
        # Beispiel: "Die Biene Maja - Majas Geburt - Teil 01.ogg"
        base_name=$(basename "$file")
        audiobook=$(echo "$base_name" | cut -d'-' -f1 | xargs) # Hörbuchname
        episode_title=$(echo "$base_name" | cut -d'-' -f2 | xargs) # Episodentitel
        part=$(echo "$base_name" | grep -oP 'Teil \d+' | xargs) # Teilnummer

        # Hauptordner basierend auf dem Hörbuchnamen erstellen
        main_folder="$audiobook"
        mkdir -p "$main_folder"

        # Unterordner basierend auf dem Episodentitel erstellen
        episode_folder="$main_folder/$episode_title"
        mkdir -p "$episode_folder"

        # Datei in den entsprechenden Ordner verschieben
        mv "$file" "$episode_folder/"
    done

    # Überprüfen, wie viele Dateien in jedem Ordner sind
    echo "Überprüfung der Dateien in den Ordnern:"
    for episode_folder in "$folder_path"/*/*; do
        file_count=$(find "$episode_folder" -type f | wc -l)
        if [ "$file_count" -eq 0 ]; then
            echo "Warnung: Keine Dateien im Ordner '$episode_folder'."
        else
            echo "Im Ordner '$episode_folder' liegen $file_count Datei(en)."
        fi
    done

    echo "Hörbuchdateien wurden erfolgreich organisiert!"
}

# Benutzer nach dem Ordner fragen
read -p "Gib den Pfad zum Ordner mit den Dateien an: " source_folder
organize_audiobooks "$source_folder"

I was wondering about where you got the information how to use the Teddycloud API? My itention is to modify the upload path, to place the generated .TAF files into the destination folder which reduce the effort.

henryk · December 6, 2024, 9:41am

Hi, there are two ways at least.

look into the sources of teddycloud on GitHub.
(more easy) just do the encoding in the webgui with dev mode open (F12) and have a look into the network tab. There are the api calls shown.

marco79cgn · December 6, 2024, 10:09am

I used the latter and checked the network tab while uploading a file with the GUI. Then I tried with curl and used the -F "file=@output_file" option instead (which is a little more easy).

You can create directories in your teddyCloud library like this:

curl 'https://192.168.178.11:8443/api/dirCreate?special=library' \
  --data-raw '/Biene Maja'

When uploading to this folder in your library, use it as path attribute in your http query string:

curl -s -F "file=@$output_file" -w "%{http_code}" "http://$TEDDYCLOUD_IP/api/fileUpload?path=%2FBiene%20Maja&special=library"

You can also download files from your library via API. This is possible as taf or even es raw opus file (without the taf header). I will add a tap input parameter to this script which automatically downloads all needed files, splits existing tafs into their chapters and repackages them in a combined taf file. This will preserve the chapters inside a single episode (which is not possible with the native tap feature).

marco79cgn · December 9, 2024, 3:34pm

Update Docker: added ARD Audiothek support

I just updated the Docker container with support for ARD Audiothek content. It uses their REST API to retrieve the content (no html scraping). Just copy & paste the link of the content and use it as -s parameter (source).

Usage

Command:

docker run --rm -v $(pwd):/data ghcr.io/marco79cgn/audio2tonie transcode -s "[AUDIOTHEK-URL]"

Example URL:

https://www.ardaudiothek.de/episode/3nach9-podcast/hape-birthday-ein-3nach9-spezial-zum-60-geburtstag-von-hape-kerkeling/radio-bremen/13935781/

Output:

The output filename will be created automatically and produces valid filenames (without unsupported characters), like Hape.Birthday.ein.3nach9.-Spezial.zum.60.Geburtstag.von.Hape.Kerkeling.taf.

MichaelSprite · December 13, 2024, 8:33am

Hi,

thank you for your effort! That’s great!

Here is a feature request - would it be possible to scrape all items of a Podcast (programSet) from ARD Audiothek to convert it to Taf?

Thank you very much!

marco79cgn · December 13, 2024, 9:28am

Yes, that‘s totally possible. I already integrated this in my dedicated script here:

It works for mini series or Podcasts up to 12 episodes. This is because I was scraping the html in this former script and the main page only includes the latest 12 items.

In the meantime I figured out how their API works - both for single items and whole program sets (Podcasts/Mini Series). I will implement it as soon as I find some sparetime. The only problem is that there are Podcasts with over 100 episodes and the limit for chapters in tafs is 99.

ks61625 · December 13, 2024, 11:58am

Maybe TAP could do the trick here…

marco79cgn · December 13, 2024, 2:02pm

No, a tap has the same restrictions. It’s the Toniebox which doesn’t like it. The files play perfectly fine in the integrated web player.

ks61625 · December 13, 2024, 2:58pm

And as a cracy idea, what about providing it as a webstream…?
Would it be possible to assign a “podcast.sh” as source into a tonie-json file…
…and this would just return one episode after another just like it would be any other radio station (mpd liek) stream that ffmpeg in TC could handle?

marco79cgn · December 17, 2024, 1:59pm

Update Docker: add Teddycloud upload

I updated the Docker container with support for automatic upload to your Teddycloud.

Usage

Use the new upload parameter follwed by your teddyCloud ip address: -u 192.168.178.11

The file will be uploaded to the root directory of your library.

Command:
docker run --rm -v $(pwd):/data ghcr.io/marco79cgn/audio2tonie transcode -s /data/test.mp3 -u 192.168.178.11

Output:

henryk · December 21, 2024, 10:12pm

It works fine with docker desktop also on windows, but the upload fails all the times

Any hint for that?

marco79cgn · December 21, 2024, 10:30pm

I haven’t used Windows in a long time but could it be a file access permission problem?

Do you have curl on Windows? If yes, could you try to upload a file manually from Terminal/Powershell?

curl -F "file=@test.taf" "http://$TEDDYCLOUD_IP/api/fileUpload?path=&special=library"

henryk · December 21, 2024, 10:32pm

this works. maybe the problems are spaces in the filename?

Edit: Even that works fine…

henryk · December 21, 2024, 10:35pm

I assume it’s related to docker desktop

marco79cgn · December 21, 2024, 10:45pm

Strange. Are you using macvlan for teddycloud and is both tc and Docker Desktop running on the same machine?