ARD Audiothek to Tonie automation

marco79cgn · November 26, 2024, 12:21pm

Disclaimer

As the ARD Audiothek is paid by german broadcast fees (Rundfunkgebühr) and since their official desktop website offers explicit drm-free download options by themselves (besides Podcast subscriptions), I don’t consider this topic as piracy. If you have a differnt opinion, feel free to delete this whole thread.

Background

While looking for an automation to play Podcasts or any audio content from ARD Audiothek on the Toniebox (via Teddycloud), I discovered that they are embedding some useful json code inside their html.

So with a simple one-liner, it’s possible to get the audio url of any Audiothek item. For example, this is the url of the latest Kalk & Welk podcast episode: Kalk & Welk · Hinten hat der Fuchs die Eier · Podcast in der ARD Audiothek

curl [URL] | grep -o '<script type="application/ld+json">.*</script>' | sed -e 's/<[^>]*>//g' | jq '.'

Output:

{
  "@context": "https://schema.org/",
  "@type": "PodcastEpisode",
  "encodingFormat": "audio/mp3",
  "inLanguage": "de-DE",
  "isAccessibleForFree": "true",
  "partOfSeries": {
    "@type": "PodcastSeries",
    "name": "Kalk & Welk",
    "url": "https://www.ardaudiothek.de/sendung/kalk-und-welk/10777871/",
    "about": "Nicht alle alten weißen Männer sind doof. Okay, sie sind vielleicht kalk und auch welk - aber Oliver Kalkofe & Oliver Welke sind unterhaltsam, kritisch, witzig - und das nicht nur im Fernsehen.\n\n\n\nKalkofe und Welke sezieren in diesem Podcast jede ..."
  },
  "identifier": "13924839",
  "name": "Hinten hat der Fuchs die Eier",
  "url": "https://www.ardaudiothek.de/episode/kalk-und-welk/hinten-hat-der-fuchs-die-eier/ard/13924839/",
  "description": "Die beiden Boomer Boys denken sich heute Wahlkampf-Slogans für Olaf Scholz aus. Und sie wundern sich, was wir wohl mit den teuren Ausgehuniformen der Bundeswehr machen werden und was im neuen Buch von Angela Merkel steht. Außerdem geht es um den neuen Comic von Lucky Luke und was aus Dieter Bohlen wird, wenn Friedrich Merz Kanzler wird. Die Folge mit den vielleicht schönsten Überleitungen, die in einem Podcast vorgenommen wurden! \nPodcast Tipp: Kein Mucks https://1.ard.de/Kein_Mucks",
  "image": "https://api.ardmediathek.de/image-service/images/urn:ard:image:ad008c3198e04f08?w=1280&ch=12edd212a6dd7e08",
  "datePublished": "2024-11-25T17:30:00+01:00",
  "timeRequired": 3321,
  "associatedMedia": {
    "@type": "MediaObject",
    "contentUrl": "https://rbbmediapmdp-a.akamaihd.net/content/70/5a/705aed27-4b06-4a40-890a-b2ad154b383e/2e4c14a2-f080-42f4-920a-1074c258326e_4844cbd4-9a2b-48fa-9b67-c86ede7bb9fb_256k.mp3"
  },
  "expires": "2026-11-25",
  "productionCompany": "ARD"
}

ㅤ
Shell script automation

I built a shell script which automatically converts a given Audiothek item to taf and uploads it to Teddycloud. Works with macOS & Linux.

Requirements:

opus2tonie cli
jq (json processor)
gnu-tools: curl, grep, sed
Teddycloud (optional, needed for upload)
audiothek-2-tonie.sh bash script

HowTo:

Download the bash script, make it executable (chmod +x audiothek-2-tonie.sh) and edit the first two lines (ip address of your Teddycloud and path to opus2tonie).
Run the script with the following syntax: ./audiothek-2-tonie.sh -s [URL] -u -c

-s [URL] → the URL of the audiothek item (copy/paste from your Browser)
-u → upload to teddycloud (optional)
-c → cleanup/delete all temporary files (optional, only makes sense in combination with -u)

The script supports single episode urls as well as series. If a series is detected, it’s possible to download single episodes or all of them. Chapters will be automatically created for each episode.

Preview (mini series with chapters):

ㅤ
Possibilities

This approach offers lots of possibilities for conversion & automation. In fact it would even be possible to integrate this into Teddycloud by piping the audio url into ffmpeg, convert it on-the-fly to a compatible opus format and stream it to the Toniebox (like already implemented for radiostreams or taps):

curl -s [URL] | grep -o '<script type="application/ld+json">.*</script>' | sed -e 's/<[^>]*>//g' | jq -r '.associatedMedia.contentUrl' | xargs -I % ffmpeg -i % -f s16le -acodec pcm_s16le -ar 48000 -ac 2 -ss 0 -

Another idea is to fully automate the shell script as a cron job with a static Podcast series url. For example you could automatically convert a daily podcast show (e.g. Tagesschau 15 Minuten) at a given time of the day to taf, let it upload to Teddycloud and even assign it to a fixed Tonie figurine programatically without any user interaction (besides a freshness check).

henryk · November 26, 2024, 2:56pm

Sounds really nice.

I see a minor problem. You rely on a very specific part of a website where you do not have control over (the json snippet). This could be changed anytime and the approach is broken.

(In first block you wrote privacy, but I believe you meant piracy)

marco79cgn · November 26, 2024, 3:30pm

Yepp, that might happen and one would have to adopt to the changes. But at the end, the needed information has to be retrieved from somewhere.

Just checked the Mediathek (videos) and it‘s even easier there. They use a real REST api and the needed id is part of the url of the show.

Corrected privacy to piracy, thanks.

chuckf · November 26, 2024, 6:46pm

Can we get a thread where all these useful snippets you provide are linked?

I’ll definetly look into that one, the audiothek seems awesome to get some fresh content for the custom tags!

henryk · November 26, 2024, 9:41pm

i would prefer get them into the wiki