Is it possible to not use Spotify in 2020? I tried. Here’s how.

One of the main functions of spotify was to supply users with music content. For maximum control I wanted all music I consumed to reside on my hard drive. I decided to download all music I listened to in Spotify from YouTube.

The first step was to export all liked songs and playlists from Spotify. I used the tool exportify.net for this purpose. Since it exports only playlists, not liked songs, I created a new playlist i like and added all liked songs to that playlist in Spotify.

Next, on exportify, The Export All button didn’t work for me so I downloaded each playlist individually. This step yielded a bunch of csv files which I saved in data/spotify.

I used the following function to read one of the playlist files.

import csv
import codecs

def read_playlist_file(path):
    with codecs.open(path, encoding='latin-1') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            yield dict(title=row['Track Name'], artist=row['Artist Name'],
                       album=row['Album Name'], csv=path)

I used the youtube-search python package to search each song on YouTube.

from urllib.parse import urljoin
from youtube_search import YoutubeSearch

def search_song(song):
    # Create search function for convenience
    search = lambda q: YoutubeSearch(q, max_results=1).to_dict()
    
    # Try combining artist and title with hyphen or just space if that fails
    for query in (sep.join((song['title'], song['artist'])) for sep in (' - ', ' ')):
        results = search(query)
        if results:
            break
    else:
        # If both failed search for title and artist separately and combine the results
        results = search(song['title']) + search(song['artist'])
        
    # Return search results in nice dicts
    for result in results:
        yield dict(id=result['id'], title=result['title'],
                   url=urljoin('https://www.youtube.com', result['link']),
                   channel=dict(name=result['channel_name'], url=result['channel_link']))

As you can see, I tried a few different search queries for each song from very specific (including artist and song title) to more general (including only artist or title) in order to get a result at least somewhat similar to the original track in case there wasn’t an exact match.

I used the pytube3 python package to download the actual YouTube videos. The official version had an unfixed bug so I used the version from https://gitlab.com/obuilds/public/pytube (revision ob-v1).

from pytube import YouTube

def download_song(song):
    youtube = YouTube(song['url'])

    # Get highest bitrate audio stream
    audio = youtube.streams.filter(only_audio=True, subtype='mp4').order_by('abr').last()
    audio.download('data/youtube', song['title'])
    
    # Return path of downloaded file
    return audio.get_file_path(filename=song['title'], output_path='data/youtube')

Last but not least, I converted the downloaded mp4 audio stream files to ogg files with the help of the ffmpeg cli tool (apt install ffmpeg).

import os
import subprocess

def mp4_to_ogg(path):
    newpath = os.path.splitext(path)[0] + '.ogg'
    subprocess.call(['ffmpeg', '-y', '-hide_banner', '-loglevel', 'warning',
                     '-i', path, '-codec:a', 'libvorbis', '-qscale:a', '3',
                     '-f', 'ogg', '-vn', newpath])
    return newpath

Finally, putting it all together.

import time
import random

for filename in os.listdir('data/spotify'):
    for song in read_playlist(os.path.join('data/spotify', filename)):
        for result in search_song(song):
            mp4 = download_song(result)
            ogg = mp4_to_ogg(mp4)
            time.sleep(random.randint(1, 10))  # Be nice and sleep a bit

Occasionally, a download would fail so I added some logic to keep track of done songs in order to enable easy restarts.

import json

status_file = 'data/status.json'
song_tuple = lambda s: tuple(s['title'], s['artist'])
status = set(tuple(x) for x in json.load(status_file))

for song in playlist:
    if song_tuple(song) in status:
        continue
    process_song(...)
    status.add(song_tuple(song))
    json.dump(list(status), open(status_file, 'w'))

A few hours later, I had my entire music collection on my hard drive. I deleted the mp4s with rm *.mp4.

One problem I observed was that many YouTube music videos had a bunch of speech in them, usually in the beginning before the song started. To fix this, one could try to employ some sort of automatic detection of the non-music part and cut it off. Or find a different data source. I might explore one of these options at a later time.