Datasets

This chapter will briefly review datasets for music demixing by summarizing the previous tutorial. You can find a more detailed introduction and explanation from the previous tutorial.

Data for Music Demixing

At a high level, the inputs and outputs of a source separation model look like this:

../_images/source_separation_io.png

Fig. 1 Inputs and outputs of a source separation model.

MUSDB18: tutorial with 7 sec samples

The MUSDB18 dataset [RLStoter+17] is one of the most widely used datasets for music demixing. For example, its uncompressed version (also known as MUSDB18-HQ [RLS+19]) was the official training dataset for Leaderboard A of the MDX challenge.

This section shows how to play with the musdb package.

Frist, install musdb pacakge.

pip install musdb

After the installation, please load musdb with download=True. This will download 7 seconds sample tracks of MUSDB18.

import musdb
mus = musdb.DB(download=True)

We can use mus as a iterator.

print('number of tracks: {}'.format(len(mus)))
number of tracks: 144

Let us load the first track of the MUSDB18 dataset

track = mus[0]
print('track name:\t', track)
print('track length:\t {} secs'.format(track.audio.shape[0]//track.rate))
track name:	 A Classic Education - NightOwl
track length:	 6 secs

Let us listen to the mixture (i.e., the input in Fig 1!)

from IPython.display import Audio, display

display(Audio(track.audio.T, rate=track.rate))

Let us listen to the target audio tracks (i.e., the outputs in Fig 1!)

for source in track.sources.keys():
    print('source name: {}'.format(source))
    display(Audio(track.sources[source].audio.T, rate=track.rate))    
source name: vocals
source name: drums
source name: bass
source name: other

Thus, the input and output of the MUSDB18’s music demixing task are:

  • input: track.audio

  • output: {source: track.sources[source].audio for source in ['vocals', 'drums', 'bass', 'other']}

MUSDB18-HQ: How to use the full version?

You downloaded a sample dataset with 7 seconds tracks in the tutorial above.

To use the full version,

  1. You need to download the dataset here.

  2. unzip musdb18hq.zip to $your_musdb18hq_dir

  3. load your musdb.DB with is_wav=True and root=$your_musdb18hq_dir.

import musdb
musdb18hq_dir = '/mnt/d/repos/musdb18hq' # <= $your_musdb18hq_dir

mus = musdb.DB(root=musdb18hq_dir, is_wav=True)
print('number of tracks: {}\n'.format(len(mus)))

track = mus[0]
print('track name:\t', track)
print('track length:\t {} secs'.format(track.audio.shape[0]//track.rate))
number of tracks: 150

track name:	 A Classic Education - NightOwl
track length:	 171 secs

Quick overview of existing datasets

In the MDX challenge, participants must train their system on the training set of MUSDB18-HQ dataset (or MUSDB18 dataset) for Leaderboard A. For Leaderboard B, there have been no constraints in the choice of training data. (i.e., any available datasets can be used by the participants).

Here’s a quick overview of existing datasets released after 2015 for Music Demixing:

Dataset

Year

Instrument categories

Tracks

Avgerage duration (s)

Full songs

Stereo

MedleyDB

2014

82

63

206 \(\pm\) 121

DSD100

2015

4

100

251 \(\pm\) 60

MUSDB18

2017

4

150

236 \(\pm\) 95

MUSDB18-HQ

2019

4

150

236 \(\pm\) 95

Slakh2100

2019

34

2100

249

You can check the full list of datasets here. This extended table is based on: SigSep/datasets, and reproduced with permission.

Models trained with unrealised datasets:

Some models such as Spleeter [HKVM20] and UMXL Predictor [StoterULM19] were trained with unrealised datasets. Some models submitted to Leaderboard B of the MDX challenge were trained with private datasets.