Datasets¶
This chapter will briefly review datasets for music demixing by summarizing the previous tutorial. You can find a more detailed introduction and explanation from the previous tutorial.
Data for Music Demixing¶
At a high level, the inputs and outputs of a source separation model look like this:
Fig. 1 Inputs and outputs of a source separation model.¶
MUSDB18: tutorial with 7 sec samples¶
The MUSDB18 dataset [RLStoter+17] is one of the most widely used datasets for music demixing. For example, its uncompressed version (also known as MUSDB18-HQ [RLS+19]) was the official training dataset for Leaderboard A of the MDX challenge.
This section shows how to play with the musdb package.
Frist, install musdb pacakge.
pip install musdb
After the installation, please load musdb with download=True. This will download 7 seconds sample tracks of MUSDB18.
import musdb
mus = musdb.DB(download=True)
We can use mus as a iterator.
print('number of tracks: {}'.format(len(mus)))
number of tracks: 144
Let us load the first track of the MUSDB18 dataset
track = mus[0]
print('track name:\t', track)
print('track length:\t {} secs'.format(track.audio.shape[0]//track.rate))
track name: A Classic Education - NightOwl
track length: 6 secs
Let us listen to the mixture (i.e., the input in Fig 1!)
from IPython.display import Audio, display
display(Audio(track.audio.T, rate=track.rate))
Let us listen to the target audio tracks (i.e., the outputs in Fig 1!)
for source in track.sources.keys():
print('source name: {}'.format(source))
display(Audio(track.sources[source].audio.T, rate=track.rate))
source name: vocals
source name: drums
source name: bass
source name: other
Thus, the input and output of the MUSDB18’s music demixing task are:
input:
track.audiooutput:
{source: track.sources[source].audio for source in ['vocals', 'drums', 'bass', 'other']}
MUSDB18-HQ: How to use the full version?¶
You downloaded a sample dataset with 7 seconds tracks in the tutorial above.
To use the full version,
You need to download the dataset here.
unzip
musdb18hq.zipto$your_musdb18hq_dirload your
musdb.DBwithis_wav=Trueandroot=$your_musdb18hq_dir.
import musdb
musdb18hq_dir = '/mnt/d/repos/musdb18hq' # <= $your_musdb18hq_dir
mus = musdb.DB(root=musdb18hq_dir, is_wav=True)
print('number of tracks: {}\n'.format(len(mus)))
track = mus[0]
print('track name:\t', track)
print('track length:\t {} secs'.format(track.audio.shape[0]//track.rate))
number of tracks: 150
track name: A Classic Education - NightOwl
track length: 171 secs
Quick overview of existing datasets¶
In the MDX challenge, participants must train their system on the training set of MUSDB18-HQ dataset (or MUSDB18 dataset) for Leaderboard A. For Leaderboard B, there have been no constraints in the choice of training data. (i.e., any available datasets can be used by the participants).
Here’s a quick overview of existing datasets released after 2015 for Music Demixing:
Dataset |
Year |
Instrument categories |
Tracks |
Avgerage duration (s) |
Full songs |
Stereo |
|---|---|---|---|---|---|---|
2014 |
82 |
63 |
206 \(\pm\) 121 |
✅ |
✅ |
|
2015 |
4 |
100 |
251 \(\pm\) 60 |
✅ |
✅ |
|
2017 |
4 |
150 |
236 \(\pm\) 95 |
✅ |
✅ |
|
2019 |
4 |
150 |
236 \(\pm\) 95 |
✅ |
✅ |
|
2019 |
34 |
2100 |
249 |
✅ |
❌ |
You can check the full list of datasets here. This extended table is based on: SigSep/datasets, and reproduced with permission.
Models trained with unrealised datasets:¶
Some models such as Spleeter [HKVM20] and UMXL Predictor [StoterULM19] were trained with unrealised datasets. Some models submitted to Leaderboard B of the MDX challenge were trained with private datasets.