[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Million Song Dataset

To: AUDITORY@xxxxxxxxxxxxxxx
Subject: Million Song Dataset
From: Thierry Bertin-Mahieux <tb2332@xxxxxxxxxxxx>
Date: Tue, 8 Feb 2011 10:48:33 -0500
Approved-by: tb2332@xxxxxxxxxxxx
Delivery-date: Tue Feb 8 10:50:43 2011
List-archive: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>
List-help: <http://lists.mcgill.ca/scripts/wa.exe?LIST=AUDITORY>, <mailto:LISTSERV@LISTS.MCGILL.CA?body=INFO AUDITORY>
List-owner: <mailto:AUDITORY-request@LISTS.MCGILL.CA>
List-subscribe: <mailto:AUDITORY-subscribe-request@LISTS.MCGILL.CA>
List-unsubscribe: <mailto:AUDITORY-unsubscribe-request@LISTS.MCGILL.CA>
Reply-to: Thierry Bertin-Mahieux <tb2332@xxxxxxxxxxxx>
Sender: AUDITORY - Research in Auditory Perception <AUDITORY@xxxxxxxxxxxxxxx>
User-agent: Internet Messaging Program (IMP) H3 (4.1.6)

It is our pleasure to announce the release of The Million Songdataset, a new resource to support music information research.

The Million Song Dataset is a freely-available collection of audiofeatures and metadata for a million contemporary popular music tracks.

http://labrosa.ee.columbia.edu/millionsong/

Its purposes are:
   * To encourage research on algorithms that scale to commercial sizes
   * To provide a reference dataset for evaluating research

* As a shortcut alternative to creating a large dataset with TheEcho Nest's API

   * To help new researchers get started in the MIR field

The core of the dataset is the feature analysis and metadata for onemillion songs, provided by The Echo Nest. The dataset does not includeany audio, only the derived features. Note, however, that sample audiocan be fetched from services like 7digital, using code we provide.

The Million Song Dataset is a collaborative project between The EchoNest and LabROSA. It is hosted by Infochimps and supported in part bythe NSF.


Aside from instructions on how to get the dataset, the website contains:
   * code and tutorials to get you started

* benchmark results for some example tasks (automatic tagging,artist recognition, ...)* artist-level mappings to link to the Yahoo Ratings Dataset (91%of the artist ratings covered)* demos including how to fetch audio snippets, mapping artists ona world map, ...

   * forum, FAQ, blog, etc.

To better understand where this dataset comes from and what it aims toachieve, you can read Dan Ellis' blog post: http://bit.ly/hF8ozR

We are keen to receive questions, comments and suggestions, and welook forward to your new number-crunching MIR algorithms!


Thierry Bertin-Mahieux and Dan Ellis, for the Million Song Dataset team
http://labrosa.ee.columbia.edu/millionsong/

Prev by Date: article on informational masking
Next by Date: Re: article on informational masking
Previous by thread: Re: article on informational masking
Next by thread: Re: AUDITORY Digest - 7 Feb 2011 to 8 Feb 2011 (#2011-36)
Index(es):
- Date
- Thread