[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[AUDITORY] Announcing SOUNDATA: A Python library for reproducible use of audio datasets



*** apologies for any cross-postings ***


Dear colleagues,


We’re excited to announce the release of soundata, a python library for reproducible use of audio datasets.


Soundata can be installed via: pip install soundata

The source code lives here: https://github.com/soundata/soundata


We’re launching with 14 popular environmental sound datasets, with plans to continue expanding with additional datasets spanning a range of audio domains including speech and bioacoustics. For music datasets see mirdata, which was the inspiration for soundata.


Soundata makes it easy to:

  • Download datasets to a common location and format

  • Validate that a downloaded dataset is complete and perfectly matches a canonical version

  • Load audio and annotation files into a common format

  • Parse clip-level metadata for detailed evaluations


We hope soundata will help the community to:

  • Ensure results are reproducible by working against exactly the same data

  • Save time by avoiding manual downloads and having to write custom dataset parsers

  • Automate large-scale download, training, and evaluation pipelines

  • Increase the visibility of new datasets by adding them to soundata


Soundata is a cross-organizational collaboration spanning researchers from MARL@NYU, Adobe Research, MTG@UPF, and GPA@UdelaR


You can learn more about the library on our docs page: https://soundata.readthedocs.io/


A bit more about the motivation for soundata can be found in our (work in progress) paper:


"Soundata: A Python library for reproducible use of audio datasets"

Magdalena Fuentes, Justin Salamon, Pablo Zinemanas, Martín Rocamora, Genís Plaja, Irán R. Román, Marius Miron, Xavier Serra, Juan Pablo Bello

[arXiv]


We *welcome and encourage* contributions from the community, especially data loaders for datasets not included yet in soundata.


Cheers,

Justin & Magdalena on behalf of the soundata team



--
Justin Salamon | Adobe Research | www.justinsalamon.com