There has been tremendous work in the MIR community to create easy to use feature extractor tools (e.g. Marsyas, jMIR, MIR toolbox, Vamp plugins to name a few), it remains difficult however to know whether a feature computed by one tool is the same as (or compatible/replaceable with) a feature computed by another tool. Moreover, if different tools were used in the same experiment, their outputs typically need conversion to some sort of common format, and for reproducibility, this glue code needs to evolve with the changes of the tools themselves. Similar problems arise with the release of data sets, like MSD or SALAMI, in a variety of different formats, as well as in the use of various Web APIs. The goal of the this project is to investigate if and how audio research communities would benefit from using interoperable file formats, data structures, vocabularies or ontologies, what are the primary needs of MIR researchers, and what are the main barriers to the uptake of shared vocabularies. The project started on October 1, 2012 and it aims to be highly community focussed. As part of this effort, we would like to invite everyone interested for a discussion along the lines of questions such as: Is your research code sustainable? Are your results (and the way they were derived) sufficiently described and easily reproducible? Are you using interoperable tools that allow plugging different components into existing methods/algorithms for flexible experimentation and efficient research workflows?
We would also like to consider practical problems such as the need for describing very large data sets in a compact format, or the potential complexity of using shared and globally unique identifiers to maintain the meaning of data across different tools or over long periods of time. The project finally aims to revise and integrate existing vocabularies and research tools after reflecting on our findings. Any ideas and suggestions on requirements for these would also be greatly appreciated.