The Audio Feature Ontology Framework

We have developed the Audio Feature Ontology (AFO) as part of a library of modular ontologies in order to facilitate linked data representation of content-based audio features and increase interoperability, reproducibility and sustainability in music information retrieval workflows.

Due to profusion of digital audio content there is a growing demand for vocabularies that facilitate interoperability between music related data sources. This includes content-based audio feature data, which can be utilised for various commercial and research purposes such as music recommendation, automatic genre classification, or query by humming services.

We propose a modular approach towards ontological representation of common sets of audio features. Since there are many different ways to structure audio features depending on a specific task or theoretically motivated organising principles, a common representation would have to account for multiple conceptualisations of the domain and facilitate diverging representations of common features.

The aim of the ontology is to foster greater agreement on the representation of audio features within research communities and to extend the original version of the ontology. It may not be feasible however that a single ontology may completely represent the different conceptualisations of the domain that exist in the research communities. For example, in some contexts, audio features are categorised according to musicological concepts, such as pitch, rhythm and timbre, while in others the computational workflow used in calculating the features determines the taxonomic hierarchy. The main scope of the ontology is to provide a framework for communication, feature representation, and the description of the association of features and audio signals. AFO provides a model for describing acoustical and musicological data and allows for publishing content-derived information about audio recordings. It was initially created within the Music Ontology framework. The AFO subsumes concepts from other ontologies in this framework, including the Event and Timeline ontologies. With regards to different conceptualisations of feature representations, the Audio Feature Ontology deals with data density and temporal characteristics.

We propose a layered conceptualisation of feature extraction algorithms which accommodates this view. The proposed model resembles the layered conceptualisation of intellectual works proposed in the FRBR model. However, due to having very different base entities in this domain, our ontology does not directly derive from FRBR. The Functional Requirements for Bibliographic Records (FRBR) aims at providing a framework that identifies and clearly defines the entities of interest to users of bibliographic records, the attributes of each entity, and the types of relationships that operate between entities. FRBR defines a set of entities divided into three groups of intellectual work: products, creators and subjects. The products group describes entities ranging from abstract to concrete: work, expression, manifestation, item. Work may stand for a poem, the lyrics of a song, or a classical composition. Expression represents a particular realisation that remains intangible and reflects artistic qualities, such as a recital of a musical piece, or illustrations in a book. Manifestation represents all the physical embodiments of an expression, that bear the same characteristics, with respect to both intellectual content and physical form, for example a book about the SemanticWeb published in 2011. Item is the only concrete entity in the model, a single exemplar of a manifestation, for instance, a copy of the aforementioned book on my shelf, a compact disc in the collection of the British Library, or an audio file on my computer. The FRBR model provides useful concepts and relationships to describe the production workflow of intellectual works.

We propose a layered conceptualisation of feature extraction algorithms which accommodates this view. The proposed model resembles the layered conceptualisation of intellectual works proposed in the FRBR model. However, due to having very different base entities in this domain, our ontology does not directly derive from the ontology associated with this model.

FRBR aims to provide a framework that identifies and clearly defines the entities of interest to users of bibliographic records, the attributes of each entity, and the types of relationships that operate between entities. FRBR defines a set of entities divided into three groups of intellectual work: products, creators and subjects. The products group describes entities ranging from abstract to concrete: work, expression, manifestation, item. Work may stand for a poem, the lyrics of a song, or a classical composition. Expression represents a particular realisation that remains intangible and reflects artistic qualities, such as a recital of a musical piece, or illustrations in a book. Manifestation represents all the physical embodiments of an expression, that bear the same characteristics, with respect to both intellectual content and physical form, for example a book about the SemanticWeb published in 2011. Item is the only concrete entity in the model, a single exemplar of a manifestation, for instance, a copy of the aforementioned book on my shelf, a compact disc in the collection of the British Library, or an audio file on my computer. The FRBR model provides useful concepts and relationships to describe the production workflow of intellectual works.

An audio feature in our domain is best conceptualised as a purely abstract concept which encapsulates the theoretical description of analysis and extraction of meaningful information from audio signals for a particular purpose. The audio feature model loosely corresponds to the abstract expression of intellectual works in FRBR. In the audio feature domain it represents the computational workflow necessary to compute the feature. Similarly to how a work may be realised throughout several expressions, a physical process may be represented in a different domain using several models. A model may have many different audio feature implementations, loosely corresponding to manifestations, for instance, an algorithm implemented in different programming languages or released as a component of a software library or application. An instance of an implementation is an audio feature extraction instance which is a specific execution of an implementation in a specific context.

The framework of the Audio Feature Ontology encourages a modular approach to information sharing in music informatics workflows by defining a workflow based model for feature extraction workflows. The Audio Feature Vocabulary was created in order to provide a comprehensive glossary of terms for tool and task specific annotations.

The ontology is accessible at http://sovarr.c4dm.eecs.qmul.ac.uk/af/ontology/1.0#

I posted earlier about the Audio Feature Vocabulary and the catalog here: http://sovarr.c4dm.eecs.qmul.ac.uk/?q=node/258