The Audio Feature Ontology Framework

From SOVARR Wiki
Jump to: navigation, search

The aim of the ontology is to foster greater agreement on the representation of audio features within research communities and to extend the original version of the ontology. It may not be feasible however that a single ontology may completely represent the different conceptualisations of the domain that exist in the research communities. For example, in some contexts, audio features are categorised according to musicological concepts, such as pitch, rhythm and timbre, while in others the computational workflow used in calculating the features determines the taxonomic hierarchy. The main scope of the ontology is to provide a framework for communication, feature representation, and the description of the association of features and audio signals. The Audio Feature Ontology (AFO) provides a model for describing acoustical and musicological data and allows for publishing content-derived information about audio recordings. It was initially created within the Music Ontology framework. The AFO subsumes concepts from other ontologies in this framework, including the Event and Timeline ontologies. With regards to different conceptualisations of feature representations, the Audio Feature Ontology deals with data density and temporal characteristics.

The Audio Feature Vocabulary

The Audio Feature Vocabulary defines terms for the tool and task specific ontologies and implements the model layer of the ontology framework. It is a clean version of the catalogue, which only lists the features without any of their properties with many duplications of terms consolidated. This enables the definition of tool and task specific feature implementations and leaves any categorisation or taxonomic organisation to be specified in the implementation layer. The following figure demonstrates how the framework can be linked to tool specific ontologies, in this example, the Vamp plugins ontology.

The vocabulary also specifies computational workflow models for some of the features, which can be linked to from lower level ontologies. The computational workflow models are based on feature signatures as described in [1]. The signatures represent mathematical operations employed in the feature extraction process with each operation assigned a lexical symbol. It offers a compact description of each feature and enables an easier way of comparing features according to their extraction workflows. Converting the signatures into a linked data format to include them in the vocabulary involves defining a set of OWL classes that handle the representation and sequential nature of the calculations. The operations are implemented as sub-classes of three general classes: transformations, filters and aggregations. For each abstract feature, we define a model property.

The OWL range of the model property is a ComputationalModel class in the Audio Feature Ontology namespace. The operation sequence can be defined through this object's operation sequence property. For example, the signature of the Chromagram feature defined in [1] as ``f F l ∑``, which designates a sequence of (1) windowing (f), (2) Discrete Fourier Transform (F), (3) logarithm (l) and (4) sum (∑) is expressed as a sequence of RDF statements:


afv:Chromagram a owl:Class ;     
    afo:model afv:ChromagramModel ;    
    rdfs:subClassOf afo:AudioFeature .  

afv:ChromagramModel a afo:ComputationalModel;
    afo:operation_sequence afv:Chromagram_operation_sequence_1 .

afv:Chromagram_operation_sequence_1 a afv:Windowing;
    afo:next_operation afv:Chromagram_operation_sequence_2 .

afv:Chromagram_operation_sequence_2 a afv:DiscreteFourierTransform; 	
    afo:next_operation afv:Chromagram_operation_sequence_3 .  

afv:Chromagram_operation_sequence_3 a afv:Logarithm;
    afo:next_operation  afv:Chromagram_operation_sequence_4;

afv:Chromagram_operation_sequence_4 a afo:LastOperation, afv:Sum . 

This structure enables building SPARQL queries of any level of complexity to retrieve comparative information on features from the vocabulary. For a rather straightforward example, we can inquire which features in the vocabulary employ the Discrete Cosine Transform calculation by executing the following query:


SELECT DISTINCT ?feature
WHERE {
    ?sequence rdf:type afv:DiscreteCosineTransform .
    ?x afo:next_operation+ ?sequence .
    OPTIONAL { 
    ?model afo:operation_sequence ?x .
    ?feature afo:model ?model 
    }
    FILTER (!isBlank(?feature))
}
ORDER BY ?feature

This query, when executed in SPARQL 1.1 specification, will produce the following result:

AutocorrelationMFCCs

BarkscaleFrequencyCepstralCoefficients

ModifiedGroupDelay

ModulationHarmonicCoefficients

NoiseRobustAuditoryFeature

PerceptualLinearPrediction

RelativeSpectralPLP


The Audio Feature Ontology

We propose a layered conceptualisation of feature extraction algorithms, which accommodates this view. The proposed model resembles the layered conceptualisation of intellectual works proposed in the FRBR model. However, due to having very different base entities in this domain, our ontology does not directly derive from FRBR. The Functional Requirements for Bibliographic Records (FRBR) aims at providing a framework that identifies and clearly defines the entities of interest to users of bibliographic records, the attributes of each entity, and the types of relationships that operate between entities. FRBR defines a set of entities divided into three groups of intellectual work: products, creators and subjects. The products group describes entities ranging from abstract to concrete: work, expression, manifestation, item. Work may stand for a poem, the lyrics of a song, or a classical composition. Expression represents a particular realisation that remains intangible and reflects artistic qualities, such as a recital of a musical piece, or illustrations in a book. Manifestation represents all the physical embodiments of an expression, that bear the same characteristics, with respect to both intellectual content and physical form, for example a book about the Semantic Web published in 2011. Item is the only concrete entity in the model, a single exemplar of a manifestation, for instance, a copy of the aforementioned book on my shelf, a compact disc in the collection of the British Library, or an audio file on my computer. The FRBR model provides useful concepts and relationships to describe the production workflow of intellectual works.

We propose a layered conceptualisation of feature extraction algorithms, which accommodates this view. The proposed model resembles the layered conceptualisation of intellectual works proposed in the FRBR model. However, due to having very different base entities in this domain, our ontology does not directly derive from the ontology associated with this model. FRBR aims to provide a framework that identifies and clearly defines the entities of interest to users of bibliographic records, the attributes of each entity, and the types of relationships that operate between entities. FRBR defines a set of entities divided into three groups of intellectual work: products, creators and subjects. The products group describes entities ranging from abstract to concrete: work, expression, manifestation, item. Work may stand for a poem, the lyrics of a song, or a classical composition. Expression represents a particular realisation that remains intangible and reflects artistic qualities, such as a recital of a musical piece, or illustrations in a book. Manifestation represents all the physical embodiments of an expression, that bear the same characteristics, with respect to both intellectual content and physical form, for example a book about the Semantic Web published in 2011. Item is the only concrete entity in the model, a single exemplar of a manifestation, for instance, a copy of the aforementioned book on my shelf, a compact disc in the collection of the British Library, or an audio file on my computer. The FRBR model provides useful concepts and relationships to describe the production workflow of intellectual works.

An audio feature in our domain is best conceptualised as a purely abstract concept which encapsulates the theoretical description of analysis and extraction of meaningful information from audio signals for a particular purpose. The audio feature model loosely corresponds to the abstract expression of intellectual works in FRBR. In the audio feature domain it represents the computational workflow necessary to compute the feature. Similarly to how a work may be realised throughout several expressions, a physical process may be represented in a different domain using several models. A model may have many different audio feature implementations, loosely corresponding to manifestations, for instance, an algorithm implemented in different programming languages or released as a component of a software library or application. An instance of an implementation is an audio feature extraction instance, which is a specific execution of an implementation in a specific context.

References

[1] Dalibor Mitrovic, Matthias Zeppelzauer, and Christian Breiteneder. Features for content-based audio retrieval. Advances in Computers, 78:71–150, 2010.

Personal tools