Audio features

From SOVARR Wiki
Jump to: navigation, search

Category: Audio Features

Alphabetical Index of Audio Features

The Audio Features Index is base on the Audio Feature Catalog and lists over 400 features collected from feature extraction source code, research papers, existing linked data resources and documentation.

RDF audio features catalogue

The catalogue has been compiled mostly programmatically from various sources including research papers, feature extraction software source code, existing RDF descriptions of feature extraction algorithms, and software documentation. It is serialised with the rdflib Python libraries in RDF and N3 formats and can be accessed on the project website: N3 format.

The RDF format can be viewed in most browsers with the free OpenLink Data Explorer.

The catalogue uses a simple and straighforward encoding without ontological structure to gain a better understanding of the scope of the domain.

The following example of Mel-scale Frequency Cepstral Coefficients (MFCC) entry in the catalogue demonstrates how the information is stored in triples describing various characteristics including textual description, abbreviation, application domain, computational complexity, compuational workflow, software tools that implement this feature, dimensions of the output, the representational domain of the feature (temporal, frequency, cepstral, etc), the semantic interpretation of the feature (physical, perceptual, symbolic), temporal category (interframe or intraframe) and others.

@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix local: <http://sovarr.c4dm.eecs.qmul.ac.uk/features/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

local:MelscaleFrequencyCepstralCoefficients a owl:Class ;
    dc:description """
	Generates a set of MFCCs; these are obtained from a band-based frequency 
	representation (using the Mel scale by default), and then a discrete cosine 
	transform (DCT). The DCT is an efficient approximation for principal components 
	analysis, so that it allows a compression, or reduction of dimensionality, 
	of the data, in this case reducing 42 band readings to a smaller set of MFCCs. 
	A small number of features (the coefficients) end up describing the spectrum. 
	The MFCCs are commonly used as timbral descriptors.
	""";
    local:abbreviation "MFCC" ;
    local:appdomain "several" ;
    local:complexity "high" ;
    local:computation "Discrete Cosine Transform",
        "Discrete Fourier Transform",
        "Logarithm",
        "Regression",
        "Windowing" ;
    local:computedIn "Aubio",
        "CLAM",
        "MIRToolbox",
        "Marsyas",
        "SuperCollider",
        "jMIR",
        "libXtract",
        "yaafe" ;
    local:dimensions "parameterized" ;
    local:domain "cepstral" ;
    local:feature "Mel-scale Frequency Cepstral Coefficients" ;
    local:level "physical" ;
    local:model "psychoacoustic" ;
    local:tag "Spectral",
        "Timbre" ;
    local:temporalscale "intraframe" .

References

  • Mitrovic, D., Zeppelzauer, M., and Breitender, C. (2010). Features for content-based audio retrieval. Advances in Computers, vol. 78, pp. 71–150.
Personal tools