Search CORE

699 research outputs found

Normalizing Non-Linear Speech Speed for Maintaining Listener Comprehension at Increased Playback Speeds

Author: Slaney Malcolm
Publication venue: Technical Disclosure Commons
Publication date: 27/12/2021
Field of study

This publication describes methods of normalizing the speed of non-linear speech by applying an algorithm to allow for improved listener comprehension at increased playback speeds. The algorithm computes an amount of tension for a given audio file and subsequently computes a running average of the tension. A high-pass filter is then applied to the tension to remove the average tension. The resulting audio file allows a listener to increase playback speed or maintain a desired average speed while retaining comprehension

Technical Disclosure Common

PLSA on Large Scale Image Databases

Author: Lienhart Rainer
Slaney Malcolm
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

OPUS Augsburg

Crossref

A Unified System for Chord Transcription and Key Extraction Using Hidden Markov Models.

Author: Kyogu Lee
Malcolm Slaney
Publication venue: ISMIR
Publication date
Field of study

[TODO] Add abstract here

ZENODO

Imaging with Diffraction Tomography

Author: Kak A. C.
Slaney Malcolm
Publication venue: 'Purdue University (bepress)'
Publication date: 01/02/1985
Field of study

The problem of cross sectional (tomographic) imaging bf objects with diffracting sources is addressed. Specifically the area of investigation is the effect of multiple scattering and attenuation phenomena in diffraction imaging. This work reviews the theory and limits of first order diffraction tomography and studies iterative techniques that can be used to improve the quality of tomographic imaging with diffracting sources. Conventional (straight-ray) tomographic algorithms are not valid when used with acoustic or microwave energy. Thus more sophisticated algorithms are needed; First order diffraction tomography uses a linearized version of the wave equation and gives an especially simple reconstruction algorithm. This work reviews first order approximations to the scattered field and studies the quality of the reconstructions when the assumptions behind these approximations are violated. It will be shown that the Born approximation is valid when the phase change across the object is less than it and the Rytov approximation is valid when the refractive index changes by less than two or three percent. Better reconstructions will be based on higher order approximations to the scattered field. This work describes two fixed point algorithms (the Born and the Rytov approximations) and an algebraic approach to more accurately calculate the scattered fields. The limits of each of these approaches is discussed and simulated results are shown. Finally a review of higher order inversion techniques is presented. Each of these techniques is reviewed and some of their limitations are discussed

Purdue E-Pubs

CNN Architectures for Large-Scale Audio Classification

Author: Chaudhuri Sourish
Ellis Daniel P. W.
Gemmeke Jort F.
Hershey Shawn
Jansen Aren
Moore R. Channing
Plakal Manoj
Platt Devin
Saurous Rif A.
Seybold Bryan
Slaney Malcolm
Weiss Ron J.
Wilson Kevin
Publication venue
Publication date: 10/01/2017
Field of study

Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task.Comment: Accepted for publication at ICASSP 2017 Changes: Added definitions of mAP, AUC, and d-prime. Updated mAP/AUC/d-prime numbers for Audio Set based on changes of latest Audio Set revision. Changed wording to fit 4 page limit with new addition

arXiv.org e-Print Archive

Crossref

The CRC Plotting Package

Author: Azimi Mani
Crawford Carl
Slaney Malcolm
Publication venue: 'Purdue University (bepress)'
Publication date: 01/10/1984
Field of study

The CRC Plotting Package is a device independent graphics system. Subroutines for generating graphics exist for programs written in FORTRAN or C. A program called Qplot exists to plot binary vectors generated as the output of any program

Purdue E-Pubs

Solving Demodulation as an Optimization Problem

Author: Gregory Sell
Malcolm Slaney
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Social network visualizations of streaming data: Design and use considerations

Author: Daniel M Russell
Malcolm Slaney
Publication venue
Publication date: 03/04/2020
Field of study

Abstract. Understanding networks of people linked by some common factor is an important task in many domains. Most commonly, a user creates a visualization of social interactions to see the patterns of interactions between individuals, and then used to find and identify important groups. Networks of individuals and links between them form graphs that vary with time and importance. Visualizing the changes in social networks over time is a non-trivial design task, imposing interesting demands on the visualization and interaction model. In this paper we briefly analyze the user requirements for interactive visualizations of streaming social network data. We find that these continuously updated, dynamic displays need: (1) controls that permit time-based control of the visualization, including pausing, restarting and variable speed playback of the data, (2) the ability to continue importing and processing streamed information even the display is paused, (3) a visually represented method to track changes in the displays over time, (4) interaction methods to allow drilldown from the visualization to original source data, and (5) information extraction from the displayed social network. We describe our visualization tool, SSNV, showing how it embodies these interaction requirements

CiteSeerX