1,013 research outputs found
Sample Mixed-Based Data Augmentation for Domestic Audio Tagging
Audio tagging has attracted increasing attention since last decade and has
various potential applications in many fields. The objective of audio tagging
is to predict the labels of an audio clip. Recently deep learning methods have
been applied to audio tagging and have achieved state-of-the-art performance,
which provides a poor generalization ability on new data. However due to the
limited size of audio tagging data such as DCASE data, the trained models tend
to result in overfitting of the network. Previous data augmentation methods
such as pitch shifting, time stretching and adding background noise do not show
much improvement in audio tagging. In this paper, we explore the sample mixed
data augmentation for the domestic audio tagging task, including mixup,
SamplePairing and extrapolation. We apply a convolutional recurrent neural
network (CRNN) with attention module with log-scaled mel spectrum as a baseline
system. In our experiments, we achieve an state-of-the-art of equal error rate
(EER) of 0.10 on DCASE 2016 task4 dataset with mixup approach, outperforming
the baseline system without data augmentation.Comment: submitted to the workshop of Detection and Classification of Acoustic
Scenes and Events 2018 (DCASE 2018), 19-20 November 2018, Surrey, U
Pathological Evidence Exploration in Deep Retinal Image Diagnosis
Though deep learning has shown successful performance in classifying the
label and severity stage of certain disease, most of them give few evidence on
how to make prediction. Here, we propose to exploit the interpretability of
deep learning application in medical diagnosis. Inspired by Koch's Postulates,
a well-known strategy in medical research to identify the property of pathogen,
we define a pathological descriptor that can be extracted from the activated
neurons of a diabetic retinopathy detector. To visualize the symptom and
feature encoded in this descriptor, we propose a GAN based method to synthesize
pathological retinal image given the descriptor and a binary vessel
segmentation. Besides, with this descriptor, we can arbitrarily manipulate the
position and quantity of lesions. As verified by a panel of 5 licensed
ophthalmologists, our synthesized images carry the symptoms that are directly
related to diabetic retinopathy diagnosis. The panel survey also shows that our
generated images is both qualitatively and quantitatively superior to existing
methods.Comment: to appear in AAAI (2019). The first two authors contributed equally
to the paper. Corresponding Author: Feng L
A Unified Framework for Multi-intent Spoken Language Understanding with prompting
Multi-intent Spoken Language Understanding has great potential for widespread
implementation. Jointly modeling Intent Detection and Slot Filling in it
provides a channel to exploit the correlation between intents and slots.
However, current approaches are apt to formulate these two sub-tasks
differently, which leads to two issues: 1) It hinders models from effective
extraction of shared features. 2) Pretty complicated structures are involved to
enhance expression ability while causing damage to the interpretability of
frameworks. In this work, we describe a Prompt-based Spoken Language
Understanding (PromptSLU) framework, to intuitively unify two sub-tasks into
the same form by offering a common pre-trained Seq2Seq model. In detail, ID and
SF are completed by concisely filling the utterance into task-specific prompt
templates as input, and sharing output formats of key-value pairs sequence.
Furthermore, variable intents are predicted first, then naturally embedded into
prompts to guide slot-value pairs inference from a semantic perspective.
Finally, we are inspired by prevalent multi-task learning to introduce an
auxiliary sub-task, which helps to learn relationships among provided labels.
Experiment results show that our framework outperforms several state-of-the-art
baselines on two public datasets.Comment: Work in progres
Smart Education in Action: AI-Based Quality Assessment of Journalism and Media Teaching Practices using the Neutrosophic Cosine Similarity Measure
The evolution of artificial intelligence (AI) has sparked a transformative shift in journalism and media education, urging academic institutions to reevaluate traditional pedagogies. As journalism integrates with intelligent tools—from automated news writing to data-driven content curation, there is a rising demand to align teaching quality with emerging media practices. This study presents a comprehensive decision-making framework to assess the quality of teaching practices in journalism and communication programs, addressing the growing intersection of AI and media education. Through multi-criteria decision-making (MCDM) techniques, this research captures the complexities of integrating smart technologies in curriculum delivery, student engagement, and pedagogical innovation. Eight evaluation criteria and eight representative teaching models or institutions are analyzed to offer a holistic view of intelligent teaching quality. The Neutrosophic Cosine Similarity Measure is used to deal with uncertainty information. Two MCDM methods are used, such as DEMATEL method to show the criteria weights and the MARCOS method to rank the alternatives
Unveiling the Power of Self-supervision for Multi-view Multi-human Association and Tracking
Multi-view multi-human association and tracking (MvMHAT), is a new but
important problem for multi-person scene video surveillance, aiming to track a
group of people over time in each view, as well as to identify the same person
across different views at the same time, which is different from previous MOT
and multi-camera MOT tasks only considering the over-time human tracking. This
way, the videos for MvMHAT require more complex annotations while containing
more information for self learning. In this work, we tackle this problem with a
self-supervised learning aware end-to-end network. Specifically, we propose to
take advantage of the spatial-temporal self-consistency rationale by
considering three properties of reflexivity, symmetry and transitivity. Besides
the reflexivity property that naturally holds, we design the self-supervised
learning losses based on the properties of symmetry and transitivity, for both
appearance feature learning and assignment matrix optimization, to associate
the multiple humans over time and across views. Furthermore, to promote the
research on MvMHAT, we build two new large-scale benchmarks for the network
training and testing of different algorithms. Extensive experiments on the
proposed benchmarks verify the effectiveness of our method. We have released
the benchmark and code to the public
Silence as a quiet strategy:understanding the consequences of workplace ostracism through the lens of sociometer theory
Existing research has predominantly framed defensive silence as an avoidance response to interpersonal mistreatments. Moving beyond this view, this study theorizes defensive silence as a proactive strategy for managing interpersonal relationships through the lens of sociometer theory. We posit that workplace ostracism will reduce employees’ organization-based self-esteem (OBSE), which in turn increases their subsequent defensive silence to avert further damage to relationships. In addition, we also expect a moderating role of the sense of power in mitigating the negative impact of workplace ostracism on OBSE. Based on the multi-wave, multi-source data of 345 employees and their 82 immediate supervisors, we tested all the hypotheses. Results from multilevel modeling indicated that OBSE mediated the indirect effect of workplace ostracism on defensive silence, and also supported the moderation role of sense of power. Our theoretical model provides a novel perspective that deepens the understanding of defensive silence and suggests implications for managerial practices.</p
The Ultrafast Kerr Effect in Anisotropic and Dispersive Media
The ultrafast optical Kerr effect (OKE) is widely used to investigate the
structural dynamics and interactions of liquids, solutions and solids by
observing their intrinsic nonlinear temporal responses through nearly-collinear
four-wave mixing (FWM). Non-degenerate mixing schemes allow for background free
detection and can provide information on the interplay between a material's
internal degrees of freedom. Here we show a source of temporal dynamics in the
OKE signal that is not reflective of the intrinsic nonlinear response but
arises from group index and momentum mismatch. It is observed in two-color
experiments on condensed media with sizable spectral dispersion, a common
property near an optical resonance. In particular birefringence in crystalline
solids is able to entirely change the character of the OKE signal via the
off-diagonal tensor elements of the nonlinear susceptibility. We develop a
detailed description of the phase-mismatched ultrafast OKE and show how to
extract quantitative information on the spectrally resolved birefringence and
group index from time-resolved experiments in one and two dimensions.Comment: 12 pages, 6 figure
- …
