1,013 research outputs found

    Sample Mixed-Based Data Augmentation for Domestic Audio Tagging

    Get PDF
    Audio tagging has attracted increasing attention since last decade and has various potential applications in many fields. The objective of audio tagging is to predict the labels of an audio clip. Recently deep learning methods have been applied to audio tagging and have achieved state-of-the-art performance, which provides a poor generalization ability on new data. However due to the limited size of audio tagging data such as DCASE data, the trained models tend to result in overfitting of the network. Previous data augmentation methods such as pitch shifting, time stretching and adding background noise do not show much improvement in audio tagging. In this paper, we explore the sample mixed data augmentation for the domestic audio tagging task, including mixup, SamplePairing and extrapolation. We apply a convolutional recurrent neural network (CRNN) with attention module with log-scaled mel spectrum as a baseline system. In our experiments, we achieve an state-of-the-art of equal error rate (EER) of 0.10 on DCASE 2016 task4 dataset with mixup approach, outperforming the baseline system without data augmentation.Comment: submitted to the workshop of Detection and Classification of Acoustic Scenes and Events 2018 (DCASE 2018), 19-20 November 2018, Surrey, U

    Pathological Evidence Exploration in Deep Retinal Image Diagnosis

    Full text link
    Though deep learning has shown successful performance in classifying the label and severity stage of certain disease, most of them give few evidence on how to make prediction. Here, we propose to exploit the interpretability of deep learning application in medical diagnosis. Inspired by Koch's Postulates, a well-known strategy in medical research to identify the property of pathogen, we define a pathological descriptor that can be extracted from the activated neurons of a diabetic retinopathy detector. To visualize the symptom and feature encoded in this descriptor, we propose a GAN based method to synthesize pathological retinal image given the descriptor and a binary vessel segmentation. Besides, with this descriptor, we can arbitrarily manipulate the position and quantity of lesions. As verified by a panel of 5 licensed ophthalmologists, our synthesized images carry the symptoms that are directly related to diabetic retinopathy diagnosis. The panel survey also shows that our generated images is both qualitatively and quantitatively superior to existing methods.Comment: to appear in AAAI (2019). The first two authors contributed equally to the paper. Corresponding Author: Feng L

    A Unified Framework for Multi-intent Spoken Language Understanding with prompting

    Full text link
    Multi-intent Spoken Language Understanding has great potential for widespread implementation. Jointly modeling Intent Detection and Slot Filling in it provides a channel to exploit the correlation between intents and slots. However, current approaches are apt to formulate these two sub-tasks differently, which leads to two issues: 1) It hinders models from effective extraction of shared features. 2) Pretty complicated structures are involved to enhance expression ability while causing damage to the interpretability of frameworks. In this work, we describe a Prompt-based Spoken Language Understanding (PromptSLU) framework, to intuitively unify two sub-tasks into the same form by offering a common pre-trained Seq2Seq model. In detail, ID and SF are completed by concisely filling the utterance into task-specific prompt templates as input, and sharing output formats of key-value pairs sequence. Furthermore, variable intents are predicted first, then naturally embedded into prompts to guide slot-value pairs inference from a semantic perspective. Finally, we are inspired by prevalent multi-task learning to introduce an auxiliary sub-task, which helps to learn relationships among provided labels. Experiment results show that our framework outperforms several state-of-the-art baselines on two public datasets.Comment: Work in progres

    Smart Education in Action: AI-Based Quality Assessment of Journalism and Media Teaching Practices using the Neutrosophic Cosine Similarity Measure

    Get PDF
    The evolution of artificial intelligence (AI) has sparked a transformative shift in journalism and media education, urging academic institutions to reevaluate traditional pedagogies. As journalism integrates with intelligent tools—from automated news writing to data-driven content curation, there is a rising demand to align teaching quality with emerging media practices. This study presents a comprehensive decision-making framework to assess the quality of teaching practices in journalism and communication programs, addressing the growing intersection of AI and media education. Through multi-criteria decision-making (MCDM) techniques, this research captures the complexities of integrating smart technologies in curriculum delivery, student engagement, and pedagogical innovation. Eight evaluation criteria and eight representative teaching models or institutions are analyzed to offer a holistic view of intelligent teaching quality. The Neutrosophic Cosine Similarity Measure is used to deal with uncertainty information. Two MCDM methods are used, such as DEMATEL method to show the criteria weights and the MARCOS method to rank the alternatives

    Unveiling the Power of Self-supervision for Multi-view Multi-human Association and Tracking

    Full text link
    Multi-view multi-human association and tracking (MvMHAT), is a new but important problem for multi-person scene video surveillance, aiming to track a group of people over time in each view, as well as to identify the same person across different views at the same time, which is different from previous MOT and multi-camera MOT tasks only considering the over-time human tracking. This way, the videos for MvMHAT require more complex annotations while containing more information for self learning. In this work, we tackle this problem with a self-supervised learning aware end-to-end network. Specifically, we propose to take advantage of the spatial-temporal self-consistency rationale by considering three properties of reflexivity, symmetry and transitivity. Besides the reflexivity property that naturally holds, we design the self-supervised learning losses based on the properties of symmetry and transitivity, for both appearance feature learning and assignment matrix optimization, to associate the multiple humans over time and across views. Furthermore, to promote the research on MvMHAT, we build two new large-scale benchmarks for the network training and testing of different algorithms. Extensive experiments on the proposed benchmarks verify the effectiveness of our method. We have released the benchmark and code to the public

    Silence as a quiet strategy:understanding the consequences of workplace ostracism through the lens of sociometer theory

    Get PDF
    Existing research has predominantly framed defensive silence as an avoidance response to interpersonal mistreatments. Moving beyond this view, this study theorizes defensive silence as a proactive strategy for managing interpersonal relationships through the lens of sociometer theory. We posit that workplace ostracism will reduce employees’ organization-based self-esteem (OBSE), which in turn increases their subsequent defensive silence to avert further damage to relationships. In addition, we also expect a moderating role of the sense of power in mitigating the negative impact of workplace ostracism on OBSE. Based on the multi-wave, multi-source data of 345 employees and their 82 immediate supervisors, we tested all the hypotheses. Results from multilevel modeling indicated that OBSE mediated the indirect effect of workplace ostracism on defensive silence, and also supported the moderation role of sense of power. Our theoretical model provides a novel perspective that deepens the understanding of defensive silence and suggests implications for managerial practices.</p

    The Ultrafast Kerr Effect in Anisotropic and Dispersive Media

    Full text link
    The ultrafast optical Kerr effect (OKE) is widely used to investigate the structural dynamics and interactions of liquids, solutions and solids by observing their intrinsic nonlinear temporal responses through nearly-collinear four-wave mixing (FWM). Non-degenerate mixing schemes allow for background free detection and can provide information on the interplay between a material's internal degrees of freedom. Here we show a source of temporal dynamics in the OKE signal that is not reflective of the intrinsic nonlinear response but arises from group index and momentum mismatch. It is observed in two-color experiments on condensed media with sizable spectral dispersion, a common property near an optical resonance. In particular birefringence in crystalline solids is able to entirely change the character of the OKE signal via the off-diagonal tensor elements of the nonlinear susceptibility. We develop a detailed description of the phase-mismatched ultrafast OKE and show how to extract quantitative information on the spectrally resolved birefringence and group index from time-resolved experiments in one and two dimensions.Comment: 12 pages, 6 figure
    corecore