1,593,214 research outputs found

    Learning to Identify Ambiguous and Misleading News Headlines

    Full text link
    Accuracy is one of the basic principles of journalism. However, it is increasingly hard to manage due to the diversity of news media. Some editors of online news tend to use catchy headlines which trick readers into clicking. These headlines are either ambiguous or misleading, degrading the reading experience of the audience. Thus, identifying inaccurate news headlines is a task worth studying. Previous work names these headlines "clickbaits" and mainly focus on the features extracted from the headlines, which limits the performance since the consistency between headlines and news bodies is underappreciated. In this paper, we clearly redefine the problem and identify ambiguous and misleading headlines separately. We utilize class sequential rules to exploit structure information when detecting ambiguous headlines. For the identification of misleading headlines, we extract features based on the congruence between headlines and bodies. To make use of the large unlabeled data set, we apply a co-training method and gain an increase in performance. The experiment results show the effectiveness of our methods. Then we use our classifiers to detect inaccurate headlines crawled from different sources and conduct a data analysis.Comment: Accepted by IJCAI 201

    Protein folding tames chaos

    Full text link
    Protein folding produces characteristic and functional three-dimensional structures from unfolded polypeptides or disordered coils. The emergence of extraordinary complexity in the protein folding process poses astonishing challenges to theoretical modeling and computer simulations. The present work introduces molecular nonlinear dynamics (MND), or molecular chaotic dynamics, as a theoretical framework for describing and analyzing protein folding. We unveil the existence of intrinsically low dimensional manifolds (ILDMs) in the chaotic dynamics of folded proteins. Additionally, we reveal that the transition from disordered to ordered conformations in protein folding increases the transverse stability of the ILDM. Stated differently, protein folding reduces the chaoticity of the nonlinear dynamical system, and a folded protein has the best ability to tame chaos. Additionally, we bring to light the connection between the ILDM stability and the thermodynamic stability, which enables us to quantify the disorderliness and relative energies of folded, misfolded and unfolded protein states. Finally, we exploit chaos for protein flexibility analysis and develop a robust chaotic algorithm for the prediction of Debye-Waller factors, or temperature factors, of protein structures

    Determining the luminosity function of Swift long gamma-ray bursts with pseudo-redshifts

    Full text link
    The determination of luminosity function (LF) of gamma-ray bursts (GRBs) is of an important role for the cosmological applications of the GRBs, which is however hindered seriously by some selection effects due to redshift measurements. In order to avoid these selection effects, we suggest to calculate pseudo-redshifts for Swift GRBs according to the empirical L-E_p relationship. Here, such a LEpL-E_p relationship is determined by reconciling the distributions of pseudo- and real redshifts of redshift-known GRBs. The values of E_p taken from Butler's GRB catalog are estimated with Bayesian statistics rather than observed. Using the GRB sample with pseudo-redshifts of a relatively large number, we fit the redshift-resolved luminosity distributions of the GRBs with a broken-power-law LF. The fitting results suggest that the LF could evolve with redshift by a redshift-dependent break luminosity, e.g., L_b=1.2\times10^{51}(1+z)^2\rm erg s^{-1}. The low- and high-luminosity indices are constrained to 0.8 and 2.0, respectively. It is found that the proportional coefficient between GRB event rate and star formation rate should correspondingly decrease with increasing redshifts.Comment: 5 pages, 5 figures, accepted for publication in ApJ

    Homogenous Ensemble Phonotactic Language Recognition Based on SVM Supervector Reconstruction

    Get PDF
    Currently, acoustic spoken language recognition (SLR) and phonotactic SLR systems are widely used language recognition systems. To achieve better performance, researchers combine multiple subsystems with the results often much better than a single SLR system. Phonotactic SLR subsystems may vary in the acoustic features vectors or include multiple language-specific phone recognizers and different acoustic models. These methods achieve good performance but usually compute at high computational cost. In this paper, a new diversification for phonotactic language recognition systems is proposed using vector space models by support vector machine (SVM) supervector reconstruction (SSR). In this architecture, the subsystems share the same feature extraction, decoding, and N-gram counting preprocessing steps, but model in a different vector space by using the SSR algorithm without significant additional computation. We term this a homogeneous ensemble phonotactic language recognition (HEPLR) system. The system integrates three different SVM supervector reconstruction algorithms, including relative SVM supervector reconstruction, functional SVM supervector reconstruction, and perturbing SVM supervector reconstruction. All of the algorithms are incorporated using a linear discriminant analysis-maximum mutual information (LDA-MMI) backend for improving language recognition evaluation (LRE) accuracy. Evaluated on the National Institute of Standards and Technology (NIST) LRE 2009 task, the proposed HEPLR system achieves better performance than a baseline phone recognition-vector space modeling (PR-VSM) system with minimal extra computational cost. The performance of the HEPLR system yields 1.39%, 3.63%, and 14.79% equal error rate (EER), representing 6.06%, 10.15%, and 10.53% relative improvements over the baseline system, respectively, for the 30-, 10-, and 3-s test conditions
    corecore