816 research outputs found

    Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

    Full text link
    The remarkable capabilities of large-scale language models, such as ChatGPT, in text generation have incited awe and spurred researchers to devise detectors to mitigate potential risks, including misinformation, phishing, and academic dishonesty. Despite this, most previous studies, including HC3, have been predominantly geared towards creating detectors that differentiate between purely ChatGPT-generated texts and human-authored texts. This approach, however, fails to work on discerning texts generated through human-machine collaboration, such as ChatGPT-polished texts. Addressing this gap, we introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts), facilitating the construction of more robust detectors. It diverges from extant corpora by comprising pairs of human-written and ChatGPT-polished abstracts instead of purely ChatGPT-generated texts. Additionally, we propose the "Polish Ratio" method, an innovative measure of ChatGPT's involvement in text generation based on editing distance. It provides a mechanism to measure the degree of human originality in the resulting text. Our experimental results show our proposed model has better robustness on the HPPT dataset and two existing datasets (HC3 and CDB). Furthermore, the "Polish Ratio" we proposed offers a more comprehensive explanation by quantifying the degree of ChatGPT involvement, which indicates that a Polish Ratio value greater than 0.2 signifies ChatGPT involvement and a value exceeding 0.6 implies that ChatGPT generates most of the text

    Exploration in association between vitamin D and cutaneous melanoma and explainable machine learning prediction

    Get PDF
    ObjectiveThis study aims to examine association between vitamin D with melanoma and develop an explainable machine learning model.MethodsFor this study, relevant data were downloaded from the CDC’s National Health and Nutrition Examination Survey (NHANES) program, for the three survey cycles 2011-2012, 2013–2014 and 2015-2016. Self-reported melanoma data, serum vitamin D levels, and other covariates were downloaded and analyzed. Analysis of variance in this study was performed using t-tests and chi-square tests, modelling was performed using logistic regression based on NHANES weights, and other risk factors were analyzed using forest plots. Ten machine learning models were compared and XGboost was selected for the melanoma prediction.ResultsIn this study, logistic regression analysis revealed a protective effect of higher vitamin D levels in melanoma, the ORs were much less than 1 for Q2 (OR=0.97, 95% CI (0.44, 0.98)), Q3 (OR=0.71, 95% CI (0.65, 0.92)), and Q4 (OR=0.32, 95% CI (0.55, 0.81)). Meanwhile, forest plot analysis showed that vitamin D, the number of sunburns in the past year, advanced age, Caucasian, education some college, single and unmarried, smoking, diabetes and hypertension, were all statistically significant. The OR was higher in men than in women, with Q4 values of 0.31 (95% CI: 0.18–0.51) for men and 0.29 (95% CI: 0.15–0.45) for women. OR was higher in the senior patients than in the non-senior group, with Q4 (OR=0.53, 95% CI (0.23, 0.73)). An explainable XGBoost model had AUC 0.906, and in the model vitamin D had main contribution to the model.ConclusionIn conclusion, this study concluded that vitamin D decreases melanoma risk based on a larger sample and multi-covariate analysis. Female and young people received high protection from vitamin D in melanoma. XGBoost can accurately prediction the possibility of melanoma based on vitamin D

    A Double Maximization Approach for Optimizing the LM Rate of Mismatched Decoding

    Full text link
    An approach is established for maximizing the Lower bound on the Mismatch capacity (hereafter abbreviated as LM rate), a key performance bound in mismatched decoding, by optimizing the channel input probability distribution. Under a fixed channel input probability distribution, the computation of the corresponding LM rate is a convex optimization problem. When optimizing the channel input probability distribution, however, the corresponding optimization problem adopts a max-min formulation, which is generally non-convex and is intractable with standard approaches. To solve this problem, a novel dual form of the LM rate is proposed, thereby transforming the max-min formulation into an equivalent double maximization formulation. This new formulation leads to a maximization problem setup wherein each individual optimization direction is convex. Consequently, an alternating maximization algorithm is established to solve the resultant maximization problem setup. Each step of the algorithm only involves a closed-form iteration, which is efficiently implemented with standard optimization procedures. Numerical experiments show the proposed approach for optimizing the LM rate leads to noticeable rate gains

    Common variants at 2q11.2, 8q21.3, and 11q13.2 are associated with major mood disorders

    Get PDF
    Bipolar disorder (BPD) and major depressive disorder (MDD) are primary major mood disorders. Recent studies suggest that they share certain psychopathological features and common risk genes, but unraveling the full genetic architecture underlying the risk of major mood disorders remains an important scientific task. The public genome-wide association study (GWAS) data sets offer the opportunity to examine this topic by utilizing large amounts of combined genetic data, which should ultimately allow a better understanding of the onset and development of these illnesses. Genome-wide meta-analysis was performed by combining two GWAS data sets on BPD and MDD (19,637 cases and 18,083 controls), followed by replication analyses for the loci of interest in independent 12,364 cases and 76,633 controls from additional samples that were not included in the two GWAS data sets. The single-nucleotide polymorphism (SNP) rs10791889 at 11q13.2 was significant in both discovery and replication samples. When combining all samples, this SNP and multiple other SNPs at 2q11.2 (rs717454), 8q21.3 (rs10103191), and 11q13.2 (rs2167457) exhibited genome-wide significant association with major mood disorders. The SNPs in 2q11.2 and 8q21.3 were novel risk SNPs that were not previously reported, and SNPs at 11q13.2 were in high LD with potential BPD risk SNPs implicated in a previous GWAS. The genome-wide significant loci at 2q11.2 and 11q13.2 exhibited strong effects on the mRNA expression of certain nearby genes in cerebellum. In conclusion, we have identified several novel loci associated with major mood disorders, adding further support for shared genetic risk between BPD and MDD. Our study highlights the necessity and importance of mining public data sets to explore risk genes for complex diseases such as mood disorders

    Genomic characterization of ribitol teichoic acid synthesis in Staphylococcus aureus: genes, genomic organization and gene duplication

    Get PDF
    BACKGROUND: Staphylococcus aureus or MRSA (Methicillin Resistant S. aureus), is an acquired pathogen and the primary cause of nosocomial infections worldwide. In S. aureus, teichoic acid is an essential component of the cell wall, and its biosynthesis is not yet well characterized. Studies in Bacillus subtilis have discovered two different pathways of teichoic acid biosynthesis, in two strains W23 and 168 respectively, namely teichoic acid ribitol (tar) and teichoic acid glycerol (tag). The genes involved in these two pathways are also characterized, tarA, tarB, tarD, tarI, tarJ, tarK, tarL for the tar pathway, and tagA, tagB, tagD, tagE, tagF for the tag pathway. With the genome sequences of several MRSA strains: Mu50, MW2, N315, MRSA252, COL as well as methicillin susceptible strain MSSA476 available, a comparative genomic analysis was performed to characterize teichoic acid biosynthesis in these S. aureus strains. RESULTS: We identified all S. aureus tar and tag gene orthologs in the selected S. aureus strains which would contribute to teichoic acids sythesis.Based on our identification of genes orthologous to tarI, tarJ, tarL, which are specific to tar pathway in B. subtilis W23, we also concluded that tar is the major teichoic acid biogenesis pathway in S. aureus. Further analyses indicated that the S. aureus tar genes, different from the divergon organization in B. subtilis, are organized into several clusters in cis. Most interesting, compared with genes in B. subtilis tar pathway, the S. aureus tar specific genes (tarI,J,L) are duplicated in all six S. aureus genomes. CONCLUSION: In the S. aureus strains we analyzed, tar (teichoic acid ribitol) is the main teichoic acid biogenesis pathway. The tar genes are organized into several genomic groups in cis and the genes specific to tar (relative to tag): tarI, tarJ, tarL are duplicated. The genomic organization of the S. aureus tar pathway suggests their regulations are different when compared to B. subtilis tar or tag pathway, which are grouped in two operons in a divergon structure

    ClickVOS: Click Video Object Segmentation

    Full text link
    Video Object Segmentation (VOS) task aims to segment objects in videos. However, previous settings either require time-consuming manual masks of target objects at the first frame during inference or lack the flexibility to specify arbitrary objects of interest. To address these limitations, we propose the setting named Click Video Object Segmentation (ClickVOS) which segments objects of interest across the whole video according to a single click per object in the first frame. And we provide the extended datasets DAVIS-P and YouTubeVOSP that with point annotations to support this task. ClickVOS is of significant practical applications and research implications due to its only 1-2 seconds interaction time for indicating an object, comparing annotating the mask of an object needs several minutes. However, ClickVOS also presents increased challenges. To address this task, we propose an end-to-end baseline approach named called Attention Before Segmentation (ABS), motivated by the attention process of humans. ABS utilizes the given point in the first frame to perceive the target object through a concise yet effective segmentation attention. Although the initial object mask is possibly inaccurate, in our ABS, as the video goes on, the initially imprecise object mask can self-heal instead of deteriorating due to error accumulation, which is attributed to our designed improvement memory that continuously records stable global object memory and updates detailed dense memory. In addition, we conduct various baseline explorations utilizing off-the-shelf algorithms from related fields, which could provide insights for the further exploration of ClickVOS. The experimental results demonstrate the superiority of the proposed ABS approach. Extended datasets and codes will be available at https://github.com/PinxueGuo/ClickVOS

    OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework

    Full text link
    Contemporary Video Object Segmentation (VOS) approaches typically consist stages of feature extraction, matching, memory management, and multiple objects aggregation. Recent advanced models either employ a discrete modeling for these components in a sequential manner, or optimize a combined pipeline through substructure aggregation. However, these existing explicit staged approaches prevent the VOS framework from being optimized as a unified whole, leading to the limited capacity and suboptimal performance in tackling complex videos. In this paper, we propose OneVOS, a novel framework that unifies the core components of VOS with All-in-One Transformer. Specifically, to unify all aforementioned modules into a vision transformer, we model all the features of frames, masks and memory for multiple objects as transformer tokens, and integrally accomplish feature extraction, matching and memory management of multiple objects through the flexible attention mechanism. Furthermore, a Unidirectional Hybrid Attention is proposed through a double decoupling of the original attention operation, to rectify semantic errors and ambiguities of stored tokens in OneVOS framework. Finally, to alleviate the storage burden and expedite inference, we propose the Dynamic Token Selector, which unveils the working mechanism of OneVOS and naturally leads to a more efficient version of OneVOS. Extensive experiments demonstrate the superiority of OneVOS, achieving state-of-the-art performance across 7 datasets, particularly excelling in complex LVOS and MOSE datasets with 70.1% and 66.4% J&FJ \& F scores, surpassing previous state-of-the-art methods by 4.2% and 7.0%, respectively. And our code will be available for reproducibility and further research.Comment: 19 pages, 7 figure
    corecore