816 research outputs found
Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text
The remarkable capabilities of large-scale language models, such as ChatGPT,
in text generation have incited awe and spurred researchers to devise detectors
to mitigate potential risks, including misinformation, phishing, and academic
dishonesty. Despite this, most previous studies, including HC3, have been
predominantly geared towards creating detectors that differentiate between
purely ChatGPT-generated texts and human-authored texts. This approach,
however, fails to work on discerning texts generated through human-machine
collaboration, such as ChatGPT-polished texts. Addressing this gap, we
introduce a novel dataset termed HPPT (ChatGPT-polished academic abstracts),
facilitating the construction of more robust detectors. It diverges from extant
corpora by comprising pairs of human-written and ChatGPT-polished abstracts
instead of purely ChatGPT-generated texts. Additionally, we propose the "Polish
Ratio" method, an innovative measure of ChatGPT's involvement in text
generation based on editing distance. It provides a mechanism to measure the
degree of human originality in the resulting text. Our experimental results
show our proposed model has better robustness on the HPPT dataset and two
existing datasets (HC3 and CDB). Furthermore, the "Polish Ratio" we proposed
offers a more comprehensive explanation by quantifying the degree of ChatGPT
involvement, which indicates that a Polish Ratio value greater than 0.2
signifies ChatGPT involvement and a value exceeding 0.6 implies that ChatGPT
generates most of the text
Exploration in association between vitamin D and cutaneous melanoma and explainable machine learning prediction
ObjectiveThis study aims to examine association between vitamin D with melanoma and develop an explainable machine learning model.MethodsFor this study, relevant data were downloaded from the CDC’s National Health and Nutrition Examination Survey (NHANES) program, for the three survey cycles 2011-2012, 2013–2014 and 2015-2016. Self-reported melanoma data, serum vitamin D levels, and other covariates were downloaded and analyzed. Analysis of variance in this study was performed using t-tests and chi-square tests, modelling was performed using logistic regression based on NHANES weights, and other risk factors were analyzed using forest plots. Ten machine learning models were compared and XGboost was selected for the melanoma prediction.ResultsIn this study, logistic regression analysis revealed a protective effect of higher vitamin D levels in melanoma, the ORs were much less than 1 for Q2 (OR=0.97, 95% CI (0.44, 0.98)), Q3 (OR=0.71, 95% CI (0.65, 0.92)), and Q4 (OR=0.32, 95% CI (0.55, 0.81)). Meanwhile, forest plot analysis showed that vitamin D, the number of sunburns in the past year, advanced age, Caucasian, education some college, single and unmarried, smoking, diabetes and hypertension, were all statistically significant. The OR was higher in men than in women, with Q4 values of 0.31 (95% CI: 0.18–0.51) for men and 0.29 (95% CI: 0.15–0.45) for women. OR was higher in the senior patients than in the non-senior group, with Q4 (OR=0.53, 95% CI (0.23, 0.73)). An explainable XGBoost model had AUC 0.906, and in the model vitamin D had main contribution to the model.ConclusionIn conclusion, this study concluded that vitamin D decreases melanoma risk based on a larger sample and multi-covariate analysis. Female and young people received high protection from vitamin D in melanoma. XGBoost can accurately prediction the possibility of melanoma based on vitamin D
A Double Maximization Approach for Optimizing the LM Rate of Mismatched Decoding
An approach is established for maximizing the Lower bound on the Mismatch
capacity (hereafter abbreviated as LM rate), a key performance bound in
mismatched decoding, by optimizing the channel input probability distribution.
Under a fixed channel input probability distribution, the computation of the
corresponding LM rate is a convex optimization problem. When optimizing the
channel input probability distribution, however, the corresponding optimization
problem adopts a max-min formulation, which is generally non-convex and is
intractable with standard approaches. To solve this problem, a novel dual form
of the LM rate is proposed, thereby transforming the max-min formulation into
an equivalent double maximization formulation. This new formulation leads to a
maximization problem setup wherein each individual optimization direction is
convex. Consequently, an alternating maximization algorithm is established to
solve the resultant maximization problem setup. Each step of the algorithm only
involves a closed-form iteration, which is efficiently implemented with
standard optimization procedures. Numerical experiments show the proposed
approach for optimizing the LM rate leads to noticeable rate gains
Common variants at 2q11.2, 8q21.3, and 11q13.2 are associated with major mood disorders
Bipolar disorder (BPD) and major depressive disorder (MDD) are primary major mood disorders. Recent studies suggest that they share certain psychopathological features and common risk genes, but unraveling the full genetic architecture underlying the risk of major mood disorders remains an important scientific task. The public genome-wide association study (GWAS) data sets offer the opportunity to examine this topic by utilizing large amounts of combined genetic data, which should ultimately allow a better understanding of the onset and development of these illnesses. Genome-wide meta-analysis was performed by combining two GWAS data sets on BPD and MDD (19,637 cases and 18,083 controls), followed by replication analyses for the loci of interest in independent 12,364 cases and 76,633 controls from additional samples that were not included in the two GWAS data sets. The single-nucleotide polymorphism (SNP) rs10791889 at 11q13.2 was significant in both discovery and replication samples. When combining all samples, this SNP and multiple other SNPs at 2q11.2 (rs717454), 8q21.3 (rs10103191), and 11q13.2 (rs2167457) exhibited genome-wide significant association with major mood disorders. The SNPs in 2q11.2 and 8q21.3 were novel risk SNPs that were not previously reported, and SNPs at 11q13.2 were in high LD with potential BPD risk SNPs implicated in a previous GWAS. The genome-wide significant loci at 2q11.2 and 11q13.2 exhibited strong effects on the mRNA expression of certain nearby genes in cerebellum. In conclusion, we have identified several novel loci associated with major mood disorders, adding further support for shared genetic risk between BPD and MDD. Our study highlights the necessity and importance of mining public data sets to explore risk genes for complex diseases such as mood disorders
Genomic characterization of ribitol teichoic acid synthesis in Staphylococcus aureus: genes, genomic organization and gene duplication
BACKGROUND: Staphylococcus aureus or MRSA (Methicillin Resistant S. aureus), is an acquired pathogen and the primary cause of nosocomial infections worldwide. In S. aureus, teichoic acid is an essential component of the cell wall, and its biosynthesis is not yet well characterized. Studies in Bacillus subtilis have discovered two different pathways of teichoic acid biosynthesis, in two strains W23 and 168 respectively, namely teichoic acid ribitol (tar) and teichoic acid glycerol (tag). The genes involved in these two pathways are also characterized, tarA, tarB, tarD, tarI, tarJ, tarK, tarL for the tar pathway, and tagA, tagB, tagD, tagE, tagF for the tag pathway. With the genome sequences of several MRSA strains: Mu50, MW2, N315, MRSA252, COL as well as methicillin susceptible strain MSSA476 available, a comparative genomic analysis was performed to characterize teichoic acid biosynthesis in these S. aureus strains. RESULTS: We identified all S. aureus tar and tag gene orthologs in the selected S. aureus strains which would contribute to teichoic acids sythesis.Based on our identification of genes orthologous to tarI, tarJ, tarL, which are specific to tar pathway in B. subtilis W23, we also concluded that tar is the major teichoic acid biogenesis pathway in S. aureus. Further analyses indicated that the S. aureus tar genes, different from the divergon organization in B. subtilis, are organized into several clusters in cis. Most interesting, compared with genes in B. subtilis tar pathway, the S. aureus tar specific genes (tarI,J,L) are duplicated in all six S. aureus genomes. CONCLUSION: In the S. aureus strains we analyzed, tar (teichoic acid ribitol) is the main teichoic acid biogenesis pathway. The tar genes are organized into several genomic groups in cis and the genes specific to tar (relative to tag): tarI, tarJ, tarL are duplicated. The genomic organization of the S. aureus tar pathway suggests their regulations are different when compared to B. subtilis tar or tag pathway, which are grouped in two operons in a divergon structure
ClickVOS: Click Video Object Segmentation
Video Object Segmentation (VOS) task aims to segment objects in videos.
However, previous settings either require time-consuming manual masks of target
objects at the first frame during inference or lack the flexibility to specify
arbitrary objects of interest. To address these limitations, we propose the
setting named Click Video Object Segmentation (ClickVOS) which segments objects
of interest across the whole video according to a single click per object in
the first frame. And we provide the extended datasets DAVIS-P and YouTubeVOSP
that with point annotations to support this task. ClickVOS is of significant
practical applications and research implications due to its only 1-2 seconds
interaction time for indicating an object, comparing annotating the mask of an
object needs several minutes. However, ClickVOS also presents increased
challenges. To address this task, we propose an end-to-end baseline approach
named called Attention Before Segmentation (ABS), motivated by the attention
process of humans. ABS utilizes the given point in the first frame to perceive
the target object through a concise yet effective segmentation attention.
Although the initial object mask is possibly inaccurate, in our ABS, as the
video goes on, the initially imprecise object mask can self-heal instead of
deteriorating due to error accumulation, which is attributed to our designed
improvement memory that continuously records stable global object memory and
updates detailed dense memory. In addition, we conduct various baseline
explorations utilizing off-the-shelf algorithms from related fields, which
could provide insights for the further exploration of ClickVOS. The
experimental results demonstrate the superiority of the proposed ABS approach.
Extended datasets and codes will be available at
https://github.com/PinxueGuo/ClickVOS
OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework
Contemporary Video Object Segmentation (VOS) approaches typically consist
stages of feature extraction, matching, memory management, and multiple objects
aggregation. Recent advanced models either employ a discrete modeling for these
components in a sequential manner, or optimize a combined pipeline through
substructure aggregation. However, these existing explicit staged approaches
prevent the VOS framework from being optimized as a unified whole, leading to
the limited capacity and suboptimal performance in tackling complex videos. In
this paper, we propose OneVOS, a novel framework that unifies the core
components of VOS with All-in-One Transformer. Specifically, to unify all
aforementioned modules into a vision transformer, we model all the features of
frames, masks and memory for multiple objects as transformer tokens, and
integrally accomplish feature extraction, matching and memory management of
multiple objects through the flexible attention mechanism. Furthermore, a
Unidirectional Hybrid Attention is proposed through a double decoupling of the
original attention operation, to rectify semantic errors and ambiguities of
stored tokens in OneVOS framework. Finally, to alleviate the storage burden and
expedite inference, we propose the Dynamic Token Selector, which unveils the
working mechanism of OneVOS and naturally leads to a more efficient version of
OneVOS. Extensive experiments demonstrate the superiority of OneVOS, achieving
state-of-the-art performance across 7 datasets, particularly excelling in
complex LVOS and MOSE datasets with 70.1% and 66.4% scores, surpassing
previous state-of-the-art methods by 4.2% and 7.0%, respectively. And our code
will be available for reproducibility and further research.Comment: 19 pages, 7 figure
- …
