164 research outputs found
Validating Perception and Use of Mobile Social Network Service: A Cross-Cultural Comparison Study
The rapid developments of Social Network Services (SNS) and mobile technology have offered opportunities to revisitseminal theories of technology use in today’s socio-technical environment. Mobile technology based SNSprovides various service channels that are highly correlated with their respective service contexts, in which culturalinfluences are omnipresent. Investigating use intention in a cross-cultural mobile SNS study implies new theoreticaldiscoveries and managerial practices. This research in progress (RIP) paper suggests that important distinctions existbetween U.S. and Chinese SNS subscribers in terms of SNS use and perceptions. Taking the perspective of SNSusers, we strive to explore the effects of cultural factors (e.g., collectivism vs. individualism) on trust formulation,degree of social awareness, and privacy concern. We examine the antecedents and consequences of legacyconstructs (e.g., technology acceptance and social capital) in SNS. This paper describes the research design to testthe research hypotheses. A triangulation methodology (i.e., qualitative and quantitative methods) is desired andproposed in the design. A discussion of research implications and business practices is also included in this RIPpaper
Large-scale collection and annotation of gene models for date palm (Phoenix dactylifera, L.)
The date palm (Phoenix dactylifera L.), famed for its sugar-rich fruits (dates) and cultivated by humans since 4,000 B.C., is an economically important crop in the Middle East, Northern Africa, and increasingly other places where climates are suitable. Despite a long history of human cultivation, the understanding of P. dactylifera genetics and molecular biology are rather limited, hindered by lack of basic data in high quality from genomics and transcriptomics. Here we report a large-scale effort in generating gene models (assembled expressed sequence tags or ESTs and mapped to a genome assembly) for P. dactylifera, using the long-read pyrosequencing platform (Roche/454 GS FLX Titanium) in high coverage. We built fourteen cDNA libraries from different P. dactylifera tissues (cultivar Khalas) and acquired 15,778,993 raw sequencing reads—about one million sequencing reads per library—and the pooled sequences were assembled into 67,651 non-redundant contigs and 301,978 singletons. We annotated 52,725 contigs based on the plant databases and 45 contigs based on functional domains referencing to the Pfam database. From the annotated contigs, we assigned GO (Gene Ontology) terms to 36,086 contigs and KEGG pathways to 7,032 contigs. Our comparative analysis showed that 70.6 % (47,930), 69.4 % (47,089), 68.4 % (46,441), and 69.3 % (47,048) of the P. dactylifera gene models are shared with rice, sorghum, Arabidopsis, and grapevine, respectively. We also assigned our gene models into house-keeping and tissue-specific genes based on their tissue specificity. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1007/s11103-012-9924-z) contains supplementary material, which is available to authorized users
Improving the Robustness of Summarization Systems with Dual Augmentation
A robust summarization system should be able to capture the gist of the
document, regardless of the specific word choices or noise in the input. In
this work, we first explore the summarization models' robustness against
perturbations including word-level synonym substitution and noise. To create
semantic-consistent substitutes, we propose a SummAttacker, which is an
efficient approach to generating adversarial samples based on language models.
Experimental results show that state-of-the-art summarization models have a
significant decrease in performance on adversarial and noisy test sets. Next,
we analyze the vulnerability of the summarization systems and explore improving
the robustness by data augmentation. Specifically, the first brittleness factor
we found is the poor understanding of infrequent words in the input.
Correspondingly, we feed the encoder with more diverse cases created by
SummAttacker in the input space. The other factor is in the latent space, where
the attacked inputs bring more variations to the hidden states. Hence, we
construct adversarial decoder input and devise manifold softmixing operation in
hidden space to introduce more diversity. Experimental results on Gigaword and
CNN/DM datasets demonstrate that our approach achieves significant improvements
over strong baselines and exhibits higher robustness on noisy, attacked, and
clean datasets.Comment: 10 pages, 6 figures, ACL 2023 main coferenc
On-Device Recommender Systems: A Comprehensive Survey
Recommender systems have been widely deployed in various real-world
applications to help users identify content of interest from massive amounts of
information. Traditional recommender systems work by collecting user-item
interaction data in a cloud-based data center and training a centralized model
to perform the recommendation service. However, such cloud-based recommender
systems (CloudRSs) inevitably suffer from excessive resource consumption,
response latency, as well as privacy and security risks concerning both data
and models. Recently, driven by the advances in storage, communication, and
computation capabilities of edge devices, there has been a shift of focus from
CloudRSs to on-device recommender systems (DeviceRSs), which leverage the
capabilities of edge devices to minimize centralized data storage requirements,
reduce the response latency caused by communication overheads, and enhance user
privacy and security by localizing data processing and model training. Despite
the rapid rise of DeviceRSs, there is a clear absence of timely literature
reviews that systematically introduce, categorize and contrast these methods.
To bridge this gap, we aim to provide a comprehensive survey of DeviceRSs,
covering three main aspects: (1) the deployment and inference of DeviceRSs (2)
the training and update of DeviceRSs (3) the security and privacy of DeviceRSs.
Furthermore, we provide a fine-grained and systematic taxonomy of the methods
involved in each aspect, followed by a discussion regarding challenges and
future research directions. This is the first comprehensive survey on DeviceRSs
that covers a spectrum of tasks to fit various needs. We believe this survey
will help readers effectively grasp the current research status in this field,
equip them with relevant technical foundations, and stimulate new research
ideas for developing DeviceRSs
Pain-Related Factors and Their Impact on Quality of Life in Chinese Patients With Amyotrophic Lateral Sclerosis
ObjectivesPain is considered a common symptom in amyotrophic lateral sclerosis (ALS). However, the results of studies on pain in ALS are limited and inconsistent. The aim of our study was to comprehensively evaluate the potential factors of pain and effects on quality of life (QoL) in patients with ALS from China.Participants and MethodsPatients were eligible if they fulfilled the criteria of probable and definitive ALS according to the revised El Escorial criteria. Pain was assessed by the Brief Pain Inventory (BPI). Disease severity, sleep quality, fatigue, anxiety, depression, and quality of life (QoL) were evaluated in ALS patients by the ALS Functional Rating Scale-revised (ALSFRS-R) and ALS severity scale (ALSSS), Pittsburgh Sleep Quality Index (PSQI), Fatigue Severity Scale (FSS), Hamilton Anxiety Rating Scale (HARS), Hamilton Depression Rating Scale (HDRS) and McGill Quality of Life Questionnaire (MQOL). Then, the clinical characteristics of ALS patients with pain were compared with those without pain. Last, associated factors of pain, as well as impact on QoL in Chinese ALS patients, were assessed.ResultsA total of 86 ALS patients were included. ALS patients with pain tended to have higher FSS scores and poorer QoL. The FSS score and ALSSS [lower extremity (LE) + upper extremity (UE)] were associated with pain in ALS patients. The ALS Functional Rating Scale-revised (ALSFRS-R), Pain Severity Index (PSI), HARS and HDRS scores were significantly associated with both the physical and psychological domains of QoL.ConclusionOur study was the first to comprehensively evaluate factors associated with pain in Chinese ALS patients, finding that fatigue can be a risk factor for pain and ALSSS (LE + UE) score was related with pain intensity. Additionally, we identified the adverse effects of ALSSS (LE + UE), HARS and HDRS scores on QoL in Chinese ALS patients
The Disequilibrium of Nucleosomes Distribution along Chromosomes Plays a Functional and Evolutionarily Role in Regulating Gene Expression
To further understand the relationship between nucleosome-space occupancy (NO) and global transcriptional activity in mammals, we acquired a set of genome-wide nucleosome distribution and transcriptome data from the mouse cerebrum and testis based on ChIP (H3)-seq and RNA-seq, respectively. We identified a nearly consistent NO patterns among three mouse tissues—cerebrum, testis, and ESCs—and found, through clustering analysis for transcriptional activation, that the NO variations among chromosomes are closely associated with distinct expression levels between house-keeping (HK) genes and tissue-specific (TS) genes. Both TS and HK genes form clusters albeit the obvious majority. This feature implies that NO patterns, i.e. nucleosome binding and clustering, are coupled with gene clustering that may be functionally and evolutionarily conserved in regulating gene expression among different cell types
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
The rapid development of open-source large language models (LLMs) has been
truly remarkable. However, the scaling law described in previous literature
presents varying conclusions, which casts a dark cloud over scaling LLMs. We
delve into the study of scaling laws and present our distinctive findings that
facilitate scaling of large scale models in two commonly used open-source
configurations, 7B and 67B. Guided by the scaling laws, we introduce DeepSeek
LLM, a project dedicated to advancing open-source language models with a
long-term perspective. To support the pre-training phase, we have developed a
dataset that currently consists of 2 trillion tokens and is continuously
expanding. We further conduct supervised fine-tuning (SFT) and Direct
Preference Optimization (DPO) on DeepSeek LLM Base models, resulting in the
creation of DeepSeek Chat models. Our evaluation results demonstrate that
DeepSeek LLM 67B surpasses LLaMA-2 70B on various benchmarks, particularly in
the domains of code, mathematics, and reasoning. Furthermore, open-ended
evaluations reveal that DeepSeek LLM 67B Chat exhibits superior performance
compared to GPT-3.5
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
We present DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model
characterized by economical training and efficient inference. It comprises 236B
total parameters, of which 21B are activated for each token, and supports a
context length of 128K tokens. DeepSeek-V2 adopts innovative architectures
including Multi-head Latent Attention (MLA) and DeepSeekMoE. MLA guarantees
efficient inference through significantly compressing the Key-Value (KV) cache
into a latent vector, while DeepSeekMoE enables training strong models at an
economical cost through sparse computation. Compared with DeepSeek 67B,
DeepSeek-V2 achieves significantly stronger performance, and meanwhile saves
42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum
generation throughput to 5.76 times. We pretrain DeepSeek-V2 on a high-quality
and multi-source corpus consisting of 8.1T tokens, and further perform
Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to fully unlock
its potential. Evaluation results show that, even with only 21B activated
parameters, DeepSeek-V2 and its chat versions still achieve top-tier
performance among open-source models
- …
