324 research outputs found
Health Effects Associated With Electronic Cigarette Use: Automated Mining of Online Forums.
BACKGROUND:Our previous infodemiological study was performed by manually mining health-effect data associated with electronic cigarettes (ECs) from online forums. Manual mining is time consuming and limits the number of posts that can be retrieved. OBJECTIVE:Our goal in this study was to automatically extract and analyze a large number (>41,000) of online forum posts related to the health effects associated with EC use between 2008 and 2015. METHODS:Data were annotated with medical concepts from the Unified Medical Language System using a modified version of the MetaMap tool. Of over 1.4 million posts, 41,216 were used to analyze symptoms (undiagnosed conditions) and disorders (physician-diagnosed terminology) associated with EC use. For each post, sentiment (positive, negative, and neutral) was also assigned. RESULTS:Symptom and disorder data were categorized into 12 organ systems or anatomical regions. Most posts on symptoms and disorders contained negative sentiment, and affected systems were similar across all years. Health effects were reported most often in the neurological, mouth and throat, and respiratory systems. The most frequently reported symptoms and disorders were headache (n=939), coughing (n=852), malaise (n=468), asthma (n=916), dehydration (n=803), and pharyngitis (n=565). In addition, users often reported linked symptoms (eg, coughing and headache). CONCLUSIONS:Online forums are a valuable repository of data that can be used to identify positive and negative health effects associated with EC use. By automating extraction of online information, we obtained more data than in our prior study, identified new symptoms and disorders associated with EC use, determined which systems are most frequently adversely affected, identified specific symptoms and disorders most commonly reported, and tracked health effects over 7 years
Relevance-based Retrieval on Hidden-Web Text Databases without Ranking Support
Many online or local data sources provide powerful querying mechanisms
but limited ranking capabilities. For instance, PubMed allows users to
submit highly expressive Boolean keyword queries, but ranks the query
results by date only. However, a user would typically prefer a ranking
by relevance, measured by an Information Retrieval (IR) ranking
function. The naive approach would be to submit a disjunctive query with
all query keywords, retrieve the returned documents, and then re-rank
them. Unfortunately, such an operation would be very expensive due to
the large number of results returned by disjunctive queries. In this
paper we present algorithms that return the top results for a query,
ranked according to an IR-style ranking function, while operating on top
of a source with a Boolean query interface with no ranking capabilities
(or a ranking capability of no interest to the end user). The algorithms
generate a series of conjunctive queries that return only documents that
are candidates for being highly ranked according to a relevance metric.
Our approach can also be applied to other settings where the ranking is
monotonic on a set of factors (query keywords in IR) and the source
query interface is a Boolean expression of these factors. Our
comprehensive experimental evaluation on the PubMed database and a TREC
dataset show that we achieve order of magnitude improvement compared to
the current baseline approaches.Vagelis Hristidis was partly supported by NSF grant IIS-0811922 and DHS
grant 2009-ST-062-000016. Panagiotis G.\ Ipeirotis was supported by the
National Science Foundation under Grant No. IIS-0643846
NORMY: Non-Uniform History Modeling for Open Retrieval Conversational Question Answering
Open Retrieval Conversational Question Answering (OrConvQA) answers a
question given a conversation as context and a document collection. A typical
OrConvQA pipeline consists of three modules: a Retriever to retrieve relevant
documents from the collection, a Reranker to rerank them given the question and
the context, and a Reader to extract an answer span. The conversational turns
can provide valuable context to answer the final query. State-of-the-art
OrConvQA systems use the same history modeling for all three modules of the
pipeline. We hypothesize this as suboptimal. Specifically, we argue that a
broader context is needed in the first modules of the pipeline to not miss
relevant documents, while a narrower context is needed in the last modules to
identify the exact answer span. We propose NORMY, the first unsupervised
non-uniform history modeling pipeline which generates the best conversational
history for each module. We further propose a novel Retriever for NORMY, which
employs keyphrase extraction on the conversation history, and leverages
passages retrieved in previous turns as additional context. We also created a
new dataset for OrConvQA, by expanding the doc2dial dataset. We implemented
various state-of-the-art history modeling techniques and comprehensively
evaluated them separately for each module of the pipeline on three datasets:
OR-QUAC, our doc2dial extension, and ConvMix. Our extensive experiments show
that NORMY outperforms the state-of-the-art in the individual modules and in
the end-to-end system.Comment: Accepted for publication at IEEE ICSC 202
Using Social Media to Explore Mental Health-Related Behaviors and Discussions among Young Adults
Abstract There have been recurring reports of online harassment and abuse among adolescents and young adults through Anonymous Social Networking websites (ASNs). We explored discussions related to social and mental health behaviors among college students, including cyberbullying on the popular ASN, Yik Yak. From April 6, 2016, to May 7, 2016, we collected anonymous conversations posted on Yik Yak at 19 universities in four different states. We found that prosocial messages were approximately five times as prevalent as bullying messages. Frequency of cyberbullying messages was positively associated with messages seeking emotional help. We found significant geographic variation in the frequency of messages offering supportive versus bullying messages. Across campuses bullying and political discussion were positively associated. Results suggest that ASN sites can be mined for real-time data about students’ mental health-related attitudes and behaviors. We discuss the implications for using this information in education and healthcare services
Progressive Query Expansion for Retrieval Over Cost-constrained Data Sources
Query expansion has been employed for a long time to improve the accuracy of
query retrievers. Earlier works relied on pseudo-relevance feedback (PRF)
techniques, which augment a query with terms extracted from documents retrieved
in a first stage. However, the documents may be noisy hindering the
effectiveness of the ranking. To avoid this, recent studies have instead used
Large Language Models (LLMs) to generate additional content to expand a query.
These techniques are prone to hallucination and also focus on the LLM usage
cost. However, the cost may be dominated by the retrieval in several important
practical scenarios, where the corpus is only available via APIs which charge a
fee per retrieved document. We propose combining classic PRF techniques with
LLMs and create a progressive query expansion algorithm ProQE that iteratively
expands the query as it retrieves more documents. ProQE is compatible with both
sparse and dense retrieval systems. Our experimental results on four retrieval
datasets show that ProQE outperforms state-of-the-art baselines by 37% and is
the most cost-effective
PAT-Questions: A Self-Updating Benchmark for Present-Anchored Temporal Question-Answering
Existing work on Temporal Question Answering (TQA) has predominantly focused
on questions anchored to specific timestamps or events (e.g. "Who was the US
president in 1970?"). Little work has studied questions whose temporal context
is relative to the present time (e.g. "Who was the previous US president?"). We
refer to this problem as Present-Anchored Temporal QA (PATQA). PATQA poses
unique challenges: (1) large language models (LLMs) may have outdated
knowledge, (2) complex temporal relationships (e.g. 'before', 'previous') are
hard to reason, (3) multi-hop reasoning may be required, and (4) the gold
answers of benchmarks must be continuously updated. To address these
challenges, we introduce the PAT-Questions benchmark, which includes single and
multi-hop temporal questions. The answers in PAT-Questions can be automatically
refreshed by re-running SPARQL queries on a knowledge graph, if available. We
evaluate several state-of-the-art LLMs and a SOTA temporal reasoning model
(TEMPREASON-T5) on PAT-Questions through direct prompting and
retrieval-augmented generation (RAG). The results highlight the limitations of
existing solutions in PATQA and motivate the need for new methods to improve
PATQA reasoning capabilities.Comment: Accepted to Findings of ACL '2
EcoRank: Budget-Constrained Text Re-ranking Using Large Language Models
Large Language Models (LLMs) have achieved state-of-the-art performance in
text re-ranking. This process includes queries and candidate passages in the
prompts, utilizing pointwise, listwise, and pairwise prompting strategies. A
limitation of these ranking strategies with LLMs is their cost: the process can
become expensive due to API charges, which are based on the number of input and
output tokens. We study how to maximize the re-ranking performance given a
budget, by navigating the vast search spaces of prompt choices, LLM APIs, and
budget splits. We propose a suite of budget-constrained methods to perform text
re-ranking using a set of LLM APIs. Our most efficient method, called EcoRank,
is a two-layered pipeline that jointly optimizes decisions regarding budget
allocation across prompt strategies and LLM APIs. Our experimental results on
four popular QA and passage reranking datasets show that EcoRank outperforms
other budget-aware supervised and unsupervised baselines.Comment: Accepted to Findings of ACL 2
Use of Generative AI for Improving Health Literacy in Reproductive Health: Case Study.
BackgroundPatients find technology tools to be more approachable for seeking sensitive health-related information, such as reproductive health information. The inventive conversational ability of artificial intelligence (AI) chatbots, such as ChatGPT (OpenAI Inc), offers a potential means for patients to effectively locate answers to their health-related questions digitally.ObjectiveA pilot study was conducted to compare the novel ChatGPT with the existing Google Search technology for their ability to offer accurate, effective, and current information regarding proceeding action after missing a dose of oral contraceptive pill.MethodsA sequence of 11 questions, mimicking a patient inquiring about the action to take after missing a dose of an oral contraceptive pill, were input into ChatGPT as a cascade, given the conversational ability of ChatGPT. The questions were input into 4 different ChatGPT accounts, with the account holders being of various demographics, to evaluate potential differences and biases in the responses given to different account holders. The leading question, "what should I do if I missed a day of my oral contraception birth control?" alone was then input into Google Search, given its nonconversational nature. The results from the ChatGPT questions and the Google Search results for the leading question were evaluated on their readability, accuracy, and effective delivery of information.ResultsThe ChatGPT results were determined to be at an overall higher-grade reading level, with a longer reading duration, less accurate, less current, and with a less effective delivery of information. In contrast, the Google Search resulting answer box and snippets were at a lower-grade reading level, shorter reading duration, more current, able to reference the origin of the information (transparent), and provided the information in various formats in addition to text.ConclusionsChatGPT has room for improvement in accuracy, transparency, recency, and reliability before it can equitably be implemented into health care information delivery and provide the potential benefits it poses. However, AI may be used as a tool for providers to educate their patients in preferred, creative, and efficient ways, such as using AI to generate accessible short educational videos from health care provider-vetted information. Larger studies representing a diverse group of users are needed
- …
