1,045 research outputs found

    Which Modality should I use -- Text, Motif, or Image? : Understanding Graphs with Large Language Models

    Full text link
    Our research integrates graph data with Large Language Models (LLMs), which, despite their advancements in various fields using large text corpora, face limitations in encoding entire graphs due to context size constraints. This paper introduces a new approach to encoding a graph with diverse modalities, such as text, image, and motif, coupled with prompts to approximate a graph's global connectivity, thereby enhancing LLMs' efficiency in processing complex graph structures. The study also presents GraphTMI, a novel benchmark for evaluating LLMs in graph structure analysis, focusing on homophily, motif presence, and graph difficulty. Key findings indicate that the image modality, especially with vision-language models like GPT-4V, is superior to text in balancing token limits and preserving essential information and outperforms prior graph neural net (GNN) encoders. Furthermore, the research assesses how various factors affect the performance of each encoding modality and outlines the existing challenges and potential future developments for LLMs in graph understanding and reasoning tasks. All data will be publicly available upon acceptance

    Avitourism opportunities as a contribution to conservation and rural livelihoods in the Hindu Kush Himalaya - a field perspective

    Get PDF
    The Hindu Kush Himalaya is a biodiversity hotspot subject to multiple anthropogenic stressors, including hydropower plants, pollution, deforestation and wildlife poaching, in addition to changing climate. Bird photography tourism, as a locally important element of avitourism, has the potential to integrate sustainable development and wildlife conservation. We conducted field surveys around the reaches of four Indian Himalayan rivers—the Kosi, western Ramganga, Khoh, and Song—outside of protected national parks (the Corbett and Rajaji tiger reserves) to ascertain the distribution of bird species along river corridors that could be sites of avitourism. Species richness along the surveyed reaches were: Kosi (79), western Ramganga (91), Khoh (52), and Song (79). This study contributes critical data to the existing baseline information on the avifaunal species of Uttarakhand. It further discusses the possibility of developing avitourism for knowledge generation on species distribution and innovative livelihood options for local communities in Uttarakhand, reinforcing local vested interest in bird conservation. The findings have generic applicability worldwide

    Single-Cell RNA Sequencing Reveals Cellular Heterogeneity and Stage Transition under Temperature Stress in Synchronized Plasmodium falciparum Cells

    Get PDF
    The malaria parasite has a complex life cycle exhibiting phenotypic and morphogenic variations in two different hosts by existing in heterogeneous developmental states. To investigate this cellular heterogeneity of the parasite within the human host, we performed single-cell RNA sequencing of synchronized Plasmodium cells under control and temperature treatment conditions. Using the Malaria Cell Atlas (https://www.sanger.ac.uk/science/tools/mca) as a guide, we identified 9 subtypes of the parasite distributed across known intraerythrocytic stages. Interestingly, temperature treatment results in the upregulation of the AP2-G gene, the master regulator of sexual development in a small subpopulation of the parasites. Moreover, we identified a heterogeneous stress-responsive subpopulation (clusters 5, 6, and 7 [−10% of the total population]) that exhibits upregulation of stress response pathways under normal growth conditions. We also developed an online exploratory tool that will provide new insights into gene function under normal and temperature stress conditions. Thus, our study reveals important insights into cell-to-cell heterogeneity in the parasite population under temperature treatment that will be instrumental toward a mechanistic understanding of cellular adaptation and population dynamics in Plasmodium falciparum. IMPORTANCE The malaria parasite has a complex life cycle exhibiting phenotypic variations in two different hosts accompanied by cell-to-cell variability that is important for stress tolerance, immune evasion, and drug resistance. To investigate cellular heterogeneity determined by gene expression, we performed single-cell RNA sequencing (scRNA-seq) of about 12,000 synchronized Plasmodium cells under physiologically relevant normal (37°C) and temperature stress (40°C) conditions phenocopying the cyclic bouts of fever experienced during malarial infection. In this study, we found that parasites exhibit transcriptional heterogeneity in an otherwise morphologically synchronized culture. Also, a subset of parasites is continually committed to gametocytogenesis and stress-responsive pathways. These observations have important implications for understanding the mechanisms of drug resistance generation and vaccine development against the malaria parasite.</p

    Innovations in Agricultural Forecasting: A Multivariate Regression Study on Global Crop Yield Prediction

    Full text link
    The prediction of crop yields internationally is a crucial objective in agricultural research. Thus, this study implements 6 regression models (Linear, Tree, Gradient Descent, Gradient Boosting, K Nearest Neighbors, and Random Forest) to predict crop yields in 37 developing countries over 27 years. Given 4 key training parameters, insecticides (tonnes), rainfall (mm), temperature (Celsius), and yield (hg/ha), it was found that our Random Forest Regression model achieved a determination coefficient (r2) of 0.94, with a margin of error (ME) of .03. The models were trained and tested using the Food and Agricultural Organization of the United Nations data, along with the World Bank Climate Change Data Catalog. Furthermore, each parameter was analyzed to understand how varying factors could impact overall yield. We used unconventional models, contrary to generally used Deep Learning (DL) and Machine Learning (ML) models, combined with recently collected data to implement a unique approach in our research. Existing scholarship would benefit from understanding the most optimal model for agricultural research, specifically using the United Nations data.Comment: 12 pages, 8 figures, 1 table, Guided by Dr. Aditya Undurt

    Streamlining Data Pipelines with DBT and GIT Integration

    Get PDF
    DBT (Data Build Tool) has revolutionized data transformation workflows, enabling data engineers to model, test, and document data within cloud data warehouses. When coupled with Git for version control, DBT enables more efficient collaboration, reproducibility, and error tracking in data engineering teams. This paper explores how integrating DBT with Git can streamline the development and deployment of data pipelines. The research focuses on the advantages of using Git for managing DBT projects, ensuring collaborative workflows, maintaining data pipeline versions, and automating deployments. We discuss best practices for integrating DBT with Git to improve data pipeline efficiency, reduce errors, and ensure a smoother CI/CD process in modern data engineering environment

    Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells

    Get PDF
    Full-length RNA sequencing (RNA-Seq) has been applied to bulk tissue, cell lines and sorted cells to characterize transcriptomes1–11, but applying this technology to single cells has proven to be difficult, with less than ten single-cell transcriptomes having been analyzed thus far12,13. Although single splicing events have been described for ≤200 single cells with statistical confidence14,15, full-length mRNA analyses for hundreds of cells have not been reported. Single-cell short-read 3′ sequencing enables the identification of cellular subtypes16–21, but full-length mRNA isoforms for these cell types cannot be profiled. We developed a method that starts with bulk tissue and identifies single-cell types and their full-length RNA isoforms without fluorescence-activated cell sorting. Using single-cell isoform RNA-Seq (ScISOr-Seq), we identified RNA isoforms in neurons, astrocytes, microglia, and cell subtypes such as Purkinje and Granule cells, and cell-type-specific combination patterns of distant splice sites6–9,22,23. We used ScISOr-Seq to improve genome annotation in mouse Gencode version 10 by determining the cell-type-specific expression of 18,173 known and 16,872 novel isoforms

    Identification of herpesvirus transcripts from genomic regions around the replication origins

    Get PDF
    Long-read sequencing (LRS) techniques enable the identification of full-length RNA molecules in a single run eliminating the need for additional assembly steps. LRS research has exposed unanticipated transcriptomic complexity in various organisms, including viruses. Herpesviruses are known to produce a range of transcripts, either close to or overlapping replication origins (Oris) and neighboring genes related to transcription or replication, which possess confirmed or potential regulatory roles. In our research, we employed both new and previously published LRS and short-read sequencing datasets to uncover additional Ori-proximal transcripts in nine herpesviruses from all three subfamilies (alpha, beta and gamma). We discovered novel long non-coding RNAs, as well as splice and length isoforms of mRNAs. Moreover, our analysis uncovered an intricate network of transcriptional overlaps within the examined genomic regions. We demonstrated that herpesviruses display distinct patterns of transcriptional overlaps in the vicinity of or at the Oris. Our findings suggest the existence of a ‘super regulatory center’ in the genome of alphaherpesviruses that governs the initiation of both DNA replication and global transcription through multilayered interactions among the molecular machineries

    Analysis of whole genome-transcriptomic organization in brain to identify genes associated with alcoholism.

    Get PDF
    Alcohol exposure triggers changes in gene expression and biological pathways in human brain. We explored alterations in gene expression in the Pre-Frontal Cortex (PFC) of 65 alcoholics and 73 controls of European descent, and identified 129 genes that showed altered expression (FDR < 0.05) in subjects with alcohol dependence. Differentially expressed genes were enriched for pathways related to interferon signaling and Growth Arrest and DNA Damage-inducible 45 (GADD45) signaling. A coexpression module (thistle2) identified by weighted gene co-expression network analysis (WGCNA) was significantly correlated with alcohol dependence, alcohol consumption, and AUDIT scores. Genes in the thistle2 module were enriched with genes related to calcium signaling pathways and showed significant downregulation of these pathways, as well as enrichment for biological processes related to nicotine response and opioid signaling. A second module (brown4) showed significant upregulation of pathways related to immune signaling. Expression quantitative trait loci (eQTLs) for genes in the brown4 module were also enriched for genetic associations with alcohol dependence and alcohol consumption in large genome-wide studies included in the Psychiatric Genetic Consortium and the UK Biobank's alcohol consumption dataset. By leveraging multi-omics data, this transcriptome analysis has identified genes and biological pathways that could provide insight for identifying therapeutic targets for alcohol dependence.VoRSUNY DownstateHenri Begleiter Neurodynamics LaboratoryN/
    corecore