3,965 research outputs found

    Fishing in the Stream: Similarity Search over Endless Data

    Full text link
    Similarity search is the task of retrieving data items that are similar to a given query. In this paper, we introduce the time-sensitive notion of similarity search over endless data-streams (SSDS), which takes into account data quality and temporal characteristics in addition to similarity. SSDS is challenging as it needs to process unbounded data, while computation resources are bounded. We propose Stream-LSH, a randomized SSDS algorithm that bounds the index size by retaining items according to their freshness, quality, and dynamic popularity attributes. We analytically show that Stream-LSH increases the probability to find similar items compared to alternative approaches using the same space capacity. We further conduct an empirical study using real world stream datasets, which confirms our theoretical results

    The SOL Genomics Network Model: Making Community Annotation Work

    Get PDF
    The concept of community annotation is a growing discipline for achieving participation of the research community in depositing up‐to‐date knowledge in biological databases.
The Solanaceae Genomics Network ("SGN":http://sgn.cornell.edu/) is a clade‐oriented database (COD) focusing on plants of the nightshade family, including tomato, potato, pepper, eggplant, and tobacco, and is one of the bioinformatics nodes of the international tomato genome sequencing project. One of our major efforts is linking Solanaceae phenotype information with the underlying genes, and subsequently the genome. As part of this goal, SGN has introduced a database for locus names and descriptors, and a database for phenotypes of natural and induced variation. These two databases have web interfaces that allow cross references, associations with tomato gene models, and in‐house curated information of sequences, literature, ontologies, gene networks, and the Solanaceae biochemical pathways database ("SolCyc":http://solcyc.sgn.cornell.edu). All of our curator tools are open for online community annotation, through specially assigned “submitter” accounts. 

Currently the community database consists of 5,548 phenotyped accessions, and 5,739 curated loci, out of which more than 300 loci where contributed or annotated by 66 active submitters, creating a database that is truly community driven.
This framework is easily adaptable for other projects working on other taxa (for example see "http://chlamybase.org":http://chlamybase.org), greatly expanding the application of this user‐friendly online annotation system. Community participation is fostered by an active outreach program that includes contacting potential submitters via emails, at meetings and conferences, and by promoting featured user submitted annotations on the SGN homepage. The source code and database schema for all SGN functionalities are freely available. Please contact SGN at "sgn‐feedback[at]sgn.cornell.edu":mailto:[email protected] for more information

    Studies on the reliability of biomarkers for alcohol use and abuse

    Get PDF
    Alcohol is consumed by the vast majority of the population, but prolonged excessive drinking is associated with various negative health and social consequences. It is therefore important to identify individuals with at-risk alcohol consumption, before it turns into abuse or dependence. Early detection of alcohol use and abuse can be done by the use of biomarkers such as ethyl glucuronide (EtG), carbohydrate-deficient transferrin (CDT), and phosphatidylethanol (PEth) that provide objective information about current consumption. However, since misleading test results can have devastating consequences, the use of reliable biomarkers is substantial. The aim of this thesis was to evaluate several factors, both clinical and analytical, that could generate erroneous test results when testing for alcohol use by these biomarkers. Measurement of urinary EtG levels was done in 482 samples using different liquid chromatography-mass spectrometry procedures. Accurate determination of EtG concentrations was done according to specific criteria suggested by international guidelines. The sensitivity and specificity were calculated for each of four methods by comparing EtG results obtained with a fifth reference method that demonstrated the highest selectivity. These results showed that meeting the guideline criteria does not always guarantee correct identification, and the likelihood of different analytical methods to provide reliable analytical results depends on the reporting limit applied. Evaluation of the analytical performance of CDT testing was done by comparing two different methods in routine use, capillary electrophoresis (CE) and high-performance liquid chromatography (HPLC). Most of the problems encountered by CE could be solved by using the HPLC method, and it was therefore advised to have access to a confirmatory HPLC analysis, when a high throughput method like CE is employed. Evaluation of the clinical performance of CDT in pregnancy was done by measuring serum transferrin glycoforms in 171 samples collected from 24 healthy women during and after pregnancy. A gradual increase in the CDT (%disialotransferrin) level was observed during pregnancy, and in many subjects the level approached the upper limit of the reference interval. For use in pregnant women, the cutoff value for CDT used to detect risky drinking needs to be raised slightly to minimize the risk for falsepositive results. The possible interference by transferrin glycation on CDT testing was also evaluated. Samples subjected to in vitro glycation and samples collected from diabetic patients were tested for CDT by HPLC. No interferences were observed in samples from diabetics, which contrasted to the effect seen in vitro by transferrin glycation. The results indicated that CDT, and also PEth, are reliable markers to identify risky drinking in diabetic patients. Taken together, the results of the present studies have identified and suggested ways to overcome a number of analytical and clinical interferences with these alcohol biomarkers, and thus helped to improve their routine use

    Single-cell protein dynamics reproduce universal fluctuations in cell populations

    Full text link
    Protein variability in single cells has been studied extensively in populations, but little is known about temporal protein fluctuations in a single cell over extended times. We present here traces of protein copy number measured in individual bacteria over multiple generations and investigate their statistical properties, comparing them to previously measured population snapshots. We find that temporal fluctuations in individual traces exhibit the same universal features as those previously observed in populations. Scaled fluctuations around the mean of each trace exhibit the same universal distribution shape as found in populations measured under a wide range of conditions and in two distinct microorganisms. Additionally, the mean and variance of the traces over time obey the same quadratic relation. Analyzing the temporal features of the protein traces in individual cells, reveals that within a cell cycle protein content increases as an exponential function with a rate that varies from cycle to cycle. This leads to a compact description of the protein trace as a 3-variable stochastic process - the exponential rate, the cell-cycle duration and the value at the cycle start - sampled once each cell cycle. This compact description is sufficient to preserve the universal statistical properties of the protein fluctuations, namely, the protein distribution shape and the quadratic relationship between variance and mean. Our results show that the protein distribution shape is insensitive to sub-cycle intracellular microscopic details and reflects global cellular properties that fluctuate between generations

    Individuality and slow dynamics in bacterial growth homeostasis

    Full text link
    Microbial growth and division are fundamental processes relevant to many areas of life science. Of particular interest are homeostasis mechanisms, which buffer growth and division from accumulating fluctuations over multiple cycles. These mechanisms operate within single cells, possibly extending over several division cycles. However, all experimental studies to date have relied on measurements pooled from many distinct cells. Here, we disentangle long-term measured traces of individual cells from one another, revealing subtle differences between temporal and pooled statistics. By analyzing correlations along up to hundreds of generations, we find that the parameter describing effective cell-size homeostasis strength varies significantly among cells. At the same time, we find an invariant cell size which acts as an attractor to all individual traces, albeit with different effective attractive forces. Despite the common attractor, each cell maintains a distinct average size over its finite lifetime with suppressed temporal fluctuations around it, and equilibration to the global average size is surprisingly slow (> 150 cell cycles). To demonstrate a possible source of variable homeostasis strength, we construct a mathematical model relying on intracellular interactions, which integrates measured properties of cell size with those of highly expressed proteins. Effective homeostasis strength is then influenced by interactions and by noise levels, and generally varies among cells. A predictable and measurable consequence of variable homeostasis strength appears as distinct oscillatory patterns in cell size and protein content over many generations. We discuss the implications of our results to understanding mechanisms controlling division in single cells and their characteristic timescalesComment: In press with PNAS. 50 pages, including supplementary informatio

    A Wavelet-Based Approach To Monitoring Parkinson's Disease Symptoms

    Full text link
    Parkinson's disease is a neuro-degenerative disorder affecting tens of millions of people worldwide. Lately, there has been considerable interest in systems for at-home monitoring of patients, using wearable devices which contain inertial measurement units. We present a new wavelet-based approach for analysis of data from single wrist-worn smart-watches, and show high detection performance for tremor, bradykinesia, and dyskinesia, which have been the major targets for monitoring in this context. We also discuss the implication of our controlled-experiment results for uncontrolled home monitoring of freely behaving patients.Comment: ICASSP 201
    corecore