3,965 research outputs found
Fishing in the Stream: Similarity Search over Endless Data
Similarity search is the task of retrieving data items that are similar to a
given query. In this paper, we introduce the time-sensitive notion of
similarity search over endless data-streams (SSDS), which takes into account
data quality and temporal characteristics in addition to similarity. SSDS is
challenging as it needs to process unbounded data, while computation resources
are bounded. We propose Stream-LSH, a randomized SSDS algorithm that bounds the
index size by retaining items according to their freshness, quality, and
dynamic popularity attributes. We analytically show that Stream-LSH increases
the probability to find similar items compared to alternative approaches using
the same space capacity. We further conduct an empirical study using real world
stream datasets, which confirms our theoretical results
The SOL Genomics Network Model: Making Community Annotation Work
The concept of community annotation is a growing discipline for achieving participation of the research community in depositing up‐to‐date knowledge in biological databases.
The Solanaceae Genomics Network ("SGN":http://sgn.cornell.edu/) is a clade‐oriented database (COD) focusing on plants of the nightshade family, including tomato, potato, pepper, eggplant, and tobacco, and is one of the bioinformatics nodes of the international tomato genome sequencing project. One of our major efforts is linking Solanaceae phenotype information with the underlying genes, and subsequently the genome. As part of this goal, SGN has introduced a database for locus names and descriptors, and a database for phenotypes of natural and induced variation. These two databases have web interfaces that allow cross references, associations with tomato gene models, and in‐house curated information of sequences, literature, ontologies, gene networks, and the Solanaceae biochemical pathways database ("SolCyc":http://solcyc.sgn.cornell.edu). All of our curator tools are open for online community annotation, through specially assigned “submitter” accounts. 

Currently the community database consists of 5,548 phenotyped accessions, and 5,739 curated loci, out of which more than 300 loci where contributed or annotated by 66 active submitters, creating a database that is truly community driven.
This framework is easily adaptable for other projects working on other taxa (for example see "http://chlamybase.org":http://chlamybase.org), greatly expanding the application of this user‐friendly online annotation system. Community participation is fostered by an active outreach program that includes contacting potential submitters via emails, at meetings and conferences, and by promoting featured user submitted annotations on the SGN homepage. The source code and database schema for all SGN functionalities are freely available. Please contact SGN at "sgn‐feedback[at]sgn.cornell.edu":mailto:[email protected] for more information
Studies on the reliability of biomarkers for alcohol use and abuse
Alcohol is consumed by the vast majority of the population, but prolonged excessive
drinking is associated with various negative health and social consequences. It is
therefore important to identify individuals with at-risk alcohol consumption, before it
turns into abuse or dependence. Early detection of alcohol use and abuse can be done
by the use of biomarkers such as ethyl glucuronide (EtG), carbohydrate-deficient
transferrin (CDT), and phosphatidylethanol (PEth) that provide objective information
about current consumption. However, since misleading test results can have
devastating consequences, the use of reliable biomarkers is substantial. The aim of
this thesis was to evaluate several factors, both clinical and analytical, that could
generate erroneous test results when testing for alcohol use by these biomarkers.
Measurement of urinary EtG levels was done in 482 samples using different liquid
chromatography-mass spectrometry procedures. Accurate determination of EtG
concentrations was done according to specific criteria suggested by international
guidelines. The sensitivity and specificity were calculated for each of four methods
by comparing EtG results obtained with a fifth reference method that demonstrated
the highest selectivity. These results showed that meeting the guideline criteria does
not always guarantee correct identification, and the likelihood of different analytical
methods to provide reliable analytical results depends on the reporting limit applied.
Evaluation of the analytical performance of CDT testing was done by comparing two
different methods in routine use, capillary electrophoresis (CE) and high-performance
liquid chromatography (HPLC). Most of the problems encountered by CE could be
solved by using the HPLC method, and it was therefore advised to have access to a
confirmatory HPLC analysis, when a high throughput method like CE is employed.
Evaluation of the clinical performance of CDT in pregnancy was done by measuring
serum transferrin glycoforms in 171 samples collected from 24 healthy women during
and after pregnancy. A gradual increase in the CDT (%disialotransferrin) level was
observed during pregnancy, and in many subjects the level approached the upper limit
of the reference interval. For use in pregnant women, the cutoff value for CDT used
to detect risky drinking needs to be raised slightly to minimize the risk for falsepositive results.
The possible interference by transferrin glycation on CDT testing was also evaluated.
Samples subjected to in vitro glycation and samples collected from diabetic patients
were tested for CDT by HPLC. No interferences were observed in samples from
diabetics, which contrasted to the effect seen in vitro by transferrin glycation. The
results indicated that CDT, and also PEth, are reliable markers to identify risky
drinking in diabetic patients.
Taken together, the results of the present studies have identified and suggested ways to
overcome a number of analytical and clinical interferences with these alcohol
biomarkers, and thus helped to improve their routine use
Single-cell protein dynamics reproduce universal fluctuations in cell populations
Protein variability in single cells has been studied extensively in
populations, but little is known about temporal protein fluctuations in a
single cell over extended times. We present here traces of protein copy number
measured in individual bacteria over multiple generations and investigate their
statistical properties, comparing them to previously measured population
snapshots. We find that temporal fluctuations in individual traces exhibit the
same universal features as those previously observed in populations. Scaled
fluctuations around the mean of each trace exhibit the same universal
distribution shape as found in populations measured under a wide range of
conditions and in two distinct microorganisms. Additionally, the mean and
variance of the traces over time obey the same quadratic relation. Analyzing
the temporal features of the protein traces in individual cells, reveals that
within a cell cycle protein content increases as an exponential function with a
rate that varies from cycle to cycle. This leads to a compact description of
the protein trace as a 3-variable stochastic process - the exponential rate,
the cell-cycle duration and the value at the cycle start - sampled once each
cell cycle. This compact description is sufficient to preserve the universal
statistical properties of the protein fluctuations, namely, the protein
distribution shape and the quadratic relationship between variance and mean.
Our results show that the protein distribution shape is insensitive to
sub-cycle intracellular microscopic details and reflects global cellular
properties that fluctuate between generations
Individuality and slow dynamics in bacterial growth homeostasis
Microbial growth and division are fundamental processes relevant to many
areas of life science. Of particular interest are homeostasis mechanisms, which
buffer growth and division from accumulating fluctuations over multiple cycles.
These mechanisms operate within single cells, possibly extending over several
division cycles. However, all experimental studies to date have relied on
measurements pooled from many distinct cells. Here, we disentangle long-term
measured traces of individual cells from one another, revealing subtle
differences between temporal and pooled statistics. By analyzing correlations
along up to hundreds of generations, we find that the parameter describing
effective cell-size homeostasis strength varies significantly among cells. At
the same time, we find an invariant cell size which acts as an attractor to all
individual traces, albeit with different effective attractive forces. Despite
the common attractor, each cell maintains a distinct average size over its
finite lifetime with suppressed temporal fluctuations around it, and
equilibration to the global average size is surprisingly slow (> 150 cell
cycles). To demonstrate a possible source of variable homeostasis strength, we
construct a mathematical model relying on intracellular interactions, which
integrates measured properties of cell size with those of highly expressed
proteins. Effective homeostasis strength is then influenced by interactions and
by noise levels, and generally varies among cells. A predictable and measurable
consequence of variable homeostasis strength appears as distinct oscillatory
patterns in cell size and protein content over many generations. We discuss the
implications of our results to understanding mechanisms controlling division in
single cells and their characteristic timescalesComment: In press with PNAS. 50 pages, including supplementary informatio
A Wavelet-Based Approach To Monitoring Parkinson's Disease Symptoms
Parkinson's disease is a neuro-degenerative disorder affecting tens of
millions of people worldwide. Lately, there has been considerable interest in
systems for at-home monitoring of patients, using wearable devices which
contain inertial measurement units. We present a new wavelet-based approach for
analysis of data from single wrist-worn smart-watches, and show high detection
performance for tremor, bradykinesia, and dyskinesia, which have been the major
targets for monitoring in this context. We also discuss the implication of our
controlled-experiment results for uncontrolled home monitoring of freely
behaving patients.Comment: ICASSP 201
- …
