868 research outputs found
Observations - Planet Mars in 1967
Visual Mars observations by Volgograd Observatory in 1966-196
The cooperation of Russian and German metal forming scientific schools to develop the new energy-efficient materials and technologies
The future scientific orientation of Katedra PDSS is in the area of materials forming and materials development with a focus on efficient processes regarding the use of energy and resources. Current research in the department PDSS is based on fundamental works on thermo-mechanical treatment of metals and on the modeling of nano-materials, rolled material and medical materials. This includes research on the relevant microstructural and macroscopic effects on the materials behavior. Together with its international research partners PDSS has excellent foundations for experimental research as well. With its international focus and its educational programs for students and skilled employees PDSS is an important partner of the Russian metal processing industry which supports Russian companies to compete on a world-class level
Four basic symmetry types in the universal 7-cluster structure of 143 complete bacterial genomic sequences
Coding information is the main source of heterogeneity
(non-randomness) in the sequences of bacterial genomes. This
information can be naturally modeled by analysing cluster structures in the ``in-phase'' triplet distributions of relatively short genomic fragments (200-400bp). We found a universal 7-cluster structure in all 143 completely sequenced bacterial genomes available in Genbank in August 2004, and explained its properties.
The 7-cluster structure is responsible for the main part of sequence heterogeneity in bacterial genomes. In this sense, our 7 clusters is the basic model of bacterial genome sequence. We demonstrated that there are four basic ``pure'' types of this model, observed in nature: ``parallel triangles'', ``perpendicular triangles'',
degenerated case and the flower-like type. We show that codon usage of bacterial genomes is a multi-linear function of their genomic G+C-content with high accuracy (more precisely, by two similar functions, one for eubacterial genomes and the other one for archaea).
All 143 cluster animated 3D-scatters are collected in a database and is made available on our web-site:
http://www.ihes.fr/~zinovyev/7clusters
The finding can be readily introduced into any software for gene prediction, sequence alignment or bacterial genomes classification
Data complexity measured by principal graphs
How to measure the complexity of a finite set of vectors embedded in a
multidimensional space? This is a non-trivial question which can be approached
in many different ways. Here we suggest a set of data complexity measures using
universal approximators, principal cubic complexes. Principal cubic complexes
generalise the notion of principal manifolds for datasets with non-trivial
topologies. The type of the principal cubic complex is determined by its
dimension and a grammar of elementary graph transformations. The simplest
grammar produces principal trees.
We introduce three natural types of data complexity: 1) geometric (deviation
of the data's approximator from some "idealized" configuration, such as
deviation from harmonicity); 2) structural (how many elements of a principal
graph are needed to approximate the data), and 3) construction complexity (how
many applications of elementary graph transformations are needed to construct
the principal object starting from the simplest one).
We compute these measures for several simulated and real-life data
distributions and show them in the "accuracy-complexity" plots, helping to
optimize the accuracy/complexity ratio. We discuss various issues connected
with measuring data complexity. Software for computing data complexity measures
from principal cubic complexes is provided as well.Comment: Computers and Mathematics with Applications, in pres
PCA and K-Means decipher genome
In this paper, we aim to give a tutorial for undergraduate students studying
statistical methods and/or bioinformatics. The students will learn how data
visualization can help in genomic sequence analysis. Students start with a
fragment of genetic text of a bacterial genome and analyze its structure. By
means of principal component analysis they ``discover'' that the information in
the genome is encoded by non-overlapping triplets. Next, they learn how to find
gene positions. This exercise on PCA and K-Means clustering enables active
study of the basic bioinformatics notions. Appendix 1 contains program listings
that go along with this exercise. Appendix 2 includes 2D PCA plots of triplet
usage in moving frame for a series of bacterial genomes from GC-poor to GC-rich
ones. Animated 3D PCA plots are attached as separate gif files. Topology
(cluster structure) and geometry (mutual positions of clusters) of these plots
depends clearly on GC-content.Comment: 18 pages, with program listings for MatLab, PCA analysis of genomes
and additional animated 3D PCA plot
Visualization of Data by Method of Elastic Maps and Its Applications in Genomics, Economics and Sociology
Technology of data visualization and data modeling is suggested. The basic of the technology is original idea of elastic net and methods of its construction and application. A short review of relevant methods has been made. The methods proposed are illustrated by applying them to the real economical, sociological and biological datasets and to some model data distributions.
The basic of the technology is original idea of elastic net - regular point approximation of some manifold that is put into the multidimensional space and has in a certain sense minimal energy. This manifold is an analogue of principal surface and serves as non-linear screen on what multidimensional data are projected.
Remarkable feature of the technology is its ability to work with and to fill gaps in data tables. Gaps are unknown or unreliable values of some features. It gives a possibility to predict plausibly values of unknown features by values of other ones. So it provides technology of constructing different prognosis systems and non-linear regressions.
The technology can be used by specialists in different fields. There are several examples of applying the method presented in the end of this paper
Blind source separation methods for deconvolution of complex signals in cancer biology
Two blind source separation methods (Independent Component Analysis and
Non-negative Matrix Factorization), developed initially for signal processing
in engineering, found recently a number of applications in analysis of
large-scale data in molecular biology. In this short review, we present the
common idea behind these methods, describe ways of implementing and applying
them and point out to the advantages compared to more traditional statistical
approaches. We focus more specifically on the analysis of gene expression in
cancer. The review is finalized by listing available software implementations
for the methods described.Comment: Zinovyev A., Kairov U., Karpenyuk T., Ramanculov E. Blind Source
Separation Methods For Deconvolution Of Complex Signals In Cancer Biology.
2012. Biochemical and Biophysical Research Communications. In Press. DOI:
10.1016/j.bbrc.2012.12.04
- …
