253 research outputs found

    Towards a biodiversity knowledge graph

    Get PDF
    One way to think about "core" biodiversity data is as a network of connected entities, such as taxa, taxonomic names, publications, people, species, sequences, images, and collections that form the "biodiversity knowledge graph". Many questions in biodiversity informatics can be framed as paths in this graph. This article explores this futher, and sketches a set of services and tools we would need in order to construct the graph

    Surfacing the deep data of taxonomy

    Get PDF
    Taxonomic databases are perpetuating approaches to citing literature that may have been appropriate before the Internet, often being little more than digitised 5 × 3 index cards. Typically the original taxonomic literature is either not cited, or is represented in the form of a (typically abbreviated) text string. Hence much of the “deep data” of taxonomy, such as the original descriptions, revisions, and nomenclatural actions are largely hidden from all but the most resourceful users. At the same time there are burgeoning efforts to digitise the scientific literature, and much of this newly available content has been assigned globally unique identifiers such as Digital Object Identifiers (DOIs), which are also the identifier of choice for most modern publications. This represents an opportunity for taxonomic databases to engage with digitisation efforts. Mapping the taxonomic literature on to globally unique identifiers can be time consuming, but need be done only once. Furthermore, if we reuse existing identifiers, rather than mint our own, we can start to build the links between the diverse data that are needed to support the kinds of inference which biodiversity informatics aspires to support. Until this practice becomes widespread, the taxonomic literature will remain balkanized, and much of the knowledge that it contains will linger in obscurity

    DNA barcoding and taxonomy: dark taxa and dark texts

    Get PDF
    Both classical taxonomy and DNA barcoding are engaged in the task of digitizing the living world. Much of the taxonomic literature remains undigitized. The rise of open access publishing this century and the freeing of older literature from the shackles of copyright have greatly increased the online availability of taxonomic descriptions, but much of the literature of the mid- to late-twentieth century remains offline (‘dark texts’). DNA barcoding is generating a wealth of computable data that in many ways are much easier to work with than classical taxonomic descriptions, but many of the sequences are not identified to species level. These ‘dark taxa’ hamper the classical method of integrating biodiversity data, using shared taxonomic names. Voucher specimens are a potential common currency of both the taxonomic literature and sequence databases, and could be used to help link names, literature and sequences. An obstacle to this approach is the lack of stable, resolvable specimen identifiers. The paper concludes with an appeal for a global ‘digital dashboard’ to assess the extent to which biodiversity data are available online. This article is part of the themed issue ‘From DNA barcodes to biomes’

    Towards a Taxonomically Intelligent Phylogenetic Database

    Get PDF
    This note outlines some of the key intellectual obstacles that stand in the way of creating a usable phylogenetic database. These challenges include the need to accommodate multiple taxonomic names and classifications, and the need for tools to query trees in biologically meaningful ways. Until these problems are addressed, and a taxonomically intelligent phylogenetic database created, much of our phylogenetic knowledge will languish in the pages of journals

    Liberating links between datasets using lightweight data publishing: an example using plant names and the taxonomic literature

    Get PDF
    Constructing a biodiversity knowledge graph will require making millions of cross links between diversity entities in different datasets. Researchers trying to bootstrap the growth of the biodiversity knowledge graph by constructing databases of links between these entities lack obvious ways to publish these sets of links. One appealing and lightweight approach is to create a "datasette", a database that is wrapped together with a simple web server that enables users to query the data. Datasettes can be packaged into Docker containers and hosted online with minimal effort. This approach is illustrated using a dataset of links between globally unique identifiers for plant taxonomic namesand identifiers for the taxonomic articles that published those names

    Phyloinformatics in the age of Wikipedia

    Get PDF
    This talk describes a mapping between the NCBI taxonomy database and Wikipedia. These two databases were chosen because the NCBI taxonomy contains all the taxa for which sequences are publicly available, and for many taxa Wikipedia is the first site returned in a Google search on that taxon's scientific name. The NCBI web pages for nearly 53,000 NCBI taxa now have a link to the corresponding page in Wikipedia

    Finding scientific articles in a large digital archive: BioStor and the Biodiversity Heritage Library

    Get PDF
    The Biodiversity Heritage Library (BHL) is a large digital archive of legacy biological literature, comprising over 31 million pages scanned from books, monographs, and journals. During the digitisation process basic metadata about the scanned items is recorded, but not article-level metadata. Given that the article is the standard unit of citation, this makes it difficult to locate cited literature in BHL. Adding the ability to easily find articles in BHL would greatly enhance the value of the archive. A service was developed to locate articles in BHL based on matching article metadata to BHL metadata using approximate string matching, regular expressions, and string alignment. This article finding service is exposed as a standard OpenURL resolver on the BioStor web site "http://biostor.org/openurl/":http://biostor.org/openurl/. This resolver can be used on the web, or called by bibliographic tools that support OpenURL. BioStor provides tools for extracting, annotating, and visualising articles from the Biodiversity Heritage Library. BioStor is available from "http://biostor.org/":http://biostor.org/

    Biotic Element Analysis in Biogeography

    Get PDF
    Biotic element analysis is an alternative to the areas-of-endemism approach for recognizing the presence or absence of vicariance events in a given region. If an ancestral biota was fragmented by vicariance events, biotic elements or clusters of distribution areas should emerge. We propose a statistical test for clustering of distribution areas based on a Monte Carlo simulation with a null model that considers the spatial autocorrelation in the data. The hypothesis tested is that the observed degree of clustering of ranges can be explained by the range size distribution, the varying number of taxa per cell, and the spatial autocorrelation of the occurrences of a taxon alone. A method for the delimitation of biotic elements which uses model-based Gaussian clustering is introduced. We demonstrate our methods and show the importance of grid size by means of a case study, an analysis of the distribution patterns of southern African species of the weevil genus Scobius. The example highlights the difficulties in delimiting areas of endemism if dispersal has occurred and illustrates the advantages of the biotic element approac

    Going nuclear: gene family evolution and vertebrate phylogeny reconciled

    Get PDF
    Gene duplications have been common throughout vertebrate evolution, introducing paralogy and so complicating phylogenctic inference from nuclear genes. Reconciled trees are one method capable of dealing with paralogy, using the relationship between a gene phylogeny and the phylogeny of the organisms containing those genes to identify gene duplication events. This allows us to infer phylogenies from gene families containing both orthologous and paralogous copies. Vertebrate phylogeny is well understood from morphological and palaeontological data, but studies using mitochondrial sequence data have failed to reproduce this classical view. Reconciled tree analysis of a database of 118 vertebrate gene families supports a largely classical vertebrate phylogeny
    corecore