4 research outputs found
Reproducibility Education in Computer Science
Presentation on the GRN Symposium on Education and Teaching formats on reproducibility on May 11, 202
delilah-csp/delilah-pt: v0.1
This version is an initial research release of Delilah coinciding with the DaMoN publication of Delilah
NEATX: Non-Expert Annotations of Tubes in X-rays
The Non-Expert Annotations of Tubes in X-rays (NEATX) dataset was created at PURRlab as part of a MSc thesis of Trine Naja Eriksen and Cathrine Damgaard. This dataset contains 3.5k chest drain annotations for the NIH-CXR14 dataset, and 1k annotations for four different tube types (chest drain, tracheostomy, nasogastric, and endotracheal) in the PadChest dataset by two data science students. Please read more about how the dataset can be used in https://arxiv.org/abs/2309.02244. Bibtex: @article{damgaard2023augmenting, title={Augmenting chest x-ray datasets with non-expert annotations}, author={Damgaard, Cathrine and Eriksen, Trine Naja and Juodelyte, Dovile and Cheplygina, Veronika and Jim{\'e}nez-S{\'a}nchez, Amelia}, journal={arXiv preprint arXiv:2309.02244}, year={2023} } Data description: This dataset contains the annotations provided by two data science students (not medical experts) for: A csv file with 3.5k chest drain annotations for NIH-CXR14 dataset A csv file with 1k annotations for four different tube types (chest drain, tracheostomy, nasogastric, and endotracheal) for PadChest dataset We provide the raw individual annotations as well as the aggregated annotations. The annotation protocol is described in the Healthsheet
SeagrassFinder: An Underwater Eelgrass Image Classification Dataset
This dataset is published as part of the publishing of the paper “SeagrassFinder: Deep Learning for Eelgrass Detection and Coverage Estimation in the Wild” in the Journal Ecological Informatics. The dataset is created as a machine learning dataset for training computer vision models to classify the presence of eelgrass. This dataset was created by the main author Jannik Elsäßer as part of his bachelor's thesis. The original video transect data in this dataset comes from DHI A/S work providing By og Havn a “Summer Status” report on the maritime environmental impacts of the Lynetteholm project. More information on the project and the report is available here: https://byoghavn.dk/mediebibliotek/lynetteholm-sommerstatus-2023/ The dataset consists of underwater images taken on a sled, dragged through the water by a survey vessel. The camera used is a Subsea HD-Camera made by LH-Camera. Images were created by taking 5 video frames each second, and then randomly sampling. Each image is labeled True or False for eelgrass presence. In total, the dataset consists of 8500 images from 6 different transects, with 4482 images containing eelgrass, and 4042 images not containing eelgrass. All images have been annotated by a both domain-experts, and non-domain experts. Images were annotated using a uniform sampling process. In the occurrence of any disagreement between annotators, images have been removed from the dataset. For more information on the dataset creation, please refer to the corresponding paper. We recommend using one transect as a test dataset, and not using a random split of all images to create the test dataset. When using a random split of all images, a form of data leakage occurs, since some images can be very similar to other images. An unfortunate limitation, we believe caused by the compression of the videos in the camera system, is some frames contain an echo or form of motion trail. This can lead to ghost like eelgrass features in some frames. This should be taken into consideration when applying the dataset in future locations
