49 research outputs found
D-VRE: From a Jupyter-enabled Private Research Environment to Decentralized Collaborative Research Ecosystem
Today, scientific research is increasingly data-centric and
compute-intensive, relying on data and models across distributed sources.
However, it still faces challenges in the traditional cooperation mode, due to
the high storage and computing cost, geo-location barriers, and local
confidentiality regulations. The Jupyter environment has recently emerged and
evolved as a vital virtual research environment for scientific computing, which
researchers can use to scale computational analyses up to larger datasets and
high-performance computing resources. Nevertheless, existing approaches lack
robust support of a decentralized cooperation mode to unlock the full potential
of decentralized collaborative scientific research, e.g., seamlessly secure
data sharing. In this work, we change the basic structure and legacy norms of
current research environments via the seamless integration of Jupyter with
Ethereum blockchain capabilities. As such, it creates a Decentralized Virtual
Research Environment (D-VRE) from private computational notebooks to
decentralized collaborative research ecosystem. We propose a novel architecture
for the D-VRE and prototype some essential D-VRE elements for enabling secure
data sharing with decentralized identity, user-centric agreement-making,
membership, and research asset management. To validate our method, we conducted
an experimental study to test all functionalities of D-VRE smart contracts and
their gas consumption. In addition, we deployed the D-VRE prototype on a test
net of the Ethereum blockchain for demonstration. The feedback from the studies
showcases the current prototype's usability, ease of use, and potential and
suggests further improvements.Comment: We revised the manuscript draft and submitted the revised manuscript
to the journal Blockchain: Research and Application
Towards Seamless Serverless Computing Across an Edge-Cloud Continuum
Serverless computing has emerged as an attractive paradigm due to the
efficiency of development and the ease of deployment without managing any
underlying infrastructure. Nevertheless, serverless computing approaches face
numerous challenges to unlock their full potential in hybrid environments. To
gain a deeper understanding and firsthand knowledge of serverless computing in
edge-cloud deployments, we review the current state of open-source serverless
platforms and compare them based on predefined requirements. We then design and
implement a serverless computing platform with a novel edge orchestration
technique that seamlessly deploys serverless functions across the edge and
cloud environments on top of the Knative serverless platform. Moreover, we
propose an offloading strategy for edge environments and four different
functions for experimentation and showcase the performance benefits of our
solution. Our results demonstrate that such an approach can efficiently utilize
both cloud and edge resources by dynamically offloading functions from the edge
to the cloud during high activity, while reducing the overall application
latency and increasing request throughput compared to an edge-only deployment
PriCE: Privacy-Preserving and Cost-Effective Scheduling for Parallelizing the Large Medical Image Processing Workflow over Hybrid Clouds
Running deep neural networks for large medical images is a resource-hungry
and time-consuming task with centralized computing. Outsourcing such medical
image processing tasks to hybrid clouds has benefits, such as a significant
reduction of execution time and monetary cost. However, due to privacy
concerns, it is still challenging to process sensitive medical images over
clouds, which would hinder their deployment in many real-world applications. To
overcome this, we first formulate the overall optimization objectives of the
privacy-preserving distributed system model, i.e., minimizing the amount of
information about the private data learned by the adversaries throughout the
process, reducing the maximum execution time and cost under the user budget
constraint. We propose a novel privacy-preserving and cost-effective method
called PriCE to solve this multi-objective optimization problem. We performed
extensive simulation experiments for artifact detection tasks on medical images
using an ensemble of five deep convolutional neural network inferences as the
workflow task. Experimental results show that PriCE successfully splits a wide
range of input gigapixel medical images with graph-coloring-based strategies,
yielding desired output utility and lowering the privacy risk, makespan, and
monetary cost under user's budget.Comment: Acccepted at Europar 202
Towards Privacy-, Budget-, and Deadline-Aware Service Optimization for Large Medical Image Processing across Hybrid Clouds
Efficiently processing medical images, such as whole slide images in digital
pathology, is essential for timely diagnosing high-risk diseases. However, this
demands advanced computing infrastructure, e.g., GPU servers for deep learning
inferencing, and local processing is time-consuming and costly. Besides,
privacy concerns further complicate the employment of remote cloud
infrastructures. While previous research has explored privacy and
security-aware workflow scheduling in hybrid clouds for distributed processing,
privacy-preserving data splitting, optimizing the service allocation of
outsourcing computation on split data to the cloud, and privacy evaluation for
large medical images still need to be addressed. This study focuses on
tailoring a virtual infrastructure within a hybrid cloud environment and
scheduling the image processing services while preserving privacy. We aim to
minimize the use of untrusted nodes, lower monetary costs, and reduce execution
time under privacy, budget, and deadline requirements. We consider a two-phase
solution and develop 1) a privacy-preserving data splitting algorithm and 2) a
greedy Pareto front-based algorithm for optimizing the service allocation. We
conducted experiments with real and simulated data to validate and compare our
method with a baseline. The results show that our privacy mechanism design
outperforms the baseline regarding the average lower band on individual privacy
and information gain for privacy evaluation. In addition, our approach can
obtain various Pareto optimal-based allocations with users' preferences on the
maximum number of untrusted nodes, budget, and time threshold. Our solutions
often dominate the baseline's solution and are superior on a tight budget.
Specifically, our approach has been ahead of baseline, up to 85.2% and 6.8% in
terms of the total financial and time costs, respectively
A Survey on Dataset Distillation: Approaches, Applications and Future Directions
Dataset distillation is attracting more attention in machine learning as
training sets continue to grow and the cost of training state-of-the-art models
becomes increasingly high. By synthesizing datasets with high information
density, dataset distillation offers a range of potential applications,
including support for continual learning, neural architecture search, and
privacy protection. Despite recent advances, we lack a holistic understanding
of the approaches and applications. Our survey aims to bridge this gap by first
proposing a taxonomy of dataset distillation, characterizing existing
approaches, and then systematically reviewing the data modalities, and related
applications. In addition, we summarize the challenges and discuss future
directions for this field of research
Federating Medical Deep Learning Models from Private Jupyter Notebooks to Distributed Institutions
[EN] Deep learning-based algorithms have led to tremendous progress over the last years, but they face a bottleneck as their optimal development highly relies on access to large datasets. To mitigate this limitation, cross-silo federated learning has emerged as a way to train collaborative models among multiple institutions without having to share the raw data used for model training. However, although artificial intelligence experts have the expertise to develop state-of-the-art models and actively share their code through notebook environments, implementing a federated learning system in real-world applications entails significant engineering and deployment efforts. To reduce the complexity of federation setups and bridge the gap between federated learning and notebook users, this paper introduces a solution that leverages the Jupyter environment as part of the federated learning pipeline and simplifies its automation, the Notebook Federator. The feasibility of this approach is then demonstrated with a collaborative model solving a digital pathology image analysis task in which the federated model reaches an accuracy of 0.8633 on the test set, as compared to the centralized configurations for each institution obtaining 0.7881, 0.6514, and 0.8096, respectively. As a fast and reproducible tool, the proposed solution enables the deployment of a cross-country federated environment in only a few minutes.This work has been partially funded by the European Union s Horizon 2020 research and innovation programme with the project CLARIFY under Marie Sklodowska-Curie (860627), ENVRI-FAIR (824068), BlueCloud (862409), and ARTICONF (825134). This work is also supported by LifeWatch ERIC, GVA through projects PROMETEO/2019/109 and INNEST/2021/321 (SAMUEL), and the Spanish Ministry of Economy and Competitiveness through project PID2019-105142RB-C21 (AI4SKIN). The work of Adrián Colomer has been supported by the ValgrAI Valencian Graduate School and Research Network for Artificial Intelligence & Generalitat Valenciana and Universitat Politècnica de València (PAID-PD-22).Launet, LM.; Wang, Y.; Colomer, A.; Igual García, J.; Pulgarín-Ospina, CC.; Koulouzis, S.; Bianchi, R.... (2023). Federating Medical Deep Learning Models from Private Jupyter Notebooks to Distributed Institutions. Applied Sciences. 13(2). https://doi.org/10.3390/app1302091913
Effect of cognitive-behavioral therapy with music therapy in reducing physics test anxiety among students as measured by generalized test anxiety scale
Abstract: Background: The study determined the effect of cognitive-behavioral therapy (CBT) with music in reducing physics test anxiety among secondary school students as measured by generalized test anxiety scale. Methods:Pre-test post-test randomized control trial experimental design was adopted in this study. A total of 83 senior secondary students including male (n=46) and female (n=37) from sampled secondary schools in Enugu State, Nigeria, who met the inclusion criteria constituted participants for the study. A demographic questionnaire and a 48-item generalized test anxiety scale were used for data collection for the study. Subjects were randomized into treatment and control groups. The treatment group was exposed to a 12-week CBT-music program. Thereafter, the participants in the treatment group were evaluated at 3 time points. Data collected were analyzed using repeated measures analysis of variance. Results: The participants who were exposed to CBT-music intervention program significantly had lower test anxiety scores at the post-treatment than the participants in the control group. Furthermore, the test anxiety scores of the participants in the CBT-music group were significantly lower than those in the control group at the follow-up measure. Thus, the results showed a significant effect of CBT with music in reducing physics test anxiety among secondary school students. Conclusion:We concluded that CBT-music program has a significant benefit in improving the management of physics test anxiety among secondary school students. Abbreviations: DR2 = adjusted R2, CBT = cognitive-behavioral therapy, CBT-music = CBT-based music group, CI = confidence interval, GTAI = Generalized Test Anxiety Inventory
Workflows Community Summit 2024:Future Trends and Challenges in Scientific Workflows
The Workflows Community Summit gathered 111 participants from 18 countries to discuss emerging trends and challenges in scientific workflows, focusing on six key areas: time-sensitive workflows, AI-HPC convergence, multi-facility workflows, heterogeneous HPC environments, user experience, and FAIR computational workflows. The integration of AI and exascale computing has revolutionized scientific workflows, enabling higher-fidelity models and complex, time-sensitive processes, while introducing challenges in managing heterogeneous environments and multi-facility data dependencies. The rise of large language models is driving computational demands to zettaflop scales, necessitating modular, adaptable systems and cloud-service models to optimize resource utilization and ensure reproducibility. Multi-facility workflows present challenges in data movement, curation, and overcoming institutional silos, while diverse hardware architectures require integrating workflow considerations into early system design and developing standardized resource management tools. The summit emphasized improving user experience in workflow systems and ensuring FAIR workflows to enhance collaboration and accelerate scientific discovery. Key recommendations include developing standardized metrics for time-sensitive workflows, creating frameworks for cloud-HPC integration, implementing distributed-by-design workflow modeling, establishing multi-facility authentication protocols, and accelerating AI integration in HPC workflow management. The summit also called for comprehensive workflow benchmarks, workflow-specific UX principles, and a FAIR workflow maturity model, highlighting the need for continued collaboration in addressing the complex challenges posed by the convergence of AI, HPC, and multi-facility research environments
