56 research outputs found
Towards Disaggregation-Native Data Streaming between Devices
Disaggregation is an ongoing trend to increase flexibility in datacenters.
With interconnect technologies like CXL, pools of CPUs, accelerators, and
memory can be connected via a datacenter fabric. Applications can then pick
from those pools the resources necessary for their specific workload. However,
this vision becomes less clear when we consider data movement. Workloads often
require data to be streamed through chains of multiple devices, but typically,
these data streams physically do not directly flow device-to-device, but are
staged in memory by a CPU hosting device protocol logic. We show that
augmenting devices with a disaggregation-native and device-independent data
streaming facility can improve processing latencies by enabling data flows
directly between arbitrary devices.Comment: Presented at the 3rd Workshop on Heterogeneous Composable and
Disaggregated Systems (HCDS 2024
Slice-Level Trading of Quality and Performance in Decoding H.264 Video: Slice-basiertes Abwägen zwischen Qualität und Leistung beim Dekodieren von H.264-Video
When a demanding video decoding task requires more CPU resources then available, playback degrades ungracefully today: The decoder skips frames selected arbitrarily or by simple heuristics, which is noticed by the viewer as jerky motion in the good case or as images completely breaking up in the bad case. The latter can happen due to missing reference frames. This thesis provides a way to schedule individual decoding tasks based on a cost for performance trade. Therefore, I will present a way to preprocess a video, generating estimates for the cost in terms of execution time and the performance in terms of perceived visual quality. The granularity of the scheduling decision is a single slice, which leads to a much more fine-grained approach than dealing with entire frames. Together with an actual scheduler implementation that uses the generated estimates, this work allows for higher perceived quality video playback in case of CPU overload.Wenn eine anspruchsvolle Video-Dekodierung mehr Prozessor-Ressourcen benötigt, als verfügbar sind, dann verschlechtert sich die Abspielqualität mit aktuellen Methoden drastisch: Willkürlich oder mit einfachen Heuristiken ausgewählten Bilder werden nicht dekodiert.
Diese Auslassung nimmt der Betrachter im günstigsten Fall nur als ruckelnde Bewegung wahr, im ungünstigen Fall jedoch als komplettes Zusammenbrechen nachfolgender Bilder durch Folgefehler im Dekodierprozess. Meine Arbeit ermöglicht es, einzelne Teilaufgaben des Dekodierprozesses anhand einer Kosten-Nutzen-Analyse einzuplanen.
Dafür ermittle ich die Kosten im Sinne von Rechenzeitbedarf und den Nutzen im Sinne von visueller Qualität für einzelne Slices eines H.264 Videos. Zusammen mit einer Implementierung eines Schedulers, der diese Werte nutzt, erlaubt meine Arbeit höhere vom Betrachter wahrgenommene Videoqualität bei knapper Prozessorzeit
Practical Real-Time with Look-Ahead Scheduling
In my dissertation, I present ATLAS — the Auto-Training Look-Ahead Scheduler. ATLAS improves service to applications with regard to two non-functional properties: timeliness and overload detection. Timeliness is an important requirement to ensure user interface responsiveness and the smoothness of multimedia operations. Overload can occur when applications ask for more computation time than the machine can offer. Interactive systems have to handle overload situations dynamically at runtime. ATLAS provides timely service to applications, accessible through an easy-to-use interface. Deadlines specify timing requirements, workload metrics describe jobs. ATLAS employs machine learning to predict job execution times. Deadline misses are detected before they occur, so applications can react early.:1 Introduction
2 Anatomy of a Desktop Application
3 Real Simple Real-Time
4 Execution Time Prediction
5 System Scheduler
6 Timely Service
7 The Road Ahead
Bibliography
Inde
CoRD: Converged RDMA Dataplane for High-Performance Clouds
High-performance networking is often characterized by kernel bypass which is
considered mandatory in high-performance parallel and distributed applications.
But kernel bypass comes at a price because it breaks the traditional OS
architecture, requiring applications to use special APIs and limiting the OS
control over existing network connections. We make the case, that kernel bypass
is not mandatory. Rather, high-performance networking relies on multiple
performance-improving techniques, with kernel bypass being the least effective.
CoRD removes kernel bypass from RDMA networks, enabling efficient OS-level
control over RDMA dataplane.Comment: 11 page
Probabilistic Analysis of Low-Criticality Execution
The mixed-criticality toolbox promises system
architects a powerful framework for consolidating real-time
tasks with different safety properties on a single computing
platform. Thanks to the research efforts in the mixed-criticality
field, guarantees provided to the highest criticality level are well
understood. However, lower-criticality job execution depends
on the condition that all high-criticality jobs complete within
their more optimistic low-criticality execution time bounds.
Otherwise, no guarantees are made. In this paper, we add to the
mixed-criticality toolbox by providing a probabilistic analysis
method for low-criticality tasks. While deterministic models
reduce task behavior to constant numbers, probabilistic analysis
captures varying runtime behavior. We introduce a novel
algorithmic approach for probabilistic timing analysis, which we
call symbolic scheduling. For restricted task sets, we also present
an analytical solution. We use this method to calculate per-job
success probabilities for low-criticality tasks, in order to quantify,
how low-criticality tasks behave in case of high-criticality jobs
overrunning their optimistic low-criticality reservation
Practical Real-Time with Look-Ahead Scheduling
In my dissertation, I present ATLAS — the Auto-Training Look-Ahead Scheduler. ATLAS improves service to applications with regard to two non-functional properties: timeliness and overload detection. Timeliness is an important requirement to ensure user interface responsiveness and the smoothness of multimedia operations. Overload can occur when applications ask for more computation time than the machine can offer. Interactive systems have to handle overload situations dynamically at runtime. ATLAS provides timely service to applications, accessible through an easy-to-use interface. Deadlines specify timing requirements, workload metrics describe jobs. ATLAS employs machine learning to predict job execution times. Deadline misses are detected before they occur, so applications can react early.:1 Introduction
2 Anatomy of a Desktop Application
3 Real Simple Real-Time
4 Execution Time Prediction
5 System Scheduler
6 Timely Service
7 The Road Ahead
Bibliography
Inde
Slice-Level Trading of Quality and Performance in Decoding H.264 Video: Slice-basiertes Abwägen zwischen Qualität und Leistung beim Dekodieren von H.264-Video
When a demanding video decoding task requires more CPU resources then available, playback degrades ungracefully today: The decoder skips frames selected arbitrarily or by simple heuristics, which is noticed by the viewer as jerky motion in the good case or as images completely breaking up in the bad case. The latter can happen due to missing reference frames. This thesis provides a way to schedule individual decoding tasks based on a cost for performance trade. Therefore, I will present a way to preprocess a video, generating estimates for the cost in terms of execution time and the performance in terms of perceived visual quality. The granularity of the scheduling decision is a single slice, which leads to a much more fine-grained approach than dealing with entire frames. Together with an actual scheduler implementation that uses the generated estimates, this work allows for higher perceived quality video playback in case of CPU overload.Wenn eine anspruchsvolle Video-Dekodierung mehr Prozessor-Ressourcen benötigt, als verfügbar sind, dann verschlechtert sich die Abspielqualität mit aktuellen Methoden drastisch: Willkürlich oder mit einfachen Heuristiken ausgewählten Bilder werden nicht dekodiert.
Diese Auslassung nimmt der Betrachter im günstigsten Fall nur als ruckelnde Bewegung wahr, im ungünstigen Fall jedoch als komplettes Zusammenbrechen nachfolgender Bilder durch Folgefehler im Dekodierprozess. Meine Arbeit ermöglicht es, einzelne Teilaufgaben des Dekodierprozesses anhand einer Kosten-Nutzen-Analyse einzuplanen.
Dafür ermittle ich die Kosten im Sinne von Rechenzeitbedarf und den Nutzen im Sinne von visueller Qualität für einzelne Slices eines H.264 Videos. Zusammen mit einer Implementierung eines Schedulers, der diese Werte nutzt, erlaubt meine Arbeit höhere vom Betrachter wahrgenommene Videoqualität bei knapper Prozessorzeit
Practical Real-Time with Look-Ahead Scheduling
In my dissertation, I present ATLAS — the Auto-Training Look-Ahead Scheduler. ATLAS improves service to applications with regard to two non-functional properties: timeliness and overload detection. Timeliness is an important requirement to ensure user interface responsiveness and the smoothness of multimedia operations. Overload can occur when applications ask for more computation time than the machine can offer. Interactive systems have to handle overload situations dynamically at runtime. ATLAS provides timely service to applications, accessible through an easy-to-use interface. Deadlines specify timing requirements, workload metrics describe jobs. ATLAS employs machine learning to predict job execution times. Deadline misses are detected before they occur, so applications can react early.:1 Introduction
2 Anatomy of a Desktop Application
3 Real Simple Real-Time
4 Execution Time Prediction
5 System Scheduler
6 Timely Service
7 The Road Ahead
Bibliography
Inde
Slice-Level Trading of Quality and Performance in Decoding H.264 Video Slice-basiertes Abwägen zwischen Qualität und Leistung beim Dekodieren von H.264-Video
When a demanding video decoding task requires more CPU resources then available, playback degrades ungracefully today: The decoder skips frames selected arbitrarily or by simple heuristics, which is noticed by the viewer as jerky motion in the good case or as images completely breaking up in the bad case. The latter can happen due to missing reference frames. This thesis provides a way to schedule individual decoding tasks based on a cost for performance trade. Therefore, I will present a way to preprocess a video, generating estimates for the cost in terms of execution time and the performance in terms of perceived visual quality. The granularity of the scheduling decision is a single slice, which leads to a much more fine-grained approach than dealing with entire frames. Together with an actual scheduler implementation that uses the generated estimates, this work allows for higher perceived quality video playback in case of CPU overload.Wenn eine anspruchsvolle Video-Dekodierung mehr Prozessor-Ressourcen benötigt, als verfügbar sind, dann verschlechtert sich die Abspielqualität mit aktuellen Methoden drastisch: Willkürlich oder mit einfachen Heuristiken ausgewählten Bilder werden nicht dekodiert.
Diese Auslassung nimmt der Betrachter im günstigsten Fall nur als ruckelnde Bewegung wahr, im ungünstigen Fall jedoch als komplettes Zusammenbrechen nachfolgender Bilder durch Folgefehler im Dekodierprozess. Meine Arbeit ermöglicht es, einzelne Teilaufgaben des Dekodierprozesses anhand einer Kosten-Nutzen-Analyse einzuplanen.
Dafür ermittle ich die Kosten im Sinne von Rechenzeitbedarf und den Nutzen im Sinne von visueller Qualität für einzelne Slices eines H.264 Videos. Zusammen mit einer Implementierung eines Schedulers, der diese Werte nutzt, erlaubt meine Arbeit höhere vom Betrachter wahrgenommene Videoqualität bei knapper Prozessorzeit
- …
