43 research outputs found
The influence of audio length on the performance of Swiss-German speech translation models
Speech Translation models designed to convert spoken Swiss-German to written German have been in existence for some time. While these models generally perform well, their performance in various scenarios remains poorly understood. In this thesis, we explore the influence of audio length on the performance of Swiss-German speech translation models and identify the necessary factors for achieving better performance on longer audio segments. To achieve this, we examined four speech translation models from different institutions. A model from the Zurich University of Applied Sciences (ZHAW), one from the University of Applied Sciences Northwestern Switzerland (FHNW), a model from Microsoft, as well as a model from OpenAI called Whisper. We conducted eight different experiments using a Swiss-German corpus collected by the ZHAW and FHNW. In the experiments, the audio length was augmented in various ways. From there, we found that while the ZHAW, FHNW and Microsoft models showed a tendency to perform worse on longer duration, extending the duration by adding silence did not influence on the performance. Changing the playback speed has a negative influence on the ZHAW, Microsoft and Whisper models, both when speeding segments up or slowing them down. The FHNW model exhibited extraordinary robustness to changes in playback speed, as the results when accelerated by a factor of 1.25 were nearly identical to the results when the playback speed was not altered. The biggest influence on performance was when adding more than one sentence to a segment. Without a segmentation of the input audio the ZHAW, FHNW and Microsoft models performed badly, indicating that segmentation should be introduced as soon as more than one sentence appears in an audio recording. Training a model specifically on multi-sentence segments showed promising results, on single sentence segments and multi-sentence segments as well as in scenarios where sentences are split while segmenting the audio recordings. Comparing a sentence-based segmentation, which is considered ideal for models trained on single sentence segments, to a fixed-window segmentation with an overlap showed an almost identical result. Examining the models on a real-life recording showed that the ZHAW (lowercase) and ZHAW (multisentence) models perform considerably worse than the FHNW, Microsoft and Whisper models. Indicating that more investigation is required to fully understand what makes a speech translation model work well in real-life scenarios
Messdatenauswertung zum Einsatz eines fasreoptischen Kreisels fuer Polarenmessung im Windkanal.
Kurzbericht ueber Untersuchungen zum Einsatz eines Faserkreisels fuer Polarmessungen im Windkanal.
Comparison of self-mated hardmetal coatings under dry sliding conditions up to 600 degrees C
Hardmetal coatings prepared by high velocity oxy-fuel (HVOF) spraying represent an advanced solution for surface protection against wear. In the current systematic study the high-temperature oxidation and unidirectional sliding wear in dry and lubricated conditions were studied. Results for a series of experiments on self-mated pairs in dry conditions as part of that work are described in this paper. Coatings with nominal compositions WC-10%Co4%Cr, WC-(W,Cr)(2)C-7%Ni, Cr3C2-25%NiCr, (Ti,Mo)(C,N)-29%Ni and (Ti,Mo)(C,N)-29%Co were prepared with an ethylene-fuel led DJH 2700 HVOF spray gun. Electrolytic hard chromium (EHC) coatings and bulk (Ti,Mo)(C,N)-15%NiMo (TM10) hardmetal specimens were studied for comparison. The wear behaviour was investigated at room temperature, 400 and 600 degrees C. For the coatings sliding speeds were varied in the range 0.1-1m/s for a wear distance of 5000 m and a normal force of 10 N. In some cases the WC- and (Ti,Mo)(C,N)-based coatings showed total wear rates (sum of wear rates of the rotating and stationary samples) of less than 10(-6) mm(3)/Nm, i.e., comparable to values typically measured under mixed/boundary conditions. Coefficients of friction above 0.4 were found for all test conditions. The P x V values as an engineering parameter for coating application are discussed. The microstructures and the sliding wear behaviour of the (Ti,Mo)(C, N)-based coatings and the (Ti,Mo)(C,N)-15%NiMo hardmetal are compared
The influence of audio length on the performance of Swiss-German speech translation models
Speech Translation models designed to convert spoken Swiss-German to written German have been in existence for some time. While these models generally perform well, their performance in various scenarios remains poorly understood. In this thesis, we explore the influence of audio length on the performance of Swiss-German speech translation models and identify the necessary factors for achieving better performance on longer audio segments. To achieve this, we examined four speech translation models from different institutions. A model from the Zurich University of Applied Sciences (ZHAW), one from the University of Applied Sciences Northwestern Switzerland (FHNW), a model from Microsoft, as well as a model from OpenAI called Whisper. We conducted eight different experiments using a Swiss-German corpus collected by the ZHAW and FHNW. In the experiments, the audio length was augmented in various ways. From there, we found that while the ZHAW, FHNW and Microsoft models showed a tendency to perform worse on longer duration, extending the duration by adding silence did not influence on the performance. Changing the playback speed has a negative influence on the ZHAW, Microsoft and Whisper models, both when speeding segments up or slowing them down. The FHNW model exhibited extraordinary robustness to changes in playback speed, as the results when accelerated by a factor of 1.25 were nearly identical to the results when the playback speed was not altered. The biggest influence on performance was when adding more than one sentence to a segment. Without a segmentation of the input audio the ZHAW, FHNW and Microsoft models performed badly, indicating that segmentation should be introduced as soon as more than one sentence appears in an audio recording. Training a model specifically on multi-sentence segments showed promising results, on single sentence segments and multi-sentence segments as well as in scenarios where sentences are split while segmenting the audio recordings. Comparing a sentence-based segmentation, which is considered ideal for models trained on single sentence segments, to a fixed-window segmentation with an overlap showed an almost identical result. Examining the models on a real-life recording showed that the ZHAW (lowercase) and ZHAW (multisentence) models perform considerably worse than the FHNW, Microsoft and Whisper models. Indicating that more investigation is required to fully understand what makes a speech translation model work well in real-life scenarios
