260 research outputs found
Unified Segment-to-Segment Framework for Simultaneous Sequence Generation
Simultaneous sequence generation is a pivotal task for real-time scenarios,
such as streaming speech recognition, simultaneous machine translation and
simultaneous speech translation, where the target sequence is generated while
receiving the source sequence. The crux of achieving high-quality generation
with low latency lies in identifying the optimal moments for generating,
accomplished by learning a mapping between the source and target sequences.
However, existing methods often rely on task-specific heuristics for different
sequence types, limiting the model's capacity to adaptively learn the
source-target mapping and hindering the exploration of multi-task learning for
various simultaneous tasks. In this paper, we propose a unified
segment-to-segment framework (Seg2Seg) for simultaneous sequence generation,
which learns the mapping in an adaptive and unified manner. During the process
of simultaneous generation, the model alternates between waiting for a source
segment and generating a target segment, making the segment serve as the
natural bridge between the source and target. To accomplish this, Seg2Seg
introduces a latent segment as the pivot between source to target and explores
all potential source-target mappings via the proposed expectation training,
thereby learning the optimal moments for generating. Experiments on multiple
simultaneous generation tasks demonstrate that Seg2Seg achieves
state-of-the-art performance and exhibits better generality across various
tasks.Comment: Accepted at NeurIPS 202
Robust Bandit Learning with Imperfect Context
A standard assumption in contextual multi-arm bandit is that the true context
is perfectly known before arm selection. Nonetheless, in many practical
applications (e.g., cloud resource management), prior to arm selection, the
context information can only be acquired by prediction subject to errors or
adversarial modification. In this paper, we study a contextual bandit setting
in which only imperfect context is available for arm selection while the true
context is revealed at the end of each round. We propose two robust arm
selection algorithms: MaxMinUCB (Maximize Minimum UCB) which maximizes the
worst-case reward, and MinWD (Minimize Worst-case Degradation) which minimizes
the worst-case regret. Importantly, we analyze the robustness of MaxMinUCB and
MinWD by deriving both regret and reward bounds compared to an oracle that
knows the true context. Our results show that as time goes on, MaxMinUCB and
MinWD both perform as asymptotically well as their optimal counterparts that
know the reward function. Finally, we apply MaxMinUCB and MinWD to online edge
datacenter selection, and run synthetic simulations to validate our theoretical
analysis
End-to-End Simultaneous Speech Translation with Differentiable Segmentation
End-to-end simultaneous speech translation (SimulST) outputs translation
while receiving the streaming speech inputs (a.k.a. streaming speech
translation), and hence needs to segment the speech inputs and then translate
based on the current received speech. However, segmenting the speech inputs at
unfavorable moments can disrupt the acoustic integrity and adversely affect the
performance of the translation model. Therefore, learning to segment the speech
inputs at those moments that are beneficial for the translation model to
produce high-quality translation is the key to SimulST. Existing SimulST
methods, either using the fixed-length segmentation or external segmentation
model, always separate segmentation from the underlying translation model,
where the gap results in segmentation outcomes that are not necessarily
beneficial for the translation process. In this paper, we propose
Differentiable Segmentation (DiSeg) for SimulST to directly learn segmentation
from the underlying translation model. DiSeg turns hard segmentation into
differentiable through the proposed expectation training, enabling it to be
jointly trained with the translation model and thereby learn
translation-beneficial segmentation. Experimental results demonstrate that
DiSeg achieves state-of-the-art performance and exhibits superior segmentation
capability.Comment: Accepted at ACL 2023 finding
Rural Land Property Right System of China: Defects and Solutions
The innovations of the rural land property right system have the important meaning to Chinese agricultural and rural development. At the present stage, the rural land property right system of China have such problems as the unclear rural land property right subject, the incomplete rural land property right object, the uneven urban-rural land development right as well as the imperfect land property right management system. In the next stage of the system reform process of China, the innovation problem of the rural land property right system should be fully emphasized, and the related measures should be actively taken to perfect the rural land property right system, including clarifying the rural land property right subject, propelling the real right tendency of the rural land contractual management right, setting up the urban-rural unified market of land for construction, along with deepening carrying out the work of confirming the rural land property right and issuing the property right certificates
Learning for Edge-Weighted Online Bipartite Matching with Robustness Guarantees
Many problems, such as online ad display, can be formulated as online
bipartite matching. The crucial challenge lies in the nature of
sequentially-revealed online item information, based on which we make
irreversible matching decisions at each step. While numerous expert online
algorithms have been proposed with bounded worst-case competitive ratios, they
may not offer satisfactory performance in average cases. On the other hand,
reinforcement learning (RL) has been applied to improve the average
performance, but it lacks robustness and can perform arbitrarily poorly. In
this paper, we propose a novel RL-based approach to edge-weighted online
bipartite matching with robustness guarantees (LOMAR), achieving both good
average-case and worst-case performance. The key novelty of LOMAR is a new
online switching operation which, based on a judicious condition to hedge
against future uncertainties, decides whether to follow the expert's decision
or the RL decision for each online item. We prove that for any ,
LOMAR is -competitive against any given expert online algorithm. To
improve the average performance, we train the RL policy by explicitly
considering the online switching operation. Finally, we run empirical
experiments to demonstrate the advantages of LOMAR compared to existing
baselines. Our code is available at: https://github.com/Ren-Research/LOMARComment: Accepted by ICML 202
Glancing Future for Simultaneous Machine Translation
Simultaneous machine translation (SiMT) outputs translation while reading the
source sentence. Unlike conventional sequence-to-sequence (seq2seq) training,
existing SiMT methods adopt the prefix-to-prefix (prefix2prefix) training,
where the model predicts target tokens based on partial source tokens. However,
the prefix2prefix training diminishes the ability of the model to capture
global information and introduces forced predictions due to the absence of
essential source information. Consequently, it is crucial to bridge the gap
between the prefix2prefix training and seq2seq training to enhance the
translation capability of the SiMT model. In this paper, we propose a novel
method that glances future in curriculum learning to achieve the transition
from the seq2seq training to prefix2prefix training. Specifically, we gradually
reduce the available source information from the whole sentence to the prefix
corresponding to that latency. Our method is applicable to a wide range of SiMT
methods and experiments demonstrate that our method outperforms strong
baselines.Comment: 5 pages, 4 figure, Submitted to ICASSP 202
Decoder-only Streaming Transformer for Simultaneous Translation
Simultaneous Machine Translation (SiMT) generates translation while reading
source tokens, essentially producing the target prefix based on the source
prefix. To achieve good performance, it leverages the relationship between
source and target prefixes to exact a policy to guide the generation of
translations. Although existing SiMT methods primarily focus on the
Encoder-Decoder architecture, we explore the potential of Decoder-only
architecture, owing to its superior performance in various tasks and its
inherent compatibility with SiMT. However, directly applying the Decoder-only
architecture to SiMT poses challenges in terms of training and inference. To
alleviate the above problems, we propose the first Decoder-only SiMT model,
named Decoder-only Streaming Transformer (DST). Specifically, DST separately
encodes the positions of the source and target prefixes, ensuring that the
position of the target prefix remains unaffected by the expansion of the source
prefix. Furthermore, we propose a Streaming Self-Attention (SSA) mechanism
tailored for the Decoder-only architecture. It is capable of obtaining
translation policy by assessing the sufficiency of input source information and
integrating with the soft-attention mechanism to generate translations.
Experiments demonstrate that our approach achieves state-of-the-art performance
on three translation tasks.Comment: Accepted to ACL 2024. 14 pages, 10 Tables, 5 Figure
- …
