176 research outputs found
Choose What You Need: Disentangled Representation Learning for Scene Text Recognition, Removal and Editing
Scene text images contain not only style information (font, background) but
also content information (character, texture). Different scene text tasks need
different information, but previous representation learning methods use tightly
coupled features for all tasks, resulting in sub-optimal performance. We
propose a Disentangled Representation Learning framework (DARLING) aimed at
disentangling these two types of features for improved adaptability in better
addressing various downstream tasks (choose what you really need).
Specifically, we synthesize a dataset of image pairs with identical style but
different content. Based on the dataset, we decouple the two types of features
by the supervision design. Clearly, we directly split the visual representation
into style and content features, the content features are supervised by a text
recognition loss, while an alignment loss aligns the style features in the
image pairs. Then, style features are employed in reconstructing the
counterpart image via an image decoder with a prompt that indicates the
counterpart's content. Such an operation effectively decouples the features
based on their distinctive properties. To the best of our knowledge, this is
the first time in the field of scene text that disentangles the inherent
properties of the text images. Our method achieves state-of-the-art performance
in Scene Text Recognition, Removal, and Editing.Comment: Accepted to CVPR 202
How Control Information Influences Multilingual Text Image Generation and Editing?
Visual text generation has significantly advanced through diffusion models
aimed at producing images with readable and realistic text. Recent works
primarily use a ControlNet-based framework, employing standard font text images
to control diffusion models. Recognizing the critical role of control
information in generating high-quality text, we investigate its influence from
three perspectives: input encoding, role at different stages, and output
features. Our findings reveal that: 1) Input control information has unique
characteristics compared to conventional inputs like Canny edges and depth
maps. 2) Control information plays distinct roles at different stages of the
denoising process. 3) Output control features significantly differ from the
base and skip features of the U-Net decoder in the frequency domain. Based on
these insights, we propose TextGen, a novel framework designed to enhance
generation quality by optimizing control information. We improve input and
output features using Fourier analysis to emphasize relevant information and
reduce noise. Additionally, we employ a two-stage generation framework to align
the different roles of control information at different stages. Furthermore, we
introduce an effective and lightweight dataset for training. Our method
achieves state-of-the-art performance in both Chinese and English text
generation. The code and dataset available at
https://github.com/CyrilSterling/TextGen
Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition
In text recognition, self-supervised pre-training emerges as a good solution
to reduce dependence on expansive annotated real data. Previous studies
primarily focus on local visual representation by leveraging mask image
modeling or sequence contrastive learning. However, they omit modeling the
linguistic information in text images, which is crucial for recognizing text.
To simultaneously capture local character features and linguistic information
in visual space, we propose Symmetric Superimposition Modeling (SSM). The
objective of SSM is to reconstruct the direction-specific pixel and feature
signals from the symmetrically superimposed input. Specifically, we add the
original image with its inverted views to create the symmetrically superimposed
inputs. At the pixel level, we reconstruct the original and inverted images to
capture character shapes and texture-level linguistic context. At the feature
level, we reconstruct the feature of the same original image and inverted image
with different augmentations to model the semantic-level linguistic context and
the local character discrimination. In our design, we disrupt the character
shape and linguistic rules. Consequently, the dual-level reconstruction
facilitates understanding character shapes and linguistic information from the
perspective of visual texture and feature semantics. Experiments on various
text recognition benchmarks demonstrate the effectiveness and generality of
SSM, with 4.1% average performance gains and 86.6% new state-of-the-art average
word accuracy on Union14M benchmarks. The code is available at
https://github.com/FaltingsA/SSM.Comment: Accepted to IJCAI202
On-chip Q-factor greater than 1 billion
A record Q-factor of 1.1 billion is demonstrated in on-chip silica whispering-gallery resonators. Using the devices, sub-milliwatt parametric oscillation threshold is measured in 9 GHz free-spectral-range devices
On-chip Q-factor greater than 1 billion
A record Q-factor of 1.1 billion is demonstrated in on-chip silica whispering-gallery resonators. Using the devices, sub-milliwatt parametric oscillation threshold is measured in 9 GHz free-spectral-range devices
Impact of spatio-temporal thermal decoherence on soliton microcombs in multimode microresonators
The phase noise of the soliton repetition rate is experimentally characterized in silica microresonators. In conjunction with dispersive wave quieting of pump technical noise, spatio-temporal fluctuations of distinct transverse modes set a limit to performance
Impact of spatio-temporal thermal decoherence on soliton microcombs in multimode microresonators
The phase noise of the soliton repetition rate is experimentally characterized in silica microresonators. In conjunction with dispersive wave quieting of pump technical noise, spatio-temporal fluctuations of distinct transverse modes set a limit to performance
An all-photonic, dynamic device for flattening the spectrum of a laser frequency comb for precise calibration of radial velocity measurements
Laser frequency combs are fast becoming critical to reaching the highest
radial velocity precisions. One shortcoming is the highly variable brightness
of the comb lines across the spectrum (up to 4-5 orders of magnitude). This can
result in some lines saturating while others are at low signal and lost in the
noise. Losing lines to either of these effects reduces the precision and hence
effectiveness of the comb. In addition, the brightness of the comb lines can
vary with time which could drive comb lines with initially reasonable SNR's
into the two regimes described above. To mitigate these two effects, laser
frequency combs use optical flattener's.
Flattener's are typically bulk optic setups that disperse the comb light with
a grating, and then use a spatial light modulator to control the amplitude
across the spectrum before recombining the light into another single mode fiber
and sending it to the spectrograph. These setups can be large (small bench
top), expensive (several hundred thousand dollars) and have limited stability.
To address these issues, we have developed an all-photonic spectrum flattener
on a chip. The device is constructed from optical waveguides on a SiN chip. The
light from the laser frequency comb's output optical fiber can be directly
connected to the chip, where the light is first dispersed using an arrayed
waveguide grating. To control the brightness of each channel, the light is
passed through a Mach-Zehnder interferometer before being recombined with a
second arrayed waveguide grating. Thermo-optic phase modulators are used in
each channel before recombination to path length match the channels as needed.
Here we present the results from our first generation prototype. The device
operates from 1400-1800 nm (covering the H band), with 20, 20 nm wide channels.Comment: 7 pages, 5 figures, conferenc
Flattening laser frequency comb spectra with a high dynamic range, broadband spectral shaper on-a-chip
Spectral shaping is critical to many fields of science. In astronomy for
example, the detection of exoplanets via the Doppler effect hinges on the
ability to calibrate a high resolution spectrograph. Laser frequency combs can
be used for this, but the wildly varying intensity across the spectrum can make
it impossible to optimally utilize the entire comb, leading to a reduced
overall precision of calibration. To circumvent this, astronomical applications
of laser frequency combs rely on a bulk optic setup which can flatten the
output spectrum before sending it to the spectrograph. Such flatteners require
complex and expensive optical elements like spatial light modulators and have
non-negligible bench top footprints. Here we present an alternative in the form
of an all-photonic spectral shaper that can be used to flatten the spectrum of
a laser frequency comb. The device consists of a circuit etched into a silicon
nitride wafer that supports an arrayed-waveguide grating to disperse the light
over hundreds of nanometers in wavelength, followed by Mach-Zehnder
interferometers to control the amplitude of each channel, thermo-optic phase
modulators to phase the channels and a second arrayed-waveguide grating to
recombine the spectrum. The demonstrator device operates from 1400 to 1800 nm
(covering the astronomical H band), with twenty 20 nm wide channels. The device
allows for nearly 40 dBs of dynamic modulation of the spectrum via the
Mach-Zehnders , which is greater than that offered by most spatial light
modulators. With a superluminescent diode, we reduced the static spectral
variation to ~3 dB, limited by the properties of the components used in the
circuit and on a laser frequency comb we managed to reduce the modulation to 5
dBs, sufficient for astronomical applications.Comment: 15 pages, 10 figures. arXiv admin note: substantial text overlap with
arXiv:2209.0945
Hertz-linewidth semiconductor lasers using CMOS-ready ultra-high- microresonators
Driven by narrow-linewidth bench-top lasers, coherent optical systems
spanning optical communications, metrology and sensing provide unrivalled
performance. To transfer these capabilities from the laboratory to the real
world, a key missing ingredient is a mass-produced integrated laser with
superior coherence. Here, we bridge conventional semiconductor lasers and
coherent optical systems using CMOS-foundry-fabricated microresonators with
record high factor over 260 million and finesse over 42,000. Five
orders-of-magnitude noise reduction in the pump laser is demonstrated, and for
the first time, fundamental noise below 1 Hz Hz is achieved in an
electrically-pumped integrated laser. Moreover, the same configuration is shown
to relieve dispersion requirements for microcomb generation that have
handicapped certain nonlinear platforms. The simultaneous realization of
record-high factor, highly coherent lasers and frequency combs using
foundry-based technologies paves the way for volume manufacturing of a wide
range of coherent optical systems.Comment: 19 pages, 11 figure
- …
