527 research outputs found

    Market pressure or regulatory pressure? U.S. small bank pre-emptive IT investment to data privacy regulation

    Get PDF
    We assess small banks’ responses to announcements of state-level proposals of Privacy Protection Acts (PPAs). Employing a Difference-in-Differences framework, we uncover the proactive actions taken by U.S. small banks in anticipation of these proposals. Our findings reveal that the announcement of PPA proposals leads to a 35.46% increase in IT investment by U.S. small banks, primarily driven by market pressure, with regulatory pressure playing a more limited role. Particularly, evidence suggests that banks with greater competitive threats from their rivals are motivated to enhance their IT investments due to market pressures. However, our research also finds that this surge in IT investment does not immediately translate into benefits for small banks

    Masked and Permuted Implicit Context Learning for Scene Text Recognition

    Full text link
    Scene Text Recognition (STR) is difficult because of the variations in text styles, shapes, and backgrounds. Though the integration of linguistic information enhances models' performance, existing methods based on either permuted language modeling (PLM) or masked language modeling (MLM) have their pitfalls. PLM's autoregressive decoding lacks foresight into subsequent characters, while MLM overlooks inter-character dependencies. Addressing these problems, we propose a masked and permuted implicit context learning network for STR, which unifies PLM and MLM within a single decoder, inheriting the advantages of both approaches. We utilize the training procedure of PLM, and to integrate MLM, we incorporate word length information into the decoding process and replace the undetermined characters with mask tokens. Besides, perturbation training is employed to train a more robust model against potential length prediction errors. Our empirical evaluations demonstrate the performance of our model. It not only achieves superior performance on the common benchmarks but also achieves a substantial improvement of 9.1%9.1\% on the more challenging Union14M-Benchmark

    Model Will Tell: Training Membership Inference for Diffusion Models

    Full text link
    Diffusion models pose risks of privacy breaches and copyright disputes, primarily stemming from the potential utilization of unauthorized data during the training phase. The Training Membership Inference (TMI) task aims to determine whether a specific sample has been used in the training process of a target model, representing a critical tool for privacy violation verification. However, the increased stochasticity inherent in diffusion renders traditional shadow-model-based or metric-based methods ineffective when applied to diffusion models. Moreover, existing methods only yield binary classification labels which lack necessary comprehensibility in practical applications. In this paper, we explore a novel perspective for the TMI task by leveraging the intrinsic generative priors within the diffusion model. Compared with unseen samples, training samples exhibit stronger generative priors within the diffusion model, enabling the successful reconstruction of substantially degraded training images. Consequently, we propose the Degrade Restore Compare (DRC) framework. In this framework, an image undergoes sequential degradation and restoration, and its membership is determined by comparing it with the restored counterpart. Experimental results verify that our approach not only significantly outperforms existing methods in terms of accuracy but also provides comprehensible decision criteria, offering evidence for potential privacy violations.Comment: 18 pages, 6 figures, 7 table

    OPT: One-shot Pose-Controllable Talking Head Generation

    Full text link
    One-shot talking head generation produces lip-sync talking heads based on arbitrary audio and one source face. To guarantee the naturalness and realness, recent methods propose to achieve free pose control instead of simply editing mouth areas. However, existing methods do not preserve accurate identity of source face when generating head motions. To solve the identity mismatch problem and achieve high-quality free pose control, we present One-shot Pose-controllable Talking head generation network (OPT). Specifically, the Audio Feature Disentanglement Module separates content features from audios, eliminating the influence of speaker-specific information contained in arbitrary driving audios. Later, the mouth expression feature is extracted from the content feature and source face, during which the landmark loss is designed to enhance the accuracy of facial structure and identity preserving quality. Finally, to achieve free pose control, controllable head pose features from reference videos are fed into the Video Generator along with the expression feature and source face to generate new talking heads. Extensive quantitative and qualitative experimental results verify that OPT generates high-quality pose-controllable talking heads with no identity mismatch problem, outperforming previous SOTA methods.Comment: Accepted by ICASSP202

    FONT: Flow-guided One-shot Talking Head Generation with Natural Head Motions

    Full text link
    One-shot talking head generation has received growing attention in recent years, with various creative and practical applications. An ideal natural and vivid generated talking head video should contain natural head pose changes. However, it is challenging to map head pose sequences from driving audio since there exists a natural gap between audio-visual modalities. In this work, we propose a Flow-guided One-shot model that achieves NaTural head motions(FONT) over generated talking heads. Specifically, the head pose prediction module is designed to generate head pose sequences from the source face and driving audio. We add the random sampling operation and the structural similarity constraint to model the diversity in the one-to-many mapping between audio-visual modality, thus predicting natural head poses. Then we develop a keypoint predictor that produces unsupervised keypoints from the source face, driving audio and pose sequences to describe the facial structure information. Finally, a flow-guided occlusion-aware generator is employed to produce photo-realistic talking head videos from the estimated keypoints and source face. Extensive experimental results prove that FONT generates talking heads with natural head poses and synchronized mouth shapes, outperforming other compared methods.Comment: Accepted by ICME202
    corecore