358 research outputs found

    Weighted Cache Location Problem with Identical Servers

    Get PDF
    This paper extends the well-known p-CLP with one server to p-CLP with m≥2 identical servers, denoted by (p,m)-CLP. We propose the closest server orienting protocol (CSOP), under which every client connects to the closest server to itself via a shortest route on given network. We abbreviate (p,m)-CLP under CSOP to (p,m)-CSOP CLP and investigate that (p,m)-CSOP CLP on a general network is equivalent to that on a forest and further to multiple CLPs on trees. The case of m=2 is the focus of this paper. We first devise an improved O(ph2+n)-time parallel exact algorithm for p-CLP on a tree and then present a parallel exact algorithm with at most O((4/9)p2n2) time in the worst case for (p,2)-CSOP CLP on a general network. Furthermore, we extend the idea of parallel algorithm to the cases of m>2 to obtain a worst-case O((4/9)(n-m)2((m+p)p/p-1!))-time exact algorithm. At the end of the paper, we first give an example to illustrate our algorithms and then make a series of numerical experiments to compare the running times of our algorithms

    MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model

    Full text link
    Multimodal semantic understanding often has to deal with uncertainty, which means the obtained messages tend to refer to multiple targets. Such uncertainty is problematic for our interpretation, including inter- and intra-modal uncertainty. Little effort has studied the modeling of this uncertainty, particularly in pre-training on unlabeled datasets and fine-tuning in task-specific downstream datasets. In this paper, we project the representations of all modalities as probabilistic distributions via a Probability Distribution Encoder (PDE) by utilizing sequence-level interactions. Compared to the existing deterministic methods, such uncertainty modeling can convey richer multimodal semantic information and more complex relationships. Furthermore, we integrate uncertainty modeling with popular pre-training frameworks and propose suitable pre-training tasks: Distribution-based Vision-Language Contrastive learning (D-VLC), Distribution-based Masked Language Modeling (D-MLM), and Distribution-based Image-Text Matching (D-ITM). The fine-tuned models are applied to challenging downstream tasks, including image-text retrieval, visual question answering, visual reasoning, and visual entailment, and achieve state-of-the-art results.Comment: CVPR 2023 accep

    Boosting Multi-Modal E-commerce Attribute Value Extraction via Unified Learning Scheme and Dynamic Range Minimization

    Full text link
    With the prosperity of e-commerce industry, various modalities, e.g., vision and language, are utilized to describe product items. It is an enormous challenge to understand such diversified data, especially via extracting the attribute-value pairs in text sequences with the aid of helpful image regions. Although a series of previous works have been dedicated to this task, there remain seldomly investigated obstacles that hinder further improvements: 1) Parameters from up-stream single-modal pretraining are inadequately applied, without proper jointly fine-tuning in a down-stream multi-modal task. 2) To select descriptive parts of images, a simple late fusion is widely applied, regardless of priori knowledge that language-related information should be encoded into a common linguistic embedding space by stronger encoders. 3) Due to diversity across products, their attribute sets tend to vary greatly, but current approaches predict with an unnecessary maximal range and lead to more potential false positives. To address these issues, we propose in this paper a novel approach to boost multi-modal e-commerce attribute value extraction via unified learning scheme and dynamic range minimization: 1) Firstly, a unified scheme is designed to jointly train a multi-modal task with pretrained single-modal parameters. 2) Secondly, a text-guided information range minimization method is proposed to adaptively encode descriptive parts of each modality into an identical space with a powerful pretrained linguistic model. 3) Moreover, a prototype-guided attribute range minimization method is proposed to first determine the proper attribute set of the current product, and then select prototypes to guide the prediction of the chosen attributes. Experiments on the popular multi-modal e-commerce benchmarks show that our approach achieves superior performance over the other state-of-the-art techniques

    Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning

    Full text link
    Cross-modal alignment is essential for vision-language pre-training (VLP) models to learn the correct corresponding information across different modalities. For this purpose, inspired by the success of masked language modeling (MLM) tasks in the NLP pre-training area, numerous masked modeling tasks have been proposed for VLP to further promote cross-modal interactions. The core idea of previous masked modeling tasks is to focus on reconstructing the masked tokens based on visible context for learning local-to-local alignment. However, most of them pay little attention to the global semantic features generated for the masked data, resulting in the limited cross-modal alignment ability of global representations. Therefore, in this paper, we propose a novel Semantic Completion Learning (SCL) task, complementary to existing masked modeling tasks, to facilitate global-to-local alignment. Specifically, the SCL task complements the missing semantics of masked data by capturing the corresponding information from the other modality, promoting learning more representative global features which have a great impact on the performance of downstream tasks. Moreover, we present a flexible vision encoder, which enables our model to perform image-text and video-text multimodal tasks simultaneously. Experimental results show that our proposed method obtains state-of-the-art performance on various vision-language benchmarks, such as visual question answering, image-text retrieval, and video-text retrieval

    Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts

    Full text link
    Despite recent advances in image-to-video generation, better controllability and local animation are less explored. Most existing image-to-video methods are not locally aware and tend to move the entire scene. However, human artists may need to control the movement of different objects or regions. Additionally, current I2V methods require users not only to describe the target motion but also to provide redundant detailed descriptions of frame contents. These two issues hinder the practical utilization of current I2V tools. In this paper, we propose a practical framework, named Follow-Your-Click, to achieve image animation with a simple user click (for specifying what to move) and a short motion prompt (for specifying how to move). Technically, we propose the first-frame masking strategy, which significantly improves the video generation quality, and a motion-augmented module equipped with a short motion prompt dataset to improve the short prompt following abilities of our model. To further control the motion speed, we propose flow-based motion magnitude control to control the speed of target movement more precisely. Our framework has simpler yet precise user control and better generation performance than previous methods. Extensive experiments compared with 7 baselines, including both commercial tools and research methods on 8 metrics, suggest the superiority of our approach. Project Page: https://follow-your-click.github.io/Comment: Project Page: https://follow-your-click.github.io/ Github Page: https://github.com/mayuelala/FollowYourClic

    Induction of autophagy by cystatin C: a potential mechanism for prevention of cerebral vasospasm after experimental subarachnoid hemorrhage

    Get PDF
    Abstract Background Studies have demonstrated that autophagy pathways are activated in the brain after experimental subarachnoid hemorrhage (SAH) and this may play a protective role in early brain injury. However, the contribution of autophagy in the pathogenesis of cerebral vasospasm (CVS) following SAH, and whether up-regulated autophagy may contribute to aggravate or release CVS, remain unknown. Cystatin C (CysC) is a cysteine protease inhibitor that induces autophagy under conditions of neuronal challenge. This study investigated the expression of autophagy proteins in the walls of basilar arteries (BA), and the effects of CysC on CVS and autophagy pathways following experimental SAH in rats. Methods All SAH animals were subjected to injection of 0.3 mL fresh arterial, non-heparinized blood into the cisterna magna. Fifty rats were assigned randomly to five groups: control group (n = 10), SAH group (n = 10), SAH + vehicle group (n = 10), SAH + low dose of CysC group (n = 10), and SAH + high dose of CysC group (n = 10). We measured proteins by western blot analysis, CVS by H&amp;E staining method, morphological changes by electron microscopy, and recorded neuro-behavior scores. Results Microtubule-associated protein light chain-3, an autophagosome biomarker, and beclin-1, a Bcl-2-interacting protein required for autophagy, were significantly increased in the BA wall 48 h after SAH. In the CysC-handled group, the degree of CVS, measured as the inner BA perimeter and BA wall thickness, was significantly ameliorated in comparison with vehicle-treated SAH rats. This effect paralleled the intensity of autophagy in the BA wall induced by CysC. Conclusions These results suggest that the autophagy pathway is activated in the BA wall after SAH and CysC-induced autophagy may play a beneficial role in preventing SAH-induced CVS. </jats:sec

    A Broad Range Triboelectric Stiffness Sensor for Variable Inclusions Recognition.

    Get PDF
    With the development of artificial intelligence, stiffness sensors are extensively utilized in various fields, and their integration with robots for automated palpation has gained significant attention. This study presents a broad range self-powered stiffness sensor based on the triboelectric nanogenerator (Stiff-TENG) for variable inclusions in soft objects detection. The Stiff-TENG employs a stacked structure comprising an indium tin oxide film, an elastic sponge, a fluorinated ethylene propylene film with a conductive ink electrode, and two acrylic pieces with a shielding layer. Through the decoupling method, the Stiff-TENG achieves stiffness detection of objects within 1.0&nbsp;s. The output performance and characteristics of the TENG for different stiffness objects under 4&nbsp;mm displacement are analyzed. The Stiff-TENG is successfully used to detect the heterogeneous stiffness structures, enabling effective recognition of variable inclusions in soft object, reaching a recognition accuracy of 99.7%. Furthermore, its adaptability makes it well-suited for the detection of pathological conditions within the human body, as pathological tissues often exhibit changes in the stiffness of internal organs. This research highlights the innovative applications of TENG and thereby showcases its immense potential in healthcare applications such as palpation which assesses pathological conditions based on organ stiffness
    corecore