74 research outputs found

    Deep Exponential Families

    Full text link
    We describe \textit{deep exponential families} (DEFs), a class of latent variable models that are inspired by the hidden structures used in deep neural networks. DEFs capture a hierarchy of dependencies between latent variables, and are easily generalized to many settings through exponential families. We perform inference using recent "black box" variational inference techniques. We then evaluate various DEFs on text and combine multiple DEFs into a model for pairwise recommendation data. In an extensive study, we show that going beyond one layer improves predictions for DEFs. We demonstrate that DEFs find interesting exploratory structure in large data sets, and give better predictive performance than state-of-the-art models

    Memory3\text{Memory}^3: Language Modeling with Explicit Memory

    Full text link
    The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowledge externalized to explicit memories, the LLM can enjoy a smaller parameter size, training cost, and inference cost, all proportional to the amount of remaining "abstract knowledge". As a preliminary proof of concept, we train from scratch a 2.4B LLM, which achieves better performance than much larger LLMs as well as RAG models, and maintains higher decoding speed than RAG. The model is named Memory3\text{Memory}^3, since explicit memory is the third form of memory in LLMs after implicit memory (model parameters) and working memory (context key-values). We introduce a memory circuitry theory to support the externalization of knowledge, and present novel techniques including a memory sparsification mechanism that makes storage tractable and a two-stage pretraining scheme that facilitates memory formation

    Efficient Processing and Delivery of Multimedia Data

    No full text
    The explosion of multimedia data on the Internet in recent years has greatly enriched people's online experience. However, it also poses a great challenge to analyze and process, and then deliver such content to the worldwide audience. This dissertation presents novel approaches to improve the overall efficiency of the stack by tailoring software design to hardware properties, as well as optimize systems by exploiting workload characteristics using learning-based approaches. First, to improve the caching performance of the flash-memory caches for content delivery network, this thesis proposes RIPQ, a framework for efficient and advanced caching with flash memory. Traditional implementations of these algorithms generate random writes that perform poorly on flash devices, decreasing the device's performance and lifespan. RIPQ overcomes this issue by aggregating small writes, colocating items with similar priorities, and perform lazy updates to achieve low over- head. By providing a priority queue interface, it allows a variety of caching algorithms to be easily implemented. Second, this thesis proposes Chess, which uses popularity prediction for higher quality video streaming. Although better encodings improve video streaming, they are also compute-intensive, and it is infeasible to encode all videos uploaded to Face- book with the highest quality codec. However, because the accesses to videos are highly skewed, we may obtain most of the benefit by only running the compute- intensive encoding on a small portion of popular videos, and the challenge lies in how to accurately and scalably run popularity prediction to detect those videos before- hand. Chess meets this demand by designing an approximate but fast base predictor with the access history information, and using an online learning method to combine multiple such predictors as well as the social signals to boost accuracy. Lastly, this thesis investigates how to accelerate deep learning models on many-core CPUs. Deep learning is now widely used for analyzing multimedia data, but it is compute-intensive, which constitutes its major bottleneck. The manycore CPU, combining both high FLOPS and a flexible computing model, is a promising solution to this problem. However, existing frameworks are still mainly optimized for GPU, and do not run efficiently on this architecture. To overcome this issue, this thesis proposes Graphi, the first attempt to accelerate the execution of computation graphs for deep learning models on this architecture. Graphi determines the optimal parallel settings with a profiling step, runs concurrent operations with low contention, and further reduces execution makespan with critical-path first scheduling. This thesis demonstrated that these techniques can achieve significant speedups over TensorFlow on manycore CPUs

    Modulation instability and rogue wave spectrum for the generalized nonlinear Schrödinger equation

    Full text link
    Abstract We discuss modulation instability for the generalized nonlinear Schrödinger equation based on nonzero background wave frequency. First of all, we analyze the existence condition of modulation instability under different perturbed frequency. The influences of the background amplitude, background frequency and perturbed frequency on the modulation instability gain are researched, respectively. Also, we obtain the correspondences between several nonlinear excitations (Kuznetsov-Ma breather, general breather, rogue wave, bright soliton and plane wave) and modulation instability according to new parameters. Furthermore, by the Fourier transformation method, we perform spectrum analyses of the first-order and second-order rogue waves. The perturbed frequency of the rogue wave can affect the location and profile of the spectrum. And we find that the spectrum of the second-order rogue wave is jagged due to the collision of the rogue waves. These results would help us further understand the dynamics of rogue wave in complex systems.</jats:p

    Facile synthesis of silver submicrospheres and their applications

    Full text link
    Uniform silver submicrospheres were synthesized under ambient conditions, through reduction of silver nitrate using ascorbic acid as a reducing agent and Tween 20 as a stabilizer. The silver submicroparticles exhibited strong catalytic activity for the reduction of 4-nitrophenol by sodium borohydride (NaBH4). Significantly, the aggregates of a few silver submicroparticles can be used as surface-enhanced Raman scattering (SERS) substrate to improve markedly the Raman signal of crystal violet. The morphologies of silver submicroparticles can be controlled by changing reaction conditions. The formation process of silver submicroparticles was monitored by time-resolved extinction spectroscopy. The influences of concentrations and molar ratios of reaction reagents on the formation of silver submicroparticles are discussed

    Next generation sequencing identified two novel mutations in NIPBL and a frame shift mutation in CREBBP in three Chinese children

    No full text
    Abstract Background Cornelia de Lange syndrome (CdLS) and Rubinstein-Taybi syndrome (RSTS) are both rare congenital multiple malformation disorders caused by genes associated with transcription. They share a number of similar features clinically. In addition, it is difficult to make a molecular diagnosis rapidly and detect the mosaic mutation when only sanger sequencing is taken. This study aims to report three novel mutations in three Chinese children identified by next generation sequencing. Results We describe patient 1 and patient 2 presenting with characteristics of CdLS with mutations in NIPBL and patient 3 with a frame shift mutation in CREBBP who can be diagnosed as RSTS clinically and also have similar symptoms with CdLS to some extent. The splicing site c.4321-1G > A transversion in NIPBL is a mosaic mutation and produces an abnormal transcript bearing the loss of exon 20. The nonsense mutation c.218C > A in NIPBL and the frame shift c.1715delC mutation in CREBBP generate stop codon and yield the premature termination of proteins. Conclusions In general, we detect three novel heterozygous mutations including a splicing mutation and a nonsense mutation in NIPBL and a frame shift in CREBBP. And several similar features observed in patients indicate the clinical complexity and clinically overlapping of CdLS and RSTS termed “transcriptomopathies”, suggest the underlying molecular mechanism and emphasize the utilization of next generation sequencing technologies

    Towards a Robust Framework of Network Coordinate Systems

    No full text
    Part 7: Network MappingInternational audienceNetwork Coordinate System (NCS) is an efficient and scalable mechanism to predict latency between any two network hosts based on historical measurements. Most NCS models, such as metric space embedding based, like Vivaldi, and matrix factorization based, like DMF and Phoenix, use squared error measure in training which suffers from the erroneous records, i.e. the records with large noise. To overcome this drawback, we introduce an elegant error measure, the Huber norm to network latency prediction. The Huber norm shows its robustness to the large data noise while remaining efficiency of optimization. Based on that, we upgrade the traditional NCS models into more robust versions, namely Robust Vivaldi model and Robust Matrix Factorization model. We conduct extensive experiments to compare the proposed models with traditional ones and the results show that our approaches significantly increase the accuracy of network latency prediction
    corecore