4,997 research outputs found

    Tuning between singlet, triplet, and mixed pairing states in an extended Hubbard chain

    Full text link
    We study spin-half fermions in a one-dimensional extended Hubbard chain at low filling. We identify three triplet and one singlet pairing channels in the system, which are independently tunable as a function of nearest-neighbor charge and spin interactions. In a large-size system with translational invariance, we derive gap equations for the corresponding pairing gaps and obtain a Bogoliubov-de Gennes Hamiltonian with its non-trivial topology determined by the interplay of these gaps. In an open-end system with a fixed number of particles, we compute the exact many-body ground state and identify the dominant pairing revealed by the pair density matrix. Both cases show competition between the four pairing states, resulting in broad regions for each of them and relatively narrow regions for mixed-pairing states in the parameter space. Our results enable the possibility of tuning a nanowire between singlet and triplet pairing states without breaking time-reversal or SU(2) symmetry, accompanied by a change in the system's topology.Comment: 17 pages, 6 figure

    Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation

    Full text link
    TensorFlow has been the most widely adopted Machine/Deep Learning framework. However, little exists in the literature that provides a thorough understanding of the capabilities which TensorFlow offers for the distributed training of large ML/DL models that need computation and communication at scale. Most commonly used distributed training approaches for TF can be categorized as follows: 1) Google Remote Procedure Call (gRPC), 2) gRPC+X: X=(InfiniBand Verbs, Message Passing Interface, and GPUDirect RDMA), and 3) No-gRPC: Baidu Allreduce with MPI, Horovod with MPI, and Horovod with NVIDIA NCCL. In this paper, we provide an in-depth performance characterization and analysis of these distributed training approaches on various GPU clusters including the Piz Daint system (6 on Top500). We perform experiments to gain novel insights along the following vectors: 1) Application-level scalability of DNN training, 2) Effect of Batch Size on scaling efficiency, 3) Impact of the MPI library used for no-gRPC approaches, and 4) Type and size of DNN architectures. Based on these experiments, we present two key insights: 1) Overall, No-gRPC designs achieve better performance compared to gRPC-based approaches for most configurations, and 2) The performance of No-gRPC is heavily influenced by the gradient aggregation using Allreduce. Finally, we propose a truly CUDA-Aware MPI Allreduce design that exploits CUDA kernels and pointer caching to perform large reductions efficiently. Our proposed designs offer 5-17X better performance than NCCL2 for small and medium messages, and reduces latency by 29% for large messages. The proposed optimizations help Horovod-MPI to achieve approximately 90% scaling efficiency for ResNet-50 training on 64 GPUs. Further, Horovod-MPI achieves 1.8X and 3.2X higher throughput than the native gRPC method for ResNet-50 and MobileNet, respectively, on the Piz Daint cluster.Comment: 10 pages, 9 figures, submitted to IEEE IPDPS 2019 for peer-revie

    The impact of foreign trading information on emerging futures markets: a study of Taiwan's unique data set

    Get PDF
    Using a unique dataset from the Taiwan Futures Exchange, this paper investigates whether trading imbalances by foreign investors affect emerging Taiwan futures market in terms of returns and volatility. First, this evidence demonstrates a positive relation between contemporaneous futures returns and net purchases by foreign investors when other market factor effects are controlled. Second, this failure to detect price reversals is inconsistent with the price pressure hypothesis. Third, foreign investors do not exhibit positive feedback trading patterns. Fourth, a bi-directional Granger-causality relationship exists between futures volatility and foreign trading flows. As found for other stock or foreign exchange markets, our empirical results demonstrate that foreign trading flows do have impacts on the return and volatility of developing futures market, suggesting that trading by foreign investors may enhance the information flow of the local futures market.Foreign trading

    Engineering of many-body Majorana states in a topological insulator/s-wave superconductor heterostructure

    Full text link
    We study a vortex chain in a thin film of a topological insulator with proximity-induced superconductivity---a promising platform to realize Majorana zero modes (MZMs)---by modeling it as a two-leg Majorana ladder. While each pair of MZMs hybridizes through vortex tunneling, we hereby show that MZMs can be stabilized on the ends of the ladder with the presence of tilted external magnetic field and four-Majorana interaction. Furthermore, a fruitful phase diagram is obtained by controlling the direction of magnetic field and the thickness of the sample. We reveal many-body Majorana states and interaction-induced topological phase transitions and also identify trivial-superconducting and commensurate/incommensurate charge-density-wave states in the phase diagram.Comment: 10 pages, 4 figure

    AN OBSTACLE DETECTION SYSTEM USING DEPTH INFORMATION AND REGION GROWING FOR VISUALLY IMPAIRED PEOPLE

    Get PDF
    [[abstract]]This study proposes an obstacle detection method based on depth information to aid the visually impaired people in avoiding obstacles as they move in an unfamiliar environment. Firstly, we have applied dilation of morphology and erosion of morphology to remove the crushing noise of the depth image and have used the Least Squares Method (LSM) in a quadratic polynomial to approximate floor curves and determine the floor height threshold in the V-disparity. Secondly, we have searched for dramatic changes depth value in accordance with the floor height threshold to find out suspicious stair edge points. Thirdly, we have used the Hough Transform to find out the location of the drop line. In order to strengthen the characteristics of the different objects to overcome the drawbacks of the region growing method, we have applied edge detection to remove the edge. Fourthly, we have used the floor height threshold and features of the ground to remove ground plane. And then our system has used the region growing method to label the tags on different objects. It has analyzed each object to determine whether the object is a stair. Fifthly, if the result is neither up stair nor down stair, we have used K-SVD algorithm to determine whether the object is people. Finally, the system has assisted the users to determine the stairs direction and obstacle distance through a voice prompt by Text To Speech (TTS). Experimental results show that the proposed system has great robustness and convenience.[[sponsorship]]National Taipei University[[conferencetype]]國際[[conferencedate]]20150718~20150719[[booktype]]電子版[[iscallforpapers]]Y[[conferencelocation]]Tokyo, Japa

    Novel CMOS RFIC Layout Generation with Concurrent Device Placement and Fixed-Length Microstrip Routing

    Full text link
    With advancing process technologies and booming IoT markets, millimeter-wave CMOS RFICs have been widely developed in re- cent years. Since the performance of CMOS RFICs is very sensi- tive to the precision of the layout, precise placement of devices and precisely matched microstrip lengths to given values have been a labor-intensive and time-consuming task, and thus become a major bottleneck for time to market. This paper introduces a progressive integer-linear-programming-based method to gener- ate high-quality RFIC layouts satisfying very stringent routing requirements of microstrip lines, including spacing/non-crossing rules, precise length, and bend number minimization, within a given layout area. The resulting RFIC layouts excel in both per- formance and area with much fewer bends compared with the simulation-tuning based manual layout, while the layout gener- ation time is significantly reduced from weeks to half an hour.Comment: ACM/IEEE Design Automation Conference (DAC), 201

    Magnetic field structure in the Flattened Envelope and Jet in the young protostellar system HH 211

    Full text link
    HH 211 is a young Class 0 protostellar system, with a flattened envelope, a possible rotating disk, and a collimated jet. We have mapped it with the Submillimeter Array in 341.6 GHz continuum and SiO J=8-7 at ~ 0.6 resolution. The continuum traces the thermal dust emission in the flattened envelope and the possible disk. Linear polarization is detected in the continuum in the flattened envelope. The field lines implied from the polarization have different orientations, but they are not incompatible with current gravitational collapse models, which predict different orientation depending on the region/distance. Also, we might have detected for the first time polarized SiO line emission in the jet due to the Goldreich-Kylafis effect. Observations at higher sensitivity are needed to determine the field morphology in the jet.Comment: 5 pages, 2 figure

    Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?

    Full text link
    Dense Multi-GPU systems have recently gained a lot of attention in the HPC arena. Traditionally, MPI runtimes have been primarily designed for clusters with a large number of nodes. However, with the advent of MPI+CUDA applications and CUDA-Aware MPI runtimes like MVAPICH2 and OpenMPI, it has become important to address efficient communication schemes for such dense Multi-GPU nodes. This coupled with new application workloads brought forward by Deep Learning frameworks like Caffe and Microsoft CNTK pose additional design constraints due to very large message communication of GPU buffers during the training phase. In this context, special-purpose libraries like NVIDIA NCCL have been proposed for GPU-based collective communication on dense GPU systems. In this paper, we propose a pipelined chain (ring) design for the MPI_Bcast collective operation along with an enhanced collective tuning framework in MVAPICH2-GDR that enables efficient intra-/inter-node multi-GPU communication. We present an in-depth performance landscape for the proposed MPI_Bcast schemes along with a comparative analysis of NVIDIA NCCL Broadcast and NCCL-based MPI_Bcast. The proposed designs for MVAPICH2-GDR enable up to 14X and 16.6X improvement, compared to NCCL-based solutions, for intra- and inter-node broadcast latency, respectively. In addition, the proposed designs provide up to 7% improvement over NCCL-based solutions for data parallel training of the VGG network on 128 GPUs using Microsoft CNTK.Comment: 8 pages, 3 figure
    corecore