Search CORE

34,677 research outputs found

Document distribution algorithm for load balancing on an extensible Web server architecture

Author: Ng CP
Wang CL
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

Access latency and load balancing are the two main issues in the design of clustered Web server architecture for achieving high performance. We propose a novel document distribution algorithm for load balancing on a cluster of distributed Web servers. We group Web pages that are likely to be accessed during a request session into a migrating unit, which is used as the basic unit of document placement. A modified binning algorithm is developed to distribute the migrating units among the Web servers to fulfil the load balancing. We also present a redirection mechanism, which makes use of a migrating unit's property, to reduce the cost of request redirections. The distribution of Web documents would be recomputed periodically to adapt to the changes in client request patterns and system configuration. Simulation results show that our solution can reduce the amount of request redirection and document migration, and it can distribute workload properly among Web servers.published_or_final_versio

HKU Scholars Hub

Push-Pull Messaging: a high-performance communication mechanism for commodity SMP clusters

Author: Wang CL
Wong KP
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1999
Field of study

Push-Pull Messaging is a novel messaging mechanism for high-speed interprocess communication in a cluster of symmetric multi-processors (SMP) machines. This messaging mechanism exploits the parallelism in SMP nodes by allowing the execution of communication stages of a messaging event on different processors to achieve maximum performance. Push-Pull Messaging facilitates further improvement on communication performance by employing three optimizing techniques in our design: (1) Cross-Space Zero Buffer provides a unified buffer management mechanism to achieve a copy-less communication for the data transfer among processes within a SMP node. (2) Address Translation Overhead Masking removes the address translation overhead from the critical path in the internode communication. (3) Push-and-Acknowledge Overlapping overlaps the push and acknowledge phases to hide the acknowledge latency. Overall, Push-Pull Messaging effectively utilizes the system resources and improves the communication speed. It has been implemented to support high-speed communication for connecting quad Pentium Pro SMPs with 100 Mbit/s Fast Ethernet.published_or_final_versio

HKU Scholars Hub

Network performance isolation for latency-sensitive cloud applications

Author: Cheng L
Wang CL
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

preprin

Crossref

HKU Scholars Hub

Contention-Free Complete Exchange Algorithms on Clusters

Author: Tam ATC
Wang CL
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2000
Field of study

To construct a large commodity clustec a hierarchical network is generally adopted for connecting the host muchines, where a Gigabit backbone switch connects a few commodity switches with uplinks to achieve scaled bisectional bandwidth. This type of interconnection usually results in link contention and has congestion developed at the uplink ports. Moreover, the non-detenninistic delays on scheduling communication events in clusters accelerate the building up of congestion amongst these uplink ports, which lead to severe packets drop and hinder the overall performance. In this paper, we focus on the practical design of high-speed complete exchange algorithm on a commodity cluster interconnected by a hierarchical Ethemet-based network. By exploiting some architectural characteristics of the interconnection in optimizing the performunce of a complete exchange algorithm, we introduce a congestion control mechanism - global windowing that monitors and regulates the trafic load, together with a permutation scheme - reorder scheme that effectively alleviates the congestion problem. We evaluate our algorithm and compare its performance with other algorithms in a PC cluster connected by various types of switches, including Gigabit Ethernet, input-buffered and shared-memory Fast Ethernet switches.published_or_final_versio

HKU Scholars Hub

Efficient reliable broadcast for commodity clusters

Author: Wang CL
Wong R
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2000
Field of study

High-speed collective communication is the key to achieve high-performance computing in parallel computing. In the past, collective operations are usually implemented using unicast operations. We proposed a new architecture EQA (Enhanced Queue Architecture) for implementing high-speed collective operations in a cluster. With the incorporation of EQA and the hardware broadcast facility in network switches, an efficient reliable broadcast operation is implemented in a DP-SMP communication subsystem. With EQA, the computation, memory and network resources can be utilized efficiently. We evaluated the performance of the broadcast operation in a commodity cluster with fast Ethernet connection. We found that the hardware-based broadcast from DP-SMP with EQA outperforms the software-based broadcast operation. The use of EQA in broadcast operation could reduce the memory consumption by almost 40%. DP-SMP with EQA has proven to be an efficient communication mechanism for coupling commodity clusters.published_or_final_versio

Crossref

HKU Scholars Hub

GPS calibrated ad-hoc localization for geosocial networking

Author: Hu DH
Wang CL
Wang Y
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

LNCS v. 6406 is conference proceedings of UIC 2010Cost-effective localization for large-scale Geosocial networking service is a challenging issue in urban environment. This paper studies an ad-hoc localization technique which takes advantages of short-range interchanged location information for calibrating the location of mobile users carrying non-GPS mobile phones. We demonstrate by simulation that a small percentage of GPS-enabled mobile phones can greatly enable the localization of other non-GPS pedestrians in the urban environment. Based on the proposed localization technique, we implement a location-aware social networking tool called Mobile Twitter, similar to the microblogging service of Twitter, for fast propagation of social events happening in surroundings. Evaluation shows the our localization algorithm can achieve better accuracy of the location estimation and wider coverage as compared with the Amorphous algorithm and the Monte Carlo Localization (MCL) method. Moreover, we show that the Mobile Twitter implemented on an Android mobile phone is power-efficient in real-life usage scenarios. © 2010 Springer-Verlag.postprintThe 7th International Conference on Ubiquitous Intelligence and Computing (UIC) 2010, Xi'an, China, 26-29 October 2010. In Lecture Notes in Computer Science, 2010, v. 6406, p. 52-6

Crossref

HKU Scholars Hub

Scheduling parallel machines with inclusive processing set restrictions and job release times

Author: Li CL
Wang X
Publication venue: 'Elsevier BV'
Publication date: 11/12/2014
Field of study

2009-2010 > Academic research: refereed > Publication in refereed journalAccepted ManuscriptPublishe

PolyU Institutional Repository

Conditional Image-Text Embedding Networks

Author: A Gordo
A Rohrbach
BA Plummer
CL Zitnick
F Radenović
L Yu
M Wang
R Krishna
Publication venue
Publication date: 28/07/2018
Field of study

This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model. In order to differentiate text phrases into semantically distinct subspaces, we propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predefine such assignments. Our proposed solution simplifies the representation requirements for individual embeddings and allows the underrepresented concepts to take advantage of the shared representations before feeding them into concept-specific layers. Comprehensive experiments verify the effectiveness of our approach across three phrase grounding datasets, Flickr30K Entities, ReferIt Game, and Visual Genome, where we obtain a (resp.) 4%, 3%, and 4% improvement in grounding performance over a strong region-phrase embedding baseline.Comment: ECCV 2018 accepted pape

arXiv.org e-Print Archive

Crossref

Cache affinity optimization techniques for scaling software transactional memory systems on multi-CMP architectures

Author: Chan K
Lam KT
Wang CL
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

Software transactional memory (STM) enhances both ease-of-use and concurrency, and is considered one of the next-generation paradigms for parallel programming. Application programs may see hotspots where data conflicts are intensive and seriously degrade the performance. So advanced STM systems employ dynamic concurrency control techniques to curb the conflict rate through properly throttling the rate of spawning transactions. High-end computers may have two or more multicore processors so that data sharing among cores goes through a non-uniform cache memory hierarchy. This poses challenges to concurrency control designs as improper metadata placement and sharing will introduce scalability issues to the system. Poor thread-to-core mappings that induce excessive cache invalidation are also detrimental to the overall performance. In this paper, we share our experience in designing and implementing a new dynamic concurrency controller for Tiny STM, which helps keeping the system concurrency at a near-optimal level. By decoupling unfavourable metadata sharing, our controller design avoids costly inter-processor communications. It also features an affinity-aware thread migration technique that fine-tunes thread placements by observing inter-thread transactional conflicts. We evaluate our implementation using the STAMP benchmark suite and show that the controller can bring around 21% average speedup over the baseline execution. © 2015 IEEE.postprin

HKU Scholars Hub

High performance communication subsystem for clustering standard high-volume servers using Gigabit Ethernet

Author: Lee D
Wang CL
Zhu W
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2000
Field of study

This paper presents an efficient communication subsystem, DP-II, for clustering standard high-volume (SHV) servers using Gigabit Ethernet. The DP-II employs several lightweight messaging mechanisms to achieve low-latency and high-bandwidth communication. The test shows an 18.32 us single-trip latency and 72.8 MB/s bandwidth on a Gigabit Ethernet network for connecting two Dell PowerEdge 6300 Quad Xeon SMP servers running Linux. To improve the programmability of the DP-II communication subsystem, the development of DP-II was based on a concise yet powerful abstract communication model, Directed Point Model, which can be conveniently used to depict the inter-process communication pattern of a parallel task in the cluster environment. In addition, the API of DP-II preserves the syntax and semantics of traditional UNIX I/O operations, which make it easy to use.published_or_final_versio

Crossref

HKU Scholars Hub