74 research outputs found
Middleware-based Database Replication: The Gaps between Theory and Practice
The need for high availability and performance in data management systems has
been fueling a long running interest in database replication from both academia
and industry. However, academic groups often attack replication problems in
isolation, overlooking the need for completeness in their solutions, while
commercial teams take a holistic approach that often misses opportunities for
fundamental innovation. This has created over time a gap between academic
research and industrial practice.
This paper aims to characterize the gap along three axes: performance,
availability, and administration. We build on our own experience developing and
deploying replication systems in commercial and academic settings, as well as
on a large body of prior related work. We sift through representative examples
from the last decade of open-source, academic, and commercial database
replication systems and combine this material with case studies from real
systems deployed at Fortune 500 customers. We propose two agendas, one for
academic research and one for industrial R&D, which we believe can bridge the
gap within 5-10 years. This way, we hope to both motivate and help researchers
in making the theory and practice of middleware-based database replication more
relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on
Management of Data, Vancouver, Canada, June 200
Stingray: Cone Tracing using a software DSM for SCI clusters
International audienceIn this paper we consider the use of a supercomputer with a hardware shared memory versus a cluster of workstations using a software Distributed Shared Mem-ory (DSM). We focus on ray tracing applications to compare both architectures. We have ported Stingray, a parallel cone tracer developed on a SGI Origin 2000 super-computer, on a cluster using a Scalable Coherent Interface (SCI) network and a software DSM called SciFS. We present concepts of cone tracing with Stingray, concepts of SCI cluster with a DSM and the implementa-tion issues. We compare the results obtained with the two architectures and we discuss the trade-off - price/performance/programming ease - of both architectures. We show with Stingray that a modest 12 nodes SCI cluster with an efficient software DSM is 5 times cheaper and can perform up to 2.3 times better than a SGI Origin 2000 with 6 processors. We think that a software DSM is well suited for this kind of applications and provides both ease of programming and scalable per-formance
Drivolution: Rethinking the Database Driver Lifecycle
The current design of database drivers – a necessary evil for interacting with a DBMS – imposes undue burdens on those who install, upgrade, and manage database systems and their applications. In this paper, we introduce Drivolution, a new architecture for DB drivers that reduces the cost, risk, and downtime associated with driver distribution, deployment and upgrade in large production environments. We view DB drivers as an integral part of the DB schema, so Drivolution stores drivers in the database itself. Drivers are dynamically downloaded and installed by a small bootloader that resides within each client applications. Downloading, installing, and upgrading drivers occurs transparently to applications, and existing DB management mechanisms are used to define and enforce desired security policies. We show how Drivolution can be integrated into legacy DB engines, replication middleware, and applications, without requiring changes to the server or client applications. We present several case studies that illustrate the use of Drivolution in production environments
Performance and Scalability of EJB Applications
We investigate the combined effect of application implementation method, container design, and efficiency of communication layers on the performance scalability of J2EE application servers by detailed measurement and profiling of an auction site server. We have implemented five versions of the auction site. The first version uses stateless session beans, making only minimal use of the services provided by the Enterprise JavaBeans (EJB) container. Two versions use entity beans, one with containermanaged persistence and the other with bean-managed persistence. The fourth version applies the session façade pattern, using session beans as a façade to access entity beans. The last version uses EJB 2.0 local interfaces with the session façade pattern. We evaluate these different implementations on two popular open-source EJB containers with orthogonal designs. JBoss uses dynamic proxies to generate the container classes at run time, making an extensive use of reflection. JOnAS precompiles classes during deployment, minimizing the use of reflection at run time. We also evaluate the communication optimizations provided by each of these EJB containers. The most important factor in determining performance is the application implementation method. EJB applications with session beans perform as well as a Java servlets-only implementation and an order-of-magnitude better than most of the implementations based on entity beans. The fine-granularity access exposed by the entity beans limits scalability. Use of session façade beans improves performance for entity beans, but only if local communication is very efficient or EJB 2.0 local interfaces are used. Otherwise, session façade beans degrade performance. For the implementation using session beans, communication cost forms the major component of the execution time on the EJB server. The design of the container has little effect on performance. With entity beans, the design of the container becomes important. In particular, the cost of reflection affects performance. For implementations using session façade beans, local communication cost is critically important. EJB 2.0 local interfaces improve the performance by avoiding the communication layers for local communications
RAIDb: Redundant Array of Inexpensive Databases
Clusters of workstations become more and more popular to power data server applications such as large scale Web sites or e-Commerce applications. There has been much research on scaling the front tiers (web servers and application servers) using clusters, but databases usually remain on large dedicated SMP machines. In this paper, we address database performance scalability and high availability using clusters of commodity hardware. Our approach consists of studying different replication and partitioning strategies to achieve various degree of performance and fault tolerance. We propose the concept of Redundant Array of Inexpensive Databases (RAIDb). RAIDb is to databases what RAID is to disks. RAIDb aims at providing better performance and fault tolerance than a single database, at low cost, by combining multiple database instances into an array of databases. Like RAID, we define different RAIDb levels that provide various cost/performance/fault tolerance tradeoffs. RAIDb-0 features full partitioning, RAIDb-1 offers full replication and RAIDb-2 introduces an intermediate solution called partial replication, in which the user can define the degree of replication of each database table. We present a Java implementation of RAIDb called Clustered JDBC or C-JDBC. C-JDBC achieves both database performance scalability and high availability at the middleware level without changing existing applications. We show, using the TPC-W benchmark, that RAIDb-2 can offer better performance scalability (up to 25%) than traditional approaches by allowing fine-grain control on replication. Distributing and restricting the replication of frequently written tables to a small set of backends reduces I/O usage and improves CPU utilization of each cluster node
JGroups evaluation in J2EE cluster environments
Clusters have become the de facto platform to scale J2EE application servers. Each tier of the server uses group communication to maintain consistency between replicated nodes. JGroups is the most commonly used Java middleware for group communications in J2EE open source implementations. No evaluation has been done yet to evaluate the scalability of this middleware and its impact on application server scalability. We present an evaluation of JGroups performance and scalability in the context of clustered J2EE application servers. We evaluate the JGroups configuration used by popular software such as the Tomcat JSP server or JBoss J2EE server. We benchmark JGroups with different network technologies, protocol stacks and cluster sizes. We show, using the default protocol stack, that group communication performance using UDP/IP depends on the switch capability to handle multicast packets. With UDP, Fast Ethernet can give better results than Gigabit Ethernet. We experiment with another configuration using TCP/IP and show that current J2EE application server clusters up to 16 nodes (the largest configuration we tested) can scale much better with this configuration. We attribute the superiority of TCP/IP based group communications over UDP/IP multicast to a better flow control management and a better usage of the network switches available in cluster environments. Finally, we discuss architectural improvements for a better modularity and resource usage of JGroups channels
Performance Comparison of Middleware Architectures for Generating Dynamic Web Content
On-line services are making increasing use of dynamically generated Web content. Serving dynamic content is more complex than serving static content. Besides a Web server, it typically involves a server-side application and a database to generate and store the dynamic content. A number of standard mechanisms have evolved to generate dynamic content. We evaluate three specific mechanisms in common use: PHP, Java servlets, and Enterprise Java Beans (EJB). These mechanisms represent three different architectures for generating dynamic content. PHP scripts are tied to the Web server and require writing explicit database queries. Java servlets execute in a different process from the Web server, allowing them to be located on a separate machine for better load balancing. The database queries are written explicitly, as in PHP, but in certain circumstances the Java synchronization primitives can be used to perform locking, reducing database lock contention and the amount of communication between servlets and the database. Enterprise Java Beans (EJB) provide several services and facilities. In particular, many of the database queries can be generated automatically. We measure the performance of these three architectures using two application benchmarks: an online bookstore and an auction site. These benchmarks represent common applications for dynamic content and stress different parts of a dynamic content Web server. The auction site stresses the server front-end, while the online bookstore stresses the server back-end. For all measurements, we use widely available open-source software (the Apache Web server, Tomcat servlet engine, JOnAS EJB server, and MySQL relational database). While Java servlets are less efficient than PHP, their ability to execute on a different machine from the Web server and their ability to perform synchronization leads to better performance when the front-end is the bottleneck or when there is database lock contention. EJB facilities and services come at the cost of lower performance than both PHP and Java servlets
Model-driven Run-time Enforcement of Complex Role-based Access Control Policies
A Role-based Access Control (RBAC) mechanism prevents unauthorized users to perform an operation, according to authorization policies which are defined on the user’s role within an enterprise. Several models have been proposed to specify complex RBAC policies. However, existing approaches for policy enforcement do not fully support all the types of policies that can be expressed in these models, which hinders their adoption among practitioners.
In this paper we propose a model-driven enforcement framework for complex policies captured by GemRBAC+CTX, a comprehensive RBAC model proposed in the literature. We reduce the problem of making an access decision to checking whether a system state (from an RBAC point of view), expressed as an instance of the GemRBAC+CTX model, satisfies the constraints corresponding to the RBAC policies to be enforced at run time. We provide enforcement algorithms for various types of access requests and events, and a prototype tool (MORRO) implementing them. We also show how to integrate MORRO into an industrial Web application. The evaluation results show the applicability of our approach on a industrial system and its scalability with respect to the various parameters characterizing an AC configuration
Online Piece-wise Linear Approximation of Numerical Streams with Precision Guarantees
Continuous “always-on” monitoring is beneficial for a number of applications, but potentially imposes a high load in terms of communication, storage and power consumption when a large number of variables need to be monitored. We introduce two new filtering techniques, swing filters and slide filters, that represent within a prescribed precision a time-varying numerical signal by a piecewise linear function, consisting of connected line segments for swing filters and (mostly) disconnected line segments for slide filters. We demonstrate the effectiveness of swing and slide filters in terms of their compression power by applying them to a reallife data set plus a variety of synthetic data sets. For nearly all combinations of signal behavior and precision requirements, the proposed techniques outperform the earlier approaches for online filtering in terms of data reduction. The slide filter, in particular, consistently dominates all other filters, with up to twofold improvement over the best of the previous techniques
C-JDBC: a Middleware Framework for Database Clustering
Clusters of workstations become more and more popular to power data server applications such as large scale Web sites or e-Commerce applications. Successful open-source tools exist for clustering the front tiers of such sites (web servers and application servers). No comparable success has been achieved for scaling the backend databases. An expensive SMP machine is required if the database tier becomes the bottleneck. The few tools that exist for clustering databases are often database-specific and/or proprietary. Clustered JDBC (C-JDBC) addresses this problem. It is an open-source, flexible and efficient middleware for database clustering. C-JDBC implements the Redundant Array of Inexpensive Databases (RAIDb) concept. It presents a single virtual database to the application through the JDBC interface and does not require any modification to existing applications. Furthermore, C-JDBC works with any database engine that provides a JDBC driver, without modification to the database engine. The C-JDBC framework is open, configurable and extensible to support large and complex database cluster architectures offering various performance, fault tolerance and availability tradeoffs
- …
