|
|
Fast parallel similarity search in multimedia databases |
| |
Stefan Berchtold,
Christian Böhm,
Bernhard Braunmüller,
Daniel A. Keim,
Hans-Peter Kriegel
|
|
Pages: 1-12 |
|
doi>10.1145/253260.253263 |
|
Full text: PDF
|
|
Most similarity search techniques map the data objects into some high-dimensional feature space. The similarity search then corresponds to a nearest-neighbor search in the feature space which is computationally very intensive. In this paper, we present ...
Most similarity search techniques map the data objects into some high-dimensional feature space. The similarity search then corresponds to a nearest-neighbor search in the feature space which is computationally very intensive. In this paper, we present a new parallel method for fast nearest-neighbor search in high-dimensional feature spaces. The core problem of designing a parallel nearest-neighbor algorithm is to find an adequate distribution of the data onto the disks. Unfortunately, the known declustering methods to not perform well for high-dimensional nearest-neighbor search. In contrast, our method has been optimized based on the special properties of high-dimensional spaces and therefore provides a near-optimal distribution of the data items among the disks. The basic idea of our data declustering technique is to assign the buckets corresponding to different quadrants of the data space to different disks. We show that our technique - in contrast to other declustering methods - guarantees that all buckets corresponding to neighboring quadrants are assigned to different disks. We evaluate our method using large amounts of real data (up to 40 MBytes) and compare it with the best known data declustering method, the Hilbert curve. Our experiments show that our method provides an almost linear speed-up and a constant scale-up. Additionally, it outperforms the Hilbert approach by a factor of up to 5. expand
|
|
|
Similarity-based queries for time series data |
| |
Davood Rafiei,
Alberto Mendelzon
|
|
Pages: 13-25 |
|
doi>10.1145/253260.253264 |
|
Full text: PDF
|
|
We study a set of linear transformations on the Fourier series representation of a sequence that can be used as the basis for similarity queries on time-series data. We show that our set of transformations is rich enough to formulate operations such ...
We study a set of linear transformations on the Fourier series representation of a sequence that can be used as the basis for similarity queries on time-series data. We show that our set of transformations is rich enough to formulate operations such as moving average and time warping. We present a query processing algorithm that uses the underlying R-tree index of a multidimensional data set to answer similarity queries efficiently. Our experiments show that the performance of this algorithm is competitive to that of processing ordinary (exact match) queries using the index, and much faster than sequential scanning. We relate our transformations to the general framework for similarity queries of Jagadish et al. expand
|
|
|
Meaningful change detection in structured data |
| |
Sudarshan S. Chawathe,
Hector Garcia-Molina
|
|
Pages: 26-37 |
|
doi>10.1145/253260.253266 |
|
Full text: PDF
|
|
Detecting changes by comparing data snapshots is an important requirement for difference queries, active databases, and version and configuration management. In this paper we focus on detecting meaningful changes in hierarchically structured data, such ...
Detecting changes by comparing data snapshots is an important requirement for difference queries, active databases, and version and configuration management. In this paper we focus on detecting meaningful changes in hierarchically structured data, such as nested-object data. This problem is much more challenging than the corresponding one for relational or flat-file data. In order to describe changes better, we base our work not just on the traditional “atomic” insert, delete, update operations, but also on operations that move an entire sub-tree of nodes, and that copy an entire sub-tree. These operations allows us to describe changes in a semantically more meaningful way. Since this change detection problem is NP-hard, in this paper we present a heuristic change detection algorithm that yields close to “minimal” descriptions of the changes, and that has fewer restrictions than previous algorithms. Our algorithm is based on transforming the change detection problem to a problem of computing a minimum-cost edge cover of a bipartite graph. We study the quality of the solution produced by our algorithm, as well as the running time, both analytically and experimentally. expand
|
|
|
Improved query performance with variant indexes |
| |
Patrick O'Neil,
Dallan Quass
|
|
Pages: 38-49 |
|
doi>10.1145/253260.253268 |
|
Full text: PDF
|
|
The read-mostly environment of data warehousing makes it possible to use more complex indexes to speed up queries than in situations where concurrent updates are present. The current paper presents a short review of current indexing technology, including ...
The read-mostly environment of data warehousing makes it possible to use more complex indexes to speed up queries than in situations where concurrent updates are present. The current paper presents a short review of current indexing technology, including row-set representation by Bitmaps, and then introduces two approaches we call Bit-Sliced indexing and Projection indexing. A Projection index materializes all values of a column in RID order, and a Bit-Sliced index essentially takes an orthogonal bit-by-bit view of the same data. While some of these concepts started with the MODEL 204 product, and both Bit-Sliced and Projection indexing are now fully realized in Sybase IQ, this is the first rigorous examination of such indexing capabilities in the literature. We compare algorithms that become feasible with these variant index types against algorithms using more conventional indexes. The analysis demonstrates important performance advantages for variant indexes in some types of SQL aggregation, predicate evaluation, and grouping. The paper concludes by introducing a new method whereby multi-dimensional group-by queries, reminiscent of OLAP/Datacube queries but with more flexibility, can be very efficiently performed.
expand
|
|
|
Highly concurrent cache consistency for indices in client-server database systems |
| |
Markos Zaharioudakis,
Michael J. Carey
|
|
Pages: 50-61 |
|
doi>10.1145/253260.253269 |
|
Full text: PDF
|
|
In this paper, we present four approaches to providing highly concurrent B+-tree indices in the context of a data-shipping, client-server OODBMS architecture. The first performs all index operations at the server, ...
In this paper, we present four approaches to providing highly concurrent B+-tree indices in the context of a data-shipping, client-server OODBMS architecture. The first performs all index operations at the server, while the other approaches support varying degrees of client caching and usage of index pages. We have implemented the four approaches, as well as the 2PL approach, in the context of the SHORE OODB system at Wisconsin, and we present experimental results from a performance study based on running SHORE on an IBM SP2 multicomputer. Our results emphasize the need for non-2PL approaches and demonstrate the tradeoffs between 2PL, no-caching, and the three caching alternatives. expand
|
|
|
Concurrency and recovery in generalized search trees |
| |
Marcel Kornacker,
C. Mohan,
Joseph M. Hellerstein
|
|
Pages: 62-72 |
|
doi>10.1145/253260.253272 |
|
Full text: PDF
|
|
This paper presents general algorithms for concurrency control in tree-based access methods as well as a recovery protocol and a mechanism for ensuring repeatable read. The algorithms are developed in the context of the Generalized Search Tree (GiST) ...
This paper presents general algorithms for concurrency control in tree-based access methods as well as a recovery protocol and a mechanism for ensuring repeatable read. The algorithms are developed in the context of the Generalized Search Tree (GiST) data structure, an index structure supporting an extensible set of queries and data types. Although developed in a GiST context, the algorithms are generally applicable to many tree-based access methods. The concurrency control protocol is based on an extension of the link technique originally developed for B-trees, and completely avoids holding node locks during I/Os. Repeatable read isolation is achieved with a novel combination of predicate locks and two-phase locking of data records. To our knowledge, this is the first time that isolation issues have been addressed outside the context of B-trees. A discussion of the fundamental structural differences between B-trees and more general tree structures like GiSTs explains why the algorithms developed here deviate from their B-tree counterparts. An implementation of GiSTs emulating B-trees in DB2/Common Server is underway.
expand
|
|
|
Range queries in OLAP data cubes |
| |
Ching-Tien Ho,
Rakesh Agrawal,
Nimrod Megiddo,
Ramakrishnan Srikant
|
|
Pages: 73-88 |
|
doi>10.1145/253260.253274 |
|
Full text: PDF
|
|
A range query applies an aggregation operation over all selected cells of an OLAP data cube where the selection is specified by providing ranges of values for numeric dimensions. We present fast algorithms for range queries for two types of aggregation ...
A range query applies an aggregation operation over all selected cells of an OLAP data cube where the selection is specified by providing ranges of values for numeric dimensions. We present fast algorithms for range queries for two types of aggregation operations: SUM and MAX. These two operations cover techniques required for most popular aggregation operations, such as those supported by SQL.
For range-sum queries, the essential idea is to precompute some auxiliary information (prefix sums) that is used to answer ad hoc queries at run-time. By maintaining auxiliary information which is of the same size as the data cube, all range queries for a given cube can be answered in constant time, irrespective of the size of the sub-cube circumscribed by a query. Alternatively, one can keep auxiliary information which is 1/bd of the size of the d-dimensional data cube. Response to a range query may now require access to some cells of the data cube in addition to the access to the auxiliary information, but the overall time complexity is typically reduced significantly. We also discuss how the precomputed information is incrementally updated by batching updates to the data cube. Finally, we present algorithms for choosing the subset of the data cube dimensions for which the auxiliary information is computed and the blocking factor to use for each such subset.
Our approach to answering range-max queries is based on precomputed max over balanced hierarchical tree structures. We use a branch-and-bound-like procedure to speed up the finding of max in a region. We also show that with a branch-and-bound procedure, the average-case complexity is much smaller than the worst-case complexity.
expand
|
|
|
Cubetree: organization of and bulk incremental updates on the data cube |
| |
Nick Roussopoulos,
Yannis Kotidis,
Mema Roussopoulos
|
|
Pages: 89-99 |
|
doi>10.1145/253260.253276 |
|
Full text: PDF
|
|
The data cube is an aggregate operator which has been shown to be very powerful for On Line Analytical Processing (OLAP) in the context of data warehousing. It is, however, very expensive to compute, access, and maintain. In this paper we define the ...
The data cube is an aggregate operator which has been shown to be very powerful for On Line Analytical Processing (OLAP) in the context of data warehousing. It is, however, very expensive to compute, access, and maintain. In this paper we define the “cubetree” as a storage abstraction of the cube and realize in using packed R-trees for most efficient cube queries. We then reduce the problem of creation and maintenance of the cube to sorting and bulk incremental merge-packing of cubetrees. This merge-pack has been implemented to use separate storage for writing the updated cubetrees, therefore allowing cube queries to continue even during maintenance. Finally, we characterize the size of the delta increment for achieving good bulk update schedules for the cube. The paper includes experiments with various data sets measuring query and bulk update performance. expand
|
|
|
Maintenance of data cubes and summary tables in a warehouse |
| |
Inderpal Singh Mumick,
Dallan Quass,
Barinderpal Singh Mumick
|
|
Pages: 100-111 |
|
doi>10.1145/253260.253277 |
|
Full text: PDF
|
|
Data warehouses contain large amounts of information, often collected from a variety of independent sources. Decision-support functions in a warehouse, such as on-line analytical processing (OLAP), involve hundreds of complex aggregate ...
Data warehouses contain large amounts of information, often collected from a variety of independent sources. Decision-support functions in a warehouse, such as on-line analytical processing (OLAP), involve hundreds of complex aggregate queries over large volumes of data. It is not feasible to compute these queries by scanning the data sets each time. Warehouse applications therefore build a large number of summary tables, or materialized aggregate views, to help them increase the system performance.
As changes, most notably new transactional data, are collected at the data sources, all summary tables at the warehouse that depend upon this data need to be updated. Usually, source changes are loaded into the warehouse at regular intervals, usually once a day, in a batch window, and the warehouse is made unavailable for querying while it is updated. Since the number of summary tables that need to be maintained is often large, a critical issue for data warehousing is how to maintain the summary tables efficiently.
In this paper we propose a method of maintaining aggregate views (the summary-delta table method), and use it to solve two problems in maintaining summary tables in a warehouse: (1) how to efficiently maintain a summary table while minimizing the batch window needed for maintenance, and (2) how to maintain a large set of summary tables defined over the same base tables.
While several papers have addressed the issues relating to choosing and materializing a set of summary tables, this is the first paper to address maintaining summary tables efficiently. expand
|
|
|
Database buffer size investigation for OLTP workloads |
| |
Thin-Fong Tsuei,
Allan N. Packer,
Keng-Tai Ko
|
|
Pages: 112-122 |
|
doi>10.1145/253260.253279 |
|
Full text: PDF
|
|
It is generally accepted that On-Line Transaction Processing (OLTP) systems benefit from large database memory buffers. As enterprise database systems become larger and more complex, hardware vendors are building increasingly large systems capable of ...
It is generally accepted that On-Line Transaction Processing (OLTP) systems benefit from large database memory buffers. As enterprise database systems become larger and more complex, hardware vendors are building increasingly large systems capable of supporting huge memory configurations. Database vendors in turn are developing buffer schemes to exploit this physical memory.
How much will these developments benefit OLTP workloads? Through empirical studies on databases sized comparably to those seen in the real-world, this paper presents the characteristics of an industry-standard OLTP benchmark as memory buffer size changes. We design the experiments to investigate how the database size, the buffer size and the number of CPUs impact performance, in particular the throughput and the buffer hit rate on Symmetric Multiprocessor Systems. The relationships of these major database attributes are plotted and key observations are summarized. We discuss how these relationships change as the number of CPUs changes. We further quantify the relationships: 1) between database buffer data hit rate, buffer size and database size, 2) between throughput, buffer data hit rate and database size and 3) between throughput and number of CPUs. Algorithms, rules-of-thumb and examples are presented for predicting performance, sizing memory and making trade-offs between adding more memory and increasing the number of CPUs. expand
|
|
|
Database performance in the real world: TPC-D and SAP R/3 |
| |
Joachen Doppelhammer,
Thomas Höppler,
Alfons Kemper,
Donald Kossmann
|
|
Pages: 123-134 |
|
doi>10.1145/253260.253280 |
|
Full text: PDF
|
|
Traditionally, database systems have been evaluated in isolation on the basis of standardized benchmarks (e.g., Wisconsin, TPC-C, TPC-D). We argue that very often such a performance analysis does not reflect the actual use of the DBMSs in the ...
Traditionally, database systems have been evaluated in isolation on the basis of standardized benchmarks (e.g., Wisconsin, TPC-C, TPC-D). We argue that very often such a performance analysis does not reflect the actual use of the DBMSs in the “real world.” End users typically don't access a stand-alone database system; rather they use a comprehensive application system, in which the database system constitutes an integrated component. In order to derive performance evaluations of practical relevance to the end users, the application system including the database system has to be benchmarked. In this paper, we present TPC-D benchmark results carried out using the SAP R/3 system, an integrated business administration system. Like many other application systems SAP R/3 is based on a commercial relational database system. We compare the SAP R/3 benchmark results with TPC-D results of an isolated database system, the database product that served as SAP R/3's back-end. expand
|
|
|
The BUCKY object-relational benchmark |
| |
Michael J. Carey,
David J. DeWitt,
Jeffrey F. Naughton,
Mohammad Asgarian,
Paul Brown,
Johannes E. Gehrke,
Dhaval N. Shah
|
|
Pages: 135-146 |
|
doi>10.1145/253260.253283 |
|
Full text: PDF
|
|
According to various trade journals and corporate marketing machines, we are now on the verge of a revolution—the object-relational database revolution. Since we believe that no one should face a revolution without appropriate armaments, this paper ...
According to various trade journals and corporate marketing machines, we are now on the verge of a revolution—the object-relational database revolution. Since we believe that no one should face a revolution without appropriate armaments, this paper presents BUCKY, a new benchmark for object-relational database systems. BUCKY is a query-oriented benchmark that tests many of the key features offered by object-relational systems, including row types and inheritance, references and path expressions, sets of atomic values and of references, methods and late binding, and user-defined abstract data types and their methods. To test the maturity of object-relational technology relative to relational technology, we provide both an object-relational version of BUCKY and a relational equivalent thereof (i.e., a relational BUCKY simulation). Finally, we briefly discuss the initial performance results and lessons that resulted from applying BUCKY to one of the early object-relational database system products. expand
|
|
|
The STRIP rule system for efficiently maintaining derived data |
| |
Brad Adelberg,
Hector Garcia-Molina,
Jennifer Widom
|
|
Pages: 147-158 |
|
doi>10.1145/253260.253287 |
|
Full text: PDF
|
|
Derived data is maintained in a database system to correlate and summarize base data which records real world facts. As base data changes, derived data needs to be recomputed. This is often implemented by writing active rules that are triggered by changes ...
Derived data is maintained in a database system to correlate and summarize base data which records real world facts. As base data changes, derived data needs to be recomputed. This is often implemented by writing active rules that are triggered by changes to base data. In a system with rapidly changing base data, a database with a standard rule system may consume most of its resources running rules to recompute data. This paper presents the rule system implemented as part of the STandard Real-time Information Processor (STRIP). The STRIP rule system is an extension of SQL3-type rules that allows groups of rule actions to be batched together to reduce the total recomputation load on the system. In this paper we describe the syntax and semantics of the STRIP rule system, present an example set of rules to maintain stock index and theoretical option prices in a program trading application, and report the results of experiments performed on the running system. The experiments verify that STRIP's rules allow much more efficient derived data maintenance than conventional rules without batching. expand
|
|
|
An array-based algorithm for simultaneous multidimensional aggregates |
| |
Yihong Zhao,
Prasad M. Deshpande,
Jeffrey F. Naughton
|
|
Pages: 159-170 |
|
doi>10.1145/253260.253288 |
|
Full text: PDF
|
|
Computing multiple related group-bys and aggregates is one of the core operations of On-Line Analytical Processing (OLAP) applications. Recently, Gray et al. [GBLP95] proposed the “Cube” operator, which computes group-by aggregations over ...
Computing multiple related group-bys and aggregates is one of the core operations of On-Line Analytical Processing (OLAP) applications. Recently, Gray et al. [GBLP95] proposed the “Cube” operator, which computes group-by aggregations over all possible subsets of the specified dimensions. The rapid acceptance of the importance of this operator has led to a variant of the Cube being proposed for the SQL standard. Several efficient algorithms for Relational OLAP (ROLAP) have been developed to compute the Cube. However, to our knowledge there is nothing in the literature on how to compute the Cube for Multidimensional OLAP (MOLAP) systems, which store their data in sparse arrays rather than in tables. In this paper, we present a MOLAP algorithm to compute the Cube, and compare it to a leading ROLAP algorithm. The comparison between the two is interesting, since although they are computing the same function, one is value-based (the ROLAP algorithm) whereas the other is position-based (the MOLAP algorithm). Our tests show that, given appropriate compression techniques, the MOLAP algorithm is significantly faster than the ROLAP algorithm. In fact, the difference is so pronounced that this MOLAP algorithm may be useful for ROLAP systems as well as MOLAP systems, since in many cases, instead of cubing a table directly, it is faster to first convert the table to an array, cube the array, then convert the result back to a table. expand
|
|
|
Online aggregation |
| |
Joseph M. Hellerstein,
Peter J. Haas,
Helen J. Wang
|
|
Pages: 171-182 |
|
doi>10.1145/253260.253291 |
|
Full text: PDF
|
|
Aggregation in traditional database systems is performed in batch mode: a query is submitted, the system processes a large volume of data over a long period of time, and, eventually, the final answer is returned. This archaic approach is frustrating ...
Aggregation in traditional database systems is performed in batch mode: a query is submitted, the system processes a large volume of data over a long period of time, and, eventually, the final answer is returned. This archaic approach is frustrating to users and has been abandoned in most other areas of computing. In this paper we propose a new online aggregation interface that permits users to both observe the progress of their aggregation queries and control execution on the fly. After outlining usability and performance requirements for a system supporting online aggregation, we present a suite of techniques that extend a database system to meet these requirements. These include methods for returning the output in random order, for providing control over the relative rate at which different aggregates are computed, and for computing running confidence intervals. Finally, we report on an initial implementation of online aggregation in POSTGRES. expand
|
|
|
Balancing push and pull for data broadcast |
| |
Swarup Acharya,
Michael Franklin,
Stanley Zdonik
|
|
Pages: 183-194 |
|
doi>10.1145/253260.253293 |
|
Full text: PDF
|
|
The increasing ability to interconnect computers through internet-working, wireless networks, high-bandwidth satellite, and cable networks has spawned a new class of information-centered applications based on data dissemination. These ...
The increasing ability to interconnect computers through internet-working, wireless networks, high-bandwidth satellite, and cable networks has spawned a new class of information-centered applications based on data dissemination. These applications employ broadcast to deliver data to very large client populations. We have proposed the Broadcast Disks paradigm [Zdon94, Acha95b] for organizing the contents of a data broadcast program and for managing client resources in response to such a program. Our previous work on Broadcast Disks focused exclusively on the “push-based” approach, where data is sent out on the broadcast channel according to a periodic schedule, in anticipation of client requests. In this paper, we study how to augment the push-only model with a “pull-based” approach of using a backchannel to allow clients to send explicit requests for data to the server. We analyze the scalability and performance of a broadcast-based system that integrates push and pull and study the impact of this integration on both the steady state and warm-up performance of clients. Our results show that a client backchannel can provide significant performance improvement in the broadcast environment, but that unconstrained use of the backchannel can result in scalability problems due to server saturation. We propose and investigate a set of three techniques that can delay the onset of saturation and thus, enhance the performance and scalability of the system. expand
|
|
|
InfoSleuth: agent-based semantic integration of information in open and dynamic environments |
| |
R. J. Bayardo, Jr.,
W. Bohrer,
R. Brice,
A. Cichocki,
J. Fowler,
A. Helal,
V. Kashyap,
T. Ksiezyk,
G. Martin,
M. Nodine,
M. Rashid,
M. Rusinkiewicz,
R. Shea,
C. Unnikrishnan,
A. Unruh,
D. Woelk
|
|
Pages: 195-206 |
|
doi>10.1145/253260.253294 |
|
Full text: PDF
|
|
The goal of the InfoSleuth project at MCC is to exploit and synthesize new technologies into a unified system that retrieves and processes information in an ever-changing network of information sources. InfoSleuth has its roots in the Carnot project ...
The goal of the InfoSleuth project at MCC is to exploit and synthesize new technologies into a unified system that retrieves and processes information in an ever-changing network of information sources. InfoSleuth has its roots in the Carnot project at MCC, which specialized in integrating heterogeneous information bases. However, recent emerging technologies such as internetworking and the World Wide Web have significantly expanded the types, availability, and volume of data available to an information management system. Furthermore, in these new environments, there is no formal control over the registration of new information sources, and applications tend to be developed without complete knowledge of the resources that will be available when they are run. Federated database projects such as Carnot that do static data integration do not scale up and do not cope well with this ever-changing environment. On the other hand, recent Web technologies, based on keyword search engines, are scalable but, unlike federated databases, are incapable of accessing information based on concepts. In this experience paper, we describe the architecture, design, and implementation of a working version of InfoSleuth. We show how InfoSleuth integrates new technological developments such as agent technology, domain ontologies, brokerage, and internet computing, in support of mediated interoperation of data and services in a dynamic and open environment. We demonstrate the use of information brokering and domain ontologies as key elements for scalability. expand
|
|
|
STARTS: Stanford proposal for Internet meta-searching |
| |
Luis Gravano,
Chen-Chuan K. Chang,
Héctor García-Molina,
Andreas Paepcke
|
|
Pages: 207-218 |
|
doi>10.1145/253260.253299 |
|
Full text: PDF
|
|
Document sources are available everywhere, both within the internal networks of organizations and on the Internet. Even individual organizations use search engines from different vendors to index their internal document collections. These search engines ...
Document sources are available everywhere, both within the internal networks of organizations and on the Internet. Even individual organizations use search engines from different vendors to index their internal document collections. These search engines are typically incompatible in that they support different query models and interfaces, they do not return enough information with the query results for adequate merging of the results, and finally, in that they do not export metadata about the collections that they index (e.g., to assist in resource discovery). This paper describes STARTS, an emerging protocol for Internet retrieval and search that facilitates the task of querying multiple document sources. STARTS has been developed in a unique way. It is not a standard, but a group effort coordinated by Stanford's Digital Library project, and involving over 11 companies and organizations. The objective of this paper is not only to give an overview of the STARTS protocol proposal, but also to discuss the process that led to its definition. expand
|
|
|
On saying “Enough already!” in SQL |
| |
Michael J. Carey,
Donald Kossmann
|
|
Pages: 219-230 |
|
doi>10.1145/253260.253302 |
|
Full text: PDF
|
|
In this paper, we study a simple SQL extension that enables query writers to explicitly limit the cardinality of a query result. We examine its impact on the query optimization and run-time execution components of a relational DBMS, presenting ...
In this paper, we study a simple SQL extension that enables query writers to explicitly limit the cardinality of a query result. We examine its impact on the query optimization and run-time execution components of a relational DBMS, presenting two approaches—a Conservative approach and an Aggressive approach—to exploiting cardinality limits in relational query plans. Results obtained from an empirical study conducted using DB2 demonstrate the benefits of the SQL extension and illustrate the tradeoffs between our two approaches to implementing it. expand
|
|
|
A framework for implementing hypothetical queries |
| |
Timothy Griffin,
Richard Hull
|
|
Pages: 231-242 |
|
doi>10.1145/253260.253304 |
|
Full text: PDF
|
|
Previous approaches to supporting hypothetical queries have been “eager”: some representation of the hypothetical state (or the corresponding delta) is materialized, and query evaluation is filtered through that representation. This paper ...
Previous approaches to supporting hypothetical queries have been “eager”: some representation of the hypothetical state (or the corresponding delta) is materialized, and query evaluation is filtered through that representation. This paper develops a framework for evaluating hypothetical queries using a “lazy” approach, or using a hybrid of eager and lazy approaches.
We focus on queries having the form “Q when {{U}}” where Q is a relational algebra query and U is an update expression. The value assigned to this query in state DB is the value that Q would return in the state resulting from executing U on DB. Nesting of the keyword when is permitted, and U may involve a sequence of several atomic updates.
We present an equational theory for queries involving when that can be used as a basis for optimization. This theory is very different from traditional rules for the relational algebra, because the semantics of when is unlike the semantics of the algebra operators. Our theory is based on the observation that hypothetical states can be represented as substitutions, similar to those arising in functional and logic programming. Furthermore, hypothetical queries of the form Q when {{U}} can be thought of as representing the suspended application of a substitution. Using the equational theory we develop an approach to optimizing the evaluation of hypothetical queries that uses deltas in the sense of Heraclitus, and permits a range of evaluation strategies from lazy to eager. expand
|
|
|
High-performance sorting on networks of workstations |
| |
Andrea C. Arpaci-Dusseau,
Remzi H. Arpaci-Dusseau,
David E. Culler,
Joseph M. Hellerstein,
David A. Patterson
|
|
Pages: 243-254 |
|
doi>10.1145/253260.253322 |
|
Full text: PDF
|
|
We report the performance of NOW-Sort, a collection of sorting implementations on a Network of Workstations (NOW). We find that parallel sorting on a NOW is competitive to sorting on the large-scale SMPs that have traditionally held the performance records. ...
We report the performance of NOW-Sort, a collection of sorting implementations on a Network of Workstations (NOW). We find that parallel sorting on a NOW is competitive to sorting on the large-scale SMPs that have traditionally held the performance records. On a 64-node cluster, we sort 6.0 GB in just under one minute, while a 32-node cluster finishes the Datamation benchmark in 2.41 seconds.
Our implementations can be applied to a variety of disk, memory, and processor configurations; we highlight salient issues for tuning each component of the system. We evaluate the use of commodity operating systems and hardware for parallel sorting. We find existing OS primitives for memory management and file access adequate. Due to aggregate communication and disk bandwidth requirements, the bottleneck of our system is the workstation I/O bus. expand
|
|
|
Dynamic itemset counting and implication rules for market basket data |
| |
Sergey Brin,
Rajeev Motwani,
Jeffrey D. Ullman,
Shalom Tsur
|
|
Pages: 255-264 |
|
doi>10.1145/253260.253325 |
|
Full text: PDF
|
|
We consider the problem of analyzing market-basket data and present several important contributions. First, we present a new algorithm for finding large itemsets which uses fewer passes over the data than classic algorithms, and yet uses fewer candidate ...
We consider the problem of analyzing market-basket data and present several important contributions. First, we present a new algorithm for finding large itemsets which uses fewer passes over the data than classic algorithms, and yet uses fewer candidate itemsets than methods based on sampling. We investigate the idea of item reordering, which can improve the low-level efficiency of the algorithm. Second, we present a new way of generating “implication rules,” which are normalized based on both the antecedent and the consequent and are truly implications (not simply a measure of co-occurrence), and we show how they produce more intuitive results than other methods. Finally, we show how different characteristics of real data, as opposed by synthetic data, can dramatically affect the performance of the system and the form of the results. expand
|
|
|
Beyond market baskets: generalizing association rules to correlations |
| |
Sergey Brin,
Rajeev Motwani,
Craig Silverstein
|
|
Pages: 265-276 |
|
doi>10.1145/253260.253327 |
|
Full text: PDF
|
|
One of the most well-studied problems in data mining is mining for association rules in market basket data. Association rules, whose significance is measured via support and confidence, are intended to identify rules of the type, “A customer purchasing ...
One of the most well-studied problems in data mining is mining for association rules in market basket data. Association rules, whose significance is measured via support and confidence, are intended to identify rules of the type, “A customer purchasing item A often also purchases item B.” Motivated by the goal of generalizing beyond market baskets and the association rules used with them, we develop the notion of mining rules that identify correlations (generalizing associations), and we consider both the absence and presence of items as a basis for generating rules. We propose measuring significance of associations via the chi-squared test for correlation from classical statistics. This leads to a measure that is upward closed in the itemset lattice, enabling us to reduce the mining problem to the search for a border between correlated and uncorrelated itemsets in the lattice. We develop pruning strategies and devise an efficient algorithm for the resulting problem. We demonstrate its effectiveness by testing it on census data and finding term dependence in a corpus of text documents, as well as on synthetic data. expand
|
|
|
Scalable parallel data mining for association rules |
| |
Eui-Hong Han,
George Karypis,
Vipin Kumar
|
|
Pages: 277-288 |
|
doi>10.1145/253260.253330 |
|
Full text: PDF
|
|
One of the important problems in data mining is discovering association rules from databases of transactions where each transaction consists of a set of items. The most time consuming operation in this discovery process is the computation of the frequency ...
One of the important problems in data mining is discovering association rules from databases of transactions where each transaction consists of a set of items. The most time consuming operation in this discovery process is the computation of the frequency of the occurrences of interesting subset of items (called candidates) in the database of transactions. To prune the exponentially large space of candidates, most existing algorithms, consider only those candidates that have a user defined minimum support. Even with the pruning, the task of finding all association rules requires a lot of computation power and time. Parallel computers offer a potential solution to the computation requirement of this task, provided efficient and scalable parallel algorithms can be designed. In this paper, we present two new parallel algorithms for mining association rules. The Intelligent Data Distribution algorithm efficiently uses aggregate memory of the parallel computer by employing intelligent candidate partitioning scheme and uses efficient communication mechanism to move data among the processors. The Hybrid Distribution algorithm further improves upon the Intelligent Data Distribution algorithm by dynamically partitioning the candidate set to maintain good load balance. The experimental results on a Cray T3D parallel computer show that the Hybrid Distribution algorithm scales linearly and exploits the aggregate memory better and can generate more association rules with a single scan of database per pass. expand
|
|
|
Efficiently supporting ad hoc queries in large datasets of time sequences |
| |
Flip Korn,
H. V. Jagadish,
Christos Faloutsos
|
|
Pages: 289-300 |
|
doi>10.1145/253260.253332 |
|
Full text: PDF
|
|
Ad hoc querying is difficult on very large datasets, since it is usually not possible to have the entire dataset on disk. While compression can be used to decrease the size of the dataset, compressed data is notoriously difficult to index or access.
In ...
Ad hoc querying is difficult on very large datasets, since it is usually not possible to have the entire dataset on disk. While compression can be used to decrease the size of the dataset, compressed data is notoriously difficult to index or access.
In this paper we consider a very large dataset comprising multiple distinct time sequences. Each point in the sequence is a numerical value. We show how to compress such a dataset into a format that supports ad hoc querying, provided that a small error can be tolerated when the data is uncompressed. Experiments on large, real world datasets (AT&T customer calling patterns) show that the proposed method achieves an average of less than 5% error in any data value after compressing to a mere 2.5% of the original space (i.e., a 40:1 compression ratio), with these numbers not very sensitive to dataset size. Experiments on aggregate queries achieved a 0.5% reconstruction error with a space requirement under 2%. expand
|
|
|
DEVise: integrated querying and visual exploration of large datasets |
| |
M. Livny,
R. Ramakrishnan,
K. Beyer,
G. Chen,
D. Donjerkovic,
S. Lawande,
J. Myllymaki,
K. Wenger
|
|
Pages: 301-312 |
|
doi>10.1145/253260.253335 |
|
Full text: PDF
|
|
DEVise is a data exploration system that allows users to easily develop, browse, and share visual presentation of large tabular datasets (possibly containing or referencing multimedia objects) from several sources. The DEVise framework is being implemented ...
DEVise is a data exploration system that allows users to easily develop, browse, and share visual presentation of large tabular datasets (possibly containing or referencing multimedia objects) from several sources. The DEVise framework is being implemented in a tool that has been already successfully applied to a variety of real applications by a number of user groups.
Our emphasis is on developing an intuitive yet powerful set of querying and visualization primitives that can be easily combined to develop a rich set of visual presentations that integrate data from a wide range of application domains. While DEVise is a powerful visualization tool, its greatest strengths are the ability to interactively explore a visual presentation of the data at any level of detail (including retrieving individual data records), and the ability to seamlessly query and combine data from a variety of local and remote sources. In this paper, we present the DEVise framework, describe the current tool, and report on our experience in applying it to several real applications. expand
|
|
|
Partitioned garbage collection of a large object store |
| |
Umesh Maheshwari,
Barbara Liskov
|
|
Pages: 313-323 |
|
doi>10.1145/253260.253338 |
|
Full text: PDF
|
|
We present new techniques for efficient garbage collection in a large persistent object store. The store is divided into partitions that are collected independently using information about inter-partition references. This information is maintained on ...
We present new techniques for efficient garbage collection in a large persistent object store. The store is divided into partitions that are collected independently using information about inter-partition references. This information is maintained on disk so that it can be recovered after a crash. We use new techniques to organize and update this information while avoiding disk accesses. We also present a new global marking scheme to collect cyclic garbage across partitions. Global marking is piggybacked on partitioned collection; the result is an efficient scheme that preserves the localized nature of partitioned collection, yet is able to collect all garbage.
We have implemented the part of garbage collection responsible for maintaining information about inter-partition references. We present a performance study to evaluate this work; the results show that our techniques result in substantial savings in the usage of disk and memory. expand
|
|
|
Size separation spatial join |
| |
Nick Koudas,
Kenneth C. Sevcik
|
|
Pages: 324-335 |
|
doi>10.1145/253260.253340 |
|
Full text: PDF
|
|
We introduce a new algorithm to compute the spatial join of two or more spatial data sets, when indexes are not available on them. Size Separation Spatial Join (S3J) imposes a hierarchical decomposition ...
We introduce a new algorithm to compute the spatial join of two or more spatial data sets, when indexes are not available on them. Size Separation Spatial Join (S3J) imposes a hierarchical decomposition of the data space and, in contrast with previous approaches, requires no replication of entities from the input data sets. Thus its execution time depends only on the sizes of the joined data sets.
We describe S3J and present an analytical evaluation of its I/O and processor requirements comparing them with those of previously proposed algorithms for the same problem. We show that S3J has relatively simple cost estimation formulas that can be exploited by a query optimizer. S3J can be efficiently implemented using software already present in many relational systems. In addition, we introduce Dynamic Spatial Bitmaps (DSB), a new technique that enables S3J to dynamically or statically exploit bitmap query processing techniques.
Finally, we present experimental results for a prototype implementation of S3J involving real and synthetic data sets for a variety of data distributions. Our experimental results are consistent with our analytical observations and demonstrate the performance benefits of S3J over alternative approaches that have been proposed recently. expand
|
|
|
Building a scaleable geo-spatial DBMS: technology, implementation, and evaluation |
| |
Jignesh Patel,
JieBing Yu,
Navin Kabra,
Kristin Tufte,
Biswadeep Nag,
Josef Burger,
Nancy Hall,
Karthikeyan Ramasamy,
Roger Lueder,
Curt Ellmann,
Jim Kupsch,
Shelly Guo,
Johan Larson,
David De Witt,
Jeffrey Naughton
|
|
Pages: 336-347 |
|
doi>10.1145/253260.253342 |
|
Full text: PDF
|
|
This paper presents a number of new techniques for parallelizing geo-spatial database systems and discusses their implementation in the Paradise object-relational database system. The effectiveness of these techniques is demonstrated using a variety ...
This paper presents a number of new techniques for parallelizing geo-spatial database systems and discusses their implementation in the Paradise object-relational database system. The effectiveness of these techniques is demonstrated using a variety of complex geo-spatial queries over a 120 GB global geo-spatial data set. expand
|
|
|
A toolkit for negotiation support interfaces to multi-dimensional data |
| |
Michael Gebhardt,
Matthias Jarke,
Stephan Jacobs
|
|
Pages: 348-356 |
|
doi>10.1145/253260.253344 |
|
Full text: PDF
|
|
CoDecide is an experimental user interface toolkit that offers an extension to spreadsheet concepts specifically geared towards support for cooperative analysis of the kinds of multi-dimensional data encountered in data warehousing. It is distinguished ...
CoDecide is an experimental user interface toolkit that offers an extension to spreadsheet concepts specifically geared towards support for cooperative analysis of the kinds of multi-dimensional data encountered in data warehousing. It is distinguished from previous proposals by direct support for drill-down/roll-up analysis without redesign of an interface; more importantly, CoDecide can link multiple views on a data cube for synchronous or asynchronoous cooperation by multiple analysts, through a conceptual model visualizing the problem dimensions on so-called tapes. Tapes generalize the ideas of ranging and pivoting in current data warehouses for the multi-perspective and multi-user case. CoDecide allows the rapid composition of multi-matrix interfaces and their linkage to underlying data sources. A LAN version of CoDecide has been used in a number of design decision support applications. A WWW version representing externally materialized views on databases is currently under development. expand
|
|
|
Distance-based indexing for high-dimensional metric spaces |
| |
Tolga Bozkaya,
Meral Ozsoyoglu
|
|
Pages: 357-368 |
|
doi>10.1145/253260.253345 |
|
Full text: PDF
|
|
In many database applications, one of the common queries is to find approximate matches to a given query item from a collection of data items. For example, given an image database, one may want to retrieve all images that are similar to a given query ...
In many database applications, one of the common queries is to find approximate matches to a given query item from a collection of data items. For example, given an image database, one may want to retrieve all images that are similar to a given query image. Distance based index structures are proposed for applications where the data domain is high dimensional, or the distance function used to compute distances between data objects is non-Euclidean. In this paper, we introduce a distance based index structure called multi-vantage point (mvp) tree for similarity queries on high-dimensional metric spaces. The mvp-tree uses more than one vantage point to partition the space into spherical cuts at each level. It also utilizes the pre-computed (at construction time) distances between the data points and the vantage points. We have done experiments to compare mvp-trees with vp-trees which have a similar partitioning strategy, but use only one vantage point at each level, and do not make use of the pre-computed distances. Empirical studies show that mvp-tree outperforms the vp-tree 20% to 80% for varying query ranges and different distance distributions. expand
|
|
|
The SR-tree: an index structure for high-dimensional nearest neighbor queries |
| |
Norio Katayama,
Shin'ichi Satoh
|
|
Pages: 369-380 |
|
doi>10.1145/253260.253347 |
|
Full text: PDF
|
|
Recently, similarity queries on feature vectors have been widely used to perform content-based retrieval of images. To apply this technique to large databases, it is required to develop multidimensional index structures supporting nearest neighbor queries ...
Recently, similarity queries on feature vectors have been widely used to perform content-based retrieval of images. To apply this technique to large databases, it is required to develop multidimensional index structures supporting nearest neighbor queries efficiently. The SS-tree had been proposed for this purpose and is known to outperform other index structures such as the R*-tree and the K-D-B-tree. One of its most important features is that it employs bounding spheres rather than bounding rectangles for the shape of regions. However, we demonstrate in this paper that bounding spheres occupy much larger volume than bounding rectangles with high-dimensional data and that this reduces search efficiency. To overcome this drawback, we propose a new index structure called the SR-tree (Sphere/Rectangle-tree) which integrates bounding spheres and bounding rectangles. A region of the SR-tree is specified by the intersection of a bounding sphere and a bounding rectangle. Incorporating bounding rectangles permits neighborhoods to be partitioned into smaller regions than the SS-tree and improves the disjointness among regions. This enhances the performance on nearest neighbor queries especially for high-dimensional and non-uniform data which can be practical in actual image/video similarity indexing. We include the performance test results the verify this advantage of the SR-tree and show that the SR-tree outperforms both the SS-tree and the R*-tree. expand
|
|
|
Wave-indices: indexing evolving databases |
| |
Narayanan Shivakumar,
Héctor García-Molina
|
|
Pages: 381-392 |
|
doi>10.1145/253260.253349 |
|
Full text: PDF
|
|
In many applications, new data is being generated every day. Often an index of the data of a past window of days is required to answer queries efficiently. For example, in a warehouse one may need an index on the sales records of the last week for efficient ...
In many applications, new data is being generated every day. Often an index of the data of a past window of days is required to answer queries efficiently. For example, in a warehouse one may need an index on the sales records of the last week for efficient data mining, or in a Web service one may provide an index of Netnews articles of the past month. In this paper, we propose a variety of wave indices where the data of a new day can be efficiently added, and old data can be quickly expired, to maintain the required window. We compare these schemes based on several system performance measures, such as storage, query response time, and maintenance work, as well as on their simplicity and ease of coding. expand
|
|
|
On-line warehouse view maintenance |
| |
Dallan Quass,
Jennifer Widom
|
|
Pages: 393-404 |
|
doi>10.1145/253260.253352 |
|
Full text: PDF
|
|
Data warehouses store materialized views over base data from external sources. Clients typically perform complex read-only queries on the views. The views are refreshed periodically by maintenance transactions, which propagate large ...
Data warehouses store materialized views over base data from external sources. Clients typically perform complex read-only queries on the views. The views are refreshed periodically by maintenance transactions, which propagate large batch updates from the base tables. In current warehousing systems, maintenance transactions usually are isolated from client read activity, limiting availability and/or size of the warehouse. We describe an algorithm called 2VNL that allows warehouse maintenance transactions to run concurrently with readers. By logically maintaining two versions of the database, no locking is required and serializability is guaranteed. We present our algorithm, explain its relationship to other multi-version concurrency control algorithms, and describe how it can be implemented on top of a conventional relational DBMS using a query rewrite approach. expand
|
|
|
Supporting multiple view maintenance policies |
| |
Latha S. Colby,
Akira Kawaguchi,
Daniel F. Lieuwen,
Inderpal Singh Mumick,
Kenneth A. Ross
|
|
Pages: 405-416 |
|
doi>10.1145/253260.253353 |
|
Full text: PDF
|
|
Materialized views and view maintenance are becoming increasingly important in practice. In order to satisfy different data currency and performance requirements, a number of view maintenance policies have been proposed. Immediate maintenance involves ...
Materialized views and view maintenance are becoming increasingly important in practice. In order to satisfy different data currency and performance requirements, a number of view maintenance policies have been proposed. Immediate maintenance involves a potential refresh of the view after every update to the deriving tables. When staleness of views can be tolerated, a view may be refreshed periodically or (on-demand) when it is queried. The maintenance policies that are chosen for views have implications on the validity of the results of queries and affect the performance of queries and updates. In this paper, we investigate a number of issues related to supporting multiple views with different maintenance policies.
We develop formal notions of consistency for views with different maintenance policies. We then introduce a model based on view groupings for view maintenance policy assignment, and provide algorithms, based on the viewgroup model, that allow consistency of views to be guaranteed. Next, we conduct a detailed study of the performance aspects of view maintenance policies based on an actual implementation of our model. The performance study investigates the trade-offs between different maintenance policy assignments. Our analysis of both the consistency and performance aspects of various view maintenance policies are important in making correct maintenance policy assignments. expand
|
|
|
Efficient view maintenance at data warehouses |
| |
D. Agrawal,
A. El Abbadi,
A. Singh,
T. Yurek
|
|
Pages: 417-427 |
|
doi>10.1145/253260.253355 |
|
Full text: PDF
|
|
We present incremental view maintenance algorithms for a data warehouse derived from multiple distributed autonomous data sources. We begin with a detailed framework for analyzing view maintenance algorithms for multiple data sources with concurrent ...
We present incremental view maintenance algorithms for a data warehouse derived from multiple distributed autonomous data sources. We begin with a detailed framework for analyzing view maintenance algorithms for multiple data sources with concurrent updates. Earlier approaches for view maintenance in the presence of concurrent updates typically require two types of messages: one to compute the view change due to the initial update and the other to compensate the view change due to interfering concurrent updates. The algorithms developed in this paper instead perform the compensation locally by using the information that is already available at the data warehouse. The first algorithm, termed SWEEP, ensures complete consistency of the view at the data warehouse in the presence of concurrent updates. Previous algorithms for incremental view maintenance either required a quiescent state at the data warehouse or required an exponential number of messages in terms of the data sources. In contrast, this algorithm does not require that the data warehouse be in a quiescent state for incorporating the new views and also the message complexity is linear in the number of data sources. The second algorithm, termed Nested SWEEP, attempts to compute a composite view change for multiple updates that occur concurrently while maintaining strong consistency. expand
|
|
|
Eliminating costly redundant computations from SQL trigger executions |
| |
François Llirbat,
Françoise Fabret,
Eric Simon
|
|
Pages: 428-439 |
|
doi>10.1145/253260.253357 |
|
Full text: PDF
|
|
Active database systems are now in widespread use. The use of triggers in these systems, however, is difficult because of the complex interaction between triggers, transactions, and application programs. Repeated calculations of rules may incur costly ...
Active database systems are now in widespread use. The use of triggers in these systems, however, is difficult because of the complex interaction between triggers, transactions, and application programs. Repeated calculations of rules may incur costly redundant computations in rule conditions and actions. In this paper, we focus on active relational database systems supporting SQL triggers. In this context, we provide a powerful and complete solution to eliminate redundant computations of SQL triggers when they are costly. We define a model to describe programs, rules and their interactions. We provide algorithms to extract invariant subqueries from trigger's condition and action. We define heuristics to memorize the most “profitable” invariants. Finally, we develop a rewriting technique that enables to generate and execute the optimized code of SQL triggers. expand
|
|
|
Temporal aggregation in active database rules |
| |
Iakovos Motakis,
Carlo Zaniolo
|
|
Pages: 440-451 |
|
doi>10.1145/253260.253359 |
|
Full text: PDF
|
|
An important feature of many advanced active database prototypes is support for rules triggered by complex patterns of events. Their composite event languages provide powerful primitives for event-based temporal reasoning. In fact, with one important ...
An important feature of many advanced active database prototypes is support for rules triggered by complex patterns of events. Their composite event languages provide powerful primitives for event-based temporal reasoning. In fact, with one important exception, their expressive power matches and surpasses that of sophisticated languages offered by Time Series Management Systems (TSMS), which have been extensively used for temporal data analysis and knowledge discovery. This exception pertains to temporal aggregation, for which, current active database systems offer only minimal support, if any.
In this paper, we introduce the language TREPL, which addresses this problem. The TREPL prototype, under development at UCLA, offers primitives for temporal aggregation that exceed the capabilities of state-of-the-art composite event languages, and are comparable to those of TSMS languages. TREPL also demonstrates a rigorous and general approach to the definition of composite event language semantics. The meaning of a TREPL rule is formally defined by mapping it into a set of Datalog1S rules, whose logic-based semantics characterizes the behavior of the original rule. This approach handles naturally temporal aggregates, including user-defined ones, and is also applicable to other composite event languages, such as ODE, Snoop and SAMOS. expand
|
|
|
Association rules over interval data |
| |
R. J. Miller,
Y. Yang
|
|
Pages: 452-461 |
|
doi>10.1145/253260.253361 |
|
Full text: PDF
|
|
We consider the problem of mining association rules over interval data (that is, ordered data for which the separation between data points has meaning). We show that the measures of what rules are most important (also called rule interest) ...
We consider the problem of mining association rules over interval data (that is, ordered data for which the separation between data points has meaning). We show that the measures of what rules are most important (also called rule interest) that are used for mining nominal and ordinal data do not capture the semantics of interval data. In the presence of interval data, support and confidence are no longer intuitive measures of the interest of a rule. We propose a new definition of interest for association rules that takes into account the semantics of interval data. We developed an algorithm for mining association rules under the new definition and overview our experience using the algorithm on large real-life datasets. expand
|
|
|
Secure transaction processing in firm real-time database systems |
| |
Binto George,
Jayant Haritsa
|
|
Pages: 462-473 |
|
doi>10.1145/253260.253362 |
|
Full text: PDF
|
|
Many real-time database applications arise in safety-critical installations and military systems where enforcing security is crucial to the success of the enterprise. A secure real-time database system has to simultaneously satisfy who requirements guarantee ...
Many real-time database applications arise in safety-critical installations and military systems where enforcing security is crucial to the success of the enterprise. A secure real-time database system has to simultaneously satisfy who requirements guarantee data security and minimize the number of missed transaction deadlines. We investigate here the performance implications, in terms of missed deadlines, of guaranteeing security in a real-time database system. In particular, we focus on the concurrency control aspects of this issue.
Our main contributions are the following: First, we identify which among the previously proposed real-time concurrency control protocols are capable of providing protection against both direct and indirect (covert channels) means of unauthorized access to data. Second, using a detailed simulation model of a firm-deadline real-time database system, we profile the real-time performance of a representative set of these secure concurrency control protocols. Our experiments show that a prioritized optimistic concurrency control protocol. OPT-WAIT, provides the best overall performance. Third, we propose and evaluate a novel dual approach to secure transaction concurrency control that allows the real-time database system to simultaneously use different concurrency control mechanisms for guaranteeing security and for improving real-time performance. By appropriately choosing these different mechanisms, we have been able to design hybrid concurrency control algorithms that provide even better performance than OPT-WAIT. expand
|
|
|
A unified framework for enforcing multiple access control policies |
| |
Sushil Jajodia,
Pierangela Samarati,
V. S. Subrahmanian,
Eliza Bertino
|
|
Pages: 474-485 |
|
doi>10.1145/253260.253364 |
|
Full text: PDF
|
|
Although several access control policies can be devised for controlling access to information, all existing authorization models, and the corresponding enforcement mechanisms, are based on a specific policy (usually the closed policy). ...
Although several access control policies can be devised for controlling access to information, all existing authorization models, and the corresponding enforcement mechanisms, are based on a specific policy (usually the closed policy). As a consequence, although different policy choices are possible in theory, in practice only a specific policy can be actually applied within a given system. However, protection requirements within a system can vary dramatically, and no single policy may simultaneously satisfy them all.
In this paper we present a flexible authorization manager (FAM) that can enforce multiple access control policies within a single, unified system. FAM is based on a language through which users can specify authorizations and access control policies to be applied in controlling execution of specific actions on given objects. We formally define the language and properties required to hold on the security specifications and prove that this language can express all security specifications. Furthermore, we show that all programs expressed in this language (called FAM/CAM-programs) are also guaranteed to be consistent (i.e., no conflicting access decisions occur) and CAM-programs are complete (i.e., every access is either authorized or denied). We then illustrate how several well-known protection policies proposed in the literature can be expressed in the FAM/CAM language and how users can customize the access control by specifying their own policies. The result is an access control mechanism which is flexible, since different access control policies can all coexist in the same data system, and extensible, since it can be augmented with any new policy a specific application or user may require. expand
|
|
|
Revisiting commit processing in distributed database systems |
| |
Ramesh Gupta,
Jayant Haritsa,
Krithi Ramamritham
|
|
Pages: 486-497 |
|
doi>10.1145/253260.253366 |
|
Full text: PDF
|
|
A significant body of literature is available on distributed transaction commit protocols. Surprisingly, however, the relative merits of these protocols have not been studied with respect to their quantitative impact on transaction processing ...
A significant body of literature is available on distributed transaction commit protocols. Surprisingly, however, the relative merits of these protocols have not been studied with respect to their quantitative impact on transaction processing performance. In this paper, using a detailed simulation model of a distributed database system, we profile the transaction throughput performance of a representative set of commit protocols. A new commit protocol, OPT, that allows transactions to “optimistically” borrow uncommitted data in a controlled manner is also proposed and evaluated. The new protocol is easy to implement and incorporate in current systems, and can coexist with most other optimizations proposed earlier. For example, OPT can be combined with current industry standard protocols such as Presumed Commit and Presumed Abort.
The experimental results show that distributed commit processing can have considerably more influence than distributed data processing on the throughput performance and that the choice of commit protocol clearly affects the magnitude of this influence. Among the protocols evaluated, the new optimistic commit protocol provides the best transaction throughput performance for a variety of workloads and system configurations. In fact, OPT's peak throughput is often close to the upper bound on achievable performance. Even more interestingly, a three-phase (i.e., non-blocking) version of OPT provides better peak throughput performance than all of the standard two-phase (i.e., blocking protocols evaluated in our study. expand
|
|
|
Lessons from Wall Street: case studies in configuration, tuning, and distribution |
| |
Dennis Shasha
|
|
Pages: 498-501 |
|
doi>10.1145/253260.253368 |
|
Full text: PDF
|
|
Consider a setting in which
Database speed and reliability can make the difference between prosperity and ruin.
Money for information systems is no object.
Data must be accessible from many points on the globe with ...
Consider a setting in which
- Database speed and reliability can make the difference between prosperity and ruin.
- Money for information systems is no object.
- Data must be accessible from many points on the globe with subsecond response.
The financial industry is exactly such an environment.
This tutorial presents case studies in configuration, tuning, and distribution drawn from financial applications. The cases suggest both research and product issues and so should be of interest to the entire Sigmod community. expand
|
|
|
Object-relational database systems (tutorial): principles, products and challenges |
| |
Michael J. Carey,
Nelson M. Mattos,
Anil K. Nori
|
|
Page: 502 |
|
doi>10.1145/253260.253370 |
|
Full text: PDF
|
|
Object-relational database systems, a.k.a. “universal servers,” are emerging as the next major generation of commercial database system technology. Products from relational DBMS vendors including IBM, Informix, Oracle, UniSQL, and others, ...
Object-relational database systems, a.k.a. “universal servers,” are emerging as the next major generation of commercial database system technology. Products from relational DBMS vendors including IBM, Informix, Oracle, UniSQL, and others, include object-relational features today, and all of the major vendors appear to be on course to delivering full object-relational support in their products over the next few years. In addition, the SQL3 standard is rapidly solidifying in this area. The goal of this tutorial is to explain what the key features are of object-relational database systems, review what today's products provide, and then look ahead to where these systems are heading. The presentation will be aimed at general SIGMOD audience, and should therefore be appropriate for users, practitioners, and/or researchers who want to learn about object-relational database systems. expand
|
|
|
Databases on the Web: technologies for federation architectures and case studies |
| |
Ralf Kramer
|
|
Pages: 503-506 |
|
doi>10.1145/253260.253372 |
|
Full text: PDF
|
|
|
|
|
Data warehousing and OLAP for decision support |
| |
Surajit Chaudhuri,
Umeshwar Dayal
|
|
Pages: 507-508 |
|
doi>10.1145/253260.253373 |
|
Full text: PDF
|
|
On-Line Analytical Processing (OLAP) and Data Warehousing are decision support technologies. Their goal is to enable enterprises to gain competitive advantage by exploiting the ever-growing amount of data that is collected and stored ...
On-Line Analytical Processing (OLAP) and Data Warehousing are decision support technologies. Their goal is to enable enterprises to gain competitive advantage by exploiting the ever-growing amount of data that is collected and stored in corporate databases and files for better and faster decision making. Over the past few years, these technologies have experienced explosive growth, both in the number of products and services offered, and in the extent of coverage in the trade press. Vendors, including all database companies, are paying increasing attention to all aspects of decision support. expand
|
|
|
Query optimization at the crossroads |
| |
Surajit Chaudhuri
|
|
Page: 509 |
|
doi>10.1145/253260.253374 |
|
Full text: PDF
|
|
|
|
|
Delaunay: a database visualization system |
| |
Isabel F. Cruz,
M. Averbuch,
Wendy T. Lucas,
Melissa Radzyminski,
Kirby Zhang
|
|
Pages: 510-513 |
|
doi>10.1145/253260.253376 |
|
Full text: PDF
|
|
Visual query systems have traditionally supported a set of pre-defined visual displays. We describe the Delaunay system, which supports visualizations of object-oriented databases specified by the user with a visual constraint-based ...
Visual query systems have traditionally supported a set of pre-defined visual displays. We describe the Delaunay system, which supports visualizations of object-oriented databases specified by the user with a visual constraint-based query language. The highlights of our approach are the expressiveness of the visual query language, the efficiency of the query engine, and the overall flexibility and extensibility of the framework. The user interface is implemented using Java and is available on the WWW. expand
|
|
|
Picture programming project |
| |
Nita Goyal,
Charles Hoch,
Ravi Krishnamurthy,
Brian Meckler,
Michael Suchow,
Moshe Zloof
|
|
Pages: 514-516 |
|
doi>10.1145/253260.253377 |
|
Full text: PDF
|
|
|
|
|
DEVise (demo abstract): integrated querying and visual exploration of large datasets |
| |
M. Livny,
R. Ramakrishnan,
K. Beyer,
G. Chen,
D. Donjerkovic,
S. Lawande,
J. Myllymaki,
K. Wenger
|
|
Pages: 517-520 |
|
doi>10.1145/253260.253379 |
|
Full text: PDF
|
|
DEVise is a data exploration system that allows users to easily develop, browse, and share visual presentations of large tabular datasets (possibly containing or referencing multimedia objects) from several sources. The DEVise framework, implemented ...
DEVise is a data exploration system that allows users to easily develop, browse, and share visual presentations of large tabular datasets (possibly containing or referencing multimedia objects) from several sources. The DEVise framework, implemented in a tool that has been already successfully applied to a variety of real applications by a number of user groups, makes several contributions. In particular, it combines support for extended relational queries with powerful data visualization features. Datasets much larger than available main memory can be handled—DEVise is currently being used to visualize datasets well in excess of 100MB—and data can be interactively examined at several levels of detail: all the way from meta-data summarizing the entire dataset, to large subsets of the actual data, to individual data records. Combining querying (in general, data processing) with visualizations gives us a very versatile tool, and presents several novel challenges.
Our emphasis is on developing an intuitive yet powerful set of querying and visualization primitives that can be easily combined to develop a rich set of visual presentations that integrate data from a wide range of application domains. In this demo, we will present a number of examples of the use of the DEVise tool for visualizing and interactively exploring very large datasets, and report on our experience in applying it to several real applications. expand
|
|
|
SEMCOG: an object-based image retrieval system and its visual query interface |
| |
Wen-Syan Li,
K. Selçuk Candan,
Kyoji Hirata,
Yoshinori Hara
|
|
Pages: 521-524 |
|
doi>10.1145/253260.253384 |
|
Full text: PDF
|
|
|
|
|
The Context Interchange mediator prototype |
| |
S. Bressan,
C. H. Goh,
K. Fynn,
M. Jakobisiak,
K. Hussein,
H. Kon,
T. Lee,
S. Madnick,
T. Pena,
J. Qu,
A. Shum,
M. Siegel
|
|
Pages: 525-527 |
|
doi>10.1145/253260.253389 |
|
Full text: PDF
|
|
The Context Interchange strategy presents a novel approach for mediated data access in which semantic conflicts among heterogeneous systems are not identified a priori, but are detected and reconciled by a context mediator ...
The Context Interchange strategy presents a novel approach for mediated data access in which semantic conflicts among heterogeneous systems are not identified a priori, but are detected and reconciled by a context mediator through comparison of contexts. This paper reports on the implementation of a Context Interchange Prototype which provides a concrete demonstration of the features and benefits of this integration strategy. expand
|
|
|
MDM: a multiple-data model tool for the management of heterogeneous database schemes |
| |
Paolo Atzeni,
Riccardo Torlone
|
|
Pages: 528-531 |
|
doi>10.1145/253260.253393 |
|
Full text: PDF
|
|
MDM is a tool that enables the users to define schemes of different data models and to perform translations of schemes from one model to another. These functionalities can be at the basis of a customizable and integrated CASE environment supporting the ...
MDM is a tool that enables the users to define schemes of different data models and to perform translations of schemes from one model to another. These functionalities can be at the basis of a customizable and integrated CASE environment supporting the analysis and design of information systems. MDM has two main components: the Model Manager and the Schema Manager. The Model Manager supports a specialized user, the model engineer, in the definition of a variety of models, on the basis of a limited set of metaconstructs covering almost all known conceptual models. The Schema Manager allows designers to create and modify schemes over the defined models, and to generate at each time a translation of a scheme into any of the data models currently available. Translations between models are automatically derived, at definition time, by combining a predefined set of elementary transformations, which implement the standard translations between simple combinations of constructs. expand
|
|
|
Template-based wrappers in the TSIMMIS system |
| |
Joachim Hammer,
Héctor García-Molina,
Svetlozar Nestorov,
Ramana Yerneni,
Marcus Breunig,
Vasilis Vassalos
|
|
Pages: 532-535 |
|
doi>10.1145/253260.253395 |
|
Full text: PDF
|
|
In order to access information from a variety of heterogeneous information sources, one has to be able to translate queries and data from one data model into another. This functionality is provided by so-called (source) wrappers [4,8] ...
In order to access information from a variety of heterogeneous information sources, one has to be able to translate queries and data from one data model into another. This functionality is provided by so-called (source) wrappers [4,8] which convert queries into one or more commands/queries understandable by the underlying source and transform the native results into a format understood by the application. As part of the TSIMMIS project [1, 6] we have developed hard-coded wrappers for a variety of sources (e.g., Sybase DBMS, WWW pages, etc.) including legacy systems (Folio). However, anyone who has built a wrapper before can attest that a lot of effort goes into developing and writing such a wrapper. In situations where it is important or desirable to gain access to new sources quickly, this is a major drawback. Furthermore, we have also observed that only a relatively small part of the code deals with the specific access details of the source. The rest of the code is either common among wrappers or implements query and data transformation that could be expressed in a high level, declarative fashion.
Based on these observations, we have developed a wrapper implementation toolkit [7] for quickly building wrappers. The toolkit contains a library for commonly used functions, such as for receiving queries from the application and packaging results. It also contains a facility for translating queries into source-specific commands, and for translating results into a model useful to the application. The philosophy behind our “template-based” translation methodology is as follows. The wrapper implementor specifies a set of templates (rules) written in a high level declarative language that describe the queries accepted by the wrapper as well as the objects that it returns. If an application query matches a template, an implementor-provided action associated with the template is executed to provide the native query for the underlying source1. When the source returns the result of the query, the wrapper transforms the answer which is represented in the data model of the source into a representation that is used by the application. Using this toolkit one can quickly design a simple wrapper with a few templates that cover some of the desired functionality, probably the one that is most urgently needed. However, templates can be added gradually as more functionality is required later on.
Another important use of wrappers is in extending the query capabilities of a source. For instance, some sources may not be capable of answering queries that have multiple predicates. In such cases, it is necessary to pose a native query to such a source using only predicates that the source is capable of handling. The rest of the predicates are automatically separated from the user query and form a filter query. When the wrapper receives the results, a post-processing engine applies the filter query. This engine supports a set of built-in predicates based on the comparison operators =,≠,<,>, etc. In addition, the engine supports more complex predicates that can be specified as part of the filter query. The postprocessing engine is common to wrappers of all sources and is part of the wrapper toolkit. Note that because of postprocessing, the wrapper can handle a much larger class of queries than those that exactly match the templates it has been given. Figure 1 shows an overview of the wrapper architecture as it is currently implemented in our TSIMMIS testbed. Shaded components are provided by the toolkit, the white component is source-specific and must be generated by the implementor. The driver component controls the translation process and invokes the following services: the parser which parses the templates, the native schema, as well as the incoming queries into internal data structures, the matcher which matches a query against the set of templates and creates a filter query for postprocessing if necessary, the native component which submits the generated action string to the source, and extracts the data from the native result using the information given in the source schema, and the engine, which transforms and packages the result and applies a postprocessing filter if one has been created by the matcher. We now describe the sequence of events that occur at the wrapper during the translation of a query and its result using an example from our prototype system. The queries are formulated using a rule-based language called MSL that has been developed as a template specification and query language for the TSIMMIS project. Data is represented using our Object Exchange Model (OEM). We will briefly describe MSL and OEM in the next section. Details on MSL can be found in [5], a full introduction to OEM is given in [1]. expand
|
|
|
Languages for multi-database interoperability |
| |
Frédéric Gingras,
Laks V. S. Lakshmanan,
Iyer N. Subramanian,
Despina Papoulis,
Nematollaah Shiri
|
|
Pages: 536-538 |
|
doi>10.1145/253260.253397 |
|
Full text: PDF
|
|
|
|
|
Infomaster: an information integration system |
| |
Michael R. Genesereth,
Arthur M. Keller,
Oliver M. Duschka
|
|
Pages: 539-542 |
|
doi>10.1145/253260.253400 |
|
Full text: PDF
|
|
Infomaster is an information integration system that provides integrated access to multiple distributed heterogeneous information sources on the Internet, thus giving the illusion of a centralized, homogeneous information system. We say that Infomaster ...
Infomaster is an information integration system that provides integrated access to multiple distributed heterogeneous information sources on the Internet, thus giving the illusion of a centralized, homogeneous information system. We say that Infomaster creates a virtual data warehouse. The core of Infomaster is a facilitator that dynamically determines an efficient way to answer the user's query using as few sources as necessary and harmonizes the heterogeneities among these sources. Infomaster handles both structural and content translation to resolve differences between multiple data sources and the multiple applications for the collected data. Infomaster connects to a variety of databases using wrappers, such as for Z39.50, SQL databases through ODBC, EDI transactions, and other World Wide Web (WWW) sources. There are several WWW user interfaces to Infomaster, including forms based and textual. Infomaster also includes a programmatic interface and it can download results in structured form onto a client computer. Infomaster has been in production use for integrating rental housing advertisements from several newspapers (since fall 1995), and for meeting room scheduling (since winter 1996). Infomaster is also being used to integrate heterogeneous electronic product catalogs. expand
|
|
|
The InfoSleuth Project |
| |
R. J. Bayardo, Jr.,
W. Bohrer,
R. Brice,
A. Cichocki,
J. Fowler,
A. Halal,
V. Kashyap,
T. Ksiezyk,
G. Martin,
M. Nodine,
M. Rashid,
M. Rusinkiewicz,
R. Shea,
C. Unnikrishnan,
A. Unruh,
D. Woelk
|
|
Pages: 543-545 |
|
doi>10.1145/253260.253401 |
|
Full text: PDF
|
|
|
|
|
The distributed information search component (Disco) and the World Wide Web |
| |
Anthony Tomasic,
Rémy Amouroux,
Philippe Bonnet,
Olga Kapitskaia,
Hubert Naacke,
Louiqa Raschid
|
|
Pages: 546-548 |
|
doi>10.1145/253260.253402 |
|
Full text: PDF
|
|
The Distributed Information Search COmponent (DISCO) is a prototype heterogeneous distributed database that accesses underlying data sources. The DISCO prototype currently focuses on three central research problems in the context of these systems. First, ...
The Distributed Information Search COmponent (DISCO) is a prototype heterogeneous distributed database that accesses underlying data sources. The DISCO prototype currently focuses on three central research problems in the context of these systems. First, since the capabilities of each data source is different, transforming queries into subqueries on data source is difficult. We call this problem the weak data source problem. Second, since each data source performs operations in a generally unique way, the cost for performing an operation may vary radically from one wrapper to another. We call this problem the radical cost problem. Finally, existing systems behave rudely when attempting to access an unavailable data source. We call this problem the ungraceful failure problem.
DISCO copes with these problems. For the weak data source problem, the database implementor defines precisely the capabilities of each data source. For the radical cost problem, the database implementor (optionally) defines cost information for some of the operations of a data source. The mediator uses this cost information to improve its cost model. To deal with ungraceful failures, queries return partial answers. A partial answer contains the part of the final answer to the query that was produced by the available data sources. The current working prototype of DISCO contains implementations of these solutions and operations over a collection of wrappers that access information both in files and on the World Wide Web. expand
|
|
|
STRUDEL: a Web site management system |
| |
Mary Fernandez,
Daniela Florescu,
Jaewoo Kang,
Alon Levy,
Dan Suciu
|
|
Pages: 549-552 |
|
doi>10.1145/253260.253403 |
|
Full text: PDF
|
|
|
|
|
GeoMiner: a system prototype for spatial data mining |
| |
Jaiwei Han,
Krzysztof Koperski,
Nebojsa Stefanovic
|
|
Pages: 553-556 |
|
doi>10.1145/253260.253404 |
|
Full text: PDF
|
|
Spatial data mining is to mine high-level spatial information and knowledge from large spatial databases. A spatial data mining system prototype, GeoMiner, has been designed and developed based on our years of experience in the research and development ...
Spatial data mining is to mine high-level spatial information and knowledge from large spatial databases. A spatial data mining system prototype, GeoMiner, has been designed and developed based on our years of experience in the research and development of relational data mining system, DBMiner, and our research into spatial data mining. The data mining power of GeoMiner includes mining three kinds of rules: characteristic rules, comparison rules, and association rules, in geo-spatial databases, with a planned extension to include mining classification rules and clustering rules. The SAND (Spatial And Nonspatial Data) architecture is applied in the modeling of spatial databases, whereas GeoMiner includes the spatial data cube construction module, spatial on-line analytical processing (OLAP) module, and spatial data mining modules. A spatial data mining language, GMQL (Geo-Mining Query Language), is designed and implemented as an extension to Spatial SQL [3], for spatial data mining. Moreover, an interactive, user-friendly data mining interface is constructed and tools are implemented for visualization of discovered spatial knowledge. expand
|
|
|
The WHIPS prototype for data warehouse creation and maintenance |
| |
Wilburt J. Labio,
Yue Zhuge,
Janet L. Wiener,
Himanshu Gupta,
Héctor García-Molina,
Jennifer Widom
|
|
Pages: 557-559 |
|
doi>10.1145/253260.253405 |
|
Full text: PDF
|
|
A data warehouse is a repository of integrated information from distributed, autonomous, and possibly heterogeneous, sources. In effect, the warehouse stores one or more materialized views of the source data. The data is then readily available to user ...
A data warehouse is a repository of integrated information from distributed, autonomous, and possibly heterogeneous, sources. In effect, the warehouse stores one or more materialized views of the source data. The data is then readily available to user applications for querying and analysis. Figure 1 shows the basic architecture of a warehouse: data is collected from each source, integrated with data from other sources, and stored at the warehouse. Users then access the data directly from the warehouse.
As suggested by Figure 1, there are two major components in a warehouse system: the integration component, responsible for collecting and maintaining the materialized views, and the query and analysis component, responsible for fulfilling the information needs of specific end users. Note that the two components are not independent. For example, which views the integration component materializes depends on the expected needs of end users.
Most current commercial warehousing systems (e.g., Redbrick, Sybase, Arbor) focus on the query and analysis component, providing specialized index structures at the warehouse and extensive querying facilities for the end user. In the WHIPS (WareHousing Information Project at Stanford) project, on the other hand, we focus on the integration component. In particular, we have developed an architecture and implemented a prototype for identifying data changes at heterogeneous sources, transforming them and summarizing them in accordance to warehouse specifications, and incrementally integrating them into the warehouse. We propose to demonstrate our prototype at SIGMOD, illustrating the main features of our architecture. Our architecture is modular and we designed it specifically to fulfill several important and interrelated goals: data sources and warehouse views can be added and removed dynamically; it is scalable by adding more internal modules; changes at the sources are detected automatically; the warehouse may be updated continuously as the sources change, without requiring “down time;” and the warehouse is always kept consistent with the source data by the integration algorithms. More details on these goals and how we achieve them are provided in [WGL+96]. expand
|
|
|
Structural matching and discovery in document databases |
| |
Jason Tsong-Li Wang,
Dennis Shasha,
George J. S. Chang,
Liam Relihan,
Kaizhong Zhang,
Girish Patel
|
|
Pages: 560-563 |
|
doi>10.1145/253260.253406 |
|
Full text: PDF
|
|
Structural matching and discovery in documents such as SGML and HTML is important for data warehousing [6], version management [7, 11], hypertext authoring, digital libraries [4] and Internet databases. As an example, a user of the World Wide Web may ...
Structural matching and discovery in documents such as SGML and HTML is important for data warehousing [6], version management [7, 11], hypertext authoring, digital libraries [4] and Internet databases. As an example, a user of the World Wide Web may be interested in knowing changes in an HTML document [2, 5, 10]. Such changes can be detected by comparing the old and new version of the document (referred to as structural matching of documents). As another example, in hypertext authoring, a user may wish to find the common portions in the history list of a document or in a database of documents (referred to as structural discovery of documents). In SIGMOD 95 demo sessions, we exhibited a software package, called TreeDiff [13], for comparing two latex documents and showing their differences. Given two documents, the tool represents the documents as ordered labeled trees and finds an optimal sequence of edit operations to transform one document (tree) to the other. An edit operation could be an insert, delete, or change of a node in the trees. The tool is so named because documents are represented and compared using approximate tree matching techniques [9, 12, 14]. expand
|
|
|
S3: similarity search in CAD database systems |
| |
Stefan Berchtold,
Hans-Peter Kriegel
|
|
Pages: 564-567 |
|
doi>10.1145/253260.253407 |
|
Full text: PDF
|
|
S3 is the prototype of a database system supporting the management and similarity retrieval of industrial CAD parts. The major goal of the system is to reduce the cost for developing and producing new parts by maximizing the reuse of existing parts. ...
S3 is the prototype of a database system supporting the management and similarity retrieval of industrial CAD parts. The major goal of the system is to reduce the cost for developing and producing new parts by maximizing the reuse of existing parts. S3 supports the following three types of similarity queries: query by example (of an existing part in the database), query by sketch and thematic similarity query. S3 is an object-oriented system offering an adequate graphical user interface. On top of providing various state-of-the-art algorithms and index structures for geometry-based similarity retrieval, it is an excellent testbed for developing and testing new similarity algorithms and index structures. expand
|
|
|
PREDATOR: an OR-DBMS with enhanced data types |
| |
Praveen Seshadri,
Mark Paskin
|
|
Pages: 568-571 |
|
doi>10.1145/253260.253408 |
|
Full text: PDF
|
|
|
|
|
Sentinel: an object-oriented DBMS with event-based rules |
| |
S. Chakravarthy
|
|
Pages: 572-575 |
|
doi>10.1145/253260.253409 |
|
Full text: PDF
|
|
|
|
|
The MENTOR workbench for enterprise-wide workflow management |
| |
Dirk Wodtke,
Jeanine Weissenfels,
Gerhard Weikum,
Angelika Kotz Dittrich,
Peter Muth
|
|
Pages: 576-579 |
|
doi>10.1145/253260.253411 |
|
Full text: PDF
|
|
MENTOR (“Middleware for Enterprise-Wide Workflow Management”) is a joint project of the University of the Saarland, the Union Bank of Switzerland, and ETH Zurich [1, 2, 3]. The focus of the project is on enterprise-wide workflow management. ...
MENTOR (“Middleware for Enterprise-Wide Workflow Management”) is a joint project of the University of the Saarland, the Union Bank of Switzerland, and ETH Zurich [1, 2, 3]. The focus of the project is on enterprise-wide workflow management. Workflows in this category may span multiple organizational units each unit having its own workflow server, involve a variety of heterogeneous information systems, and require many thousands of clients to interact with the workflow management system (WFMS). The project aims to develop a scalable and highly available environment for the execution and monitoring of workflows, seamlessly integrated with a specification and verification environment.
For the specification of workflows, MENTOR utilizes the formalism of state and activity charts. The mathematical rigor of the specification method establishes a basis for both correctness reasoning and for partitioning of a large workflow into a number of subworkflows according to the organizational responsibilities of the enterprise. For the distributed execution of the partitioned workflow specification, MENTOR relies mostly on standard middleware components and adds own components only where the standard components fall short of functionality or scalability. In particular, the run-time environment is based on a TP monitor and a CORBA implementation. expand
|
|
|
Zoo: a desktop experiment management environment |
| |
Yannis E. Ioannidis,
Miron Livny,
Anastassia Ailamaki,
Anand Narayanan,
Andrew Therber
|
|
Pages: 580-583 |
|
doi>10.1145/253260.253415 |
|
Full text: PDF
|
|
|