String Operations in Query Languages

paper (PODS 2001)

Michael Benedikt, Leonid Libkin, Thomas Schwentick and Luc Segoufin

We study relational calculi with support for string operations. Most prior proposals were based on adding the operation of concatenation to first-order logic. Such an extension is problematic as the relational calculus becomes computationally complete, which in turn implies strong limits on the ability to perform optimization and static analysis of properties such as query safety. In contrast, we look at extensions of relational calculus that have nice expressiveness, decidability, and safety properties, while corresponding to sets of string operations used in SQL. We start with an extension based on the string ordering and LIKE predicates. We then extend this basic model to include string length comparison. While both of these share some of the attractive properties of relational calculus (low data complexity for generic queries, effective syntax for safe queries, correspondence with an algebra), there is a large gap between these calculi in expressive power and complexity. The smaller `basic model' has low data complexity for all queries, but is quite limited in expressive power. The latter handles many natural string operations, but includes queries with high data complexity. We thus also explore the space between these two extremes. We consider two intermediate languages: the first extends our base language with functions that trim/add leading characters, and the other extends it by adding the full power of regular-expression pattern matching. We show that both these extensions inherit many of the attractive properties of the basic model: they both have corresponding algebras expressing safe queries, and low complexity of query evaluation.



Back to Luc Segoufin's home page.