Wednesday, September 19, 2012

"Converting SQL to Relational Algebra"

It's that time of the year again.  Introductory courses in relational data management (I'm a bit reluctant to call them "courses in relational theory") are starting again, all over the world, and consequently, students seeking advice and assistance from "professionals" start posing "Converting SQL to Relational Algebra" questions again on various database-related discussion fora, when they are not able to solve assignments given to them in class, or when they are uncertain their solution is correct.

I would actually like any of the teachers who give such assignments to try and explain to me what purpose they hope to be achieving by giving such assignments.  What do students learn from this ?  I recently got involved in such a "Converting SQL to Relational Algebra" question, and one remark the student made, in the discussion that ensued, confirms my feeling regarding this : they learn nothing at all, and "Converting SQL to Relational Algebra" is completely pointless, only adding to the confusion instead of resolving it.  (The remark was, literally, : "we do not either understand why we are learning this".)

Why is this so ?  Well, it's because Relational Algebra is foundational, and SQL is not.  That's because Relational Algebra is completely abstract, and SQL is not.  Not in the same sense that Relational Algebra is abstract.  SQL is abstract with respect to the procedures and algorithms that the implementing engine executes in order to carry out a given SQL command.  But that is not quite as "profound" or "foundational" a level of abstraction as the one the Relational Algebra can be said to have.

SQL is just a syntax for expressing formulae of the RA.  One possible syntax for it.  Of the plethora of distinct possible syntaxes that can be conceived to do the same.  And SQL is an awful syntax for that purpose, come to that.  That's because initially, SQL was firmly rooted in predicate calculus, not in relational algebra.  That SQL can express relational algebra, is only a "coincidental" consequence of the fact that predicate calculus and relational algebra were later proven equivalent.  That historical reason is why, for example, relational difference had to be expressed using WHERE NOT EXISTS (...), and EXCEPT was only introduced in the standard decades later.

Granted, if you teach RA first and only then SQL, you are headed for a shit load of questions, "why did they make that syntax so quirky if the algebra is so simple" ?  The answer is always the same : poor historical reasons.  BTW there's a great book forthcoming that fits this line of teaching perfectly.  But at least they could understand.

And the fact still remains that if students can first manage to understand RA, then they can understand any language that implements or supports it.  They will then understand that, as far as RA operations are concerned, the difference between SQL and, say, Tutorial D, is a mere matter of syntax.  They might understand that the Tutorial D expression 'R1 JOIN R2' is the very same thing as the SQL expressions 'SELECT * FROM R1,R2 WHERE R1.<attr> = R2.<attr>', 'SELECT * FROM R1 JOIN R2 ON ...' and 'SELECT * FROM R1 NATURAL JOIN R2'.  And the reason they would understand, is precisely that their thinking would be in RA, not in SQL or any other specific language.

We urgently need to learn our database designers to think of data in terms of sets (sets of tuples, where those sets represent the extension of a logical predicate, and each tuple represents a proposition), and to think of manipulations on that data in terms of relational algebra.  Not in terms of tables and SQL.

And it means that the teaching should no longer be about "Converting SQL to relational algebra", but instead about "Converting relational algebra to SQL".  The latter, not the former, is the sensible way of doing things.

No comments:

Post a Comment