In this post, originally written by Glenn Paulley and posted to sybase.com in May of 2009, Glenn talks about ORMs and the difficulties they pose to the SQL Anywhere query optimizer and the increasing impact they are having on all RDBMS's. As Glenn mentions in his post, I recognize the attractiveness of ORMS but I am not a big of the complexity they introduce into applications.
Object-relational mapping toolkits such as Hibernate/NHibernate, LINQ, and others permit one to develop object-oriented database applications in the paradigm offered by the object-oriented language (Java, C#, C++) and workaround the issues of the impedance mismatch between the application's abstractions and the persistent relational store. My colleague jason.hinsperger has written previously regarding the additional complexity that ORM toolkits bring to the application. That additional complexity has far-reaching tentacles, and impacts database management system implementations as well. Let me explain what I mean by that.
While ORM software toolkits insulate application programmers from dealing directly with a relational database, and the paradigm mismatch that results from its use, at the same time the application is made much more complex through the addition of the ORM layer. As an example, Java Hibernateconsists of 266 packages, 1938 classes, 18,680 functions, and over 118K lines of code. From the programmer's standpoint, however, a characteristic that the object-relational mapping layer provides is that much of that complexity is hidden during program development. One can create a very sophisticated set of mappings that result in considerably complex SQL statements, but from within the program the method call that results in such a statement can appear the same as any other.
Here's an example, provided by my colleague Ani Nica. The query below is the SQL generated by an example query from the Microsoft Entity Framework test suite:
SELECT
[Project9].[ContactID] AS [ContactID],
[Project9].[C1] AS [C1],
[Project9].[C2] AS [C2],
[Project9].[ContactID1] AS [ContactID1],
[Project9].[SalesOrderID] AS [SalesOrderID],
[Project9].[TotalDue] AS [TotalDue]
FROM ( SELECT [Distinct1].[ContactID] AS [ContactID],
1 AS [C1],
[Project8].[ContactID] AS [ContactID1],
[Project8].[SalesOrderID] AS [SalesOrderID],
[Project8].[TotalDue] AS [TotalDue],
[Project8].[C1] AS [C2]
FROM (SELECT DISTINCT [Extent1].[ContactID] AS [ContactID]
FROM [DBA].[Contact] AS [Extent1] INNER JOIN [DBA].[SalesOrderHeader] AS [Extent2]
ON EXISTS (SELECT cast(1 as bit) AS [C1]
FROM ( SELECT cast(1 as bit) AS X ) AS [SingleRowTable1]
LEFT OUTER JOIN (SELECT [Extent3].[ContactID] AS [ContactID]
FROM [DBA].[Contact] AS [Extent3]
WHERE [Extent2].[ContactID] = [Extent3].[ContactID] ) AS [Project1]
ON cast(1 as bit) = cast(1 as bit)
LEFT OUTER JOIN (SELECT [Extent4].[ContactID] AS [ContactID]
FROM [DBA].[Contact] AS [Extent4]
WHERE [Extent2].[ContactID] = [Extent4].[ContactID] ) AS [Project2]
ON cast(1 as bit) = cast(1 as bit)
WHERE ([Extent1].[ContactID] = [Project1].[ContactID])
OR (([Extent1].[ContactID] IS NULL) AND ([Project2].[ContactID] IS NULL)) )
) AS [Distinct1]
LEFT OUTER JOIN
(SELECT [Extent5].[ContactID] AS [ContactID],
[Extent6].[SalesOrderID] AS [SalesOrderID],
[Extent6].[TotalDue] AS [TotalDue], 1 AS [C1]
FROM [DBA].[Contact] AS [Extent5]
INNER JOIN [DBA].[SalesOrderHeader] AS [Extent6]
ON EXISTS (SELECT cast(1 as bit) AS [C1]
FROM ( SELECT cast(1 as bit) AS X ) AS [SingleRowTable2]
LEFT OUTER JOIN (SELECT [Extent7].[ContactID] AS [ContactID]
FROM [DBA].[Contact] AS [Extent7]
WHERE [Extent6].[ContactID] = [Extent7].[ContactID] ) AS [Project5]
ON cast(1 as bit) = cast(1 as bit)
LEFT OUTER JOIN (SELECT [Extent8].[ContactID] AS [ContactID]
FROM [DBA].[Contact] AS [Extent8]
WHERE [Extent6].[ContactID] = [Extent8].[ContactID] ) AS [Project6]
ON cast(1 as bit) = cast(1 as bit)
WHERE ([Extent5].[ContactID] = [Project5].[ContactID])
OR (([Extent5].[ContactID] IS NULL) AND ([Project6].[ContactID] IS NULL))
) ) AS [Project8]
ON ([Project8].[ContactID] = [Distinct1].[ContactID])
OR (([Project8].[ContactID] IS NULL) AND ([Distinct1].[ContactID] IS NULL))
) AS [Project9]
ORDER BY [Project9].[ContactID] ASC, [Project9].[C2] ASC
Did you note the join conditions containing the EXISTS
predicates? In days gone by, when application programs and their accompanying SQL statements were composed by hand, an attempt at placing such a query into production would be met with, at a minimum, a severe reprimand of the application programmer. With ORM toolkits, however, the programmer is insulated from such gory details; they may only realize that their application is "slow". Poor performance must be the database system's fault.
At the same time, it is typical for object-oriented applications to process relational data a row-at-a-time, in keeping with the navigational paradigm offered by object orientation. In Hibernate parlance, such applications embody the "N+1 SELECTS problem" as they implement client-side joins from within the application. And herein lies the problem: like other commercial and open-source database management systems, over the years SQL Anywhere has been significantly enhanced to provide state-of-the-art support for complex query processing: OLAP functionality, the MERGE
statement, materialized views, sophisticated query processing methods such as hash anti-semijoin, computed columns, alternative join techniques, and so on. However, each additional piece of analysis within the query optimizer yields additional overhead for every statement. Consequently, it is extremely difficult for a database management system to offer precisely the same level of query processing performance for simple statements from release to release; every additional bit of SQL input analysis incurs additional CPU cost. Hence migrating from one release to the next of a commercial DBMS may yield reduced performance.
In response, what you see offered by the various vendors are mechanisms designed to mitigate these performance issues, primarily through various forms of caching: the caching of SQL result sets, caching of access plan strategies, caching of expressions, and so on, all leading to additional sophistication and complexity within each DBMS, and higher and higher utilization of memory. Moreover:
In summary, the proliferation of ORM toolkits is having an impact on every relational database system vendor. I expect that impact to continue unabated in the near term, particularly as such toolkits become ever more popular.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
26 | |
24 | |
21 | |
13 | |
11 | |
9 | |
9 | |
9 | |
8 | |
8 |