11 June 2008

Performance Q&A

Performance Q&A
How does Hibernate perform?
But Hibernate uses so much runtime reflection?
Okay, so what are the advantages of reflection then?
But how does it scale?
Why not implement instance-pooling anyway?
Does Hibernate implement its functionality using a minimal number of database queries?
So why don't you provide an "official" Hibernate benchmark?
Conclusion?
How does Hibernate perform?
We claim that Hibernate performs well, in the sense that its performance is limited by the underlying JDBC driver / relational database combination. Given this, the question boils down to: does Hibernate implement its functionality using a minimal number of database queries and how can it improve performance and scalability on top of JDBC? This page hopefully answers these questions.
But Hibernate uses so much runtime reflection?
Many former C or C++ programmers prefer generated-code solutions to runtime reflection. This is usually justified by reference to the performance red-herring. However, modern JVMs implement reflection extremely efficiently and the overhead is minimal compared to the cost of disk access or IPC. Developers from other traditions (eg. Smalltalk) have always relied upon reflection to do things that C/C++ needs code-generation for.
In the very latest versions of Hibernate, "reflection" is optimised via the CGLIB runtime bytecode generation library. This means that "reflected" property get / set calls no longer carry the overhead of the Java reflection API and are actually just normal method calls. This results in a (very) small performance gain.
Okay, so what are the advantages of reflection then?
A quicker compile-build-test cycle. The advantage of this should not be understated. The philosophy of Hibernate is this: let the developer spend as little time as possible implementing persistence for the 95% of the application which is used 5% of the time. Then, later, if there are performance issues with the remaining 5%, there will be plenty of time left for hand-coding JDBC calls to improve performance of particular bottlenecks. (Most of the time Hibernate very closely approaches the performance of hand-coded JDBC anyway.)
But how does it scale?
Hibernate implements an extremely high-concurrency architecture with no resource-contention issues (apart from the obvious - contention for access to the database). This architecture scales extremely well as concurrency increases in a cluster or on a single machine.
A more difficult question is how efficiently Hibernate utilizes memory under heavy load.
Since there is no sharing of objects between concurrent threads (like EJB 2.x entity beans), and since Hibernate does not automatically do instance-pooling (unlike EJB 2.x entity beans), you might think that memory utilization would be less efficient, and this may be true to an extent. However, our experience with real Java applications is that the benefits of instance-pooling are almost negated by common Java coding style. Very often programmers create a new HashMap in ejbLoad ..... or return a new Integer from a method call .... or do some string manipulations. Furthermore, every time you load and then passivate a bean, every non-primitive field of the bean is garbage, not to mention whatever garbage the JDBC driver leaves behind. All these kinds of operations leave behind as much garbage as we avoided by doing instance-pooling.
Please note that Hibernate is not a competitor to EJB. In fact, Hibernate EntityManager and Annotations implement a persistence service for EJB 3.0 entity beans, on top of the Hibernate Core architecture.
All this leads to Hibernate not needing a locking or synchronization mechanism, in memory, or to a lock table on disk. As stated earlier, Hibernate completely relies on the database management systems ability to deal with concurrent access; the experience of the DBMS vendors in this area should be used.
The other side of scalability is downward scalability. While it wasn't designed with small devices in mind, Hibernate nevertheless has a small footprint and could be used on machines with much less memory than you would need to run an application server. If it can run a JVM and a database, it should be able to run Hibernate.
Why not implement instance-pooling anyway?
Firstly, it would be pointless. There is a lower bound to the amount of garbage Hibernate creates every time it loads or updates and object - the garbage created by getting or setting the object's properties using reflection.
More importantly, the disadvantage of instance-pooling is developers who forget to reinitialize fields each time an instance is reused. We have seen very subtle bugs in EJBs that don't reinitialize all fields in ejbCreate.
On the other hand, if there is a particular application object that is extremely expensive to create, you can easily implement your own instance pool for that class and use the version of Session.load() that takes a class instance. Just remember to return the objects to the pool every time you close the session.
Does Hibernate implement its functionality using a minimal number of database queries?
Good Question. Hibernate can make certain optimizations all the time:
Caching objects. The session is a transaction-level cache of persistent objects. You may also enable a JVM-level/cluster cache to memory and/or local disk.
Executing SQL statements later, when needed. The session never issues an INSERT or UPDATE until it is actually needed. So if an exception occurs and you need to abort the transaction, some statements will never actually be issued. Furthermore, this keeps lock times in the database as short as possible (from the late UPDATE to the transaction end).
Never updating unmodified objects. It is very common in hand-coded JDBC to see the persistent state of an object updated, just in case it changed.....for example, the user pressed the save button but may not have edited any fields. Hibernate always knows if an object's state actually changed, as long as you are inside the same (possibly very long) unit of work.
Efficient Collection Handling. Likewise, Hibernate only ever inserts/updates/deletes collection rows that actually changed.
Rolling two updates into one. As a corollary to (1) and (3), Hibernate can roll two seemingly unrelated updates of the same object into one UPDATE statement.
Updating only the modified columns. Hibernate knows exactly which columns need updating and, if you choose, will update only those columns.
Outer join fetching. Hibernate implements a very efficient outer-join fetching algorithm! In addition, you can use subselect and batch pre-fetch optimizations.
Lazy collection initialization.
Lazy object initialization. Hibernate can use runtime-generated proxies (CGLIB) or interception injected through bytecode instrumentation at build-time.
A few more (optional) features of Hibernate that your handcoded JDBC may or may not currently benefit from
second-level caching of arbitrary query results, from HQL, Criteria, and even native SQL queries
efficient PreparedStatement caching (Hibernate always uses PreparedStatement for calls to the database)
JDBC 2 style batch updates
Pluggable connection pooling
Hopefully you will agree that Hibernate approaches the parsimony of the best hand-coded JDBC object persistence. As a subscript I would add that I have rarely seen JDBC code that approaches the efficiency of the "best possible" code. By contrast it is very easy to write efficient data-access code using Hibernate.
So why don't you provide an "official" Hibernate benchmark?
Many people try to benchmark Hibernate. All public benchmarks we have seen so far had (and most still have) serious flaws.
The first category of benchmarks are trivial micro benchmarks. Hibernate of course will have an overhead in simple scenarios (loading 50.000 objects and doing nothing else is considered trivial) compared to JDBC. See this page for a critique of a trivial benchmark. If you'd like to avoid writing your own trivial and not very conclusive tests, have a look at the perftest target in Hibernate's build file. We use this target to check if a trivial performance bug slipped into the Hibernate code. You can use it to verify the JDBC overhead of Hibernate in trivial situations. But, as should be clear now, these numbers are meaningless for real application performance and scalability.
In a fair benchmark with complex data associations/joins, highly concurrent access, random updates of data in the application, real-world data set size, and utilizing other Hibernate features, you will find Hibernate perform very well. Why is there no such benchmark provided by the Hibernate developers? The first reason is trust. Why would you believe that the numbers shown by the vendor of a product, in a comparative benchmark, are true? The second reason is applicability. Certainly, a fair benchmark would show the benefits of Hibernate in a typical complete application with realistic concurrent access. However, the number of variables in any decent benchmark make it almost impossible to transfer these results into reasonable conclusions about the performance of your own application. Your application is different. (If you came here from a forum thread, think about it: instead of arguing with you about your trivial micro benchmark, we would be arguing why you don't see the same results in your application...) For these reasons we always recommend that you benchmark your application yourself. We encourage you to see performance and load testing as a natural stage in the life of your application - don't go into production with only micro benchmarks. Setting up benchmarks for your application and scenario, and helping you in this stage is in fact one of our usual
Conclusion?
It turns out that Hibernate is very fast if used properly, in highly concurrent multi-user applications with significant data sets

0 comments:

Site owned by Hariharan | Saravanan