I am seeing a very troubling situation on our QA server. Every couple of days, our QA WebLogic cluster will fail with every thread blocked in DefaultScriptSessionManager. I have 50 threads, and here's how it seems to wind up:
1 thread is blocked here:
"[STUCK] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock
java.lang.Object@164b23e BLOCKED org.directwebremoting.impl.DefaultScriptSessionManager.invalidate(DefaultScriptSessionManager.java:125)
1 thread is blocked here (blocked, I believe, on the thread above):
"[STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock
java.lang.Object@70207c BLOCKED
org.directwebremoting.impl.DefaultScriptSession.isInvalidated(DefaultScriptSession.java:127)
And 48 threads are blocked here:
"[STUCK] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock
java.lang.Object@164b23e BLOCKED
org.directwebremoting.impl.DefaultScriptSessionManager.checkTimeouts(DefaultScriptSessionManager.java:175)
As you can see, both the invalidate() call and the checkTimeouts() call are blocking on the same java object. Looking in the code, both checkTimeouts and invalidate block on the same sessionLock member variable. This is highly problematic because the call to invalidate() happens after the synchronized block in checkTimeouts(). Here's what happens: (by the way, I think this is only a problem if you have multiple in-flight DWR requests for the same user)
- Thread A calls checkTimeouts, makes it through the synchronized block and calls invalidate() on a session. That grabs the invalidLock on the session. Thread A then gets interrupted.
- Thread B calls checkTimeouts and holds the sessionLock, and then calls into isInvalidated on the same session that Thread A is interrupted on. Thread B tries to acquire the invalidLock but can't, since thread A holds it.
- Thread A resumes and calls the second line of invalidate, which is manager.invalidate(). That method tries to grab the sessionLock, but cannot since Thread B is holding it waiting for the invalidLock held by thread A.
And deadlock ensues.
The easist thing I can see to do is to move the for loop at the end of checkTimeouts() inside the synchronized block. That way it'd be guaranteed not to deadlock.