Issue Details (XML | Word | Printable)

Key: DWR-60
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Critical Critical
Assignee: Joe Walker
Reporter: Scott Rankin
Votes: 0
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
DWR

Synchronized block in DefaultScriptSessionManager locking entire appserver

Created: 09/Apr/07 07:22 PM   Updated: 29/Feb/08 10:29 AM   Resolved: 10/Apr/07 03:47 PM
Component/s: core
Affects Version/s: 2.0.rc3
Fix Version/s: 2.0.rc4


 Description  « Hide
I am seeing a very troubling situation on our QA server. Every couple of days, our QA WebLogic cluster will fail with every thread blocked in DefaultScriptSessionManager. I have 50 threads, and here's how it seems to wind up:

1 thread is blocked here:

"[STUCK] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock java.lang.Object@164b23e BLOCKED org.directwebremoting.impl.DefaultScriptSessionManager.invalidate(DefaultScriptSessionManager.java:125)

1 thread is blocked here (blocked, I believe, on the thread above):

 "[STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock java.lang.Object@70207c BLOCKED
org.directwebremoting.impl.DefaultScriptSession.isInvalidated(DefaultScriptSession.java:127)

And 48 threads are blocked here:

"[STUCK] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock java.lang.Object@164b23e BLOCKED
org.directwebremoting.impl.DefaultScriptSessionManager.checkTimeouts(DefaultScriptSessionManager.java:175)

As you can see, both the invalidate() call and the checkTimeouts() call are blocking on the same java object. Looking in the code, both checkTimeouts and invalidate block on the same sessionLock member variable. This is highly problematic because the call to invalidate() happens after the synchronized block in checkTimeouts(). Here's what happens: (by the way, I think this is only a problem if you have multiple in-flight DWR requests for the same user)

- Thread A calls checkTimeouts, makes it through the synchronized block and calls invalidate() on a session. That grabs the invalidLock on the session. Thread A then gets interrupted.
- Thread B calls checkTimeouts and holds the sessionLock, and then calls into isInvalidated on the same session that Thread A is interrupted on. Thread B tries to acquire the invalidLock but can't, since thread A holds it.
- Thread A resumes and calls the second line of invalidate, which is manager.invalidate(). That method tries to grab the sessionLock, but cannot since Thread B is holding it waiting for the invalidLock held by thread A.

And deadlock ensues.

The easist thing I can see to do is to move the for loop at the end of checkTimeouts() inside the synchronized block. That way it'd be guaranteed not to deadlock.


Sort Order: Ascending order - Click to sort in descending order
Joe Walker added a comment - 10/Apr/07 11:08 AM
From what I can see your analysis is correct - I've checked the change into CVS and hope to cut RC4 with this in later on today.