DWR

Synchronized block in DefaultScriptSessionManager locking entire appserver

Details

  • Type: Bug Bug
  • Status: Reopened Reopened
  • Priority: Critical Critical
  • Resolution: Unresolved
  • Affects Version/s: 2.0.rc3, 2.0.rc4, 2.0.rc5, 2.0, 2.0.1, 2.0.2, 2.0.3, 2.0.4, 2.0.5, 2.0.6
  • Fix Version/s: 2.0.7
  • Component/s: core
  • Description:
    Hide
    I am seeing a very troubling situation on our QA server. Every couple of days, our QA WebLogic cluster will fail with every thread blocked in DefaultScriptSessionManager. I have 50 threads, and here's how it seems to wind up:

    1 thread is blocked here:

    "[STUCK] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock java.lang.Object@164b23e BLOCKED org.directwebremoting.impl.DefaultScriptSessionManager.invalidate(DefaultScriptSessionManager.java:125)

    1 thread is blocked here (blocked, I believe, on the thread above):

     "[STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock java.lang.Object@70207c BLOCKED
    org.directwebremoting.impl.DefaultScriptSession.isInvalidated(DefaultScriptSession.java:127)

    And 48 threads are blocked here:

    "[STUCK] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock java.lang.Object@164b23e BLOCKED
    org.directwebremoting.impl.DefaultScriptSessionManager.checkTimeouts(DefaultScriptSessionManager.java:175)

    As you can see, both the invalidate() call and the checkTimeouts() call are blocking on the same java object. Looking in the code, both checkTimeouts and invalidate block on the same sessionLock member variable. This is highly problematic because the call to invalidate() happens after the synchronized block in checkTimeouts(). Here's what happens: (by the way, I think this is only a problem if you have multiple in-flight DWR requests for the same user)

    - Thread A calls checkTimeouts, makes it through the synchronized block and calls invalidate() on a session. That grabs the invalidLock on the session. Thread A then gets interrupted.
    - Thread B calls checkTimeouts and holds the sessionLock, and then calls into isInvalidated on the same session that Thread A is interrupted on. Thread B tries to acquire the invalidLock but can't, since thread A holds it.
    - Thread A resumes and calls the second line of invalidate, which is manager.invalidate(). That method tries to grab the sessionLock, but cannot since Thread B is holding it waiting for the invalidLock held by thread A.

    And deadlock ensues.

    The easist thing I can see to do is to move the for loop at the end of checkTimeouts() inside the synchronized block. That way it'd be guaranteed not to deadlock.
    Show
    I am seeing a very troubling situation on our QA server. Every couple of days, our QA WebLogic cluster will fail with every thread blocked in DefaultScriptSessionManager. I have 50 threads, and here's how it seems to wind up: 1 thread is blocked here: "[STUCK] ExecuteThread: '3' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock java.lang.Object@164b23e BLOCKED org.directwebremoting.impl.DefaultScriptSessionManager.invalidate(DefaultScriptSessionManager.java:125) 1 thread is blocked here (blocked, I believe, on the thread above):  "[STUCK] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock java.lang.Object@70207c BLOCKED org.directwebremoting.impl.DefaultScriptSession.isInvalidated(DefaultScriptSession.java:127) And 48 threads are blocked here: "[STUCK] ExecuteThread: '1' for queue: 'weblogic.kernel.Default (self-tuning)'" waiting for lock java.lang.Object@164b23e BLOCKED org.directwebremoting.impl.DefaultScriptSessionManager.checkTimeouts(DefaultScriptSessionManager.java:175) As you can see, both the invalidate() call and the checkTimeouts() call are blocking on the same java object. Looking in the code, both checkTimeouts and invalidate block on the same sessionLock member variable. This is highly problematic because the call to invalidate() happens after the synchronized block in checkTimeouts(). Here's what happens: (by the way, I think this is only a problem if you have multiple in-flight DWR requests for the same user) - Thread A calls checkTimeouts, makes it through the synchronized block and calls invalidate() on a session. That grabs the invalidLock on the session. Thread A then gets interrupted. - Thread B calls checkTimeouts and holds the sessionLock, and then calls into isInvalidated on the same session that Thread A is interrupted on. Thread B tries to acquire the invalidLock but can't, since thread A holds it. - Thread A resumes and calls the second line of invalidate, which is manager.invalidate(). That method tries to grab the sessionLock, but cannot since Thread B is holding it waiting for the invalidLock held by thread A. And deadlock ensues. The easist thing I can see to do is to move the for loop at the end of checkTimeouts() inside the synchronized block. That way it'd be guaranteed not to deadlock.

Activity

Hide
Joe Walker added a comment - 10/Apr/07 11:08 AM
From what I can see your analysis is correct - I've checked the change into CVS and hope to cut RC4 with this in later on today.
Show
Joe Walker added a comment - 10/Apr/07 11:08 AM From what I can see your analysis is correct - I've checked the change into CVS and hope to cut RC4 with this in later on today.
Hide
Jonas added a comment - 04/Aug/10 6:56 AM
Hi Joe, I have the even problem in my application. When I getting this version with the corretion (2.0.rc4)
Show
Jonas added a comment - 04/Aug/10 6:56 AM Hi Joe, I have the even problem in my application. When I getting this version with the corretion (2.0.rc4)
Hide
David Marginian added a comment - 04/Aug/10 9:24 PM
Jonas, the fix recommended previously does not solve the problem. Now we are making foreign calls with a lock held, which is a recipe for deadlock. I revisited this issue several weeks ago (but did not look up this Jira) due to a user's request. I made some changes that should resolve this problem. Please try the latest 2.x build here - http://ci.directwebremoting.org/bamboo/browse/DWR20-ALL-1/artifact. If this fix works this version will be released as 2.0.7. I have re-opened this issue.
Show
David Marginian added a comment - 04/Aug/10 9:24 PM Jonas, the fix recommended previously does not solve the problem. Now we are making foreign calls with a lock held, which is a recipe for deadlock. I revisited this issue several weeks ago (but did not look up this Jira) due to a user's request. I made some changes that should resolve this problem. Please try the latest 2.x build here - http://ci.directwebremoting.org/bamboo/browse/DWR20-ALL-1/artifact. If this fix works this version will be released as 2.0.7. I have re-opened this issue.
Hide
Jonas added a comment - 05/Aug/10 5:44 AM
David. this version don't solve my problem, because my app running in JVM 1.4, the code on build (http://ci.directwebremoting.org/bamboo/browse/DWR20-ALL-1/artifact) is 1.5.

Tks
Show
Jonas added a comment - 05/Aug/10 5:44 AM David. this version don't solve my problem, because my app running in JVM 1.4, the code on build (http://ci.directwebremoting.org/bamboo/browse/DWR20-ALL-1/artifact) is 1.5. Tks
Hide
David Marginian added a comment - 05/Aug/10 6:40 AM
It is version 2 and should be compatible with 1.3, if it is not I did something wrong, can you elaborate?
Show
David Marginian added a comment - 05/Aug/10 6:40 AM It is version 2 and should be compatible with 1.3, if it is not I did something wrong, can you elaborate?
Hide
David Marginian added a comment - 06/Aug/10 7:15 PM
Jonas, any progress?
Show
David Marginian added a comment - 06/Aug/10 7:15 PM Jonas, any progress?
Hide
Jonas added a comment - 06/Aug/10 7:25 PM
David, my app it's testing yet.
Show
Jonas added a comment - 06/Aug/10 7:25 PM David, my app it's testing yet.
Hide
David Marginian added a comment - 06/Aug/10 7:35 PM
You are testing? Please let us know as soon as you know something. Thanks.
Show
David Marginian added a comment - 06/Aug/10 7:35 PM You are testing? Please let us know as soon as you know something. Thanks.
Hide
Jonas added a comment - 07/Aug/10 11:10 AM
Yes, I'm testing.
Show
Jonas added a comment - 07/Aug/10 11:10 AM Yes, I'm testing.

People

Dates

  • Created:
    09/Apr/07 7:22 PM
    Updated:
    07/Aug/10 11:10 AM