History | Log In     View a printable version of the current page.  
Issue Details (XML | Word | Printable)

Key: LBCLASSIC-137
Type: Sub-task Sub-task
Status: Closed Closed
Resolution: Won't Fix
Priority: Blocker Blocker
Assignee: Ceki Gulcu
Reporter: Toni Heimala
Votes: 4
Watchers: 5
Operations

If you were logged in you would be able to see more operations.
logback-classic
LBCLASSIC-135

Deadlock when running on multiple core processors

Created: 11/Dec/07 12:07 PM   Updated: 26/Jun/09 09:43 PM
Component/s: None
Affects Version/s: 0.9.9
Fix Version/s: None

File Attachments: 1. Text File AppenderBaseLock.patch (1 kb)
2. Text File patch.txt (16 kb)
3. Text File patch2.txt (85 kb)
4. Text File patch3.txt (25 kb)

Environment:
Operating System: Windows
Platform: PC

Bugzilla Id: http://bugzilla.qos.ch/show_bug.cgi?id=112


 Description  « Hide
When you run logging into same file from many threads on a system that has more than one physical processor (Dual Core for example), a deadlock will occur after a while. This can not be reproduced on HyperThreading processors. Here's an example program that will demonstrate the behavior:

-----------------------------
Main.java
-----------------------------

import java.util.Date;
import java.util.concurrent.ScheduledThreadPoolExecutor;
import java.util.concurrent.TimeUnit;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import ch.qos.logback.classic.LoggerContext;
import ch.qos.logback.classic.joran.JoranConfigurator;
import ch.qos.logback.core.joran.spi.JoranException;

public class Main extends Thread
{
    private final static String LOGGER_CONFIGURATION_FILE = "logger.xml";
    private final Logger logger = LoggerFactory.getLogger(Main.class);
    
    private final long start;
    
    public Main()
        throws JoranException
    {
        start = new Date().getTime();
        LoggerContext lc = (LoggerContext)LoggerFactory.getILoggerFactory();
        JoranConfigurator configurator = new JoranConfigurator();
        lc.shutdownAndReset();
        configurator.setContext(lc);
        configurator.doConfigure(LOGGER_CONFIGURATION_FILE);
    }
    
    public void start()
    {
        ScheduledThreadPoolExecutor ex1 = new ScheduledThreadPoolExecutor(1);
        ScheduledThreadPoolExecutor ex2 = new ScheduledThreadPoolExecutor(1);
        ScheduledThreadPoolExecutor ex3 = new ScheduledThreadPoolExecutor(1);
        ScheduledThreadPoolExecutor ex4 = new ScheduledThreadPoolExecutor(1);
        ScheduledThreadPoolExecutor ex5 = new ScheduledThreadPoolExecutor(1);
        ex1.scheduleAtFixedRate(new Task("EX1"), 10, 10, TimeUnit.MICROSECONDS);
        ex2.scheduleAtFixedRate(new Task("EX2"), 10, 10, TimeUnit.MICROSECONDS);
        ex3.scheduleAtFixedRate(new Task("EX3"), 10, 10, TimeUnit.MICROSECONDS);
        ex4.scheduleAtFixedRate(new Task("EX4"), 10, 10, TimeUnit.MICROSECONDS);
        ex5.scheduleAtFixedRate(new Task("EX5"), 10, 10, TimeUnit.MICROSECONDS);
        
        super.start();
    }
    
    public void run()
    {
        try
        {
            while(true)
            {
                logger.debug("[MAIN] {}", new Date().getTime() - start);
                Thread.sleep(10);
            }
        }
        catch (InterruptedException e)
        {
            logger.info("[MAIN]: Interrupted: {}", e.getMessage());
        }
    }
    
    public static void main(String[] args)
    {
        try
        {
            Main main = new Main();
            main.start();
        }
        catch (JoranException e)
        {
            System.out.println("Failed to load application: " + e.getMessage());
        }
    }
}

-------------------------------
Task.java
-------------------------------

import java.util.Date;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class Task implements Runnable
{
    private final Logger logger = LoggerFactory.getLogger(Task.class);
    private final Logger logger_main = LoggerFactory.getLogger(Main.class);
    private final String name;
    private final long start;
    
    public Task(final String name)
    {
        this.name = name;
        start = new Date().getTime();
    }

    public void run()
    {
        logger.debug("[{}] {}", name, new Date().getTime() - start);
        logger_main.debug("[MAIN] - [{}] {}", name, new Date().getTime() - start);
    }
}

 All   Comments   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Toni Heimala - 11/Dec/07 12:11 PM
Forgot the logging configuration file, this is how it looks like:

---------------------------------
logger.xml
---------------------------------

<?xml version="1.0" encoding="UTF-8" ?>

<configuration>
    <consolePlugin />
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<layout class="ch.qos.logback.classic.PatternLayout">
<pattern>%d{HH:mm:ss.SSS} - %msg%n</pattern>
</layout>
</appender>
  
<appender name="FILE-MAIN" class="ch.qos.logback.core.rolling.RollingFileAppender">
<file>logs/main.log</file>
<rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
<FileNamePattern>logs/main-%i.log</FileNamePattern>
<MinIndex>1</MinIndex>
<MaxIndex>3</MaxIndex>
</rollingPolicy>
<triggeringPolicy class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
<MaxFileSize>500MB</MaxFileSize>
</triggeringPolicy>
<layout class="ch.qos.logback.classic.PatternLayout">
<Pattern>%date %level [%thread] %msg%n</Pattern>
</layout>
</appender>

<root>
<appender-ref ref="FILE-MAIN" />
<level value="DEBUG" />
</root>

</configuration>

Ceki Gulcu - 11/Dec/07 04:59 PM

Toni, thank you for this report.

Could you also provide a a thread dump showing the deadlock?

Toni Heimala - 12/Dec/07 12:33 PM
(In reply to comment #2)
> Toni, thank you for this report.
>
> Could you also provide a a thread dump showing the deadlock?
>

2007-12-12 12:41:51
Full thread dump Java HotSpot(TM) Client VM (1.6.0_02-b06 mixed mode, sharing):

"DestroyJavaVM" prio=6 tid=0x002e6000 nid=0x1cc waiting on condition [0x00000000..0x0091fd4c]
   java.lang.Thread.State: RUNNABLE

"Thread-0" prio=6 tid=0x02fad400 nid=0x5b0 waiting for monitor entry [0x032cf000..0x032cfa94]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at ch.qos.logback.classic.Logger.callAppenders(Logger.java:265)
        - waiting to lock <0x22eade50> (a ch.qos.logback.classic.Logger)
        at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:491)
        at ch.qos.logback.classic.Logger.filterAndLog(Logger.java:445)
        at ch.qos.logback.classic.Logger.debug(Logger.java:404)
        at fi.imagesoft.test.Main.run(Main.java:67)

"pool-5-thread-1" prio=6 tid=0x02fac400 nid=0x530 waiting for monitor entry [0x0327f000..0x0327fc14]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at ch.qos.logback.classic.Logger.callAppenders(Logger.java:265)
        - waiting to lock <0x22eade50> (a ch.qos.logback.classic.Logger)
        at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:491)
        at ch.qos.logback.classic.Logger.filterAndLog(Logger.java:464)
        at ch.qos.logback.classic.Logger.debug(Logger.java:408)
        at fi.imagesoft.test.Task.run(Task.java:39)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown Source)

        at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"pool-4-thread-1" prio=6 tid=0x02fab800 nid=0x660 waiting for monitor entry [0x0322f000..0x0322fb14]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at ch.qos.logback.classic.Logger.callAppenders(Logger.java:265)
        - waiting to lock <0x22eade50> (a ch.qos.logback.classic.Logger)
        at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:491)
        at ch.qos.logback.classic.Logger.filterAndLog(Logger.java:464)
        at ch.qos.logback.classic.Logger.debug(Logger.java:408)
        at fi.imagesoft.test.Task.run(Task.java:40)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown Source)

        at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"pool-3-thread-1" prio=6 tid=0x02faa800 nid=0x67c waiting for monitor entry [0x031df000..0x031dfc94]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at ch.qos.logback.classic.Logger.callAppenders(Logger.java:265)
        - waiting to lock <0x22eade50> (a ch.qos.logback.classic.Logger)
        at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:491)
        at ch.qos.logback.classic.Logger.filterAndLog(Logger.java:464)
        at ch.qos.logback.classic.Logger.debug(Logger.java:408)
        at fi.imagesoft.test.Task.run(Task.java:40)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown Source)

        at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"pool-2-thread-1" prio=6 tid=0x02fe6c00 nid=0x130 waiting for monitor entry [0x0318f000..0x0318fb94]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at ch.qos.logback.classic.Logger.callAppenders(Logger.java:265)
        - waiting to lock <0x22eade50> (a ch.qos.logback.classic.Logger)
        at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:491)
        at ch.qos.logback.classic.Logger.filterAndLog(Logger.java:464)
        at ch.qos.logback.classic.Logger.debug(Logger.java:408)
        at fi.imagesoft.test.Task.run(Task.java:40)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown Source)

        at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"pool-1-thread-1" prio=6 tid=0x02fed000 nid=0x330 runnable [0x0313f000..0x0313fd14]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(Unknown Source)
        at java.net.SocketOutputStream.write(Unknown Source)
        at java.io.ObjectOutputStream$BlockDataOutputStream.write(Unknown Source)
        at java.io.ObjectOutputStream.defaultWriteFields(Unknown Source)
        at java.io.ObjectOutputStream.writeSerialData(Unknown Source)
        at java.io.ObjectOutputStream.writeOrdinaryObject(Unknown Source)
        at java.io.ObjectOutputStream.writeObject0(Unknown Source)
        at java.io.ObjectOutputStream.writeArray(Unknown Source)
        at java.io.ObjectOutputStream.writeObject0(Unknown Source)
        at java.io.ObjectOutputStream.defaultWriteFields(Unknown Source)
        at java.io.ObjectOutputStream.defaultWriteObject(Unknown Source)
        at ch.qos.logback.classic.spi.LoggingEvent.writeObject(LoggingEvent.java:287)
        at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at java.io.ObjectStreamClass.invokeWriteObject(Unknown Source)
        at java.io.ObjectOutputStream.writeSerialData(Unknown Source)
        at java.io.ObjectOutputStream.writeOrdinaryObject(Unknown Source)
        at java.io.ObjectOutputStream.writeObject0(Unknown Source)
        at java.io.ObjectOutputStream.writeObject(Unknown Source)
        at ch.qos.logback.core.net.SocketAppenderBase.append(SocketAppenderBase.java:160)
        at ch.qos.logback.core.AppenderBase.doAppend(AppenderBase.java:81)
        - locked <0x22ee7838> (a ch.qos.logback.classic.net.SocketAppender)
        at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:51)
        at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:282)
        at ch.qos.logback.classic.Logger.callAppenders(Logger.java:266)
        - locked <0x22eade50> (a ch.qos.logback.classic.Logger)
        at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:491)
        at ch.qos.logback.classic.Logger.filterAndLog(Logger.java:464)
        at ch.qos.logback.classic.Logger.debug(Logger.java:408)
        at fi.imagesoft.test.Task.run(Task.java:40)
        at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
        at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown Source)

        at java.util.concurrent.FutureTask.runAndReset(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(Unknown Source)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

"Low Memory Detector" daemon prio=6 tid=0x02aa9800 nid=0x1f4 runnable [0x00000000..0x00000000]
   java.lang.Thread.State: RUNNABLE

"CompilerThread0" daemon prio=10 tid=0x02aa8000 nid=0x258 waiting on condition [0x00000000..0x02d5f81c]
   java.lang.Thread.State: RUNNABLE

"Attach Listener" daemon prio=10 tid=0x02aa7000 nid=0x734 runnable [0x00000000..0x00000000]
   java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" daemon prio=10 tid=0x02aa6000 nid=0x180 waiting on condition [0x00000000..0x00000000]
   java.lang.Thread.State: RUNNABLE

"Finalizer" daemon prio=8 tid=0x02aa1800 nid=0x68c in Object.wait() [0x02c6f000..0x02c6fc94]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x22eaabd0> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        - locked <0x22eaabd0> (a java.lang.ref.ReferenceQueue$Lock)
        at java.lang.ref.ReferenceQueue.remove(Unknown Source)
        at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)

"Reference Handler" daemon prio=10 tid=0x02a9d400 nid=0x214 in Object.wait() [0x02c1f000..0x02c1fd14]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x22eaac60> (a java.lang.ref.Reference$Lock)
        at java.lang.Object.wait(Object.java:485)
        at java.lang.ref.Reference$ReferenceHandler.run(Unknown Source)
        - locked <0x22eaac60> (a java.lang.ref.Reference$Lock)

"VM Thread" prio=10 tid=0x02a94000 nid=0x390 runnable

"VM Periodic Task Thread" prio=10 tid=0x02ac4400 nid=0x208 waiting on condition


JNI global references: 1071

Heap
 def new generation total 960K, used 441K [0x229b0000, 0x22ab0000, 0x22e90000)

  eden space 896K, 42% used [0x229b0000, 0x22a0e6d0, 0x22a90000)
  from space 64K, 99% used [0x22aa0000, 0x22aafff8, 0x22ab0000)
  to space 64K, 0% used [0x22a90000, 0x22a90000, 0x22aa0000)
 tenured generation total 4096K, used 3804K [0x22e90000, 0x23290000, 0x269b0000)
   the space 4096K, 92% used [0x22e90000, 0x23247328, 0x23247400, 0x23290000)
 compacting perm gen total 12288K, used 1528K [0x269b0000, 0x275b0000, 0x2a9b0000)
   the space 12288K, 12% used [0x269b0000, 0x26b2e348, 0x26b2e400, 0x275b0000)
    ro space 8192K, 62% used [0x2a9b0000, 0x2aeaf728, 0x2aeaf800, 0x2b1b0000)
    rw space 12288K, 52% used [0x2b1b0000, 0x2b7f1eb8, 0x2b7f2000, 0x2bdb0000)

Earwin Burrfoot - 21/Sep/08 02:13 PM
That's not a "dead" lock. It's just a simple lock aquired by thread "pool-1-thread-1", which then waits for socket operation to complete. All other threads consequently wait for the same operation.
Anyway, introducing a sync around callAppenders loop (as a simple way to keep logging system runtime-reconfigurable) was one of the most stupid design decisions made in Log4J, which then carried on to logback, as its successor. It utterly defeats possibility of using log4j/logback in applications that utilize multiple threads and require lots of logging (as opposed to non-logging, which log4j/logback does pretty well).

Ceki Gulcu - 23/Sep/08 12:35 PM
As Earwin observes, there is no deadlock. However, it is true that there is contention on the same resource, that is the logger with appenders attached, in this case, the root logger. Thus, during logging all threads a synchronized on the root logger, see Logger.callAppenders(Logger.java:265). We could loosen this particular synchronization by using read-locks allowing concurrent access to the root logger. However, this would displace the contention point to appenders which can process only one event at a time. (The doAppend method in method AppenderBase is synchronized.)

I think the slowness observed is caused by the Console plug-in which grinds to a near-halt after 5 minutes of intensive use by the "Main" application above. If you restart Main, then the laggard behavior is observed almost immediately, which leads me to think that the problem is with the Console plug-in. My suspicion can be confirmed by commenting out <consolePlugin /> in the configuration file because thereafter the problem can no longer observed.

I am leaving this bug open so that we can come back to it to fix the Console plug-in. If you think that there is a dead-lock or a contention issues in logback, then I invite you to open a new issue.

Ralph Goers - 13/Oct/08 12:23 AM - edited
This bug needs to be fixed as does the bug in AppenderBase. I have a patch that fixes this issue but I don't seem to have permission to attach it to this issue. I have sent the patch to the dev list. I would appreciate it if someone with permission could attach it to this issue for me.

FWIW, in some cases slow appenders wouldn't be as much of a problem if it weren't for these synchronization points. I have a custom appender that is slow, but is thread safe so many of them can be run in parallel. With these two synchronization points however, Logback ends up showing up as a system wide bottleneck.

Ceki Gulcu - 13/Oct/08 11:18 AM
Testing attachments


Ralph,

If it will make the code simpler, I have no qualms with making
appender names non-mutable. Moreover, we can further impose that
loggers only accept named appenders rejecting anonymous one.

I see that you have displaced the synchronization blocks from Logger to
AppenderAttachableImpl, which seems quite reasonable to me. However,
and if I am reading the patch correctly, you have not removed the
synchronization from callAppender (line 265). The code still reads

  public void callAppenders(LoggingEvent event) {
    int writes = 0;
    // System.out.println("callAppenders");
    for (Logger l = this; l != null; l = l.parent) {
      // System.out.println("l="+l.getName());
      // Protected against simultaneous call to addAppender, removeAppender,...
      synchronized (l) { <--- this is what I am takling about
        writes += l.appendLoopOnAppenders(event);
        if (!l.additive) {
          break;
        }
      }
    }


You have also kept a synchronization point in addAppender

- public synchronized void addAppender(Appender<LoggingEvent> newAppender) {
- if (aai == null) {
- aai = new AppenderAttachableImpl<LoggingEvent>();
+ public void addAppender(Appender<LoggingEvent> newAppender) {
+ synchronized(this) {
+ if (aai == null) {
+ aai = new AppenderAttachableImpl<LoggingEvent>();
+ }
     }

So I don't see any gain with respect to the existing code. What am I
missing?

Having said that, I do not wish to offhandedly discard your contribution. I was
thinking that given that the 'aai' variable is assigned only once (and never set to
null again), it would be possible to remove both remaining synchronization points.
This would completely get rid of synchronization on "this" (the Logger instance)
with synchronization point moved into ApppenderAttachableImpl, which frankly
does not seem to be much of a gain to be worth the trouble. Would you concur with
my analysis?

Ralph Goers - 14/Oct/08 09:04 PM - edited
Well, that was silly. Of course I should have removed the synchronization on the logger in callAppenders. I can't believe I didn't do that since that was the whole point.

Yes, requiring Appenders to have immutable names that are set before the appender is attached will make AppenderAttachableImpl much simpler since the name can then reliably be used in the ConcurrentMap. The extra code to account for the name being missing can just go away.

As for the synchronization in addAppender, it is marginally better since it doesn't block aai.addAppender. The synchronization it is still doing only protects against a race condition on creating the AppenderAttachableImpl. However, this might not not even occur.

My concern is for what happens when a reconfigure occurs. Is it possible that a Logger didn't have an AppenderAttachableImpl (i.e. it has no Appenders) and then does after a reconfigure? If so, then aai may need to be an AtomicReference. I haven't looked at how the Interpreter works and how loggers get updated, etc.so I'm not sure.

I appreciate you looking at the code. Multiple pairs of eyeballs are always better than 1 set.

Ralph Goers - 16/Oct/08 09:55 PM - edited
Updated patch. This one makes the Appender name required and immutable, removes all the synchronization from Logger while calling Appenders and makes sure that an AppenderAttachableImpl is always attached to the Logger eliminating any race conditions that might occur during reconfiguration.

Ceki Gulcu - 17/Oct/08 06:22 PM
I think we should talk over your patch over the phone or via IRC.

Ralph Goers - 20/Oct/08 08:24 AM
Uploaded patch3.txt. This version uses CopyOnWriteArrayList instead of ConcurrentHashMap in AppenderAttachableImpl. It does not change the constructor for AppenderBase so is a lot less intrusive than patch2. I included all the updates from the patch in LBCORE-58 in this patch. I also added a test into LoggerPerfTest that demonstrates the difference. If LoggerPerfTest is run on trunk without the patch the elapsed time is significantly longer.

Ceki Gulcu - 20/Oct/08 08:47 PM
Hi Ralph,

I have applied your patch on my work copy. However, I suspect using ReadLocks on read operations on AppenderAttachableImpl (appenderOnLoop, getAppender, isAttached, iteratorForAppenders) and WriteLocks on write operations (addAppender, detachAndStopAllAppenders, detachAppender) would be safer although appenderOnLoop would be a bit slower and still free of interference among threads.

Nevertheless, I will commit a variant of your patch in the next few minutes.

Paweł Paprota - 15/Nov/08 03:38 PM
What is the current status of this issue? I think I got hit by this one, at least that's what I'm seeing right now in Eclipse when I suspend all threads in my application - one thread has callAppenders in its stack trace and it hanged (yesterday) on socketWrite. Some of the other threads seem to be waiting on a lock in callAppenders. As a result, I don't see any more logging appended to my log file and the console.

I'm using logback-core/classic 0.9.9 with slf4j 1.5.3.

Did the Ralph's patch make it to any of the newer logback releases? I was not able to find any change log or release announcement for releases newer than 0.9.9...

I would really appreciate some feedback, thank you.

Below is the stack trace of the "offending" thread (only relevant calls included):

Thread [Thread-12] (Suspended)
SocketOutputStream.socketWrite0(FileDescriptor, byte[], int, int) line: not available [native method]
SocketOutputStream.socketWrite(byte[], int, int) line: 92
SocketOutputStream.write(byte[], int, int) line: 136
ObjectOutputStream$BlockDataOutputStream.drain() line: 1685
ObjectOutputStream$BlockDataOutputStream.write(byte[], int, int, boolean) line: 1649
ObjectOutputStream.defaultWriteFields(Object, ObjectStreamClass) line: 1368
ObjectOutputStream.writeSerialData(Object, ObjectStreamClass) line: 1347
ObjectOutputStream.writeOrdinaryObject(Object, ObjectStreamClass, boolean) line: 1290
ObjectOutputStream.writeObject0(Object, boolean) line: 1079
ObjectOutputStream.writeArray(Object, ObjectStreamClass, boolean) line: 1251
ObjectOutputStream.writeObject0(Object, boolean) line: 1075
ObjectOutputStream.defaultWriteFields(Object, ObjectStreamClass) line: 1375
ObjectOutputStream.defaultWriteObject() line: 391
LoggingEvent.writeObject(ObjectOutputStream) line: 288
GeneratedMethodAccessor1118.invoke(Object, Object[]) line: not available
DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 25
Method.invoke(Object, Object...) line: 585
ObjectStreamClass.invokeWriteObject(Object, ObjectOutputStream) line: 917
ObjectOutputStream.writeSerialData(Object, ObjectStreamClass) line: 1339
ObjectOutputStream.writeOrdinaryObject(Object, ObjectStreamClass, boolean) line: 1290
ObjectOutputStream.writeObject0(Object, boolean) line: 1079
ObjectOutputStream.writeObject(Object) line: 302
SocketAppender(SocketAppenderBase<E>).append(E) line: 163
SocketAppender(AppenderBase<E>).doAppend(E) line: 81
AppenderAttachableImpl<E>.appendLoopOnAppenders(E) line: 51
Logger.appendLoopOnAppenders(LoggingEvent) line: 282
Logger.callAppenders(LoggingEvent) line: 266
Logger.buildLoggingEventAndAppend(String, Marker, Level, String, Object[], Throwable) line: 487
Logger.filterAndLog(String, Marker, Level, String, Object, Object, Throwable) line: 460
Logger.debug(String, Object, Object) line: 406

And those are the ones being blocked:

Thread [Thread-4] (Suspended)
Logger.callAppenders(LoggingEvent) line: 265
Logger.buildLoggingEventAndAppend(String, Marker, Level, String, Object[], Throwable) line: 487
Logger.filterAndLog(String, Marker, Level, String, Object[], Throwable) line: 479
Logger.debug(String, Throwable) line: 414

Daemon Thread [ActiveMQ Connection Worker: tcp://localhost/127.0.0.1:61616] (Suspended)
Logger.callAppenders(LoggingEvent) line: 265
Logger.buildLoggingEventAndAppend(String, Marker, Level, String, Object[], Throwable) line: 487
Logger.filterAndLog(String, Marker, Level, String, Object[], Throwable) line: 479
Logger.log(Marker, String, int, String, Throwable) line: 818
SLF4JLocationAwareLog.info(Object, Throwable) line: 152
AdvisoryConsumer.dispose() line: 58
ActiveMQConnection.close() line: 563
CachingConnectionFactory(SingleConnectionFactory).closeConnection(Connection) line: 364
CachingConnectionFactory(SingleConnectionFactory).resetConnection() line: 302
CachingConnectionFactory.resetConnection() line: 120
CachingConnectionFactory(SingleConnectionFactory).onException(JMSException) line: 283
SingleConnectionFactory$InternalChainedExceptionListener(ChainedExceptionListener).onException(JMSException) line: 60
ActiveMQConnection$3.run() line: 1690
ThreadPoolExecutor$Worker.runTask(Runnable) line: 650
ThreadPoolExecutor$Worker.run() line: 675
Thread.run() line: 595

Ceki Gulcu - 15/Nov/08 05:01 PM
Pawel,

Ralph's made has made it into the trunk and was part of the 0.9.11 release. The bad news is that the problem that you are suffering from is in the eclipse plug-in. Ralph's patch merely displaces the deadlock elsewhere.

Ralph Goers - 15/Nov/08 06:12 PM
I should point out, this isn't a deadlock but a bottleneck. By removing the locking in Logback we allow Appenders to better handle situations like this. But to do that the Appender will have to extend UnsynchronizedAppenderBase, which the SocketAppender doesn't do. Even if it did, because the SocketAppender has a scarce resource, the socket, it would eventually have to block on that.

The SocketAppender could be modified to use a pool of sockets, but then you would just end up with a bunch of threads sitting in SocketWrite. The advantage is that if the sockets have a 5 second timeout and you have 10 threads then the average log event takes 50 seconds. With a socket pool the same size as the number of threads the average log event would be 5 seconds. Still awful but 10 times better.

What is needed is a better way to handle Appenders that are "Out of Service". I will create a Jira issue for that and create a patch at some point.

Paweł Paprota - 16/Nov/08 06:45 PM
Thanks, I will update to 0.9.11 ASAP then.

Good luck with fixing it "the right way" :-)

Robert Dale - 23/Mar/09 02:51 PM
Any progress on this issue? I'm getting:

   java.lang.Thread.State: BLOCKED on ch.qos.logback.core.rolling.RollingFileAppender@2db3f4 owned by: http-127.0.0.1-8080-2
        at ch.qos.logback.core.AppenderBase.doAppend(AppenderBase.java:66)
        at ch.qos.logback.core.spi.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:60)
        at ch.qos.logback.classic.Logger.appendLoopOnAppenders(Logger.java:270)
        at ch.qos.logback.classic.Logger.callAppenders(Logger.java:257)
        at ch.qos.logback.classic.Logger.buildLoggingEventAndAppend(Logger.java:439)
        at ch.qos.logback.classic.Logger.filterAndLog_0_Or3Plus(Logger.java:393)
        at ch.qos.logback.classic.Logger.debug(Logger.java:500)

Thanks.

Ceki Gulcu - 23/Mar/09 03:38 PM
Hello Robert,

Any progress on what exactly? This issue is related to logback console plug-in for eclipse as such as logback's threading model. Are you using the the logback console plug-in for eclipse?

Robert Dale - 23/Mar/09 04:05 PM
I'm getting that deadlock in logback core (0.9.14 and 0.9.15). No console plugin or socket appenders, just plain RollingFileAppender.

Ceki Gulcu - 23/Mar/09 04:14 PM
In that case, please open a new jira issue. It would also be very helpful if you could provide the full thread dump. Please do not continue to comment on the present issue. (I intend to mark it as closed.)

Joern Huxhorn - 19/May/09 06:03 PM
Hi guys,

Our webapp running on a 4-core solaris is suffering* the same problem as described in http://jira.qos.ch/browse/LBCORE-63 and http://jira.qos.ch/browse/LBCORE-96 and I think I have an idea about the reason behind this.
Threading policy has changed from 1.5 to 1.6. I can't give you a reference for this, it's purely based on observation. In 1.5, a thread waiting for a lock was somewhat preferred by the VM over threads that would arrive at the wait() at a later time. In 1.6, however, a thread that releases synchronization is much more likely to immediately obtain the object lock again if it's in a consumer/producer loop.

for(;;)
{
synchronized(obj)
{
// thread is likely to reobtain lock
}
}

The same is the case for wait/notifyAll combinations. One can not assume that a longer waiting thread will likely obtain the lock after notifyAll.

This is documented in http://java.sun.com/javase/6/docs/api/java/lang/Object.html#notifyAll() so the changed behavior between 1.5 and 1.6 is perfectly legal.

My assumption is that the perceived deadlock, which is actually a congestion/starvation problem, occurs if one thread is logging "like crazy" while the other thread is sluggish as hell because he can't obtain the lock.

This should be solvable by using a fair ReentrantLock
(See http://java.sun.com/javase/6/docs/api/java/util/concurrent/locks/ReentrantLock.html#ReentrantLock(boolean) )

From the javadoc:
"The constructor for this class accepts an optional fairness parameter. When set true, under contention, locks favor granting access to the longest-waiting thread. Otherwise this lock does not guarantee any particular access order. Programs using fair locks accessed by many threads may display lower overall throughput (i.e., are slower; often much slower) than those using the default setting, but have smaller variances in times to obtain locks and guarantee lack of starvation."

The guaranteed lack of starvation is what we are looking for, I think, This is something that pure synchronization simply can't provide.

I'd suggest to give the attached patch a try, I can't do so myself because I can't risk our webapp and I have no other chance to test this right now.

I'll crosspost this to both tickets cause I'm not sure which one is more "active" right now.

* The webapp shows that behavior in case of either console, file and/or lilith/socket appenders. It's just much more likely to happen with the lilith/socket appender in place since it takes significantly longer to send events over the wire than just writing them to disk or console.

Ceki Gulcu - 28/May/09 08:06 PM
This issue is tainted by the fact that the eclipse console plug-in is being used. If you are really concerned about a deadlock issue, please open a new and separate issue. Suppling a test case would be most appropriate. In the mean time, as forewarned on March the 23rd, I am closing this issue.