Uploaded image for project: 'logback'
  1. logback
  2. LOGBACK-1642

LayoutWrappingEncoder does not use the correct default charset for the console

    XMLWordPrintable

Details

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • 1.5.7
    • 1.2.11
    • logback-core
    • None
    • Windows 10; Java 17; locale en-US

    Description

      Let's say I have this typical Logback configuration to write output to stderr, with pretty colors and such:

      <configuration>
        <property scope="context" name="COLORIZER_COLORS" value="boldred@,boldyellow@,boldcyan@,@,@" />
        <conversionRule conversionWord="colorize" converterClass="org.tuxdude.logback.extensions.LogColorizer" />
        <statusListener class="ch.qos.logback.core.status.NopStatusListener" />
        <appender name="STDERR" class="ch.qos.logback.core.ConsoleAppender">
          <target>System.err</target>
          <withJansi>true</withJansi>
          <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
            <pattern>[%colorize(%level)] %msg%n</pattern>
          </encoder>
        </appender>
        <root level="INFO">
          <appender-ref ref="STDERR" />
        </root>
      </configuration>
      

      The key part is that I have a PatternLayoutEncoder (a descendant of LayoutWrappingEncoder) logging via a ConsoleAppender to System.err.

      The default charset for a LayoutWrappingEncoder (discussed in depth on Stack Overflow) is Charset.defaultCharset(). (How it gets that is complicated, but ultimately it relies on String.getBytes().) There's just one big problem: the default charset of System.out and System.err is System.console().charset(), not Charset.defaultCharset(), as per the API documentation for e.g. System.out:

      The "standard" output stream. This stream is already open and ready to accept output data. Typically this stream corresponds to display output or another output destination specified by the host environment or user. The encoding used in the conversion from characters to bytes is equivalent to Console.charset() if the Console exists, Charset.defaultCharset() otherwise.

      On my system for example, Charset.defaultCharset() is set to windows-1252, while System.console().charset() returns IBM437. This results in mojibake: if I try to log the string "é" via Logback, it appears in System.out or System.err as Θ instead! (See discussion on Stack Overflow.)

      Thus LayoutWrappingEncoder somehow needs to default to System.console().charset() (instead of Charset.defaultCharset() as it does now) if it is appending to System.out or System.err. (I can't manually specify a charset because I certainly don't know what the console default charset will be on each user's machine, as there will be many different values for different users.)

      Unfortunately LayoutWrappingEncoder probably has no idea where it's writing to and probably shouldn't care. So instead, LayoutWrappingEncoder should be able to ask the enclosing OutputStreamAppender for the current charset. OutputStreamAppender could then default to Charset.defaultCharset() if not specified, and ConsoleAppender could override the default to return System.console().charset() instead of Charset.defaultCharset(). Problem solved, with the added benefit that the default charset now comes explicitly from the OutputStreamAppender implementation rather than indirectly form String.getBytes() hidden in the bowels of LayoutWrappingEncoder.

      Attachments

        Activity

          People

            logback-dev Logback dev list
            garretwilson Garret Wilson
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: