Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DateFormat garbage characters in output #1213

Open
artnaseef opened this issue Jan 5, 2025 · 14 comments
Open

DateFormat garbage characters in output #1213

artnaseef opened this issue Jan 5, 2025 · 14 comments

Comments

@artnaseef
Copy link

What are you trying to do?
Format the date/time with the following code:
DateFormat.getTimeInstance(DateFormat.SHORT, Locale.US).format(date)

Expected behaviour:
The result string contains the properly formatted date and no garbage/extraneous characters.

Observed behaviour:
The result string contains the date, but also contains garbage characters instead of a space preceeding the "AM" / "PM" text.

Any other comments:
Tested with the following:
OpenJDK Runtime Environment Temurin-21.0.2+13 (build 21.0.2+13-LTS)
OpenJDK Runtime Environment Temurin-21.0.5+11 (build 21.0.5+11-LTS)

Also tested SUCCESSFULLY (i.e. no garbage in the output) with the following, and other JVM's:
OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9)

Here is example output using cat -v and xxd:
4:49M-bM-^@M-/PM

00000000: 343a 3530 e280 af50 4d0a                 4:50...PM.

Here is the code of a complete test program:

import java.text.DateFormat;
import java.util.Date;
import java.util.Locale;

public class FormatDateSimplest {
    public static void main(String[] args) {
        DateFormat timeFormat = DateFormat.getTimeInstance(DateFormat.SHORT, Locale.US);
        Date date = new Date();
        System.out.println(timeFormat.format(date));
    }
}

@sxa
Copy link
Member

sxa commented Jan 6, 2025

I'm going to transfer this to the support repository - temurin-build is for the scripts that build and distribute Temurin.

Also tested SUCCESSFULLY (i.e. no garbage in the output) with the following, and other JVM's:

When you say "other JVMs" are you suggesting it passes with an equivalent OpenJDK version from other vendors? Can you say which ones?

@sxa sxa transferred this issue from adoptium/temurin-build Jan 6, 2025
@artnaseef
Copy link
Author

THank you @sxa. Other JVMs are the following:

            Eclipse Temurin 17 JDK
                    openjdk version "17.0.9" 2023-10-17
                    OpenJDK Runtime Environment Temurin-17.0.9+9 (build 17.0.9+9)
                    OpenJDK 64-Bit Server VM Temurin-17.0.9+9 (build 17.0.9+9, mixed mode, sharing)

            Eclipse Temurin 11 JDK
                    openjdk version "11.0.24" 2024-07-16
                    OpenJDK Runtime Environment Temurin-11.0.24+8 (build 11.0.24+8)
                    OpenJDK 64-Bit Server VM Temurin-11.0.24+8 (build 11.0.24+8, mixed mode)

            Java 1.8.0_261 (Oracle)
                    java version "1.8.0_261"
                    Java(TM) SE Runtime Environment (build 1.8.0_261-b12)
                    Java HotSpot(TM) 64-Bit Server VM (build 25.261-b12, mixed mode)

The labels are mine - the lines under the lables are the output of "java -version".

@jerboaa
Copy link

jerboaa commented Jan 7, 2025

@artnaseef What you are observing is another version of https://bugs.openjdk.org/browse/JDK-8324308 caused by the CLDR 42.0 update done in JDK 20 (also included in JDK 21). What you need to do is use a custom formatter to get the simple space (over the horizontal non-breaking space before AM/PM) if that is what you need in JDK 21+. Hope that helps.

@artnaseef
Copy link
Author

Is there a straight-forward way to get the plain-text / ASCII-compatible DateFormat.SHORT equivalent?

I'm having a little trouble wrapping my head around java.text date formatting including non-breaking spaces. I've never heard of CLDR before.

@artnaseef
Copy link
Author

BTW, I notice the labeled "WAITING ON OP"? Is there something more I need to do here?

@karianna
Copy link
Contributor

karianna commented Jan 11, 2025

I haven't dug into this too much, but a quick query to Copilot gives me:

To ensure that the output format has a simple space instead of any unexpected characters before AM/PM, you can use SimpleDateFormat from the java.text package. Here’s the updated code:

import java.text.SimpleDateFormat;
import java.util.Date;

public class FormatDateSimplest {
    public static void main(String[] args) {
        // Define the custom format
        SimpleDateFormat timeFormat = new SimpleDateFormat("h:mm a");
        
        // Create a new date instance for the current time
        Date date = new Date();
        
        // Format the date and print it
        System.out.println(timeFormat.format(date));
    }
}

Explanation:
1. Pattern "h:mm a":
• h - Hour in 12-hour format (1-12).
• mm - Minutes (00-59).
• a - AM/PM marker.
• There is a single space between the time and the AM/PM marker.
2. Why SimpleDateFormat?
• SimpleDateFormat allows you to define custom formatting patterns explicitly, so there is no ambiguity with locale-based formatting issues (such as non-breaking spaces).

When you run this code, the output will look like this:

4:49 PM

with a regular space before AM/PM._

@artnaseef
Copy link
Author

I haven't dug into this too much, but a quick query to Copilot gives me:

To ensure that the output format has a simple space instead of any unexpected characters before AM/PM, you can use SimpleDateFormat from the java.text package. Here’s the updated code:

import java.text.SimpleDateFormat;
import java.util.Date;

public class FormatDateSimplest {
    public static void main(String[] args) {
        // Define the custom format
        SimpleDateFormat timeFormat = new SimpleDateFormat("h:mm a");
        
        // Create a new date instance for the current time
        Date date = new Date();
        
        // Format the date and print it
        System.out.println(timeFormat.format(date));
    }
}

Explanation: 1. Pattern "h:mm a": • h - Hour in 12-hour format (1-12). • mm - Minutes (00-59). • a - AM/PM marker. • There is a single space between the time and the AM/PM marker. 2. Why SimpleDateFormat? • SimpleDateFormat allows you to define custom formatting patterns explicitly, so there is no ambiguity with locale-based formatting issues (such as non-breaking spaces).

When you run this code, the output will look like this:

4:49 PM

with a regular space before AM/PM._

Thank you for the response. In my case, the formatted date is going to individuals who may be anywhere geographically, so I don't want to use fixed date and time formats - I want to use the formats that are specific to their locale. Ignore the hard-coded locale in my snippet please.

@jerboaa
Copy link

jerboaa commented Jan 21, 2025

You could try if -Djava.locale.providers=COMPAT works, but that option is gone in later JDKs.

@artnaseef
Copy link
Author

artnaseef commented Jan 21, 2025 via email

@artnaseef
Copy link
Author

artnaseef commented Jan 21, 2025

Thanks Severin. So the standard (CLDR?) does not address this?

Perhaps this is just my lack of understanding UTF-8. Is it reasonable to
expect standard regular expression processors (e.g. java.lang.Matcher) to
treat this non-breaking space as a space (e.g. matching with \s predefined
character class in a java regex)?

Art

Just tested it, and the regex failed to match the non-breaking character.

Art

@artnaseef
Copy link
Author

Any thoughts on how to pursue this further?

It feels to me like the JDK is doing the wrong thing here since some of the text tools seem to make use of the full UTF-8 space (e.g. the date formatting), while others ignore it (e.g. regex).

If there is a desire to go all-in with UTF-8, then shouldn't the regex handle it? This is a breaking issue.

@artnaseef
Copy link
Author

Is there another / more-appropriate place to raise this concern?

@jerboaa
Copy link

jerboaa commented Feb 6, 2025

Feel free to raise this issue on core-libs-dev on the OpenJDK project.

@jerboaa
Copy link

jerboaa commented Feb 6, 2025

treat this non-breaking space as a space (e.g. matching with \s predefined character class in a java regex)?

The \s is defined in javadoc as:

\s 	A whitespace character: [ \t\n\x0B\f\r] if UNICODE_CHARACTER_CLASS is not set. See Unicode Support.

That doesn't include a narrow non-breaking space, AFAIK.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

4 participants