Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLDR-18217 Remove Iterable interface from CLDRFile, add iteratorDefault, iterableDefault #4285

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

btangmu
Copy link
Member

@btangmu btangmu commented Jan 16, 2025

-Instead of Iterable interface, use new methods iterableDefault, iterableWithoutExtras

-Private final boolean CLDRFile.DEFAULT_ITERATION_INCLUDES_EXTRAS determines whether iteratorDefault and iterableDefault skip extras; either way they skip paths with null values

-Replace original iterator methods (zero or more parameters) with iteratorDefault or iteratorWithoutExtras

-Iterable and iterator are distinct, as in iterableWithoutExtras versus iteratorWithoutExtras

-Call iteratorWithoutExtras/iterableWithoutExtras directly where tests would otherwise fail with DEFAULT_ITERATION_INCLUDES_EXTRAS true

-Make methods private if possible without failing tests; surprisingly, all the WithoutExtras methods can be private

CLDR-18217

  • This PR completes the ticket.

ALLOW_MANY_COMMITS=true

…lt, iterableDefault

-Instead of Iterable interface, use new methods iterableDefault, iterableWithoutExtras

-Private boolean CLDRFile.DEFAULT_ITERATION_INCLUDES_EXTRAS determines whether iteratorDefault and iterableDefault skip extras; either way they skip paths with null values

-Replace original iterator methods (zero or more parameters) with iteratorDefault or iteratorWithoutExtras

-Iterable and iterator are distinct, as in iterableWithoutExtras versus iteratorWithoutExtras

-Call iteratorWithoutExtras/iterableWithoutExtras directly where tests would otherwise fail with DEFAULT_ITERATION_INCLUDES_EXTRAS true
@btangmu
Copy link
Member Author

btangmu commented Jan 16, 2025

This goes further than #4280

@@ -105,7 +105,7 @@ private static void showEnglish(PrintWriter out) throws IOException {
String locale = "en";
Set<String> sorted =
Builder.with(new TreeSet<String>())
.addAll(cldrFile.iterator())
.addAll(cldrFile.iteratorDefault())
.addAll(cldrFile.getExtraPaths())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: if iteratorDefault includes extra paths, then getExtraPaths would be superfluous here, except that iteratorDefault skips paths with null values (getStringValue(path) == null)

alternatively fullIterator() could be called here

@btangmu
Copy link
Member Author

btangmu commented Jan 16, 2025

An essential difference from #4280 is that here, iteration always skips paths with null values (except for fullIterable which remains unchanged). Because of this, none of the tests need to call the iterableWithoutExtras or iteratorWithoutExtras

@@ -98,7 +99,7 @@
* http://java.sun.com/j2se/1.4.2/docs/api/org/xml/sax/DTDHandler.html
*/

public class CLDRFile implements Freezable<CLDRFile>, Iterable<String>, LocaleStringProvider {
public class CLDRFile implements Freezable<CLDRFile>, LocaleStringProvider {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major change: CLDRFile no longer implements Iterable. Instead, a method must be called to get an iterable/iterator

@@ -1462,22 +1463,70 @@ public static Set<String> getMatchingXMLFiles(File sourceDirs[], Matcher m) {
return s;
}

@Override
public Iterator<String> iterator() {
private final boolean DEFAULT_ITERATION_INCLUDES_EXTRAS = true;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this boolean determines whether default iterator/iterable includes extra paths


public Iterator<String> iteratorDefault() {
if (DEFAULT_ITERATION_INCLUDES_EXTRAS) {
return Iterators.filter(fullIterable().iterator(), p -> getStringValue(p) != null);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fullIterable is unchanged. If DEFAULT_ITERATION_INCLUDES_EXTRAS is true, iteratorDefault is the same as fullIterable except that it skips paths with null values

return dataSource.iterator();
}

public synchronized Iterator<String> iterator(String prefix) {
private synchronized Iterator<String> iteratorWithoutExtras(String prefix) {
return dataSource.iterator(prefix);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original iterator methods are renamed to iteratorWithoutExtras and they are now private and rarely used if DEFAULT_ITERATION_INCLUDES_EXTRAS is true

@@ -3081,7 +3146,7 @@ public String getWinningPath(String path) {
public Collection<String> getExtraPaths() {
Set<String> toAddTo = new HashSet<>();
toAddTo.addAll(getRawExtraPaths());
for (String path : this) {
for (String path : this.iterableWithoutExtras()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterableWithoutExtras is required here to prevent infinite recursion, since the caller is getExtraPaths

@@ -25,7 +25,7 @@ private CharacterFallbacks() {
DtdData.getInstance(DtdType.supplementalData).getDtdComparator(null);

for (Iterator<String> it =
characterFallbacks.iterator("//supplementalData/characters/", comp);
characterFallbacks.iteratorDefault("//supplementalData/characters/", comp);
it.hasNext(); ) {
String path = it.next();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the callers of iterator outside of CLDRFile have become iteratorDefault. Note, BTW, that this loop, and many like it, should be simplified by calling iterableDefault instead of iteratorDefault. I've postponed that simplification, pending decision on the overall approach. Also, including the word "default" in the method names might be temporary; for now, it helps to indicate which code is affected by changing the default behavior from excluding to including extra paths

@btangmu btangmu requested review from macchiati and srl295 and removed request for macchiati February 4, 2025 19:12
@btangmu
Copy link
Member Author

btangmu commented Feb 4, 2025

I'm requesting review even though this is still a DRAFT PR. Question is how to proceed, not necessarily to merge this as-is.

@@ -48,7 +48,7 @@ public static void main(String[] args) {
Set<String> children = entry.getValue();
for (String child : children) {
CLDRFile file = cldrFactory.make(child, false);
for (String path : file) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit here for the many clients that just need for (String path : cldrFile) ? Why didn't we just leave the iteratable interface on and make it equivalent to iterableDefault - what's the downside?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, i read the issue and comment thread. I don't quite follow it- to me they are two different usage models, whether you are trying to work with what's actually in the file, or what you could find in the file (as with TestCoverage). TestCoverage, the impetus for this issue, is to me the special case. But anyway, no objection.

Copy link
Member Author

@btangmu btangmu Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only reason to use explicit methods instead of the iterable interface is to facilitate finding all the usages and enabling multiple versions, including/excluding extra paths, and including/excluding paths without values. Once the dust has settled and we know what we want, it may be appropriate to restore the iterable interface.

The distinction between "what's actually in the file" and "what you could find in the file" is hard for me to interpret. For an XML file on disk, it's concrete. For an XMLSource it's more abstract, since the "constructed" paths with fallback code values are "in the file" (the XMLSource in RAM) even though they're not in the XML file on disk. For a CLDRFile it's even more nebulous since the "extra paths" are contained in the CLDRFile object but they don't have values, and if you loop through the paths in the CLDRFile you may or may not encounter the extra paths depending on which method you use for looping. So the concept of being "actually in the file" doesn't seem well defined to me.

TestCoverageLevel is one impetus for this issue. Another is CldrXmlWriter, which failed to write paths even though they had values that had been voted on. Those paths were "in the file" in the sense that they were in the vote table in the database, but they were not "in the file" in the sense of being in the XMLSource, and the question of whether they were "in the CLDRFile" was nebulous due to their having started out as "extra paths" and then been voted for.

My impression, based partly on the work of making this PR, is that the distinction between "paths with values" and "paths without values" is important and relatively well defined. There's no need to treat "extra paths" per se as not really being "in the file". For some purposes, like constructing a DataPage, all the paths are needed, including those without values. For other purposes, it's appropriate to skip the paths that don't have values.

This PR seems to imply that these kinds of iteration are generally useful:

  1. Including all paths
  2. Including all paths that have (non-null) values

A third kind, including all paths except extra paths, is only useful in a very limited context, in CLDRFile when adding extra paths, and that's to avoid infinite recursion.

srl295
srl295 previously approved these changes Feb 5, 2025
-new HashSet<>(getRawExtraPaths()) is just slightly shorter code

-To get root LSR codes, use fullIterable, not necessary to skip any paths
Copy link
Member

@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a massive change. I was thinking that the change from

for (String s : f) {

to

for (String s : f.iterableDefault()) {

was just an intermediate change while you were investigating, adding alternative iteration, and fixing cases that shouldn't use the default.

Since you've done that now, I don't see why we should make this change.

@btangmu
Copy link
Member Author

btangmu commented Feb 6, 2025

@macchiati "This is a massive change..." Yes, this is a draft PR. As I wrote earlier, "I'm requesting review even though this is still a DRAFT PR. Question is how to proceed, not necessarily to merge this as-is."

Also, as I wrote in response to @srl295 , "Once the dust has settled and we know what we want, it may be appropriate to restore the iterable interface."

How should we proceed?

I suggest:

  1. Treat this PR as a proof of concept, and start a new branch from main, since nearly all the files changed by this PR don't need changing
  2. Keep the iterable interface, and change it to be what's "default" in this PR, that is, skip paths with null values, but don't skip extra paths (unless they have null values)
  3. Keep the fullIterable method as it is now, that is, include all paths, even those with null values
  4. Use fullIterable instead of addExtraPaths in GenerateCoverageLevels.java per the comment

I would make a new PR in which the changes are mostly confined to CLDRFile.java.

Does that sound right?

@btangmu btangmu requested a review from macchiati February 6, 2025 03:13
@macchiati
Copy link
Member

Yes, that sounds good!

Great work investigating this!

@macchiati "This is a massive change..." Yes, this is a draft PR. As I wrote earlier, "I'm requesting review even though this is still a DRAFT PR. Question is how to proceed, not necessarily to merge this as-is."

Also, as I wrote in response to @srl295 , "Once the dust has settled and we know what we want, it may be appropriate to restore the iterable interface."

How should we proceed?

I suggest:

  1. Treat this PR as a proof of concept, and start a new branch from main, since nearly all the files changed by this PR don't need changing
  2. Keep the iterable interface, and change it to be what's "default" in this PR, that is, skip paths with null values, but don't skip extra paths (unless they have null values)
  3. Keep the fullIterable method as it is now, that is, include all paths, even those with null values
  4. Use fullIterable instead of addExtraPaths in GenerateCoverageLevels.java per the comment

I would make a new PR in which the changes are mostly confined to CLDRFile.java.

Does that sound right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants