Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ConcurrentModificationException in RemoteFsTimestampAwareTranslog.trimUnreferencedReaders #17028

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sachinpkale
Copy link
Member

@sachinpkale sachinpkale commented Jan 15, 2025

Description

  • In RemoteFsTimestampAwareTranslog.trimUnreferencedReaders, in order to update file tracker to reflect local translog state, we fetch the minimum generation across all readers and delete all generations from FileTransferTracker that are less than the min generation.

Optional<Long> minLiveGeneration = readers.stream().map(BaseTranslogReader::getGeneration).min(Long::compareTo);
if (minLiveGeneration.isPresent()) {
List<String> staleFilesInTracker = new ArrayList<>();
for (String file : fileTransferTracker.allUploaded()) {
if (file.endsWith(TRANSLOG_FILE_SUFFIX)) {
long generation = Translog.parseIdFromFileName(file);
if (generation < minLiveGeneration.get()) {
staleFilesInTracker.add(file);
staleFilesInTracker.add(Translog.getCommitCheckpointFileName(generation));
}
}
fileTransferTracker.delete(staleFilesInTracker);
}
}

java.util.ConcurrentModificationException
	at __randomizedtesting.SeedInfo.seed([B332A1405CA5C597:97EB586D87323614]:0)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1715)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:570)
	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:560)
	at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:265)
	at java.base/java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:702)
	at java.base/java.util.stream.ReferencePipeline.min(ReferencePipeline.java:748)
	at org.opensearch.index.translog.RemoteFsTimestampAwareTranslog.trimUnreferencedReaders(RemoteFsTimestampAwareTranslog.java:128)
	at org.opensearch.index.translog.RemoteFsTimestampAwareTranslog.trimUnreferencedReaders(RemoteFsTimestampAwareTranslog.java:117)
	at org.opensearch.index.translog.Translog.lambda$acquireTranslogGenFromDeletionPolicy$12(Translog.java:744)
	at org.opensearch.index.translog.MultiSnapshot.close(MultiSnapshot.java:99)
	at org.opensearch.index.translog.Translog$SeqNoFilterSnapshot.close(Translog.java:1044)
	at org.opensearch.index.translog.RemoteFsTranslogTests$5.doRun(RemoteFsTranslogTests.java:1188)
	at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
	at java.base/java.lang.Thread.run(Thread.java:1575)
  • But there already exists a method in Translog class that safely returns the min generation by taking a read lock: getMinFileGeneration()
  • In this PR, we use getMinFileGeneration() to get the min generation.

Related Issues

Check List

  • [ ] Functionality includes testing.
  • [ ] API changes companion pull request created, if applicable.
  • [ ] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

❌ Gradle check result for bbb355d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

✅ Gradle check result for 044793f: SUCCESS

Copy link

codecov bot commented Jan 15, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 72.25%. Comparing base (f9c239d) to head (044793f).
Report is 2 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #17028      +/-   ##
============================================
+ Coverage     72.11%   72.25%   +0.14%     
- Complexity    65260    65326      +66     
============================================
  Files          5301     5301              
  Lines        303801   303803       +2     
  Branches      44029    44028       -1     
============================================
+ Hits         219098   219528     +430     
+ Misses        66686    66280     -406     
+ Partials      18017    17995      -22     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@skumawat2025 skumawat2025 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

}
fileTransferTracker.delete(staleFilesInTracker);
Copy link
Contributor

@skumawat2025 skumawat2025 Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good to me @sachinpkale . Just one question regarding this delete function call.
We are calling fileTransferTracker.delete(staleFilesInTracker); inside this for loop. In case of multiple generation uploaded won't we be calling delete for already deleted files? Just thinking if we can move out this delete function call out of this for loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants