-
Notifications
You must be signed in to change notification settings - Fork 117
Handle multiple updates to the same record in a transaction #3722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Handle multiple updates to the same record in a transaction #3722
Conversation
…latest IndexWriter changes
# Conflicts: # fdb-record-layer-lucene/src/test/java/com/apple/foundationdb/record/lucene/LuceneIndexMaintenanceTest.java
|
|
||
| /** | ||
| * Try to find the document for the given record in the segment index. | ||
| * This method would first try to find the document using teh existing reader. If it can't, it will refresh the reader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * This method would first try to find the document using teh existing reader. If it can't, it will refresh the reader | |
| * This method would first try to find the document using the existing reader. If it can't, it will refresh the reader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| * writer may cache the changes in NRT and the reader (created earlier) can't see them. Refreshing the reader from the | ||
| * writer can alleviate this. If the index can't find the document with the refresh reader, null is returned. | ||
| * Note that the refresh of the reader will do so at the {@link com.apple.foundationdb.record.lucene.directory.FDBDirectoryWrapper} | ||
| * and so has impact on the entire directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please consider rewording.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| if (newReader != null) { | ||
| // previous reader instantiated but then writer changed | ||
| readersToClose.add(writerReader); | ||
| writerReader = LazyCloseable.supply(() -> newReader); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this is touched concurrently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test to update the readers concurrently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still feel like there is a race condition, but it's possible that it doesn't work out in practice.
We start with writerReader0
- threadA calls
openIfChangedresulting innewReader1 - threadB changes the writer -- This may not be possible to do without also creating a new writerReader, given the way this method is used, your test, I believe does call
getWriterReaderto be called when it saves. - threadC calls
openIfChangedresulting innewReader2. - threadA adds
writerReader0toreadersToClose - threadC adds
writerReader0toreadersToClose - threadA sets
writerReadertowriterReader1and returns it - threadC sets
writerReadertowriterRearedr2and returns it
In this case, none of the code here will close writerReader1. So it might be worthwhile to change it so that we add the created reader to readersToClose, and we don't close writerReader directly.
i.e.
in the constructor call:
readersToClose(writerReader)
and do so again here, with a local:
newWriterReader = LazyCloseable.supply(() -> newReader);
readersToClose.add(newWriterReader);
writerReader = newWriterReader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do you need to set writerReader to volatile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think there is a possibility of a race condition. I believe you only need two threads making a change to the writer to expose this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I don't think it is necessary to add volatile to the writerReader field, as in each thread, if the delete failed and the thread has caused a new reader to be created, this new reader should have the changed document cached, and if it is not, it will not be available in other threads, so a cached value in a thread should be OK in that situation.
The issue at hand is that when running multiple update operations in a single transaction, the partition's document counts and the PK-segment index may get into an inconsistent state. The root cause is that the first update in the transaction clears the doc from the Lucene index and the PK index. Since the changes are not flushed, the IndexWriter has them cached in the NRT cache. The second record update would then not find the record in the PK index (because the segment has changed but the IndexReader does not yet reflect that) and therefore the delete is skipped, including updating the partition count. Note that it does attempt a delete-by-query that actually removes the doc from the Lucene index, but since we can't know that, the partition is not updated.
The solution is to
refreshtheDirectoryReaderwhen doing an update, so that any previously written changes are showing up. The refresh operation usesDirectoryReader.openIfChangedthat is more efficient in resources than using a brand newopencall.Resolve #3704