Skip to content

HADOOP-19902. [ABFS] Fix small write hflush followed by close#8513

Open
sunchao wants to merge 1 commit into
apache:trunkfrom
sunchao:dev/chao/codex/abfs-hflush-close
Open

HADOOP-19902. [ABFS] Fix small write hflush followed by close#8513
sunchao wants to merge 1 commit into
apache:trunkfrom
sunchao:dev/chao/codex/abfs-hflush-close

Conversation

@sunchao
Copy link
Copy Markdown
Member

@sunchao sunchao commented May 24, 2026

Why are the changes needed?

With fs.azure.write.enableappendwithflush=true, ABFS's small-write optimized hflush() submits and consumes the active DataBlock but leaves it recorded as the active writable block. A later close() retries that already consumed block and fails with IllegalStateException: Expected stream state Writing -but actual state is Closed.

JIRA: HADOOP-19902

What changes were proposed in this PR?

Clear the active block after the optimized append-with-flush path submits it for upload, matching the lifecycle already used by uploadCurrentBlock().

Add a regression test with small-write optimization enabled that executes write(), hflush(), and close(), then asserts the payload is appended exactly once using FLUSH_MODE.

Contains content generated by Codex.

How was this PR tested?

  • Unit test: ./mvnw -pl hadoop-tools/hadoop-azure -am -DskipITs -Dtest=org.apache.hadoop.fs.azurebfs.services.TestAbfsOutputStream#verifySmallWriteOptimizedHFlushFollowedByClose -DfailIfNoTests=false test

AI Tooling

  • The PR includes the phrase Contains content generated by Codex.
  • My use of AI tooling follows ASF legal policy.

@hadoop-yetus
Copy link
Copy Markdown

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 12m 36s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 41m 14s trunk passed
+1 💚 compile 1m 8s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 compile 1m 7s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 checkstyle 1m 1s trunk passed
+1 💚 mvnsite 1m 11s trunk passed
+1 💚 javadoc 1m 5s trunk passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 1m 2s trunk passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 1m 38s trunk passed
+1 💚 shadedclient 29m 21s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 34s the patch passed
+1 💚 compile 0m 34s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javac 0m 34s the patch passed
+1 💚 compile 0m 34s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 javac 0m 34s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 24s the patch passed
+1 💚 mvnsite 0m 38s the patch passed
+1 💚 javadoc 0m 30s the patch passed with JDK Ubuntu-21.0.10+7-Ubuntu-124.04
+1 💚 javadoc 0m 30s the patch passed with JDK Ubuntu-17.0.18+8-Ubuntu-124.04.1
+1 💚 spotbugs 1m 15s the patch passed
+1 💚 shadedclient 28m 20s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 12s hadoop-azure in the patch passed.
+1 💚 asflicense 0m 36s The patch does not generate ASF License warnings.
129m 19s
Subsystem Report/Notes
Docker ClientAPI=1.54 ServerAPI=1.54 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8513/1/artifact/out/Dockerfile
GITHUB PR #8513
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 0fa36340feb7 5.15.0-173-generic #183-Ubuntu SMP Fri Mar 6 13:29:34 UTC 2026 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 2a22439
Default Java Ubuntu-17.0.18+8-Ubuntu-124.04.1
Multi-JDK versions /usr/lib/jvm/java-21-openjdk-amd64:Ubuntu-21.0.10+7-Ubuntu-124.04 /usr/lib/jvm/java-17-openjdk-amd64:Ubuntu-17.0.18+8-Ubuntu-124.04.1
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8513/1/testReport/
Max. process+thread count 610 (vs. ulimit of 10000)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-8513/1/console
versions git=2.43.0 maven=3.9.15 spotbugs=4.9.7
Powered by Apache Yetus 0.14.1 https://yetus.apache.org

This message was automatically generated.

@sunchao sunchao marked this pull request as ready for review May 24, 2026 17:54
@sunchao
Copy link
Copy Markdown
Member Author

sunchao commented May 24, 2026

cc @anmolanmol1234 @anujmodi2021 @steveloughran, could you review this?

It addresses a failure in the small-write append-with-flush path: after hflush() consumes the active block, a subsequent close() can attempt to reuse that already closed block. The PR adds a focused write -> hflush -> close regression test.

@sunchao
Copy link
Copy Markdown
Member Author

sunchao commented May 24, 2026

we encountered this while using Parquet 1.15.2 without this PR apache/parquet-java#3204

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants