Thanks to all for the quick turnaround resolving #301. Unfortunately we've hit a much deeper snag performing an upgrade starting with 0.11.0. Nearly as soon as we re-open in read mode an Array we had previously been writing to we get native code errors or what appears to be a deadlock. This only happens after writing many, many overlapping chunks. This does not happen with 0.10.1. With our production code this is the behaviour: TileDB-Java 0.11.0 (TileDB 2.9.0) ``` [2023-07-25 17:49:14.391] [Process: 2565256] [error] [Global] [TileDB::FragmentMetadata] Error: Trying to access metadata that's not loaded Caused by: io.tiledb.java.api.TileDBError: [TileDB::FragmentMetadata] Error: Trying to access metadata that's not loaded at io.tiledb.java.api.ContextCallback.call(ContextCallback.java:56) at io.tiledb.java.api.Context.handleError(Context.java:142) at io.tiledb.java.api.Query.submit(Query.java:130) ... ``` TileDB-Java 0.13.0 (TileDB 2.11.0) ``` [2023-07-25 13:57:01.358] [Process: 889580] [error] [Global] [TileDB::Task] Error: Caught std::exception: device or resource busy: device or resource busy [2023-07-25 13:57:03.753] [Process: 889580] [error] [Global] [TileDB::Task] Error: Caught std::exception: device or resource busy: device or resource busy [2023-07-25 13:57:08.940] [Process: 889580] [error] [Global] [TileDB::Task] Error: Caught std::exception: device or resource busy: device or resource busy [2023-07-25 13:57:08.942] [Process: 889580] [error] [Global] [TileDB::Task] Error: Caught std::exception: device or resource busy: device or resource busy [2023-07-25 13:57:08.943] [Process: 889580] [error] [Global] [TileDB::FragmentMetadata] Error: Trying to access metadata that's not loaded [2023-07-25 13:57:09.289] [Process: 889580] [error] [Global] [TileDB::Task] Error: Caught std::exception: device or resource busy: device or resource busy [2023-07-25 13:57:11.144] [Process: 889580] [error] [Global] [TileDB::Task] Error: Caught std::exception: device or resource busy: device or resource busy ... Caused by: io.tiledb.java.api.TileDBError: [TileDB::FragmentMetadata] Error: Trying to access metadata that's not loaded at io.tiledb.java.api.ContextCallback.call(ContextCallback.java:56) at io.tiledb.java.api.Context.handleError(Context.java:142) at io.tiledb.java.api.Query.submit(Query.java:130) ... ``` TileDB-Java 0.14.1 (TileDB 2.12) ``` [2023-07-25 13:43:49.710] [Process: 1772240] [error] [1690288813910863900-Global] [TileDB::Task] Error: Caught std::exception: device or resource busy: device or resource busy [2023-07-25 13:43:49.712] [Process: 1772240] [error] [1690288813910863900-Global] Error: Internal TileDB uncaught exception; device or resource busy: device or resource busy [2023-07-25 13:43:50.810] [Process: 1772240] [error] [1690288813910863900-Global] [TileDB::Task] Error: Caught std::exception: device or resource busy: device or resource busy [2023-07-25 13:43:50.811] [Process: 1772240] [error] [1690288813910863900-Global] Error: Internal TileDB uncaught exception; device or resource busy: device or resource busy ... Caused by: io.tiledb.java.api.TileDBError: Error: Internal TileDB uncaught exception; device or resource busy: device or resource busy at io.tiledb.java.api.ContextCallback.call(ContextCallback.java:56) at io.tiledb.java.api.Context.handleError(Context.java:142) at io.tiledb.java.api.Query.submit(Query.java:130) ... ``` TileDB-Java 0.15.2 (TileDB 2.13.2) Hang or deadlock. Worker stack traces (collected via jstack) are: ``` "pool-1-thread-1" #23 prio=5 os_prio=0 cpu=89546.88ms elapsed=340.37s tid=0x0000020972a92800 nid=0x3d32c runnable [0x000000be392fe000] java.lang.Thread.State: RUNNABLE at io.tiledb.libtiledb.tiledbJNI.tiledb_query_submit(Native Method) at io.tiledb.libtiledb.tiledb.tiledb_query_submit(tiledb.java:2853) at io.tiledb.java.api.Query.submit(Query.java:130) ... ``` TileDB-Java 0.16.1 (TileDB 2.14.1) Works for a while, dies later. ``` [2023-07-25 14:20:19.628] [Process: 1313796] [error] [1690290969279989200-Global] [TileDB::Task] Error: Caught std::exception: device or resource busy: device or resource busy [2023-07-25 14:20:19.632] [Process: 1313796] [error] [1690290969279989200-Global] C API: TileDB Internal, std::exception; device or resource busy: device or resource busy [2023-07-25 14:20:20.317] [Process: 1313796] [error] [1690290969279989200-Global] [TileDB::Task] Error: Caught std::exception: device or resource busy: device or resource busy [2023-07-25 14:20:20.317] [Process: 1313796] [error] [1690290969279989200-Global] C API: TileDB Internal, std::exception; device or resource busy: device or resource busy ... Caused by: io.tiledb.java.api.TileDBError: C API: TileDB Internal, std::exception; device or resource busy: device or resource busy at io.tiledb.java.api.ContextCallback.call(ContextCallback.java:56) at io.tiledb.java.api.Context.handleError(Context.java:144) at io.tiledb.java.api.Query.submit(Query.java:130) ... ``` TileDB-Java 0.17.8 (TileDB 2.15.4) ``` [2023-07-25 16:45:09.726] [Process: 452168] [error] [1690299697002774900-Global] [TileDB::Task] Error: Caught std::exception: device or resource busy: device or resource busy [2023-07-25 16:45:09.729] [Process: 452168] [error] [1690299697002774900-Global] C API: TileDB Internal, std::exception; device or resource busy: device or resource busy [2023-07-25 16:45:09.921] [Process: 452168] [error] [1690299697002774900-Global] [TileDB::Task] Error: Caught std::exception: device or resource busy: device or resource busy [2023-07-25 16:45:09.922] [Process: 452168] [error] [1690299697002774900-Global] C API: TileDB Internal, std::exception; device or resource busy: device or resource busy [2023-07-25 16:45:10.401] [Process: 452168] [error] [1690299697002774900-Global] [TileDB::Task] Error: Caught std::exception: device or resource busy: device or resource busy [2023-07-25 16:45:10.402] [Process: 452168] [error] [1690299697002774900-Global] C API: TileDB Internal, std::exception; device or resource busy: device or resource busy [2023-07-25 16:45:10.617] [Process: 452168] [error] [1690299697002774900-Global] [TileDB::Task] Error: Caught std::exception: device or resource busy: device or resource busy [2023-07-25 16:45:10.617] [Process: 452168] [error] [1690299697002774900-Global] C API: TileDB Internal, std::exception; device or resource busy: device or resource busy [2023-07-25 16:45:10.762] [Process: 452168] [error] [1690299697002774900-Global] [TileDB::Task] Error: Caught std::exception: device or resource busy: device or resource busy [2023-07-25 16:45:10.763] [Process: 452168] [error] [1690299697002774900-Global] C API: TileDB Internal, std::exception; device or resource busy: device or resource busy [2023-07-25 16:45:11.033] [Process: 452168] [error] [1690299697002774900-Global] [TileDB::Task] Error: Caught std::exception: device or resource busy: device or resource busy [2023-07-25 16:45:11.034] [Process: 452168] [error] [1690299697002774900-Global] C API: TileDB Internal, std::exception; device or resource busy: device or resource busy [2023-07-25 16:45:11.776] [Process: 452168] [error] [1690299697002774900-Global] [TileDB::Task] Error: Caught std::exception: device or resource busy: device or resource busy [2023-07-25 16:45:11.777] [Process: 452168] [error] [1690299697002774900-Global] C API: TileDB Internal, std::exception; device or resource busy: device or resource busy ... Caused by: io.tiledb.java.api.TileDBError: C API: TileDB Internal, std::exception; device or resource busy: device or resource busy at io.tiledb.java.api.ContextCallback.call(ContextCallback.java:56) at io.tiledb.java.api.Context.handleError(Context.java:144) at io.tiledb.java.api.Query.submit(Query.java:130) ... ``` I've put together a limited example which reproduces this: * https://github.com/chris-allan/tiledb-java-torture It fails like this: ``` ... Inserting rectangle: [11078, 21057] Inserting rectangle: [12093, 21090] Not consolidating H:\code\tiledb-java-torture\tiledb_14210934035379110573\0 Creating TileDB array: H:\code\tiledb-java-torture\tiledb_14210934035379110573\1 [2023-07-26 13:28:54.114] [Process: 2200528] [error] [1690373955299503400-Global] [TileDB::Task] Error: Caught std::exception: device or resource busy: device or resource busy [2023-07-26 13:28:54.114] [Process: 2200528] [error] [1690373955299503400-Global] C API: TileDB Internal, std::exception; device or resource busy: device or resource busy Exception during execution java.util.concurrent.CompletionException: io.tiledb.java.api.TileDBError: C API: TileDB Internal, std::exception; device or resource busy: device or resource busy at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346) at java.base/java.util.concurrent.CompletableFuture$BiRelay.tryFire(CompletableFuture.java:1423) at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073) at com.glencoesoftware.tiledb.Main.lambda$2(Main.java:404) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: io.tiledb.java.api.TileDBError: C API: TileDB Internal, std::exception; device or resource busy: device or resource busy at io.tiledb.java.api.ContextCallback.call(ContextCallback.java:56) at io.tiledb.java.api.Context.handleError(Context.java:144) at io.tiledb.java.api.Query.submit(Query.java:130) at com.glencoesoftware.tiledb.Main.processTile(Main.java:359) at com.glencoesoftware.tiledb.Main.lambda$2(Main.java:402) ... 3 more ``` That above output snippet from Windows 10. Linux behaves similarly but not identically. The code reflects the pattern from our production code that relies on TileDB fairly well: 1. Process a large number of tiles writing them in a non-adjacent fashion from multiple workers to a 5-dimensional TileDB Array 2. Downsample from the Array [1] and write to new 5-dimensional Array for each new "resolution" 20 channels is about right to produce the errors; ~11000 fragments. If less data is processed, things proceed as normal. The issue occurs with or without consolidation.