Skip to content

Conversation

@DmitryNekrasov
Copy link
Contributor

  • Replace direct list concatenation in ParserStructure.append() with ConcatenatedListView for improved efficiency.
  • Add ConcatenatedListView implementation to lazily combine two lists without creating a new collection.

- Replace direct list concatenation in `ParserStructure.append()` with `ConcatenatedListView` for improved efficiency.
- Add `ConcatenatedListView` implementation to lazily combine two lists without creating a new collection.
@DmitryNekrasov DmitryNekrasov self-assigned this Nov 3, 2025
@DmitryNekrasov
Copy link
Contributor Author

@dkhalanskyjb Hello! What do you think about this idea?

@DmitryNekrasov DmitryNekrasov marked this pull request as draft November 3, 2025 14:33
@dkhalanskyjb
Copy link
Collaborator

Hi! This may help with the first stage (building a parser before the normalisation), but normalisation has quadratic complexity, too, and it wouldn't benefit from the proposed approach, as the lists themselves will also need to be reconstructed.

We could extract the happy fast path where there are no adjacent numeric parser operations and simplify normalisation there. That is a common case, so it would be nice to provide brilliant performance there. I'm not yet convinced the common case can't be drastically improved by a better algorithm.

@dkhalanskyjb
Copy link
Collaborator

Now that I think about it, it doesn't even fix the quadratic complexity of the initial stage. Concatenating n parsers will give us a binary tree with n leaves. We will need to traverse the tree at least once, and even a single enumeration of all these elements will have quadratic complexity: the depth n to access the first parser, n - 1 to access the second one, and so on. The construction of the new list does indeed become O(n), but then, each traversal is O(n^2).

Most parsers we concatenate are going to be single-element, so n parsers basically means n operations.

@dkhalanskyjb
Copy link
Collaborator

Yep, the initial stage also doesn't benefit from this. A quick run of a benchmark shows this:

Before the change:

Benchmark                        Mode  Cnt  Score   Error  Units
FormattingBenchmark.buildFormat  avgt   25  5.205 ± 0.090  us/op

After the change:

Benchmark                        Mode  Cnt  Score   Error  Units
FormattingBenchmark.buildFormat  avgt   25  7.830 ± 0.160  us/op

Here, less is better (the numbers 5.2 and 7.8 show how long in milliseconds an operation takes).

The benchmark itself is creating the datetime format used in Python:

     @Benchmark
     fun buildFormat(blackhole: Blackhole) {
         val v = LocalDateTime.Format {
             year()
             char('-')
             monthNumber()
             char('-')
             day()
             char(' ')
             hour()
             char(':')
             minute()
             optional {
                 char(':')
                 second()
                 optional {
                     char('.')
                     secondFraction()
                 }
             }
         }
         blackhole.consume(v)
     }

- Introduce `ConcatenatedListViewIterator` to enable iteration without materializing combined lists.
- Optimize nested list handling by directly traversing inner `ConcatenatedListView` instances.
@DmitryNekrasov
Copy link
Contributor Author

I have avoided $$O(n^2)$$ time complexity for the creation of a naive serial parser:

open class ParserStructureConcatBenchmark {

    @Param("1", "2", "4", "8", "16", "32", "64", "128", "256", "512", "1024")
    var n = 0

    @Benchmark
    fun largeSerialFormat(blackhole: Blackhole) {
        val format = LocalDateTime.Format {
            repeat(n) {
                char('^')
                monthNumber()
                char('&')
                day()
                char('!')
                hour()
                char('$')
                minute()
                char('#')
                second()
                char('@')
            }
        }
        blackhole.consume(format)
    }
}

ParserStructure(operations + other.operations, other.followedBy)

Benchmark                                          (n)  Mode  Cnt         Score         Error  Units
ParserStructureConcatBenchmark.largeSerialFormat     1  avgt    5      1405.703 ±       7.771  ns/op
ParserStructureConcatBenchmark.largeSerialFormat     2  avgt    5      2661.292 ±      74.637  ns/op
ParserStructureConcatBenchmark.largeSerialFormat     4  avgt    5      5079.791 ±      14.050  ns/op
ParserStructureConcatBenchmark.largeSerialFormat     8  avgt    5     10676.345 ±      87.379  ns/op
ParserStructureConcatBenchmark.largeSerialFormat    16  avgt    5     22338.753 ±     325.866  ns/op
ParserStructureConcatBenchmark.largeSerialFormat    32  avgt    5     55037.234 ±     131.355  ns/op
ParserStructureConcatBenchmark.largeSerialFormat    64  avgt    5    136245.744 ±    4255.752  ns/op
ParserStructureConcatBenchmark.largeSerialFormat   128  avgt    5    402370.485 ±    4214.158  ns/op
ParserStructureConcatBenchmark.largeSerialFormat   256  avgt    5   1525375.832 ±   52333.725  ns/op
ParserStructureConcatBenchmark.largeSerialFormat   512  avgt    5   6293168.210 ±  158860.438  ns/op
ParserStructureConcatBenchmark.largeSerialFormat  1024  avgt    5  23487634.639 ± 1012301.386  ns/op

ParserStructure(ConcatenatedListView(operations, other.operations), other.followedBy)

Benchmark                                          (n)  Mode  Cnt        Score       Error  Units
ParserStructureConcatBenchmark.largeSerialFormat     1  avgt    5     1033.226 ±    15.759  ns/op
ParserStructureConcatBenchmark.largeSerialFormat     2  avgt    5     1971.151 ±     6.071  ns/op
ParserStructureConcatBenchmark.largeSerialFormat     4  avgt    5     3929.353 ±    19.891  ns/op
ParserStructureConcatBenchmark.largeSerialFormat     8  avgt    5     8286.685 ±    57.521  ns/op
ParserStructureConcatBenchmark.largeSerialFormat    16  avgt    5    16827.044 ±   120.837  ns/op
ParserStructureConcatBenchmark.largeSerialFormat    32  avgt    5    32460.756 ±  1708.745  ns/op
ParserStructureConcatBenchmark.largeSerialFormat    64  avgt    5    62202.661 ±   963.763  ns/op
ParserStructureConcatBenchmark.largeSerialFormat   128  avgt    5   120877.958 ±  1474.372  ns/op
ParserStructureConcatBenchmark.largeSerialFormat   256  avgt    5   245431.433 ±  4975.299  ns/op
ParserStructureConcatBenchmark.largeSerialFormat   512  avgt    5   483199.434 ±  2113.089  ns/op
ParserStructureConcatBenchmark.largeSerialFormat  1024  avgt    5  1038053.801 ± 28129.845  ns/op

@DmitryNekrasov
Copy link
Contributor Author

formatCreationWithAlternativeParsing (before caching)

Benchmark                             (n)  Mode  Cnt          Score          Error  Units
formatCreationWithAlternativeParsing    2  avgt    5       8810.577 ±       62.323  ns/op
formatCreationWithAlternativeParsing    3  avgt    5      24660.018 ±      194.929  ns/op
formatCreationWithAlternativeParsing    4  avgt    5      70321.838 ±     1128.198  ns/op
formatCreationWithAlternativeParsing    5  avgt    5     204549.353 ±     1720.009  ns/op
formatCreationWithAlternativeParsing    6  avgt    5     604779.515 ±     3666.215  ns/op
formatCreationWithAlternativeParsing    7  avgt    5    1830192.394 ±    18695.866  ns/op
formatCreationWithAlternativeParsing    8  avgt    5    5449726.801 ±    28945.062  ns/op
formatCreationWithAlternativeParsing    9  avgt    5   16326281.316 ±   153527.963  ns/op
formatCreationWithAlternativeParsing   10  avgt    5   49075976.571 ±  1297210.911  ns/op
formatCreationWithAlternativeParsing   11  avgt    5  148064885.743 ± 12308758.456  ns/op
formatCreationWithAlternativeParsing   12  avgt    5  454970041.600 ± 70482925.523  ns/op

formatCreationWithAlternativeParsing (after caching)

Benchmark                             (n)  Mode  Cnt      Score      Error  Units
formatCreationWithAlternativeParsing    2  avgt    5   7891.114 ±   13.541  ns/op
formatCreationWithAlternativeParsing    3  avgt    5  13171.110 ±   78.865  ns/op
formatCreationWithAlternativeParsing    4  avgt    5  18710.440 ±  991.032  ns/op
formatCreationWithAlternativeParsing    5  avgt    5  23920.832 ±  639.878  ns/op
formatCreationWithAlternativeParsing    6  avgt    5  28424.127 ±   88.730  ns/op
formatCreationWithAlternativeParsing    7  avgt    5  33483.529 ± 2935.929  ns/op
formatCreationWithAlternativeParsing    8  avgt    5  39414.801 ± 2181.813  ns/op
formatCreationWithAlternativeParsing    9  avgt    5  43943.165 ±  261.219  ns/op
formatCreationWithAlternativeParsing   10  avgt    5  48417.492 ± 2198.409  ns/op
formatCreationWithAlternativeParsing   11  avgt    5  53704.932 ±  844.236  ns/op
formatCreationWithAlternativeParsing   12  avgt    5  59041.803 ±  546.692  ns/op

formatCreationWithNestedAlternativeParsing (before caching)

Benchmark                                   (n)  Mode  Cnt            Score           Error  Units
formatCreationWithNestedAlternativeParsing    2  avgt    5       519390.642 ±      2809.698  ns/op
formatCreationWithNestedAlternativeParsing    3  avgt    5     27569772.303 ±    388683.519  ns/op
formatCreationWithNestedAlternativeParsing    4  avgt    5    110676487.984 ±   5760474.687  ns/op
formatCreationWithNestedAlternativeParsing    5  avgt    5  13426672641.800 ± 841426222.880  ns/op

formatCreationWithNestedAlternativeParsing (after caching)

Benchmark                                   (n)  Mode  Cnt       Score       Error  Units
formatCreationWithNestedAlternativeParsing    2  avgt    5   41034.554 ± 10912.763  ns/op
formatCreationWithNestedAlternativeParsing    3  avgt    5   65044.120 ±  7292.569  ns/op
formatCreationWithNestedAlternativeParsing    4  avgt    5   73721.035 ±  7653.916  ns/op
formatCreationWithNestedAlternativeParsing    5  avgt    5  100544.511 ±   400.049  ns/op
formatCreationWithNestedAlternativeParsing    6  avgt    5  118522.459 ± 30279.436  ns/op
formatCreationWithNestedAlternativeParsing    7  avgt    5  139539.603 ± 17732.571  ns/op
formatCreationWithNestedAlternativeParsing    8  avgt    5  143597.460 ±  1106.854  ns/op
formatCreationWithNestedAlternativeParsing    9  avgt    5  173742.102 ±   889.090  ns/op
formatCreationWithNestedAlternativeParsing   10  avgt    5  195273.334 ± 55181.847  ns/op
formatCreationWithNestedAlternativeParsing   11  avgt    5  211645.084 ±   553.810  ns/op
formatCreationWithNestedAlternativeParsing   12  avgt    5  216165.437 ±   284.256  ns/op

@DmitryNekrasov DmitryNekrasov marked this pull request as ready for review November 10, 2025 16:26
@dkhalanskyjb
Copy link
Collaborator

It's cool that the quadratic complexity can be mitigated in some scenarios, and I still believe that it's currently contributing significantly to the runtime we essentially observe.

That said, the PR as a whole looks very much like a Pyrrhic victory to me, as the buildPythonDateTimeFormat is consistently a bit slower with the proposed changes than it was without it on my machine:

Before:
Benchmark                                                Mode  Cnt     Score     Error  Units
PythonDateTimeFormatBenchmark.buildPythonDateTimeFormat  avgt   20  5517.285 ± 155.130  ns/op

After:
Benchmark                                                Mode  Cnt     Score     Error  Units
PythonDateTimeFormatBenchmark.buildPythonDateTimeFormat  avgt   20  5703.280 ± 105.218  ns/op

(Less is better here)

Performance improvements are pointless if they negatively impact the actual use cases our users will encounter. Yes, the quadratic complexity was eliminated, but the increase in the resulting constant just seems too high.

I propose that we add all the common formats we already provide as benchmarks (LocalTime.Formats.ISO, UtcOffset.Formats.FOUR_DIGITS, DateTimeComponents.Formats.RFC_1123, etc.) and rely on them to check for the improvements. These formats are very representative of those that people are actually likely to write.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants