Add tpr stats output, and periodic warning when gen_tpr < 90% of max_tokens #14

michaeltremeer · 2023-12-21T08:50:30Z

Periodically (every 10s) alerts users when the average number of generated tokens is less than 90% of max_tokens. This is important if users are using the tool to estimate total requests per second based on their real-world context/response sizes, but the models are returning considerably shorter responses in testing (due to different/automated prompts).

…tokens

technicianted · 2024-01-02T18:40:04Z

I think a more suitable approach would be to add another metric that measures actual generated tokens such that, with each report, user can see "attempted" and "actual". Then they can decide about the relevance of the overall test results based on their use-cases.

technicianted

Need a change of approach to a metrics output one.

michaeltremeer · 2024-01-03T03:34:53Z

Need a change of approach to a metrics output one.

Those metrics should be part of the stats output, let me know what you think of those (I added 10th//90th/avg), happy to remove some or rename them. I do think a warning is useful though (it's easy to ignore if you're just running a single test and haven't dug into the tool). Maybe a single one after the test is finished/terminated?

technicianted · 2024-01-05T21:11:45Z

Need a change of approach to a metrics output one.

Those metrics should be part of the stats output, let me know what you think of those (I added 10th//90th/avg), happy to remove some or rename them. I do think a warning is useful though (it's easy to ignore if you're just running a single test and haven't dug into the tool). Maybe a single one after the test is finished/terminated?

I think p10, p90 and avg should be fine. I'm wodndering though if it is better to make an aggregate in the sliding window rather than per request? this way it would match, and be comparable to, existing gen_tpm?

michaeltremeer · 2024-01-09T00:13:28Z

Need a change of approach to a metrics output one.

Those metrics should be part of the stats output, let me know what you think of those (I added 10th//90th/avg), happy to remove some or rename them. I do think a warning is useful though (it's easy to ignore if you're just running a single test and haven't dug into the tool). Maybe a single one after the test is finished/terminated?

I think p10, p90 and avg should be fine. I'm wodndering though if it is better to make an aggregate in the sliding window rather than per request? this way it would match, and be comparable to, existing gen_tpm?

I'm not sure I understand - can you explain what you mean?

technicianted · 2024-01-10T21:31:13Z

Need a change of approach to a metrics output one.

Those metrics should be part of the stats output, let me know what you think of those (I added 10th//90th/avg), happy to remove some or rename them. I do think a warning is useful though (it's easy to ignore if you're just running a single test and haven't dug into the tool). Maybe a single one after the test is finished/terminated?

I think p10, p90 and avg should be fine. I'm wodndering though if it is better to make an aggregate in the sliding window rather than per request? this way it would match, and be comparable to, existing gen_tpm?

I'm not sure I understand - can you explain what you mean?

I'm suggesting we add 3 new aggregate metrics in the sliding window that represent p10, p90 and avg actual tokens generated instead of your proposal. This way users will have comparable metrics to theoritical gen_tpm based on max_tokens.

yshahin · 2024-02-21T23:15:21Z

Need a change of approach to a metrics output one.

Those metrics should be part of the stats output, let me know what you think of those (I added 10th//90th/avg), happy to remove some or rename them. I do think a warning is useful though (it's easy to ignore if you're just running a single test and haven't dug into the tool). Maybe a single one after the test is finished/terminated?

I think p10, p90 and avg should be fine. I'm wodndering though if it is better to make an aggregate in the sliding window rather than per request? this way it would match, and be comparable to, existing gen_tpm?

I'm not sure I understand - can you explain what you mean?

I'm suggesting we add 3 new aggregate metrics in the sliding window that represent p10, p90 and avg actual tokens generated instead of your proposal. This way users will have comparable metrics to theoritical gen_tpm based on max_tokens.

Practically speaking when I run this with the existing contexts, I did not see the number change. It is almost always a const value.
Does it make sense to show the same number 3 times? I would think avg would be sufficient maybe a 95th but I dont see the benefit.

yshahin

Rebased and created a new PR here
#50

michaeltremeer added 4 commits December 21, 2023 18:26

Add tpr stats output, and periodic warning when gen_tpr < 90% of max_…

274c785

…tokens

Add note to README

4cb532f

README clarity

576424d

Switch warning to lower case

e222b3e

technicianted suggested changes Jan 2, 2024

View reviewed changes

michaeltremeer added 2 commits January 9, 2024 10:14

Add context_tpr_avg, clarify warning text

5ee404f

Merge branch 'main' into add_tokens_per_request

3c84deb

yshahin reviewed Mar 1, 2024

View reviewed changes

yshahin closed this Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add tpr stats output, and periodic warning when gen_tpr < 90% of max_tokens #14

Add tpr stats output, and periodic warning when gen_tpr < 90% of max_tokens #14

Uh oh!

michaeltremeer commented Dec 21, 2023

Uh oh!

technicianted commented Jan 2, 2024

Uh oh!

technicianted left a comment

Uh oh!

michaeltremeer commented Jan 3, 2024 •

edited

Loading

Uh oh!

technicianted commented Jan 5, 2024

Uh oh!

michaeltremeer commented Jan 9, 2024

Uh oh!

technicianted commented Jan 10, 2024

Uh oh!

yshahin commented Feb 21, 2024

Uh oh!

yshahin left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add tpr stats output, and periodic warning when gen_tpr < 90% of max_tokens #14

Add tpr stats output, and periodic warning when gen_tpr < 90% of max_tokens #14

Uh oh!

Conversation

michaeltremeer commented Dec 21, 2023

Uh oh!

technicianted commented Jan 2, 2024

Uh oh!

technicianted left a comment

Choose a reason for hiding this comment

Uh oh!

michaeltremeer commented Jan 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

technicianted commented Jan 5, 2024

Uh oh!

michaeltremeer commented Jan 9, 2024

Uh oh!

technicianted commented Jan 10, 2024

Uh oh!

yshahin commented Feb 21, 2024

Uh oh!

yshahin left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

michaeltremeer commented Jan 3, 2024 •

edited

Loading