Skip to content

Conversation

@christolis
Copy link
Member

@christolis christolis commented Jan 31, 2026

What

When some user calls the command /xkcd relevant [n], our integrated ChatGPT reads the last $n$ messages and tries to post a relevant XKCD comic depending on the dialogue that's being held as of that particular moment the command was executed.

If the user manages to supply no $n$, then a default of $n = 100$ is selected.

For the times when it's needed, /xkcd custom <id>, (where $id$ stands for the XKCD comic number) the bot will send that XKCD comic in the chat.

It's as simple as that.

How

Every time the bot launches, if there's no xkcd.generated.json file found and there's no vector store uploaded on OpenAI of the XKCD comics, the bot spends some time downloading all of the XKCD comics available and creates that file. It then attempts to upload it as a vector store.

If the file is not found but the vector store is uploaded on OpenAI, the bot will still fetch for the file because it needs it for retrieving individual XKCDs and because we should not bother the XKCD API every time. And it's a 3MB file anyways.

Why

It's a fun feature, people can get a laugh out of it. It's harmless, and a fun use of ChatGPT. I don't see why it shouldn't be included.

Preview

image

Use RAG with ChatGPT to store all XKCD comics and display the correct one
based on the chat history.

There is also the possibility to specify your own XKCD if you wish.

Signed-off-by: Chris Sdogkos <work@chris-sdogkos.com>
@christolis christolis added enhancement New feature or request new command Add a new command or group of commands to the bot java Pull requests that update java code labels Jan 31, 2026
Copy link
Member

@Zabuzard Zabuzard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the chatgptservice class should not deal with xkcd stuff. refactor the code so that anything xkcd is done in the xkcd classes instead. the responsibilities should be correct.

why does it need to do file io? i would like to avoid that if possible.

@christolis
Copy link
Member Author

the chatgptservice class should not deal with xkcd stuff

@Zabuzard I agree with you on that. The createOrGetXkcdVectorStore method you saw introduced should really have been named createOrGetVectorStore as it generally does what it describes for any situation, it's just that the XKCD feature happens to be using it exclusively. Remember that as of writing this, it's still a draft pull request for a good reason. The code does what it's supposed to do, but I am not 100% satisfied with how it looks like, and further refactors are on the way. 👍

why does it need to do file io? i would like to avoid that if possible.

Could you clarify what you mean by that? Are you trying to avoid storing files completely? Or just using file.io in particular?

Signed-off-by: Chris Sdogkos <work@chris-sdogkos.com>
Signed-off-by: Chris Sdogkos <work@chris-sdogkos.com>
    - ChatGptService: Refrain from polluting it with XKCD related calls,

    - ChatGptService: provide JavaDocs to the methods that don't have one,

    - ChatGptService: remove unused sendWebPrompt method

    - XkcdCommand and XkcdRetriever: Refactor code into functions for
      readability.

Signed-off-by: Chris Sdogkos <work@chris-sdogkos.com>
Signed-off-by: Chris Sdogkos <work@chris-sdogkos.com>
@christolis
Copy link
Member Author

@Zabuzard ChatGptService is now handling what it's responsible for and file.io is not used anymore. PTAL when you have time :)

@christolis christolis marked this pull request as ready for review February 1, 2026 20:21
@christolis christolis requested a review from a team as a code owner February 1, 2026 20:21
private static String getChatgptRelevantPrompt(String discordChat) {
return """
<discord-chat>
%s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single discord message could be 2000 characters. This will blow the context way out. Please handle this edgecase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will blow the context way out.

At what limit would the context be blown way out? 2000 characters is a limit imposed by Discord for chat messages, but I suspect that it's different when using OpenAI.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is let's say chitchat had 5 messages and all 5 where 2000 characters long. You're sending 10K characters to the ChatGPI API. Since there's no validation happening, 5 could end up being 2000 * MAXIMUM_MESSAGE_HISTORY and since that's set to 100, 200000 characters are added to your AI prompt.

This has 2 issues:

  1. The ChatGPT API won't accept this and will error out
  2. It could blow up billing

I understand this is an unlikely scenario but in general, your code needs absolute certainty.

Signed-off-by: Chris Sdogkos <work@chris-sdogkos.com>
@Zabuzard
Copy link
Member

Zabuzard commented Feb 1, 2026

could u explain quickly why it does need to do file stuff in general? id like to avoid reading/writing files if possible.

config, database and in-memory cache should ideally be enough for everything we do.

this would be a first-time for this bot, making the setup more complex overall. so id like to avoid it but first id like to understand why we need files for this feature, cheers

@christolis
Copy link
Member Author

could u explain quickly why it does need to do file stuff in general?

@Zabuzard We need to be able to locally reference information about XKCD comics in order to not oversaturate the XKCD API endpoint.

id like to avoid reading/writing files if possible.

One way we can avoid it would be to use the SQLite database we have and store all of the XKCD comics there. Would that be a solution that could work?

@Zabuzard
Copy link
Member

Zabuzard commented Feb 1, 2026

Why do we need to store the comics instead of just posting a URL or downloading the content from the URL adhoc if really needed?

Like, pick the comic u want and then post the URL to it in the embedded / download the content from the URL, attach it to the embedded and that's it's.

I don't see why we need to store all comics on our side.

@christolis
Copy link
Member Author

Why do we need to store the comics instead of just posting a URL or downloading the content from the URL adhoc if really needed?

Sure, for just displaying a comic in an embed, it's easy to just put the URL directly from the XKCD API, but OpenAI needs to have a vector store in order to be aware of all the posts so that it knows which one is more relevant when it gets asked. That's the reason this file is made in the first place, because it's uploaded as a vector store for RAG purposes.

@Together-Java Together-Java deleted a comment from surajkumar Feb 1, 2026
@christolis
Copy link
Member Author

(Comment deleted due to duplicate)

Comment on lines +167 to +171
* Creates a new vector store with the given file ID if none exists or returns the ID of the
* existing vector store with that name.
* <p>
* You can use this for RAG purposes, it is an effective way to give ChatGPT extra information
* from what it has been trained.
Copy link
Member

@Zabuzard Zabuzard Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is RAG, what is a vector store? these terms arent generally known to people. maybe drop a sentence or two somewhere to elaborate

* Posts are cached locally in {@value #SAVED_XKCD_PATH} as JSON and uploaded to OpenAI using the
* provided {@link ChatGptService} if not already present.
*/
public class XkcdRetriever {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a "retriever" if it's uploading and saving to disk...

private final ChatGptService chatGptService;
private String xkcdUploadedFileId;

public XkcdRetriever(ChatGptService chatGptService) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructor is doing more than just constructing. I feel like there's too much business logic in here and you should split it out.

A constructor should not be interacting with ChatGpt or uploading files. Should just construct.

return xkcdPosts;
}

private void fetchAllXkcdPosts(Path savedXckdsPath) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about this function and honestly, it's so complicated for no reason.

You’ve got three different concurrency controls layered on top of each other:

  1. newFixedThreadPool(FETCH_XCKD_POSTS_POOL_SIZE)
  2. A Semaphore(FETCH_XKCD_POSTS_SEMAPHORE_SIZE)
  3. Thread.sleep(...) throttling/rate limit whatever

In addition to CompletableFuture.runAsync and blocking calls join(), sleep().

Platform threads are so expensive.

So let's use some project loom here and switch this entire thing to use Virtual Threads. This way, you don’t need the extra logic. Only keep the Semaphore if we're actually going to get rate limited.

This could all just be:

try (Executor executor = Executors.newVirtualThreadPerTaskExecutor()) {
    List<Future<?>> futures = IntegerRange.of(1, XKCD_POSTS_AMOUNT)
        .toIntStream()
        .filter(id -> id != 404)
        .mapToObj(id -> executor.submit(() -> {
            retrieveXkcdPost(id).join().ifPresent(p -> xkcdPosts.put(id, p));

            Thread.sleep(FETCH_XKCD_POSTS_THREAD_SLEEP_MS);
        }))
        .toList();

    for (Future<?> f : futures) {
        f.get();
    }
}

and if you really need the semaphore, add it ot the mapToObj

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request java Pull requests that update java code new command Add a new command or group of commands to the bot

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants