Skip to content

Bots are in training dataset #2

@Tsardoz

Description

@Tsardoz

This looks great! I have been looking for something like this that includes human ability in a single model.
Unfortunately there is one major issue with this I am not sure can be fixed.
Previously I have downloaded some of these Lichess datasets.
Based on the supposition that users often use the same accouint name between chess.com and lichess, I extracted all the users, submitted them one by one to chess.com's api, which reports back with a fair play violation if they have banned that user.
Up to 20% of users in recent lichess datasets who have the same name in chess.com have been banned by chess.com.
So your training data will be corrupted by chess AI bots of much higher rating.
I do not know of a way of filtering these out. Maybe you can think of one?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions