Users have no easy way to check how much disk space the OpenML cache is using. The cache can grow large with many datasets. The solution would be to add a utility function to calculate the current cache size.
Much of the relevant code is in openml/config.py and openml/utils.py. We could put the function into openml/utils.py and expose it to the public API from there.