float-data — Diverse Floating-Point Datasets

This repository provides a diverse suite of real-world and synthetic floating-point datasets designed for benchmarking numeric parsing and string-conversion algorithms.

All datasets are stored as plain text, one numeric value per line, making them easy to inspect and reproducible across systems.

The goal of this dataset collection is to provide a representative, diverse, and challenging benchmark corpus mirroring the numeric values commonly encountered in practice—ranging from geospatial data, scientific simulations, astronomy catalogs, financial time series, and machine-learning model weights, up to pathological IEEE-754 edge cases. Only two datasets (numbers.txt and hellfloat64.txt) are synthetic, serving well-defined roles: a simple uniform baseline and a comprehensive stress-test for string-conversion algorithms.

📦 Dataset Overview

Below is a detailed description of each dataset, including the type of numeric values they contain and typical real-world scenarios they represent.

🌎 `canada.txt` — Geospatial Coordinates (FP64)

Extracted from the GeoJSON dataset of geographic features.
Contains latitude, longitude, elevations, and associated attributes.
Representative of GIS pipelines, navigation systems, and open-data APIs.

🐟 `marine_ik.txt` — Inverse Kinematics (FP32)

Numeric values from a marine robotics inverse-kinematics example.
Contains small real numbers (between -1 and 4.4) from a physical simulation.
Typical of control systems and scientific computing workloads.

🧱 `mesh.txt` — 3D Mesh Geometry (FP64)

Vertex coordinates and related mesh data from a triangulated 3D surface.
Similar to formats used in CAD, graphics engines, and scientific visualization.
Heavy on small numbers (between -1 and 3), but realistic and widespread.

🪙 `bitcoin.txt` — Historical Market Data (FP64)

Daily closing prices of Bitcoin (USD), from 2020-01-01 to 2022-07-31.
Representative of financial APIs, trading systems, and real-time dashboards.

🎲 `numbers.txt` — Uniform Random in [0,1] (FP64)

Synthetic baseline dataset.
Useful for comparisons but not intended to represent real-world patterns.

🤖 `mobilenetv3_large.txt` — Deep Learning Model Weights (FP32)

Serialized weights of the MobileNetV3-Large ImageNet model.
Contains millions of FP32 values (both small and moderately large), typical of:
- neural networks,
- gradient updates,
- machine-learning pipelines.

✨ `gaia.txt` — Astrometric & Photometric Data (FP64)

Extracted from ESA Gaia DR3.
Includes:
- right ascension / declination,
- parallax, proper motions,
- photometric fluxes,
- galactic & ecliptic coordinates.
True scientific dataset with large dynamic range, typical of astronomy and big-science archives.

🌤️ `noaa_global_hourly_2023.txt` — Weather Station Measurements (FP32)

NOAA NCEI “Global Hourly” dataset (temperature, dew point, visibility, pressure).
Extremely common real-world data format: noisy, irregular, and API-like.

🌍 `noaa_gfs_1p00.txt` — Global Forecast System Model Output (FP32)

Extracted from NOAA GFS model GRIB2 files.
Contains fields such as geopotential height, temperature, humidity, pressure, wind components.
True scientific FP32 with meaningful numerical variety and scaling.

🔥 `hellfloat64.txt` — Synthetic IEEE-754 Stress-Test (FP64)

A custom-built dataset designed to stress special values.
Contains:

subnormals,
powers of two across the full range,
powers of ten in ±308,
values near rounding boundaries,
extreme magnitudes,
structured edge cases (±0, EPS, min/max normal/subnormal),
log-distributed extreme values.

This dataset is purely synthetic and intended as a worst-case test harness.

⚙️ Reproducibility and Regeneration

Several datasets can be regenerated automatically using the scripts found in the scripts/ directory. All Python dependencies are managed using uv.

Example commands:

cd scripts
uv sync
./noaa_gfs_1p00.sh
uv run ./noaa_global_hourly_f32.py
uv run ./gaia.py
uv run ./mobilenetv3.py
uv run ./hellfloat64.py

📖 Quick usage

You can list or inspect the files with these shell commands:

ls -l number_files/
head -n 30 number_files/canada.txt
wc -l number_files/gaia.txt

📚 Citation

If you use this dataset in research or a publication, please cite it.

Example BibTeX entry:

@misc{float-data,
  title = {float-data: A collection of floating-point numbers},
  author = {Jaël Champagne Gareau and Daniel Lemire},
  year = {2025},
  howpublished = {\url{https://github.com/fastfloat/float-data}}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
number_files		number_files
scripts		scripts
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

float-data — Diverse Floating-Point Datasets

📦 Dataset Overview

🌎 `canada.txt` — Geospatial Coordinates (FP64)

🐟 `marine_ik.txt` — Inverse Kinematics (FP32)

🧱 `mesh.txt` — 3D Mesh Geometry (FP64)

🪙 `bitcoin.txt` — Historical Market Data (FP64)

🎲 `numbers.txt` — Uniform Random in [0,1] (FP64)

🤖 `mobilenetv3_large.txt` — Deep Learning Model Weights (FP32)

✨ `gaia.txt` — Astrometric & Photometric Data (FP64)

🌤️ `noaa_global_hourly_2023.txt` — Weather Station Measurements (FP32)

🌍 `noaa_gfs_1p00.txt` — Global Forecast System Model Output (FP32)

🔥 `hellfloat64.txt` — Synthetic IEEE-754 Stress-Test (FP64)

⚙️ Reproducibility and Regeneration

📖 Quick usage

📚 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

fastfloat/float-data

Folders and files

Latest commit

History

Repository files navigation

float-data — Diverse Floating-Point Datasets

📦 Dataset Overview

🌎 canada.txt — Geospatial Coordinates (FP64)

🐟 marine_ik.txt — Inverse Kinematics (FP32)

🧱 mesh.txt — 3D Mesh Geometry (FP64)

🪙 bitcoin.txt — Historical Market Data (FP64)

🎲 numbers.txt — Uniform Random in [0,1] (FP64)

🤖 mobilenetv3_large.txt — Deep Learning Model Weights (FP32)

✨ gaia.txt — Astrometric & Photometric Data (FP64)

🌤️ noaa_global_hourly_2023.txt — Weather Station Measurements (FP32)

🌍 noaa_gfs_1p00.txt — Global Forecast System Model Output (FP32)

🔥 hellfloat64.txt — Synthetic IEEE-754 Stress-Test (FP64)

⚙️ Reproducibility and Regeneration

📖 Quick usage

📚 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

🌎 `canada.txt` — Geospatial Coordinates (FP64)

🐟 `marine_ik.txt` — Inverse Kinematics (FP32)

🧱 `mesh.txt` — 3D Mesh Geometry (FP64)

🪙 `bitcoin.txt` — Historical Market Data (FP64)

🎲 `numbers.txt` — Uniform Random in [0,1] (FP64)

🤖 `mobilenetv3_large.txt` — Deep Learning Model Weights (FP32)

✨ `gaia.txt` — Astrometric & Photometric Data (FP64)

🌤️ `noaa_global_hourly_2023.txt` — Weather Station Measurements (FP32)

🌍 `noaa_gfs_1p00.txt` — Global Forecast System Model Output (FP32)

🔥 `hellfloat64.txt` — Synthetic IEEE-754 Stress-Test (FP64)

Packages