Skip to content

feat(yahoo): add GB (London Stock Exchange) support to Yahoo Finance data collector#2155

Open
kaish114 wants to merge 8 commits intomicrosoft:mainfrom
kaish114:feat/add-gb-lse-yahoo-collector
Open

feat(yahoo): add GB (London Stock Exchange) support to Yahoo Finance data collector#2155
kaish114 wants to merge 8 commits intomicrosoft:mainfrom
kaish114:feat/add-gb-lse-yahoo-collector

Conversation

@kaish114
Copy link

Summary

This PR adds GB (London Stock Exchange) support to the Yahoo Finance data collector, following the same pattern as the existing IN (India) and BR (Brazil) implementations.

Changes

  • scripts/data_collector/utils.py

    • Add "GB_ALL": "^FTSE" to CALENDAR_BENCH_URL_MAP — uses FTSE 100 index as the trading-calendar benchmark, consistent with ^NSEI (India) and ^BVSP (Brazil)
    • Add _GB_SYMBOLS module-level cache
    • Extend get_calendar_list() to handle GB_ prefixed bench codes
    • Add get_gb_stock_symbols() — fetches symbols from the Yahoo Finance predefined most_actives_gb screener API with pagination (250 per page), filtering to .L-suffixed symbols (~960 LSE-listed equities)
  • scripts/data_collector/yahoo/collector.py

    • Add YahooCollectorGB, YahooCollectorGB1d, YahooCollectorGB1min
    • Add YahooNormalizeGB, YahooNormalizeGB1d, YahooNormalizeGB1dExtend, YahooNormalizeGB1min
    • Timezone: Europe/London; symbol format: .L suffix (e.g. HSBA.L, AZN.L)
    • YahooNormalizeGB1min sets AM_RANGE = ("08:00:00", "16:29:00") to reflect the LSE continuous session (no midday break)
  • scripts/data_collector/yahoo/README.md — add GB download/normalize examples and a note on GBp (pence) pricing

  • scripts/data_collector/README.md — update supported regions list (also fixes existing omission of IN and BR)

  • tests/test_yahoo_collector_gb.py — 21 unit tests covering symbol fetching, pagination, caching, calendar routing, class existence, timezone, and Run class-name resolution

Symbol source

Symbols are fetched from the Yahoo Finance predefined screener endpoint:

GET https://query1.finance.yahoo.com/v1/finance/screener/predefined/saved?scrIds=most_actives_gb&count=250&start={offset}

This endpoint requires no authentication, supports start offset pagination, and returns symbols already validated by Yahoo Finance — guaranteeing that collected symbols will have price data available. The full GB market universe (~1,432 securities, ~960 .L equities) is covered across 6 paginated requests.

Testing

pytest tests/test_yahoo_collector_gb.py -v
# 21 passed

All changed files formatted with black -l 120.

Usage

# Download 1d data
python scripts/data_collector/yahoo/collector.py download_data \
  --source_dir ~/.qlib/stock_data/source/gb_data \
  --start 2000-01-04 --end 2025-12-31 --delay 1 --interval 1d --region GB

# Normalize
python scripts/data_collector/yahoo/collector.py normalize_data \
  --source_dir ~/.qlib/stock_data/source/gb_data \
  --normalize_dir ~/.qlib/stock_data/source/gb_1d_nor \
  --region GB --interval 1d

@kaish114
Copy link
Author

@kaish114 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

The GB symbol fetch paginates Yahoo Finance screener in a tight loop
(up to 6 requests for ~1400 symbols).  Without a delay between pages,
Yahoo returns 429 Too Many Requests on the second request.  Add a
1-second sleep after each full page to stay within rate limits.
Yahoo Finance blocks the Chrome UA string but accepts a plain
Mozilla/5.0 agent, consistent with the existing BR collector.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant