[ENH] Remove yfinance as a dependency and implement data_loader by Shuvam586 · Pull Request #721 · PyPortfolio/PyPortfolioOpt

Shuvam586 · 2026-03-03T12:15:07Z

Closes #716

Replaces yfinance usage in notebooks with bundled static example data (stock_prices.csv and market_caps.csv).
Adds pypfopt.data loaders for stock prices and market caps.
Removes yfinance dependency entirely.

Shuvam586 · 2026-03-03T12:18:39Z

instead of editing tickers mentioned in the cookbook notebooks, i added 2 csv files to pypfopt/data. stock_prices.csv and market_caps.csv with only data from 2023 onwards. the stock_prices.csv is around 450kb.

also removed yfinance related statements and functions from all notebooks

fkiraly

Nice, thanks!

May I request to not use data downloaded via yfinance from Yahoo services at all? This is due to terms of use, we should not distribute data from Yahoo services at all in the repository or package.

Could you instead use similar data? Either completely randomly generated (Brownian motion random walk or similar, with same column names and time index), or taking some inspiration from the actual data in how you randomize - but it cannot be the exact values.

Shuvam586 · 2026-03-05T14:32:36Z

@fkiraly

market_caps.csv and stock_prices.csv now contain data produced synthetically with Geometric Brownian Motion.

fkiraly · 2026-03-05T17:49:58Z

Thanks! Could you kindly post here the plots in the notebooks before/after, just to check if they look similar?

fkiraly · 2026-03-05T18:39:46Z

also, code formatting tests are failing, please look at pre-commit

Shuvam586 · 2026-03-07T18:46:46Z

i have run pre-commit and pushed the changes.

Shuvam586 · 2026-03-07T18:52:50Z

plots you asked for:

previous data plot with data from yfinance:

synthetic data plot from brownian motion:

fkiraly · 2026-03-07T20:09:56Z

Thanks!

I suspect the actual data would be closer to exponential Brownian motion - that should be achieved by simply taking np.exp of the Brownian motion.

fkiraly · 2026-03-07T20:16:36Z

+        return pd.read_csv(f, **read_csv_kwargs)
+
+
+def load_stockdata(tickers: list = None, start: str = None, end: str = None):


please add docstrings (numpydoc format)

added docstrings

fkiraly · 2026-03-07T20:17:08Z

+
+
+def available_tickers():
+    df = _load_raw_data("stock_prices.csv", parse_dates=["date"])


is this not known in advance? Instad of loading the csv, you could simply load the header, or return the known list

okay, i will rewrite it to return only the list of the tickers in the csv file.

fkiraly

Thanks, looks good!

can you re-execute the notebooks after a clean reset?
I think the simulated data should be exponential brownian motion to resemble the actual data
please add numpydoc docstrings to the data loaders
further comments above

fkiraly · 2026-03-08T16:31:32Z

non-blocking - would it be possible to include the simulation code somewhere in the utils module?

Shuvam586 · 2026-03-12T06:50:52Z

I suspect the actual data would be closer to exponential Brownian motion - that should be achieved by simply taking np.exp of the Brownian motion.

i am confused. the code i used to generate the synthetic data uses exponential brownian motion.

for t in range(1, n_days):
    z = np.random.normal(size=len(tickers))
    prices[t] = prices[t-1] * np.exp(
        (mu - 0.5 * sigma**2) * dt + sigma * np.sqrt(dt) * z
    )

Shuvam586 · 2026-03-12T07:00:45Z

can you re-execute the notebooks after a clean reset?

yes. i re-executed the cookbook notebooks before making the first commit.

fkiraly · 2026-03-14T19:34:07Z

i am confused. the code i used to generate the synthetic data uses exponential brownian motion.

oh, ok - you are right. Then we do not need to change that.

Can you check the pyproject.toml and resolve the merge conflict?

Shuvam586 · 2026-03-21T11:12:43Z

non-blocking - would it be possible to include the simulation code somewhere in the utils module?

i would be happy to, but cant find a utils folder anywhere

Shuvam586 · 2026-03-21T11:17:38Z

@fkiraly

i have made the docstring changes and resolved the conflict. i had exams going on, sorry for the hold-up.

Shuvam586 · 2026-03-29T19:48:49Z

@fkiraly not to bother you, but can this be merged?

amishverma05 · 2026-04-23T14:28:48Z

Hi @Shuvam586,

I was testing out your code and noticed a couple of things that need to be addressed:

1. Notebooks Crash on Import (ModuleNotFoundError)

In the updated cookbook/*.ipynb notebooks, the first cell still has !pip install PyPortfolioOpt. Running this forces Jupyter to download v1.6.0 from PyPI. Because the notebook is inside cookbook/ and pypfopt is outside in the parent directory, Python ignores your new local code, defaults to the PyPI version, and crashes because it can't find pypfopt.data.

I feel the !pip install command can be removed, and instead add the parent directory to sys.path so Python checks the local files first:

import sys
import os
sys.path.insert(0, os.path.abspath('..'))

import matplotlib.pyplot as plt
from pypfopt.data.data_loader import load_stockdata

2. Make available_tickers() Dynamic

In data_loader.py, available_tickers() returns a hardcoded list of the 33 companies. If the CSV files ever get updated with new datasets, this function will easily fall out of sync.

To keep this robust, you should dynamically read the column straight from the local CSV:

def available_tickers():
    """
    Return the list of available ticker symbols dynamically from the bundled dataset.
    """
    df = _load_raw_data("market_caps.csv")
    return sorted(df["ticker"].tolist())

[ENH] Remove yfinance as a dependancy and implement data_loader

bbc653a

fkiraly changed the title ~~[ENH] Remove yfinance as a dependancy and implement data_loader~~ [ENH] Remove yfinance as a dependency and implement data_loader Mar 3, 2026

fkiraly requested changes Mar 3, 2026

View reviewed changes

fkiraly added the documentation Documentation & tutorials label Mar 3, 2026

replaced yfinance data with synthetic data

6486710

pre-commit fixes

67a54db

fkiraly reviewed Mar 7, 2026

View reviewed changes

fkiraly requested changes Mar 7, 2026

View reviewed changes

fkiraly mentioned this pull request Mar 7, 2026

[MNT] homogenize CI workflows with GC.OS repositories #702

Merged

Shuvam586 and others added 2 commits March 21, 2026 16:25

Merge branch 'main' into remove-yfinance

aeec39d

added docstrings to data_loader

6f254b6

Shuvam586 mentioned this pull request Apr 23, 2026

[ENH] Remove yfinance from notebooks and examples #716

Open

		return pd.read_csv(f, **read_csv_kwargs)


		def load_stockdata(tickers: list = None, start: str = None, end: str = None):



		def available_tickers():
		df = _load_raw_data("stock_prices.csv", parse_dates=["date"])

Conversation

Shuvam586 commented Mar 3, 2026

Uh oh!

Shuvam586 commented Mar 3, 2026

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

Shuvam586 commented Mar 5, 2026

Uh oh!

fkiraly commented Mar 5, 2026

Uh oh!

fkiraly commented Mar 5, 2026

Uh oh!

Shuvam586 commented Mar 7, 2026

Uh oh!

Shuvam586 commented Mar 7, 2026

Uh oh!

fkiraly commented Mar 7, 2026

Uh oh!

fkiraly Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Shuvam586 Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

fkiraly Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Shuvam586 Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

fkiraly commented Mar 8, 2026

Uh oh!

Shuvam586 commented Mar 12, 2026

Uh oh!

Shuvam586 commented Mar 12, 2026

Uh oh!

fkiraly commented Mar 14, 2026

Uh oh!

Shuvam586 commented Mar 21, 2026

Uh oh!

Shuvam586 commented Mar 21, 2026

Uh oh!

Shuvam586 commented Mar 29, 2026

Uh oh!

amishverma05 commented Apr 23, 2026

1. Notebooks Crash on Import (ModuleNotFoundError)

2. Make available_tickers() Dynamic

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants