Skip to content

fix(utils): strip the _qlib_ prefix in fname_to_code, not characters#2262

Open
he-yufeng wants to merge 1 commit into
microsoft:mainfrom
he-yufeng:fix/fname-to-code-strip-prefix
Open

fix(utils): strip the _qlib_ prefix in fname_to_code, not characters#2262
he-yufeng wants to merge 1 commit into
microsoft:mainfrom
he-yufeng:fix/fname-to-code-strip-prefix

Conversation

@he-yufeng

Copy link
Copy Markdown

Description

fname_to_code() is meant to reverse code_to_fname(), which prefixes reserved Windows device names (CON, PRN, AUX, NUL, COM0COM9, LPT0LPT9) with _qlib_ so they can be used as directory names. The reverse used fname.lstrip("_qlib_"), but str.lstrip removes any leading characters contained in its argument, not the prefix as a whole.

exists_qlib_data() lowercases each feature directory name before calling fname_to_code():

code_names = set(map(lambda x: fname_to_code(x.name.lower()), features_dir.iterdir()))

So LPT1 is stored as _qlib_LPT1, lowercased to _qlib_lpt1, and then lstrip("_qlib_") strips the leading l too (it is one of the prefix characters), yielding pt1 instead of lpt1. The data-existence check then fails to match such a stock. Any name whose body begins with a character in {_, q, l, i, b} is corrupted (e.g. fname_to_code("_qlib_lll") returns "").

The fix removes the prefix by slicing, so the original code is recovered intact.

Verification

Before:

>>> fname_to_code("_qlib_lll")
''
>>> fname_to_code(code_to_fname("LPT1").lower())
'pt1'

After:

>>> fname_to_code("_qlib_lll")
'lll'
>>> fname_to_code(code_to_fname("LPT1").lower())
'lpt1'

Added FileNameUtils.test_fname_code_round_trip in tests/misc/test_utils.py, covering the reserved device names (via the lowercased path exists_qlib_data uses), a prefix-only body, and a plain code. black -l 120 --check passes on the changed files.

fname_to_code used fname.lstrip("_qlib_") to undo code_to_fname's prefix, but
str.lstrip removes any leading characters that appear in its argument, not the
prefix as a whole. exists_qlib_data() lowercases the feature directory name
before calling fname_to_code, so a reserved device code such as LPT1 (stored as
"_qlib_LPT1") becomes "_qlib_lpt1" and then "pt1" instead of "lpt1" -- the
leading "l" is wrongly stripped because it is one of the prefix characters.
Remove the prefix by slicing so the original code is recovered intact.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant