Skip to content

CLI output path is hardcoded, making scripted usage difficult#268

Open
mustafagoktugibolar wants to merge 2 commits intoVectifyAI:mainfrom
mustafagoktugibolar:feat/cli-output-path
Open

CLI output path is hardcoded, making scripted usage difficult#268
mustafagoktugibolar wants to merge 2 commits intoVectifyAI:mainfrom
mustafagoktugibolar:feat/cli-output-path

Conversation

@mustafagoktugibolar
Copy link
Copy Markdown

Problem

I've been using run_pageindex.py as part of a larger script that processes
multiple documents. The issue is that the output path is hardcoded to
./results/{name}_structure.json with no way to override it.

This becomes a problem when:

  • Running the script from a different working directory
  • Processing multiple documents into separate output folders
  • Integrating into a pipeline where the output location needs to be controlled externally

Expected Behavior

Users should be able to specify where the output file is written via CLI arguments.

Proposed Solution

Add --output-dir and --output-file arguments:

# Write to a custom directory
python run_pageindex.py --pdf_path doc.pdf --output-dir /path/to/output

# Write to a specific file
python run_pageindex.py --pdf_path doc.pdf --output-file /path/to/result.json

Backward Compatibility

No breaking changes. If neither argument is provided, the output is still written
to ./results/{name}_structure.json — exactly the same as before.

Copilot AI review requested due to automatic review settings May 9, 2026 22:04
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds CLI configurability for where run_pageindex.py writes its JSON output, addressing scripted/pipeline usage where a hardcoded ./results/... path is inconvenient.

Changes:

  • Added --output-dir and --output-file CLI flags (mutually exclusive) to control output location.
  • Updated PDF and Markdown flows to write JSON to either the specified directory or explicit file path.
  • Documented the new flags in README.md.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
run_pageindex.py Adds mutually exclusive output path CLI args and updates both PDF/MD save logic to respect them.
README.md Documents the new --output-dir / --output-file options in the optional parameters list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread run_pageindex.py Outdated
Comment on lines +79 to +88
pdf_name = os.path.splitext(os.path.basename(args.pdf_path))[0]
if args.output_file:
output_file = args.output_file
parent = os.path.dirname(os.path.abspath(output_file))
if parent:
os.makedirs(parent, exist_ok=True)
else:
output_dir = args.output_dir if args.output_dir else './results'
os.makedirs(output_dir, exist_ok=True)
output_file = os.path.join(output_dir, f'{pdf_name}_structure.json')
Comment thread README.md Outdated
Comment on lines +183 to +184
--output-dir Directory to write the output JSON (default: ./results)
--output-file Full output file path, e.g. /tmp/my_doc.json
Comment thread run_pageindex.py Outdated
Comment on lines +81 to +86
output_file = args.output_file
parent = os.path.dirname(os.path.abspath(output_file))
if parent:
os.makedirs(parent, exist_ok=True)
else:
output_dir = args.output_dir if args.output_dir else './results'
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants