Skip to content

CSV with a blank first line converts to an all-empty table (data silently lost) #2136

@wajih-rathore

Description

@wajih-rathore

Ran a few CSV files through markitdown and one of them came out as nothing but empty table cells, even though the file obviously had data in it. Spent a while confused before I noticed the only thing special about that file was a blank line at the very top.

Two files, identical except the first one has an empty leading line:

blank_first.csv

(blank line here)
name,age
bob,3
alice,7

Converting it:

markitdown blank_first.csv

gives an empty table, one | | per line, all data gone:

|  |
|  |
|  |
|  |

The same file with the blank line removed converts fine:

| name | age |
| --- | --- |
| bob | 3 |
| alice | 7 |

What's going on: csv.reader parses the blank first line as an empty row [], so rows[0] ends up with zero columns. The converter uses rows[0] as the header, then for every data row does row[: len(rows[0])], which is row[:0] and truncates every row to nothing. No error, no warning, the whole table just silently empties out.

Easy to hit in practice since plenty of exports put a blank line (or a blank leading row) at the top, and because it fails silently you don't notice until you look at the output.

markitdown 0.1.6, Python 3.12.

Happy to send a PR. Skipping leading empty rows before picking the header row fixes it cleanly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions