Ran a few CSV files through markitdown and one of them came out as nothing but empty table cells, even though the file obviously had data in it. Spent a while confused before I noticed the only thing special about that file was a blank line at the very top.
Two files, identical except the first one has an empty leading line:
blank_first.csv
(blank line here)
name,age
bob,3
alice,7
Converting it:
markitdown blank_first.csv
gives an empty table, one | | per line, all data gone:
The same file with the blank line removed converts fine:
| name | age |
| --- | --- |
| bob | 3 |
| alice | 7 |
What's going on: csv.reader parses the blank first line as an empty row [], so rows[0] ends up with zero columns. The converter uses rows[0] as the header, then for every data row does row[: len(rows[0])], which is row[:0] and truncates every row to nothing. No error, no warning, the whole table just silently empties out.
Easy to hit in practice since plenty of exports put a blank line (or a blank leading row) at the top, and because it fails silently you don't notice until you look at the output.
markitdown 0.1.6, Python 3.12.
Happy to send a PR. Skipping leading empty rows before picking the header row fixes it cleanly.
Ran a few CSV files through markitdown and one of them came out as nothing but empty table cells, even though the file obviously had data in it. Spent a while confused before I noticed the only thing special about that file was a blank line at the very top.
Two files, identical except the first one has an empty leading line:
blank_first.csvConverting it:
gives an empty table, one
| |per line, all data gone:The same file with the blank line removed converts fine:
What's going on:
csv.readerparses the blank first line as an empty row[], sorows[0]ends up with zero columns. The converter usesrows[0]as the header, then for every data row doesrow[: len(rows[0])], which isrow[:0]and truncates every row to nothing. No error, no warning, the whole table just silently empties out.Easy to hit in practice since plenty of exports put a blank line (or a blank leading row) at the top, and because it fails silently you don't notice until you look at the output.
markitdown 0.1.6, Python 3.12.
Happy to send a PR. Skipping leading empty rows before picking the header row fixes it cleanly.