Skip to content

combine_first not unionising the columns of empty dataframes #29562

@Afrozodiacc

Description

@Afrozodiacc

Code Sample, a copy-pastable example if possible

df = pandas.DataFrame(columns=['a']).combine_first(pandas.DataFrame(columns=['b']))
df.columns
> Index(['a'], dtype='object')

Problem description

The current behaviour is a problem in that the resulting dataframe's columns are not a combination/union of the columns of each of the two dataframes. The expected output is that df.columns would return Index(['a', 'b'], dtype='object'). This expected output is desirable as it is a possible that the dataframes being combined have information value in the union of columns even in instances where there are no row indexes in either dataframe.

I am using version 0.25.3 for which the documentation states: "The row and column indexes of the resulting DataFrame will be the union of the two". (https://github.com/pandas-dev/pandas/blob/v0.25.3/pandas/core/frame.py#L5587-L5661)

I've checked the issues page but couldn't find anything on this case.

Expected Output

Index(['a', 'b'], dtype='object')

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : None python : 3.7.3.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : None.None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None s3fs : None scipy : None sqlalchemy : None tables : None xarray : None xlrd : None xlwt : None xlsxwriter : None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions