Skip to content

BUG: loc setitem with duplicate columns and new columns corrupts Data…#65208

Merged
jbrockmendel merged 2 commits intopandas-dev:mainfrom
roeimed0:fix-issue-58317
Apr 14, 2026
Merged

BUG: loc setitem with duplicate columns and new columns corrupts Data…#65208
jbrockmendel merged 2 commits intopandas-dev:mainfrom
roeimed0:fix-issue-58317

Conversation

@roeimed0
Copy link
Copy Markdown
Contributor

…Frame (#58317)

Summary

fixing stale PR #64079
When using df.loc to assign a DataFrame with duplicate column names and new columns ,unrelated columns were corrupted.

The indexer in _ensure_listlike_indexer assumed the expanded columns mapped 1-to-1
to the original columns in order, which broke when union inserted duplicates in the
middle. Fixed by using get_indexer to correctly map each column position.

AI was used to explore the code path and trace the root cause.

@roeimed0 roeimed0 changed the title Fix issue 58317 BUG: loc setitem with duplicate columns and new columns corrupts Data… Apr 13, 2026
@afurm
Copy link
Copy Markdown

afurm commented Apr 13, 2026

The test creates a DataFrame with columns ["D", "B", "C", "A"] and assigns with ["B", "E", "B"]. When there's a mismatch between the column counts and the assignment target, the get_indexer approach needs to guarantee it produces the same indexer array length as len(keys) — otherwise reindex_indexer may misalign columns silently. Can you confirm the indexer length matches len(keys) in all paths?

@roeimed0
Copy link
Copy Markdown
Contributor Author

roeimed0 commented Apr 14, 2026

I can confirm the indexer length always matches len(keys). get_indexer is guaranteed to return one value per element in its input — either the position of that key in the existing columns, or -1 for new ones. So the length is always correct.

The problem with the old code was that it assumed new columns always end up at the tail of keys, which isn't true when there are duplicate columns.

get_indexer handles duplicates and any ordering correctly without making that assumption.

@jbrockmendel jbrockmendel merged commit 4d6f97f into pandas-dev:main Apr 14, 2026
43 of 45 checks passed
@jbrockmendel
Copy link
Copy Markdown
Member

thanks @roeimed0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: loc __setitem__ has incorrect behavior when assigned a DataFrame and new columns and duplicated columns are added.

3 participants