Power Query: Fix Wrong Remove Duplicates

von HubSite 365 über Leila Gharani [MVP]

Power Query in Excel: stop lazy evaluation on remove duplicates, fix with Table.Buffer or Group By to keep latest record

Key insights

Remove Duplicates: Power Query's Remove Duplicates can keep the wrong row after a visible sort, so your report may show an incorrect count without any error or warning.
lazy evaluation and query folding: Power Query often delays or pushes work back to the source, so a sort in the editor may not be materialized before deduplication and an older row can survive.
verify counts: Always compare deduplicated results to the source file and sample rows; mismatches usually show up as count differences or unexpected retained records.
Table.Buffer: Fix this by sorting so the desired row comes first, then force materialization with Table.Buffer (or break folding with Table.StopFolding), and finally run Remove Duplicates or Table.Distinct.
Pattern: Table.Sort → Table.Buffer → Table.Distinct (or Remove Duplicates).
Group By: As an alternative, use Group By to aggregate and pick the max date or max index per key; this method does not rely on sort order and avoids folding issues.
normalize: Normalize key columns (Trim, Uppercase/Lowercase, Clean) before deduplication, deduplicate on the correct business key, and always test the full dataset to confirm results.

Leila Gharani explains Power Query deduplication pitfall

Leila Gharani [MVP] explains why Power Query can keep the wrong record

In a recent YouTube video, trainer Leila Gharani [MVP] demonstrates a subtle but important pitfall when using Power Query Remove Duplicates to keep the latest record. The video shows that a straightforward approach — sorting newest-first and removing duplicates — can silently return incorrect results on larger datasets. Consequently, reports may publish bad numbers without any error or warning. The demonstration highlights how common tools can fail in production unless users understand internal behavior.

The problem demonstrated

Gharani walks through a typical scenario: sort by date so the newest entry appears first, then use Remove Duplicates to keep that top row for each key. At small scale the output looks correct, but when the same steps run on a large CSV the count is wrong and some keys keep older rows. In other words, the visible editor order does not always guarantee the row order used by the deduplication step. As a result, teams who trust the editor preview can be surprised by quietly incorrect data feeding downstream reports.

Why Power Query keeps the wrong row

The video explains that the root cause is how Power Query optimizes work, often by using query folding and lazy evaluation. Query folding can push operations back to the data source or reorder steps to improve efficiency, so the editor’s sort may not materialize before duplicates are removed. Therefore, duplicate-removal operates on the table’s current internal state rather than the visually sorted state. This behavior is efficient, but when you need a stable ordering for correct deduplication, it becomes a hazard.

The simple fix and an alternative

Gharani proposes a one-line fix: wrap the sorted table in Table.Buffer so Power Query materializes the sorted data in memory before deduplication. With buffering, the sort is fully realized and Table.Distinct or the UI Remove Duplicates step will keep the intended row. She also presents a second approach that avoids relying on order: use the Group By method to compute a maximum date or an index per key, then merge or filter to keep the correct record. Both solutions work, but they come with tradeoffs that users should weigh.

Tradeoffs: performance, memory, and folding

Buffering enforces order, yet it increases memory usage because the table is materialized in Power Query’s process. Thus, for very large datasets buffering can slow queries or even exhaust available memory. On the other hand, the Group By pattern often avoids buffering and preserves query folding, which can be faster when the source supports it. However, grouping can be more complex to implement and may change how transformations are structured, which affects maintainability.

Practical verification and safe practices

The video emphasizes verifying results against the original source. For example, compare deduplicated output to the CSV or source query counts before publishing, and test on real dataset sizes. Additionally, normalizing text with Trim, Clean, or consistent casing helps avoid hidden mismatches that look identical in the editor. Finally, deduplicate on the correct business key rather than the entire row so you control which fields determine uniqueness.

Challenges in production use

Teams must balance robustness, speed, and resource use when choosing an approach. Buffering is simple and reliable for moderate datasets, but it can raise costs when run at scale. Conversely, preserving query folding can deliver faster results and lower memory demand, yet it may hide ordering assumptions and require more careful step design. Therefore, developers should document the chosen pattern and include tests to detect regressions when data grows or sources change.

Conclusion

Gharani’s video offers a clear and practical lesson: the visible order in Power Query’s editor is not always the order used by downstream steps, and that mismatch can silently corrupt results. By either materializing the sort with Table.Buffer or adopting a grouping strategy, analysts can ensure they keep the correct record. Ultimately, the right choice depends on dataset size, source capabilities, and operational constraints, so validate and monitor any deduplication pattern before it reaches production.

Excel - Power Query: Fix Wrong Remove Duplicates

Keywords

Power Query remove duplicates keeps wrong record, Power Query remove duplicates fix, Power Query wrong record after removing duplicates, Power BI remove duplicates wrong record, Excel Power Query remove duplicates keeps wrong row, Power Query dedupe keep correct record, M code remove duplicates keep latest, Power Query remove duplicates by column order