cirle 1

Rewriting the MIM Pending Export Script for 2026

There are tools you use once… and forget. And then there are tools that quietly become part of how you survive complex systems. For me, one of those tools has always been the Pending Export script originally written by Carol Wapshere. If you’ve worked in Microsoft Identity Manager long enough, you know the moment when you are ready to run a big export. You think you know what’s going to happen, but there is so much data it is hard to validate in the console.

And in identity systems, “not really knowing” is where things have gone wrong for me in the past.

The Problem MIM Never Really Solved

MIM gives you power, but it does not give your stakeholders visibility. Before an export, they are often left asking:

  • What exactly is going to change?
  • Which attributes are being updated?
  • What values are being added, removed, or overwritten?
  • How many objects are actually affected?

Yes, you can inspect connector space objects and do previews – but at scale, this becomes impractical. That’s where Carol’s script came in. It took the output of csexport.exe, turned it into something human-readable, and gave you a pre-flight view of identity changes. This allowed you to send data to business for validation before running big exports (and to effectively follow change processes).

Why I Had to Touch It

I recently had to dust this tool off again for a project. In this project I have a sizable dataset which I wanted to validate with the customer. I got the script and ran it; this is where things broke.

The original script was brilliant for its time — but it struggled under time and modern scale:

  • Connector spaces with 95,000+ objects
  • Large XML exports
  • Real-world attribute complexity

Two issues became obvious:

  • Memory: The script split XML into thousands of files and loaded each into memory. That works… until it doesn’t.
  • Performance: Each object was parsed twice and at scale, this becomes painfully slow.

This Was Not a Patch

I started with the intention of fixing a few things but ended rewriting. And I want to be very clear here; the rewrite is not intended to minimize Carol’s original work. This version is simply an attempt to make that idea survive modern environments.

The Core Change — From DOM to Streaming

The biggest shift was architectural.

Instead of:

  • Splitting XML into per-object files
  • Loading full XML DOMs into memory
  • Re-parsing strings repeatedly

The script now uses:

  • A single-pass streaming XML reader (XmlReader)
  • Forward-only processing
  • Depth-aware navigation

What this means in practice:

  • No temporary files
  • No full XML loads
  • Memory usage scales with file size, not object count
  • Processing time becomes predictable

This one change fundamentally altered how the tool behaves under load but also exposed a lot of data hanlding issues.

What You Get Out of It

The goal of the tool has not changed, clarity before execution. But the output is now more usable at scale and has a few more features and validation built in.

Reports generated

  • Single-value attribute changes (per object)
  • Multi-value attribute changes (adds/removes)
  • Change validation (counts)
  • Optional per-attribute breakdowns: Something I always used to do with the original was extract and then start filtering changes per attribute, so this time I just added a flag to output straight to “per-attribute.csv” files.
  • HTML summary with totals and timing

You can take these outputs and:

  • Validate changes before export
  • Get stakeholder approval
  • Audit what is about to happen
  • Troubleshoot unexpected behaviour

The Hidden Problem — Data Quality

The rewrite exposed something else that is not performance rated; data quality. Some target system data is “dirty” with means things like spaces, new lines and more creates issues for the script. When you start processing large datasets, you begin to see:

  • Broken values
  • Unexpected formatting
  • Embedded line breaks
  • Edge-case attribute behaviour

Some of the hardest bugs I hit were not performance-related but they were data integrity issues caused by real-world directory data. In the end with all the changes I made in the processing I was second guessing data all the time. This is why I introduced a second script:

Test-PendingExports.ps1

A validation tool that:

  • Re-reads the original XML
  • Compares it to the generated CSVs
  • Verifies counts, values, and summaries

Because if you are going to trust a report… You need to be able to verify the report.

Sidenote: The CSV Problem

One of the more interesting issues was sometimes the CSV format itself. Embedded newlines inside attribute values caused:

  • Broken rows
  • Phantom records
  • Silent data corruption in Excel pipelines

Ultimately the only to fix this in the reporting was (altering trying simple quoting) was to sanitise the data at export time to prevent downstream parsing failures.

Now, and Next Steps

For now, thank you Carol – hope you do not mind.

As for the project – it is not finished. I’ve tested heavily on a large Active Directory user exports (100k+ objects)

But I still want to validate:

  • Moves
  • Deletes
  • Different Management Agents
  • More edge cases

Back to top