Gold-Plating Analysis Is Not a Prompt — It Is an Engineering Problem

Everyone is talking about Gold-Plating again. The German federal government, the Normenkontrollrat, industry associations. They all agree that the over-implementation of EU directives into national law can create a real burden for businesses. The German National Regulatory Control Council’s 2024 reporting put the additional compliance cost from directive transposition at roughly one billion euros.

But here is the question behind the slogan:

How do you actually find it?

Identifying Gold-Plating is not a matter of reading two texts side by side. It means comparing the normative intent of an EU directive, often deliberately open and full of member-state discretion, against the specific choices a national legislator made. Some of those choices are legitimate. Some are political additions. Some are inherited from pre-existing national law that was never harmonised. And some are genuine, unnecessary over-regulation.

Telling these apart is the hard part.

Why naive AI approaches fail

The temptation is obvious: feed an LLM the directive and the transposition law, ask where the national law goes beyond the directive, and publish the result.

That approach fails in predictable ways.

Hallucinated deviations. Large language models are pattern-completion systems. If you present them with two long legal texts and ask them to find differences, they will produce differences, whether meaningful ones exist or not. In regulatory analysis, a false positive is not harmless. It can enter political discourse as if it were fact.

Structural asymmetry. Directives and national transposition acts are rarely organised in parallel. A directive may express a requirement in one article and qualify it in a recital, while the national implementation distributes the same issue across several statutes, definitions or procedural clauses. Naive text comparison misses precisely the kind of structural extension that matters most.

Blindness in one direction. Germany can also transpose too weakly, or leave parts under-implemented. If the system is framed only to find where Germany exceeds EU requirements, it will systematically miss the inverse case. A robust method has to inspect both directions.

Context collapse. There is a decisive difference between “member states shall ensure” and “member states may require”. The distinction between obligation and discretion is the difference between legitimate national policy and Gold-Plating. Models regularly flatten that distinction unless the analysis architecture forces them not to.

A simplified example

Take a simplified case. A directive says that member states may require additional reporting from operators above a certain risk threshold. The national law then makes reporting mandatory for a much broader category of organisations and adds shorter deadlines plus extra documentation duties.

That might be Gold-Plating. But not automatically.

You still have to determine:

  • whether the directive explicitly left this discretion to member states
  • whether the national obligation already existed in domestic law before the directive
  • whether the broader scope is created by a definition, a threshold, a procedural clause or a separate implementing act
  • whether the change is politically deliberate, legally required elsewhere, or analytically unjustified

That is not a prompting trick. It is a classification problem with legal and structural dependencies.

What a robust analysis architecture looks like

Technical Gold-Plating detection as an engineering challenge, not a prompting exercise. In PolicyMonitor, the analysis follows a multi-stage architecture designed to control exactly the failure modes above.

Stage 1: Case selection

Not every directive is worth analysing. The first filter is strategic: is there a clear transposition act, is the directive recent enough to matter politically, does the sector carry measurable economic weight, and is there enough textual material for a structured comparison? Bad case selection is expensive because it creates the appearance of output without producing useful findings.

Stage 2: Structured decomposition

Instead of comparing entire documents, the texts are broken into normative units: obligations, permissions, definitions, thresholds and procedural requirements. Each unit is tagged by regulatory character, for example mandatory minimum, optional provision or member-state discretion. The national provisions are then mapped against those units, so the analysis survives the asymmetry between EU and national legal architecture.

Stage 3: AI inside a deterministic spine

This is where AI becomes useful, but only as a bounded component. I have described that broader pattern elsewhere as the deterministic spine: the process flow, validation gates and output structure stay fixed, while the AI performs constrained analytical tasks inside those boundaries.

Every candidate deviation passes through concrete checks:

  • source verification: does the cited article actually exist and say what the model claims
  • cross-validation: does an independent pass reach the same conclusion
  • classification: is this a genuine addition, a scope extension, a stricter threshold, or a legitimate use of discretion
  • confidence control: low-confidence cases are flagged for review, not promoted as findings

This is much closer to a safety-critical pipeline than to the popular fantasy of “just ask the model”.

Stage 4: Traceable output

Every usable finding has to point back to specific provisions in both the directive and the national law, with extracted passages and a clear deviation type. Without traceability, the output is not analysis. It is opinion with formatting.

The selection problem nobody talks about

Current political discourse often treats Gold-Plating as if all directives could simply be run through a filter. In reality, the choice of what to analyse already shapes the outcome.

Look at labour law and you will often find Germany exceeding minimum standards for deliberate political reasons. Look at environmental law and you find a mixture of genuine additions and older domestic standards that were never harmonised away. Look at financial regulation and supervisory interpretation can matter as much as statutory wording.

A credible Gold-Plating methodology therefore has to be explicit about selection criteria, exclusions and analytical limits. That is basic methodological hygiene. Most political debate skips it. A well-architected system can enforce it consistently.

From analysis to action

A report that merely lists deviations is an academic artefact. The real value begins with prioritisation: which deviations have measurable economic impact, which can be removed without undermining legitimate policy goals, and which require legislative change rather than administrative adjustment?

That is where technical screening ends and human judgement begins. The system can identify, structure and verify. Legal review and political strategy still remain human responsibilities.


PolicyMonitor offers a free Gold-Plating analysis for an initial deviation scan. If the signal is relevant, the paid technical screening delivers a structured, source-linked report for legal review and political strategy. See PolicyMonitor or get in touch.

Source note

  • The reference to roughly EUR 1 billion in additional compliance costs is based on 2024 reporting by the German National Regulatory Control Council (Normenkontrollrat / NKR) on the burden created by EU directive transposition.