<- / New Developments in Control Flow Graphs in Lean

New Developments in Control Flow Graphs in Lean

quartztz · 23/04/2026

Table of Contents:

  1. Brief excursion
  2. Main lesson

This is a small part 2/procedural update on the original article. Since then, I’ve been doing a lot of reading, and I have some notes to share! Or rather: there’s stuff I want to think through, so I’ll put it in writing, and I’ll publish that writing in case it can help someone else/attract the attention of someone that knows what they’re doing.

Brief excursion: CompCert

I had a call with one of my professors about the problem, where he said something that stuck with me:

Whenever there’s problems like these, my question is : “How would CompCert do this”. Did you have a look at that?

To which I had to admit that no, indeed, I didn’t think that CompCert was chill like that. Turns out that it is! Kildall’s algorithm is implemented and properties on it are verified, and a few dataflow optimization passes are proven on it. It seems Xavier Leroy et al were onto someting.

On the one hand, the fact that someone else had attempted to do this meant that I had to actually sit down to read and understand real, industry Rocq [1], but on the other, it given the cred and reputation of the project, if anyone did it right, it’s probably them.

So armed with confidence and a few hours, I started digging through the proofs, slightly souring the sweet taste of anticipation. CompCert’s soundness proof proceeds over two fronts:

This means that this proof approach in particular relies on the relation between two execution states, and not between an execution state and a CFG node. This makes a lot of sense for the purposes of CompCert, as they’ll be uniquely focused on the dataflow as a means of optimization, so the dataflow can be assumed to be correct insofar as its effects on the executable code are sound. However, this means that for Uniqueness analysis, which is a static analysis study without application on the code, this approach loses some relevance.

Summary of the brief excursion:

Main lesson

Another interesting thing my professor said [slightly paraphrased]:

Your approach to showing correspondence cannot possibly work ‘cause you’re asking too much of it.

He didn’t say it like that, but that’s what it felt like. And it made a lot of sense: the CEK machine contains a lot of information that’s lost in translation, so while it’s easy to write a forgetful map from a state to any node, doing the opposite cannot happen with just a flat view of the program point. The professor’s suggestion was therefore to instrument the Control Flow Graphs with semantics, defining what it means to evaluate a CFG.

For a very high-level, theoretical overview: the idea is to consider the following execution state

structure CFGState where
  n : CFGNode
  E : Environment
  K : Continuations
  -- _ : Other Metadata?

and to define an execution relation of type CFGEval : CFGState -> CFGState -> Prop. Then, we define correspondence between a CEK state and a CFG state, using the bisimulation framework defined in the previous post but considering states instead of nodes, and we’re good! As long as we’re careful, the analysis half of the code shouldn’t need to be touched, nor should the structure of the CFG itself require any sort of particular handling.

In a way, this approach refines the one used by the CompCert developers, considering the CFG as the bytecode for a new abstract machine, and in some sense as a compilation target. There are some questions still lingering:

A first bit of research into CFG semantics didn’t yield a lot of information, but this approach shows a lot of promise and I’m confident something good will come of it eventually.


[1] Nothing against it per se, but after a few months of Rocq inactivity, it did require some mental fortitude and prior prep.

[2] From the CompCert documentation, Kildall.v

[3] From the CompCert documentation, ConstPropproofs.v