Getting into the Flow: Towards Better Type Error Messages for Constraint-Based Type Inference

Creating good type error messages for constraint-based type inference systems is difficult. Typical type error messages reflect implementation details of the underlying constraint-solving algorithms rather than the specific factors leading to type mismatches. We propose using subtyping constraints that capture data flow to classify and explain type errors. Our algorithm explains type errors as faulty data flows, which programmers are already used to reasoning about, and illustrates these data flows as sequences of relevant program locations. We show that our ideas and algorithm are not limited to languages with subtyping, as they can be readily integrated with Hindley-Milner type inference. In addition to these core contributions, we present the results of a user study to evaluate the quality of our messages compared to other implementations. While the quantitative evaluation does not show that flow-based messages improve the localization or understanding of the causes of type errors, the qualitative evaluation suggests a real need and demand for flow-based messages.


INTRODUCTION
Much academic research has gone into producing better type error messages for functional programming languages, dating back at least to Wand [1986].Yet, one would be none the wiser by looking at the error messages produced by existing compilers, including those compilers designed specifically with learning in mind, such as Helium [Heeren et al. 2003].For example, consider the following OCaml program 2 , where operator (^) stands for string concatenation: 4 let appInfo = ( " My ␣ Application " , 1.5) 5 let process ( name , vers ) = name ^show_major ( parse_version vers ) 6 let main () = process appInfo OCaml l .6:let main () = process appInfo Error : This expression has type string * float but an expression was expected of type string * string Type float is not compatible with type string is provided, which could have been helpful.Error reports produced by most other existing compilers for functional programming languages are not significantly different than this.

Flow-Based Error Messages
How come the wealth of previous ideas for improving ML type errors has not yet permeated modern compiler design practice?This could be for a number of reasons.Perhaps the previously-proposed approaches were too difficult to implement or to integrate into existing type systems; or they were too unreliable and their heuristics too difficult to tune; or perhaps the corresponding explanations were not actually helpful to real programmers.
In this paper, we set out to start addressing these questions by: • proposing a straightforward, heuristics-free approach to recording and reproducing the information relevant to ML type errors in terms of data flows, a concept that we expect users can get used to quickly, because it relates to how programs evaluate; and by • performing a randomized quasi-experimental study to evaluate whether our approach does help programmers understand the type errors found in actual ML programs.We compare the error messages we produce to those generated by ocamlc and Helium [Heeren et al. 2003] 3 .The approach we propose, which we dub HM ℓ (to be read as "H-M-loc"), produces the error message in Figure 2 when given the same program as above.This report adds several helpful bits of context to the error.Most importantly, instead of displaying a single erroneous location, it presents, in a logical order, each location involved in the erroneous data flow.This report illustrates the flow of data from the right-hand-side component of the appInfo pair into the vers parameter of function process, and the flow of this vers parameter into function parse_version, which is imported from some library.Our type error is caused by this specific erroneous data flow.
Note that if the abbreviated report above is still not enough for the user to resolve the issue at hand, a further detailed explanation can be obtained.HM ℓ supports a verbose mode that lists the entire sequence of locations involved in a type mismatch, including those locations that go through nested type constructors.The verbose message for the current example is illustrated in Appendix C.1.
Verbose error messages can become very long and unwieldy.A promising future direction for our work will be to enable interactive type error debugging in integrated development environments (IDEs), whereby users will be able to explore the data flows involved in a type error interactively.

Complex Type Errors
The above program is simplistic and the erroneous data flow is easy to understand.In more complex scenarios, errors can arise from two types flowing into or from the same location, which we refer to as confluence errors.Data flows are further complicated when types flow through constructors.Later in the paper, we detail how we propose to handle these more advanced typing errors in our error reporting system (Section 4).As an early example, consider the following linear algebra program: 1 let move (x , y ) = ( x / 2 , y / 2) 2 let dist (x , y ) = x *. x +. y *. y 3 let move_closer pos = On the other hand, Figure 3 shows HM ℓ 's verbose error report, where the flows of all values and how they pass through constructors is precisely described: This error report introduces new syntax that we explain below.
• The indented parts are those corresponding to nested constructor flows-here, the float value does not immediately flow into an int position, but rather flows into the left-hand side of a pair, which itself has its own flow.This nested flow is important to understand the entire context of the error and is shown in the verbose error reports.
• The labels ?a, ?b, and ?c are placeholders for types.?b is the label for the argument of the move_closer function and it flows into two use sites.It is used as the argument to the dist function which expects a pair.The left type argument of the pair is labelled ? a which flows into a location expecting a float.Label ?b is also used as the argument to the move function which also expects a pair with the left type argument labelled as ?c.Type ?c flows into a location expecting an int.The data flow, presented like this, shows how two incompatible types are being unified causing a type error.
This program represents a more realistic example where composing different functions together can lead to erroneous data flows.

Contributions
Our new approach is based on a theory of type provenance tracking.A key observation is that we have to treat type equality constraints  1 =  2 as asymmetric, since such type constraints are read as information about a value flowing from a source where it has been introduced with type  1 to a usage site where it will be used at type  2 .This asymmetry suggest to look for inspiration from constraint-based inference algorithms for subtyping constraints of the form  1 <:  2 , which are naturally asymmetric.In this paper, we use the algebraic subtyping approach and algorithms developed by Dolan [2017]; Dolan and Mycroft [2017] and simplified by Parreaux [2020a], and combine it with the idea of Gast [2005] to use data flow for explaining error messages.While we use subtype inference to improve the quality of error messages, our approach targets the familiar type theory of Hindley, Damas and Milner as the user-facing type system.
Specifically, we make the following contributions: • A classification system for unification errors based on data flow, where each unification error is assigned a numeric level (Section 2).The classification allows us to speak about Level- errors, and to craft error messages specific for each level.We suggest that it is crucially important to use different textual explanations when explaining type errors of different levels.• A subtyping constraint solving algorithm which reports data flow-based error messages for Level-0 errors, which is very close to the one used in algebraic subtyping but additionally tracks the provenances of types and flows in the program (Section 3).This demonstrates an observation of the algebraic subtyping community that it is easier to create helpful error messages from a subtype-inference-based system rather than from a unification-based one.
• An equality-constraint solving algorithm which reports data-flow-based error messages for Level-n errors (where  ≥ 1).This algorithm is close to unification-based algorithms, but also tracks provenances of types and flows (Section 4).
• A user study to empirically evaluate our error messages and to help us guide further research into improving their quality (Section 6).The experiment compares the effects of HM ℓ , ocamlc, and Helium on programmers' ability to understand and localize type errors.While the quantitative evaluation does not show that HM ℓ provides any measurable improvement over the state-of-the-art, a qualitative analysis suggests the demand for flow-based errors in situations with complex type errors.
We provide an implementation of HM ℓ as an extension of Simple-sub [Parreaux 2020a].Our system type checks a reasonable subset of OCaml features while providing high-quality error messages4 .

CLASSIFYING TYPE ERRORS
Not all unification errors are created equal.By treating them as equal, compiler engineers pass up an opportunity to improve the quality of the error messages that we can generate.Independent of the constraint algorithm we use, eventually the type checker might come to a point where it has discovered enough information to conclude that two incompatible types, such as Bool and Int, should be equal, at which point it emits an error.Following this line of thought, we might conclude there is only one essential kind of unification error, namely that two types are incompatible.
Accordingly, most type checkers only use one textual template to display these errors to the user.The error messages might be enriched with additional information about the typing context in which the error arose, or about the source code region for which it was generated, but the underlying textual template often stays the same.
In this section, we argue that this uniform view holds us back if we want to create great error messages for the user.
To improve error messages that arise from type unification, as a first step, we realize that not all unification errors are the same and introduce a precise classification of unification errors.Based on this classification, as a second step, it is then possible to craft a textual error message for each kind of unification error, instead of using one fits-all template.As we will see, we classify different constraint solving errors using the direction of data flow in the program.

Flow of Types
Let us assume that we typecheck the faulty expression not 1.Traditionally, one would generate a type constraint expressing that Int (the type of the literal 1) has to be equal to Bool (the argument type of the not function).However, we can observe that this information is directed and closely corresponds to the data flow: In analogy to the well-known concept of data flow, we argue that programmers can reason about the flow of types to understand faulty programs.In the above example, we say that the argument type Int flows into the parameter type Bool.It would be incorrect to generate a constraint for this expression which says that Bool flows into Int.Most standard unification algorithms discard this directionality information, since they make implicit use of the rule of symmetry to solve constraints.In these algorithms, the type equality constraint  1 =  2 is considered equivalent to the constraint  2 =  1 .As a first technical insight, we thus recognize that we have to use non-symmetric constraints if we want to preserve directionality information during constraint solving.Luckily, there already is a ready-made notion for non-symmetric constraints: subtyping constraints  1 <:  2 which express that  1 has to be a subtype of  2 .We will see that we can equivalently interpret these subtyping constraints as expressing that a value of type  1 flows into a context which expects a value of type  2 , and that this reading is independent of whether we consider a system with subtyping or without.

Change of Direction
The flow of types provides us with a different explanation model for type errors.In our example above, the flow was excessively short.In realistic programs, the distance between the point where a type is introduced (like type Int) and the point where it collides with a different expected type (type Bool in our example) can be arbitrarily large.
Furthermore, type errors are not only introduced when one type flows directly into another incompatible one, but also if two incompatible types flow into a single location.This is the case in the following example, where both Int and Str flow into the result type of the conditional expression: if true then 5 else "hi" We refer to these type errors as confluence errors.When type checking a program like the one above, we would gather the two constraints Str <:  ret and Int <:  ret , where  ret is a unification variable corresponding to the result of the conditional.While there is an obvious type error in the program, just given the constraints we cannot immediately derive an inconsistency.In order to do so, we would have to invoke the rule of symmetry.As a second technical insight, we observe that invoking symmetry corresponds to a change of direction in the flow of types: Str <:  ret :> Int following the type flow from Str to Int, we can notice that it reverses direction once.Generalizing this observation, we present the following classification of type errors.
Definition 2.1.In a Level- unification error, the derivation of the contradiction has the form of a chain of subtyping constraints  <:> . . .<:>  ′ (with  ≠  ′ and <:> denoting either <: or :>), where the direction of the subtyping constraints changes  times.Each change of direction corresponds to a reversal of the data flow which has to be explained to the user.While the rule of symmetry allows HM type inference algorithms to ignore this information about data flow, retaining it is important to properly explain the cause of the type error.
To see why this classification is useful, in the remainder of this section, we will consider concrete examples containing errors of various levels in Figure 4 and the corresponding error messages that we generate.

HM ℓ
[ ERROR ] Type `int `does not match `bool ` ( int ) comes from | -l .1 let x = 2; ▼ ^ ( bool ) comes from -l .2let y x = if x then true else false ^Fig. 5. Level-0 error.

2.2.1
Level-0 Errors.The snippet in Figure 4a contains a Level-0 error.In this example, we have to introduce one unification variable   for the let-bound program variable , and two constraints Int <:   and   <: Bool.The first constraint expresses that the type Int, introduced by the literal 2, flows into the variable , while the second constraint expresses that the type of the variable  flows into the condition of the if-then-else expression which expects a boolean.
These constraints are presented in the corresponding graph as arrows, with the direction of the arrow corresponding to the flow of data through the program.From these two constraints we can deduce the inconsistency Int <: Bool without having to reverse the data flow in the constraints.This means that we can directly explain the error as the flow from one type to the other as shown in Figure 5. Level-0 errors allow for a good textual explanation in error messages, since we can point to a location in the program where a value of a certain type was introduced, follow it through intermediate bindings and point to a position in the code where a different type was expected.Programs containing such Level-0 errors should always be rejected by a typechecker, since executing such programs would result in type mismatch errors at runtime (i.e., non-value terms getting stuck and not reducing further).Such programs are therefore rejected by both systems which support subtyping and systems which do not.

2.2.2
Level-1 Errors.In Figure 4b we have two snippets which both exhibit Level-1 errors.In the first of these snippets we have an if-then-else expression with incompatible cases for the if and else branch.During constraint generation we would generate a unification variable  ret for the return type of the if-then-else expression, and two constraints: The constraint Int <:  ret for the if branch and the constraint Str <:  ret for the else branch.But taken together, these constraints are only contradictory if we reverse the data flow once in the chain Int <:  ret :> Str. Figure 6 shows the error message.
In a system with subtyping and union and intersection types it would be possible to assign the type Int ⊔ Str to the expression.This shows that it is not strictly necessary to reject this expression, since the evaluation of this expression cannot lead to type unsoundness in itself, as long as its context can handle both an integer and a string.
The second example in Figure 4b exhibits a different Level-1 error.Here we have to generate a unification variable   for the lambda-bound variable , and two constraints for the two different uses of  in either side of the tuple.The error is presented in Figure 7.A system with support for subtyping could assign the type (Int ⊓ Bool) → (Bool, Int) to the expression, i.e., a function which can only be called with a value which can act as both an integer and a boolean.2.2.3 Level-2 Errors.If we combine the two snippets from Figure 4b we obtain the example from Figure 4c which exhibits a Level-2 error.Here two unification variables,   and  ret have to be generated, together with three constraints, and we have to change the direction of data flow twice to obtain the inconsistency between Bool and Int.We conjecture that these kind of errors will already be quite rare in practice, even more so for errors of even higher levels, and even for a human it is no longer clear how the best error message should look like in this case.But the algorithm that we present is still able to provide an explanation, mentions all the essential information, and is much more informative than what other implementations provide.We show the error message for this example in Figure 8.

Reporting Different Levels
How do different type inference algorithms deal with these different errors?
Algebraic subtyping algorithms like MLsub usually only report Level-0 errors, and not any of the higher-level-errors ( > 0), as these are not considered errors in this typing discipline.This is why it has been remarked, in the subtyping literature, that it should be easier to generate good error messages for a system based on subtyping-the error messages only have to explain a linear and obviously problematic data flow of information through the program.
Standard unification algorithms, on the other hand, have to account for Level- errors for arbitrary , since symmetry is always valid for equality constraints.However, since these algorithms usually do not track the reversal of the direction of data flow in the constraint solving process, the same textual explanation is used for all unification errors, regardless of their level.
Our classification now allows us to design detailed and specific error messages for both systems with and without subtyping.If we have a system with subtyping, we only recognize Level-0 errors as proper errors, and display and explain them accordingly.We describe how to do this in Section 3. If, on the other hand, we are interested in a system which recognizes the same errors as a standard unification algorithm, then we have to recognize and explain errors for all levels.In Section 4, we extend the algorithm from Section 3 to emulate a standard unification algorithm, and recognize errors for all the levels.However unlike standard unification algorithm, we keep track of when the direction of data flow is reversed and report the full data flow for an error.

FORMALIZATION
In this section, we make the intuitions described in the previous sections formally precise.There are two important properties of type inference in ML-style languages we omit in this section: let-generalization and the occurs-check.We expect that both features will integrate well with the approach we have described in this paper, but leave the details for future work.

Terms and Locations
Figure 9 defines the syntax of terms .The presentation is fairly standard, but since we want to track the flow of information through the program, we need a way to refer to the locations of subexpressions within the program.For this reason, every subexpression and every binding site of a variable is annotated with a program location ℓ.In this article, we do not commit to any actual representation for program locations, but in our implementation we choose one based on line and column number ranges.We also omit these locations in examples and explanations, or whenever they are not necessary.
Syntax of Terms.Terms themselves consist of variables , the unit constructor unit, integer literals  and integer addition  + .Booleans are constructed with literals true and false, and eliminated using the conditional if  then  else .Functions are introduced using lambda abstraction  .and eliminated using function application  .Pairs are constructed using the pairing constructor [, ] and deconstructed using projections  1 () and  2 ().Sums are constructed using injections  1 () and  2 (), and deconstructed using the pattern matching construct case  of {  1 () ⇒ ;  2 () ⇒  }.

Types and Provenances
Where terms are annotated with locations, types are annotated with provenances (defined in Figure 9).These provenances  explain why a certain type is used at a specific point in the program, and they are recorded and recombined during the type inference process.Provenances are also used to report errors; in that case they explain the flow of information through the program that led to the mismatch of two types.A provenance records a linear path through the program, so we have an operation • to concatenate two paths, and its unit  standing for the empty path.Provenance concatenation is taken to be an associative operation where the absent provenance  is taken to be the empty element.Therefore, for example, ( 0 •  1 ) •  2 is the same as  0 • ( 1 •  2 ) and is simply written We also use locations ℓ in provenances, to record specific points in the flow of information through the program.We will introduce and motivate the remaining syntactic forms of provenances in Section 3.3.3,where they are used in the constraint solving process.
Syntax of Types.The type forms are standard.We have type variables , the unit type 1, and primitive types Int and Bool.We have three binary type constructors referred to as ⊙: the function type →, the product type ⊗ and the sum type ⊕.As mentioned above, these are all annotated with provenances .Just as with terms, we will sometimes omit these provenances in examples and explanations.In order to show how the different parts of an annotated type correspond to different parts of the information flow, we consider a very simple example.
Example 3.1.The inferred type of the term The above example shows that in the inferred type, the top-level provenance ℓ 3 only contains the information of the flow explaining the outermost type constructor _ ⊗ _.It does not contain the information about the provenance of its arguments.These provenances are annotated in the arguments to _ ⊗ _, namely Int ℓ 1 and 1 ℓ 2 .
. Syntax of terms and types.

Type Inference
Our type inference algorithm is an extension of a particular implementation of algebraic subtyping, due to Parreaux [2020a].For readers who want to grasp the full extent of the algebraic subtyping technique and the associated proofs of correctness, we suggest referring to the relevant literature [Dolan 2017;Dolan and Mycroft 2017].Note that locations and provenances do not affect type inference; in the algorithm presented in the following sections, it is easy to see that erasing all mentions of locations and type provenance tracking from our algorithm results in the algorithm by Parreaux.Since algebraic subtyping accepts strictly more programs than Hindley-Milner type inference, it follows that all programs rejected by the algorithm in this section are also rejected by Hindley-Milner.However, programs which only exhibit level-n errors (with  ≥ 1) are accepted by the algorithm in this section, but rejected by Hindley-Milner type inference.Since the goal of our approach is to improve the quality of error messages, and not to change the set of accepted programs, we add an additional phase described in Section 4 so that the same programs are accepted by our algorithm and the standard unification based algorithms.
The type inference algorithm consists of three parts: The algorithmic inference rules, discussed in Section 3.3.1, the constraint solving algorithm, discussed in Section 3.3.2,and the computation of subconstraints, discussed in Section 3.3.3.This type inference state consists of the lower and upper bounds for each type variable, and a list of errors that were generated during type inference.To focus on the essential aspects of a rule, we fade out the state  when it is only threaded through.

Algorithmic Inference
The top-level judgement ∅  ⊢  :  , also written ⊢  :  , tells us whether a term  is well-typed: if there exists a  such that err  ∈ , then we say that  is ill-typed and  is a type provenance chain highlighting one of its type collision errors; otherwise, we say that  is well-typed.
We rely on the usual informal notion of freshness for type variables- is "fresh" if it does not appear anywhere in the previous values of , nor in the previous parts of the premise of a typing rule.We introduce two helper functions 'add-lb' and 'add-ub' to add a type, respectively, to the upper and lower bounds of a type variable in the type inference state.For e.g.add-ub( 0 , , we introduce helper functions 'lb' and 'ub' to look up respectively the upper and lower bounds of a type variable in the type inference state. C-Cache This rule allows to immediately solve a constraint if it has already been encountered in the constraint solving process, and is therefore contained in the set of hypotheses  .This is necessary to avoid divergence in the presence of recursive types. 5We define reset( 0 ,  1 ) as the substitution, in ( 0 ,  1 ), of all type provenances by .This is used to ensure that type provenances do not affect the memoization of the constraining function.
C-Refl A constraint between two equal types can be solved immediately.When we check for equality of types, we do not care for provenances.For this reason, we apply the reset function before we compare them.
C-Var-L (and similarly for C-Var-R) When we encounter a constraint  <:  between a unification variable  and a type  (which is not a unification variable) we have to do two things.We first add  to the set of upper bounds of  in the state .Then we generate and solve one additional constraint between  and all existing lower bounds for  in .
C-Var-LR When we encounter a constraint  <:  ′ between two unification variables, we add  to the lower bounds of  ′ and  ′ to the upper bounds of  before we generate the subconstraints to verify that the bounds are still consistent.

C-Sub
When the constraint we have to solve is complex, i.e. neither of the two types is a unification variable, then we invoke the function sub in order to compute the subconstraints of the constraint.If this function returns with a set of new constraints, we solve them in turn.

C-Error
If the function sub returns with an error, the returned  ′ is populated with err  elements containing the provenance chains  corresponding to the error.

Computation of Subconstraints.
The computation of subconstraints is defined in Figure 12.The function sub() takes a constraint  as input, and either computes a new list of constraints to be solved, or otherwise returns an error err  containing a provenance if the constraint cannot be solved.We return new subconstraints if the types in the constraint are either both function types, both product types or both sum types.In that case, we also have to recombine the provenances of the types which are involved in the constraint, in order to track how a data flow can be tracked through a constructor.This is where the additional provenances which we introduced, but didn't explain, in Section 3.2 come into play.We write ⟨⟩ ⊙  and ⟨⟩ ⊙  where the L and R indicate if the provenance comes from the left or right hand side of a constraint on a constructor type ⊙.We use the notations   0 •  1 and  0 •   1 as shorthands for   0 • 1 .In every other case, that is, if the outermost types of the two sides of a subtyping constraint are not identical, the constraint is not solvable.
When we compute the subconstraints of two function types, we use the function rev() on provenances which yields a type provenance with the same contents, but in reverse order.Reversal applies recursively, meaning that it also reverses the order of provenances nested inside constructors like ⟨⟩ →  .The use of this function spells out a curious phenomenon: passing through a function parameter reverses the direction of a type flow, switching from flowing into to flowing from, or vice versa.To illustrate this subtlety, consider the following program annotated with the relevant locations ℓ 1..8 : The problematic flow here is 'ℓ 5 Here is how to understand it: The string literal at ℓ 5 has type string; it flows into the parameter of function g at ℓ 4 (hence the , which denotes the left-hand-side of a function type, so we reverse the flow direction); where g itself flows from let-bound identifier g at ℓ 2 ; from parameter reference f at ℓ 3 ; from parameter f at ℓ 1 ; from the argument function at ℓ 8 ; and (leaving the function type and reverting back to a forward flow) into parameter x at ℓ 6 ; into reference x at ℓ 7 ; where type int is expected.
Notice how this flow starts as a normal forward flows but reverses to a backward one upon entering the left-hand side of a function type before going back to a forward flow as the flow leaves the function type.Naturally, the complete flow information described above is far too verbose to report directly to users.We found that a good tradeoff (which we use in our tool) is to only report the outer flow 'ℓ 5 • . . .• ℓ 6 • ℓ 7 ' and reserve the full flow for the verbose mode and (in the future) for interactive type error exploration and IDE integration.
Finally, notice that all the type constructors used in this section are variant: product and sum types are covariant in their components and functions are contravariant in their parameters and covariant in their results.In a system with subtyping, it is always possible to separate the covariant and contravariant uses of type parameters, so that this pervasive variance is generally feasible.However, in ML languages like OCaml, some type constructors are defined as invariant, such as mutable references.To handle these, we need a notion of non-directional unification, which is studied in the next section.

TYPE CONFLUENCE ERRORS
The algorithm presented in Section 3 only recognizes and reports Level-0 errors.Now, we extend it to also report Level-n errors for  ≥ 1, by tracking data flows described by constraints, as explained in Section 2.1., and we abstract over these relations with the symbol •.We already discussed in Section 2.1 that constraints represent data flows; we can therefore embed constraints into data flows: constraint If a data flow has the form  1 • . . .

Unification Algorithm
The unification algorithm is specified in Figure 14.We start with function uni()  , which takes a type inference state  and recurses over its bounds through the unification function uni( )   , where  is a data flow and  is the current set of hypotheses.This function equates the first and last types of a data flow and terminates with an error if they are not equal.A piece of global state could be threaded through the inference rules describing this function to collect all incorrect data flows; however, we omit this to keep the algorithm's specification concise.It is enough to see that, given a derivation of this function, we can gather all unification errors by collecting all uses of the U-Error rule.
We say that type inference state  is saturated when for all  and  ′ we have  ∈ lb(,  ′ ) ⇐⇒  ′ ∈ ub(, ).
Helper function 'saturate' saturates its input state in the obvious way.
U-state This rule is the entry point for unification.We first saturate the type inference state.Then, for each type variable  in state , we unify it with its upper and lower bound types .
U-Cache A unification is solved trivially if it is already cached.We use 'reset' to erase the provenance of the types before looking up the cache, where reset(,  ′ ) = (reset(), reset( ′ )).This cache is necessary for the same reason as the cache in Section 3.3.2,i.e., to prevent divergence in the case of cyclic bounds.However, since unifying two types is a symmetric operation, we now look up both subtyping directions in the cache (i.e., both reset(,  ′ ) and reset( ′ , )).
U-Refl A unification between two types that are equal modulo their provenance is solved immediately.We use reset to erase the provenance information before comparing the types.U-Error If U-Sub returns an error for an incorrect data flow, this rule terminates the algorithm.The actual algorithm can be implemented by threading through a state and collecting all such errors before reporting them.
Notice that because we look up both pairs (,  ′ ) and ( ′ , ) in the cache, we will only ever traverse a given data flow between two variables in a single direction only, instead of potentially traversing the data flow once in each direction, which could otherwise happen in the presence of type variable cycles, such as  <: ,  <: .Long cyclic chains could lead to a potentially exponential amount of erroneous data flows, which we avoid this way.Moreover, our concrete implementation performs a breadth-first search while following the algorithm of Figure 14 in order to find the shortest erroneous paths, and then stops the search without traversing further constraints.From our own observations, this seems to make inference fast in practice. 6We only report one erroneous path to the programmer to explain a given error, even when many possible paths (including several shortest ones) are available.This choice can be considered arbitrary, as there is no reason to consider that one particular path should be better at explaining the error than the others.In the future, different approaches to report such equivalent paths could be studied: We could report all paths, explain cyclic data flows separately to programmers, or use heuristics to determine which path will be more understandable.

Detailed Error Reports
We create error reports from the data flows of failed unifications.Each data flow is transformed into a sequence of source locations separated by alternating data flow directions, encoding the back-and-forth nature of higher-level error, and possibly containing nested flows when the problematic unification is indirect and goes through type constructor arguments.Additional type information in the data flow is used to add helpful details to the report.The data flow information can be used for interactive error reporting, interactive code exploration, code hints in IDEs to name a few.
Our current implementation only reports errors and we share examples in Appendix B.
Example 4.1 (Level-0 Data Flow).We can visualize a data flow as a type flowing through a sequence of program locations.A constraint is the most basic data flow.We can translate the subtyping constraint   0 <:   1 into the data flow  <:  0 • 1 .After unification,  and  must be the same type, so we can say that the data represented by the type flows through the locations  0 and  1 .This can be visualized by the following diagram: We annotate the second program from Figure 4b, which yields the following annotated program: For this program we generate the constraints We get a unification error for Int <: Notice that invoking symmetry for Str relation to creates a linear data flow.This flow looks like this: Example 4.3 (Level-2 Data Flow).Similarly, we annotate the locations for the program in Figure 4c which yields the following annotated program: We generate the following constraints and unification error for   and   7 .The constraints are Int ℓ 5 <:

𝑖𝑡𝑒
and  ℓ 1  <: Bool ℓ 2 .We visualized the unification error Int <: Bool in the following way: Example 4.4 (Constructor Data Flow).We introduce the following program to demonstrate a nested data flow: , " true " ) ℓ 6 else ( " false " ℓ 5 ," false " ) ℓ 7 ) ℓ 3 We consider constraints Unification gives an error for Bool <: We elide the right type argument and irrelevant provenances for clarity.It is visualized in the following diagram: Notice that in this section, we now unify constructor arguments using the non-directional symbol ∼ instead of the directional <:.This is appropriate because some type constructors can be invariant in OCaml, and even variant constructors use unification semantics during type inference anyway.But since product, sum, function, and other types can still be considered variant even in ML (because their type parameters are used only at one polarity in their definitions), this can still be used to construct properly directional flows in type error explanations.For instance, notice that in the diagram above, we use directional arrows in all the edges of the graph, because ⊗ is covariant in its first argument.We would use a non-directional edge if the type constructor was invariant, for example, if it had been a mutable reference.We can reconstruct the directionality of these variant flows by inspecting the nature of the type constructor in the nested ∼ ⟨ ⟩ ⊗|⊕|→ | unification forms.
We also include a one-line flow summary (or outline) in the survey error messages.It was omitted from the introductory examples as it is not essential.At a glance, it shows the user a high-level overview of the erroneous flow by stripping away the location info and using ASCII symbols to show the flow direction.Figure 15 shows flow summaries for the examples discussed above.
7   is the type for the if-then-else expression.

LET POLYMORPHISM
In this section, we introduce the notion of polymorphism level to support let-polymorphism in HM ℓ , following Rémy [1992], who created this technique for the Caml/OCaml compilers [Kiselyov 2013].8 We extend the syntax in Figure 16, to include let bindings and Substitutions which map type variables to type variables.
Polymorphism levels are natural numbers and the levels field in the typing state maps a type variable to a level.We extend the typing context to store a type  of a polymorphism level  as ∀  .We also introduce a few helper functions on the typing state, namely: • new-lvl( ↦ → , ) =  ′ : New typing state  ′ has  ↦ →  added to it or updated, if  already associated a level to . • ty-subst(, )  =  ′ : New state  ′ has all substitutions applied to old state  as described in Figure 17.For each substitution  ↦ →  ′ , all the constraints of  are duplicated where  is substituted with  ′ while preserving provenance information.A level mapping  ′ ↦ →  is also added to the new state.
• lvl()  = : The polymorphism level of the type  is  for a given state .The semantics is described in Figure 18.
A primitive type has level 0 and a constructor type's level is the maximum level of its constituent types.Type variables are associated with their own polymorphism levels.We use record-dot style syntax like .bounds to access the bounds associated with the state.
We extend the typing rules, with the notion of polymorphism level.All top level let bindings and terms start with polymorphism level 0. Figure 19 shows only the relevant rules.
T-PolyType This rule looks up a polymorphic type from the typing context and freshens it.freshen collects a substitution  which is applied to typing state .In practice, this operation only affect types whose polymorphism level is lesser or equal to the current level .T-Let This is a new rule that types let expressions.It increments the polymorphism level before typing the let-bound expression.
T-Let-Rec This is rule types let-rec expressions and is similar.It increments to the T-Let rule.The key difference is that it creates a fresh type variable  for the binding.19.Extended type inference rules for let-polymorphism.
The semantics for freshen are described in Figure 20.Freshening a type means traversing it and mapping all the type variables of lesser or equal polymorphism level to fresh type variables.This creates a substitution  which when used like ty-subst(, )  specializes the type up to polymorphism level .Most constraining rules are unaffected by polymorphism level.However, we add a refinement in Figure 21.

C-R-Extr
The type variable is constrained with a type of higher polymorphism level, as its upper bound.The type is extruded to the level of the type variable before proceeding with the constraint.

C-L-Extr
This rule is similar to C-R-Extr but for type variable lower bounds instead of upper bounds.
A type behaves polymorphically only when its polymorphism level is higher than that of the typing context.In such cases, freshen clones the type upto the level of the typing context.However, we don't want it to behave polymorphically below its level.To ensure this, we use extrusion described in Figure 22.It reduces the polymorphism of a type variable upto a level.It also does this for all its upper and lower bound types, including those reachable transitively.The extrude function also maintains a cache for visited type variables to break cycles.
, cons( The function , initially, appears polymorphic and the program appears valid.However, in the body of  the parameter  is applied to  which is introduced by  .When typing the expression , we encounter a constraint like  0 cons(  <: (  →  ′ ))  1 where   ↦ → 0,   ↦ → 1 ∈  0 ,  ′ is some fresh type variable, and all other symbols have their usual meanings.This constraining rule extrudes   to polymorphism level 0 with C-R-Extr rule.Now   is no longer polymorphic at level 1 and the program is no longer valid.In Figure 24, we see the HM ℓ for the program.

EXPERIMENTAL EVALUATION
In this paper, we hypothesize that flow-based type error messages are more effective than location-based type error messages in helping programmers understand errors.Traditional error messages only provide one of the possible locations related to each error, which can be useful to understand where a program goes wrong but is often insufficient in let k = g 1 in g " hi " ▼ ^^^^ (? a ) is assumed for ^Fig.24.Error message for incorrect usage of function .
to understand the full context of the error.In contrast, our proposed system presents the flow of types, which intends to help understand how a program is goes wrong.Traditional systems thus focus on locality, while we attempt to facilitate causal reasoning, which we believe is essential to effectively understand and repair errors.
We conduct a randomized quasi-experimental study to validate our hypothesis.The experiment compares the understanding of programmers using flow-based type error messages with those using traditional location-based error messages.
In the user study, we ask participants to understand and describe program errors, measuring a) the perceived satisfaction of the participants with the provided error messages and b) whether the participant sufficiently understood the error.

Experiment Design
The experiment has been conducted in form of an online survey (using lab.js [Henninger et al. 2021] and hosted on Open Lab [Shevchenko 2022]).After a demographic questionaire and a short introduction to OCaml, participants were presented with a series of errornous OCaml programs.Provided with the program and an error message (side-to-side), participants were invited to understand the error using the provided error message followed by a series of questions: Q1 "In your own words: what is the problem in the program above?" Q2 "How much did the error message help you to locate the problem" Q3 "How much did the error message help you to understand the problem" The first question (Q1) was asked as free form text asking participants to keep the answer short.The remaining two questions (Q2 and Q3) were answered on a five-point Likert scale ranging from "Not helpful" to "Very helpful".The survey ended with a single optional free form text field asking users "Is there anything you want to tell us?".
Conditions.When starting the survey, each participant was randomly assigned (drawn without replacement) to one of three conditions: (A) HM ℓ -our implementation of HM ℓ , extending SystemSub with flow-based type errors.
(B) OCaml -as a first control group, we compare against the standard OCaml compiler.
(C) Helium -as a second control group, we compare against Helium [Heeren et al. 2003].
Helium is a compiler for Haskell, not OCaml, but the subset of programs we consider can be easily translated from OCaml to Haskell.We therefore translated the OCaml examples to Haskell by hand, used Helium to generate an error message, and translated the types contained in the resulting error message back to use OCaml Syntax.
Selection of programs.We prepared ten different ill-typed example programs that we manually ranked as "easy" (3), "medium" (4), or "hard" (3), see Appendix B for full details of the example programs.Programs labeled as "easy" were constructed specifically for the study, while examples labeled "medium" or "hard" were selected from the datasets shared by [Seidel et al. 2017].From each of the three categories, each participant was presented with two randomly sampled example programs; that is, six programs in total.While the order of examples within each category was random; the categories itself have always been presented in ascending complexity.

Participants
We shared the online survey with professionals and researchers by posting it on relevant online platforms (such as Reddit and Twitter).Participants were asked to self-estimate their experience in programming in general, functional programming, statically typed programming, and programming in OCaml on a 5-point Likert scale [Feigenspan et al. 2012].From a total of 455 participants, 318 started and 119 concluded the survey (40 HM ℓ , 39 OCaml, 40 Helium).We manually excluded two participants (both in the OCaml condition) since they visibly did not invest enough effort to answer the questions.Of the 117 non-excluded participants, 70 assigned themselves an expertise ≥ 4 for "functional programming" or "OCaml".Only 14 participants did assign themselves an expertise ≤ 3 for all categories.

Evaluation
The study sets out to analyse whether the error-message condition (HM ℓ , OCaml, or Helium) significantly influences a) the perceived usefulness of the error messages, as well as b) the understanding of the programming error.We first discuss the evaluation of perceived usefulness (Q2 and Q3) before discussing the evaluation of the open question (Q1).
After data collection, we realized a mistake made while hand-translating Helium error message for hard2 back to OCaml types.It left a Haskell formatted list type, which might have been confusing to participants and led to poor responses.Thus we exclude example hard2 from our evaluation.
6.3.1 Perceived Usefulness (Q2 and Q3).For each of the ten individual example programs, and each of the three conditions (A), (B), and (C), Figure 25 presents the results for the perceived usefulness reported by participants.The five-point Likert-scaled data is visualized as stacked bar charts, where the lowest (dark-blue) component corresponds to the answer "Very helpful", and the highest (dark-red) component corresponds to the answer "Not helpful".
Q2: "How much did the error message help you to locate the problem?" Q3: "How much did the error message help you to understand the problem?" Fig. 25.Participants answering the respective question on a five-point Likert scale from "Not helpful" (top, red) to "Very helpful" (bottom, blue).We compare conditions HM ℓ (A), OCaml (B), and Helium (C).To determine which groups are different, we perform a post-hoc Dunn test [Dunn 1964] for each of the significant example programs.A Bonferonni adjustment of the -value is used to account for the error rate introduced by the pairwise comparison.For easy2, we have significant differences for − with −2.934363 ( = 0.0050).That is, participants reported messages of HM ℓ to help less with locating than those of OCaml.For easy3, we have significant differences for − with −3.2893 ( = 0.0015) and − with −2.7449 ( = 0.0091).That is, participants reported messages of HM ℓ to help less with locating than those of OCaml or Helium.For medium2, we have significant differences − with −3.018982 ( = 0.0038).That is, participants reported messages of HM ℓ to help less with locating than those of Helium.
Understanding the problem (Q3).Similar to Q2, we perform a Kruskal-Wallis test, however there is no significant difference between the results of the three systems.

Understanding Errors (Q1).
To evaluate the open question Q1, we manually assigned a binary grade to the provided textual answers, judging whether the participant understood the underlying problem or not.Emphasis was given on participants' understanding of erroneous program expressions and the fixes they suggested.To avoid biases, the manual coding was performed blind-that is, the condition of a participant during evaluation was hidden to the Q1: "In you own words: what is the problem the program above?" Fig. 26.Percentage of participants that correctly described the problem.We compare conditions HM ℓ (A), OCaml (B), and Helium (C).
grader.The same grader graded the responses from all participants.Figure 26 reports the results for the three conditions, again grouped per example program.

Interpretation
Our experiment yielded little statistically significant difference between respondents using different tools to locate and understand errors.In general, the study cannot show that HM ℓ improves understanding or localization of problems.
In a few cases, other systems even performed significantly better than HM ℓ .For Q2, which measured the perceived value of the error message in locating the error, easy2, easy3, and medium2 showed significant results.In all three cases, participants reported that HM ℓ error messages helped less in locating the problem.All three programs (especially easy2 and easy3) are comparatively small.Constructing (and consuming) a detailed data flow explanation that is longer than the programs might not pay off in complexity.For Q3, which measured the perceived value of the error message in understanding the error, no results were significant.
A qualitative analysis of the freeform feedback provided by participants reveals interesting insight into why HM ℓ might have performed worse than OCaml and Helium in locating errors.
Errors are too verbose or unnecessary for small programs.Many participants reported the HM ℓ errors to be excessively verbose, which was detrimental to understanding errors for small programs.
The earliest examples were so trivial I could've figured them out without error messages.
However, some participants also reported that they were indeed useful for longer problems.
I felt that the error messages were more helpful on the longer problems, only because I didn't need to look at them on the shorter problems.
The problem of errors that are too verbose could be remedied by designing better layouts or using heuristics to hide some of the extra locations for simple programs.Furthering the previous point, in many of the simple cases (in easy and medium programs), the respondents were frustrated by the comparatively large number of locations reported by HM ℓ and instead just wanted to see "the" location where the error happened.We conjecture that it would be helpful to combine our work with previous research on identifying single error locations and provide an "incipit" of sorts before the precise data flow-reporting a single location, which is often enough to understand simple errors.This way, programmers would only look at the precise data flow when they need to obtain more context and understanding for the root causes of the error.In an interactive setting the additional information provided by flow errors could be presented on request of the user.While the participant's concerns are important and it would be interesting to investigate more compact representations, real-world programs are almost always larger than the programs surveyed in our study.
Error message layout and notation.Respondents also complained about the error message layout as well as other aspects of the used notation, including the data-flow arrows, line number formatting, ellipsis for large code and so on.
Many of the error messages I was shown seemed upside-down, in that they showed what I believe to be the source of the error at the bottom, with a stack of more-removed sources of incompatible constraints extending upwards.This might look better coming from a CLI, but in this format it was weird and unhelpful.
However, other partipants appreciated the structure of HM ℓ 's error messages: In general, I really like the detail and consistency in the error messages.This really helps with solving more subtle errors, but it also adds a lot of noise for more simple issues.I feel like this isn't a real problem since even a little experience will allow you to immediately identify the issues at a glance of the error message by looking at the right things.One possible improvement could be to mention (and preview) the offending statement with the relevant parts marked before going in depth on the breakdown of the type interpretation.

As it is now, one part of the breakdown hides the other conflicting part of the statement
There is ample opportunity to improve all aspects of the presentation of error messages, including the data-flow summaries, the textual explanation, and the graphical layout.For each of the components, the usefulness is still unclear, the concrete design can be improved, and the interplay between the different aspects needs to be studied.More research and testing is needed to develop effective error message layouts for data-flow style One factor that might have contributed to the confusion of participants is that many of our respondents were experienced practitioners who were accustomed to message layout from standard tools.We also conjecture that experienced programmers already developed a deep understanding for how type-checking in OCaml proceeds and thus learnt to infer information from existing error messages.It naturally requires some time to adapt to the new format of error messages, as also one participant described: The error message has all the information, it takes some time to get used to it though.
Similarly, many respondents found the unification variable shown in HM ℓ messages unhelpful.The unexplained "?a" notation used for unification variables combined with the new concept of data flows was confusing to these participants.
We conjecture that a gentle introduction to HM ℓ 's notation, as well as familiarity built over time, could potentially remedy these issues.
Flow-based reasoning.The work presented in this paper builds upon the assumption that the underlying mental model (also called "notional machine" by du Boulay et al. [1981]) of flow-based reasoning is natural and can help programmers to understand and locate error messages.A possible interpretation of the below user feedback is that this assumption is false.
I honestly find the 'int --> ?a <--bool' notation quite confusing.It is useful in some cases where there is no obvious expected or actual type, but in cases where the unification variable is unnecessary, it adds quite a bit of unnecessary mental overhead.
Maybe understanding how values (and types) flow through a program does not contribute to the understanding of type errors.Maybe guiding users along the data flow is not helpful afterall, since they could also follow the data flow themselves without the overhead of processing verbose error messages with positions marked by unification variables that are not part of the original program text.Our work aims to support users in this process, which only makes sense if the process itself is practically useful.
No specialized messages.OCaml and Helium had specialized messages for certain errors.Messages like "function applied to too many arguments" (hard1) and "expected type . . ." seem more helpful than just data flow information.Of course, HM ℓ could be specialized to also add such helpful text to error messages, which are orthogonal to the idea of presenting data flow.
6.4.1 The Need for Detailed Explanations.While some participants remarked that the messages of HM ℓ are too verbose, those in the control groups often remarked the opposite about messages generated by OCaml and Helium.
I think in many of the examples it would be helpful for the errors to explain where the constraint was introduced that we are hitting [...].

Another participant recognized the difficulty of extensive inference:
Presumably some of the harder ones were caused by extensive inference throughout the code.It might help to show and look at multiple errors in such cases.Perhaps there's a way to group them, but this is hypothetical.
Yet another participant would like to see more type information of the different components that constitute a problematic call.
The error messages are not at all clear about what the expected and what the found type is.Also, it is not clear why it believes the found type is the found type.Maybe it could show the different elements that are applied to the function separately as well?
Participants were also aware of the difficulty to trade-off concise error messages and sufficiently detailed information.
Almost every type error in a program is an accumulation of multiple (smaller) erroneous parts.Any such error that states "something happened exactly here and here" is incomplete, because there's far more context that should be included to fully understand the error.Some languages show the entire chain of type unification that led to the error, but that's rather verbose.I hope one can find a solution that shows just enough context to be perfectly helpful.
The above quotes are only a few examples of the feedback that we received by the control groups.In many cases, respondents want to know source expressions for the two conflicting types, instead of the singular location where the error was triggered.They also want to see surrounding source code in the error message to gain context.HM ℓ 's error messages address exactly these two concerns.We interpret this as support our hypothesis that dataflow-style error messages could be useful for programmers.
Overall, the quantitative analysis cannot show that HM ℓ improves over ocamlc or Helium.The qualitative analysis suggests that this might be due to the error message layout and notations or the verbosity.However, some respondents found the verbosity and context helpful as well, particularly for large programs and subtle errors.HM ℓ 's contribution is the detailed data flow information, not the specific error message layout.However, HM ℓ in its current form is certainly not perfect and respondents point out a few shortcomings, such as the exact textual representation of errors.We believe, in future work can well utilize it to design better error message layouts, code exploration tools, and IDE intellisense features.

Threats to Validity
Internal validity.For most of the tasks, the results were insignificant.Potentially, programs in the used corpus were not complex enough to measure differences between the systems.The study was conducted remotely with no control over the context.Participants might have been distracted while answering, spent different amount of time (mean = 32min, sd = 22min), and have used different devices.We used the browser's user agent to identify mobile devices.We were initially concerned that mobile users may have a harder time taking the survey and may subsequently provide a worse quality of responses, but all open-text answers provided by the 14 participants on mobile devices were of a high quality, so we decided to keep them.Participants might not know enough OCaml to understand the errors, which we tried to address by presenting each participant with a one-page introduction into the relevant OCaml concepts.Participants might recognize the different styles of error messages from different systems.We tried to limit the influence of this bias by fixing one participant to one specific condition.However, experienced OCaml programmers reportedly recognized the errors generated by Helium and our system to be non-standard.Participants might be biased in favour of our system, since it is socially desirable: firstly, it is always interesting to see a new tool that could turn out an improvement and secondly, because participants might be fellow researchers and practitioners that want to support research in the field.Participants were not trained on how to read HM ℓ error messages.Unknown notation and unconventional layout of error messages might have confused participants.External validity.Our example programs were sourced from university students solving programming assignments [Seidel et al. 2017].These may not be indicative of the broader range of programming practices prevalent at large.The results of the study might not carry over to languages other than OCaml.Our approach can be used with all languages that implement Hindley-Milner-style type inference.However, we chose OCaml because of extensive empirical analysis already done on OCaml error localization by other researchers [Geng et al. 2022;Zhang and Myers 2014] and because of the relative simplicity of the language's base constructs.For instance, compared to Haskell, the lack of type classes in OCaml makes it easier to quickly introduce the language to beginner and novice programmer before starting the survey.Conveniently, previous research made available large datasets of ill-typed OCaml programs ranging from low to high complexity.

RELATED WORK
Algebraic subtyping.Type inference for systems with subtyping and parametric polymorphism is a known hard problem.We build upon the algebraic subtyping approach developed in [Dolan 2017] and [Dolan and Mycroft 2017].
More precisely, we build upon the more recent publications [Parreaux 2020a] and [Parreaux and Chau 2022].
Explaining type errors with data flow.The approach to explaining type errors using the data flow of the program is very reminiscent of the similar approach by Gast [2005].It describes an algebra based on subtyping constraints and defines a "consistency" relation between types.This is similar to how we unify the bounds of type variables to find flow errors where incompatible types flow into or flow from the same type variable.Neubauer and Thiemann [2003] use sum types to encode all the types an expression can be and flows sets to track the locations each type can flow through.
Our work builds on this by formally categorizing different kinds of data flows and describing a systematic approach to display error reports for them.We also provide an implementation of our algorithm that integrates with existing type systems and supports let polymorphism.
Algorithmic error localization.Previous work on type errors focus on finding the program expression most likely causing the error.Zhang and Myers [2014] demonstrate a constraint based system to identify expressions that create unsatisfiable constraints.Using heuristics they pick the simplest explanation for the error.Loncaric et al. [2016] also demonstrate a constraint based system that can integrate with existing type systems to produce error reports efficiently.
Our system directly integrates provenance tracking with constraint solving allowing us to track detailed information.
Heuristics based error localization can complement detailed data flow errors.
Data driven error localization.More recent work by Geng et al. [2022]; Seidel et al. [2017] leverages language models and supervised learning techniques to localize errors.They use large datasets with pairs of ill-typed and fixed programs to train models, which can then predict the likely location for the fix with high accuracy.However these techniques are limited to identifying a program expression and cannot create error messages which explain the flow that causes the error.
Improving compiler error messages.There's been considerable research on type errors messages [Heeren 2005] and their role in programmer experience.Becker et al. [2019] mention that type error messages play an important role in helping the programmer fix the error.Marceau et al. [2011a] argue that reporting all type errors and mapping error messages back to source code are crucial for effective error messages.Furthermore, Marceau et al. [2011b] recommends not to highlight specific fixes as they may be incorrect.Techniques from Wrenn and Krishnamurthi [2017] can be used to evaluate and improve data flow style error messages.Finally, Kochhar et al. [2016] surveyed software engineering practitioners to find that respondents prefer general solutions that can integrate with existing tooling and IDEs, furthermore they should scale to large codebases.Our system addresses the key challenge of mapping an error back to source code locations and existing tools and IDEs can be instrumented with the detailed data provenance information for interactive debugging.

CONCLUSION AND FUTURE WORK
We now conclude and suggest directions for future work.

Conclusion
If we want powerful type inference techniques to become broadly accepted in mainstream programming languages, we have to generate excellent error messages when type inference goes wrong.In this article, we laid some foundations to improve one important class of error messages: type error messages arising from constraint solving for both subtyping and equality constraints.Our main insight is that these constraints contain information about the data flow that led to the error, and that we can use this information to generate more informative error messages.We carried out a user study and compared our error messages to those of ocamlc and Helium.The study suggests that the additional information can potentially be overwhelming, so we have to carefully consider under what circumstances we use it and and how much of it we present to the user.While the empirical part of the study could not quantitatively show that HM ℓ improved over the state of the art, we also received encouraging feedback by participants which suggest that the general approach of flow-based error messages is worthwhile.

Future Work
We see two important directions of future work related to the extension of our method to model additional features of common type systems.
Occurs check.We have not implemented the occurs check in our prototype yet.The occurs check is a standard feature of Hindley-Milner type inference that catches cycles in constraint graphs.One of the most significant results in our study is that occurs-check failures are much harder to understand for users than unification failures, and so there is Bhanuka, Parreaux, Binder, and Brachthäuser much space for improvement.We hope to significantly improve these error messages using our approach based on data flows.Running the occurs-check separately also has algorithmic complexity advantages [Rémy 1992].
More advanced type system features.We would like to investigate how flow-based reasoning scales to more advanced type system features, where type error messages can often become even more confusing than traditional unification error.For example, we are particularly interested in studying support for higher-rank and first-class polymorphism
let x = 2; let y = if x then true else false ; Program with Level-2 error.Fig. 4. Examples of faulty programs and their corresponding constraint graphs.
2 let y = if true then x else " x " ▼ ^ (? a ) is assumed here ▲ -l .2let y = if true then x else " x " | ^^^^^^^^^^^^^^^^^^^^^^^ ( string ) comes from -l .2let y = if true then x else " x " ^^F ig. 6. Level-1 "confluence" error with convergent flows HM ℓ[ ERROR ] Type `int `does not match `bool ( from -lib .let not : bool -> bool ^^^^F ig. 7. Level-1 error with divergent flows.
Rules.The type inference judgement  Γ ⊢  :   ′ is specified in Figure 10.It takes a type inference state , a typing context Γ, and a term , and returns a type  along with a new type inference state  ′ .

Fig. 10 .
Algorithmic type inference rules.3.3.2Constraint Solving Algorithm.The constraint solving algorithm is specified in Figure 11.The type constraining function  cons()   ′ takes a type inference state , a constraint , and a set of current hypotheses  , and returns a new state  ′ .

Fig. 13 .
Fig. 13.Extended syntax with unification.Data flows.We extend the syntax from Figure 9 with the notion of data flows.A data flow Z is a sequence of types related by either <:  , :>  , or ∼ ⟨ ⟩ ⊗|⊕|→ |

U
-Var-L This rule unifies type variable  with .It produces a set of data flows from the the upper and lower bounds of  to .Since  is on the left of the relation, the new relations are concatenated to the left of the existing data flow  .This preserves the left to right continuity of the data flow.U-Var-R This is similar to U-Var-L except the type variable  is on the right of the relation.So the new relation gets concatenated on the right.U-Sub When the unification involves constructed types (product, sum and function types), we invoke ctor-uni to compute sub-unifications for their type arguments.If the function cannot equate the constructor types it returns an error.
into a textual error message is straightforward.The sequence of program locations are shown vertically betwe en lines describing intermediate types.The constructor data flow is shown with a horizontal offset corresponding to the height of the nested data flow in the diagram.Our layout is not prescriptive and future implementations can experiment with other layouts.

Fig. 15 .
Fig. 15.Flow summary for data flows shown in Section 4.2.
Fig. 17.Application of type substitution to typing state.
Fig. 20.Freshening of types above a polymorphism level.

Table 1 .
Results of all Kruskal-Wallis tests per program.

[
[Le Botlan and Rémy 2014]ially since these approaches could benefit from subtyping[Le Botlan and Rémy 2014].Other important and tricky type system features include generalized algebraic data types, constrained types, modalities, and linear types.Given these error reports, we now understand exactly what the cause of the error is, even without having seen the rest of the source code: version information passed into the process function are expected to be strings, since they are passed to the parse_version function, but the user provided a float number instead while defining appInfo.