A pred-LL(*) Parsable Typed Higher-Order Macro System for Architecture Description Languages

Macro systems are powerful language extension tools for Architecture Description Languages (ADLs). Their generative power in combination with the simplicity of specification languages allows for a substantial reduction of repetitive specification sections. This paper explores how the introduction of function- and record types in a template-based macro system impacts the specification of ADLs. We present design and implementation of a pattern-based syntax macro system for the Vienna Architecture Description Language (VADL). The macro system is directly integrated into the language and is analyzed at parse time using a context-sensitive pred-LL(*) parser. The usefulness of the macro system is illustrated by some typical macro application design patterns. The effectiveness is shown by a detailed evaluation of the Instruction Set Architecture (ISA) specification of five different processor architectures. The observed specification reduction can be up to 90 times, leading to improved maintainability, readability and runtime performance of the specifications.


Introduction
Macros and Domain-Speci c Languages (DSLs) are two programming concepts that contribute to faster development of artifacts and code quality.Macros are used to simplify repetitive code patterns by providing a shorter, more concise expression.Domain-Speci c Languages o er a higher level of abstraction than conventional General-Purpose Languages (GPLs) for a speci c domain.DSLs allow for a wide variety of applications, implementation techniques and design choices [18].Macros and DSLs are strongly coupled.Many DSLs are implemented as a collection of macro denitions, while on the other hand, macros can contribute to the language extensibility for existing DSLs.A special form of DSLs are Architecture Description Languages (ADLs).In this article we present our experience with the development of a macro system with special focus on ADLs.Through the development of our Vienna Architecture Description Language (VADL), see Section 2, we gathered valuable insights regarding language extensibility for ADLs.

Architecture Description Languages
ADLs are computer languages used to describe the architecture of hardware and software systems.Particularly interesting for this article is the ADL subgroup of Processor Description Languages (PDLs).PDLs allow hardware designers to describe instruction set, register set, memory hierarchy, and other aspects of a microprocessor.We identi ed a particular need for macros regarding PDLs, especially when it comes to the speci cation of an instruction set architecture (ISA).Section 2.2 will provide an overview of the fragment of VADL used to describe ISAs and explain in more depth where the repetitiveness comes from and how it in uenced our macro system design.Of course these observations are not limited to us and can also be found in other description languages like LISA [21], ISDL [10] or ArchC [2].Additionally, we want to clarify some key properties of VADL and PDLs in general.A PDL is not, and should not be, an executable program.It can be thought of as a complex con guration for artifacts like hardware, simulator or compiler.This is an important concept as an error is no longer a programming error, which can be debugged with the speci cation alone.Debugging a speci cation requires specially generated tools and techniques like co-simulation.Hence, it is most important to reduce any other sources of errors, e.g.semantic errors, to a minimum.In Section 1.2, we describe how speci c macro designs contribute to this desired property.

Macro Systems
Macro systems are one of the oldest forms of language extensions.In general, macros are user-de ned procedures, transforming one program sequence to another program sequence.This transformation is called macro expansion.Based on the technique used the macro system is categorized into a lexical or syntactical, and procedural or pattern-based macro system [16].Lexical macro systems, such as the C preprocessor (CPP) [23] and Unix M4 [12], are language agnostic and work on a lexical level, for example a token stream.In contrast to lexical macro systems, syntax macros are aware of syntactic structures.They are integrated into the core language and usually perform AST (Abstract Syntax Tree) to AST transformations.Representatives for example are LISP [24], Scheme [1] or Racket [9].If the macro system supports algorithmic computations on their inputs, they are classied as procedural macro systems.On the other hand, macro systems that rely on pattern matching and substitution are called pattern-based.The presented concepts are not mutually exclusive and may be present in all combinations.The Rust programming language [13] incorporate both, procedural and pattern-based techniques within its macro system.Another interesting example is the Java Syntactic Extender (JSE) [3], which supports full procedural macros and an extendable pattern-matching engine.Finally, a concept often considered when talking about macro system is hygienic macros [4,8,14].The main idea of hygienic macros is to prevent accidental capture of identi ers during expansion.When we started the design of our macro system, we anticipated, that macro hygiene was of secondary importance for us as we either want to capture identi ers or we pass the identi ers as arguments providing us with more control over the used names.We will address hygienic macros again in Section 3.6 together with our lexical macros.For now, hygienic macros are part of our future work.

Macros for DSLs
When we were considering language extensibility for our DSL, syntax type safety and termination were the top priorities.We use the term syntax type safety in the sense that a syntax type safe macro system is able to detect syntax type errors.Hence, the system prevents the generation of syntactically incorrect code.Furthermore, many macro systems designed for DSLs have a feature-rich host-language or environment they can exploit [5].
VADL on the other hand is a standalone DSL/ADL with no meta-or host-language available.This decision helps us to develop the VADL syntax more freely and explore di erent design possibilities for PDLs without syntactical restrictions or super uous features of a host language.The drive of keeping the speci cation simple led us to investigate a lightweight and language dependent implementation, i.e. a syntactical pattern-based macro system.We also considered a language agnostic approach, but decided against it due to the lack of safety, available debug information and IDE support.
While we were satis ed with the choice of syntactical type safety, the pattern-based templates felt very limiting in expressiveness.Switching to procedural macros is for us (and we believe also for many other DSLs with a non Turing-complete speci cation language as host language) not bene cial as it compromises the simplicity of the host language.This inspired us to develop the higher-order models for our macro system discussed in Section 3.
A nal aspect worth considering is computation time for DSL macro systems.Macro expansion becomes a prerequisite for any DSL related analysis and task.Therefore, a main goal should be to make sure that the macro system's execution time is as short as possible.We incorporated our macro system directly into the language grammar without requiring any preprocessor.This helped us to reduce unnecessary precomputations.Additionally, our LL(k) parsable host language encouraged us to preserve the top-down parsing fashion.We designed the built-in macro system to be pred-LL(*) parsable.

Contribution.
• A simple pred-LL(*) parsable syntactical pattern-based macro system for speci cation languages • Syntax type safe higher-order macro templates using models • Composable syntax types using records and type aliases • Demonstration of the presented macro system using the Vienna Architecture Description Language Additionally, we present a variety of smaller macro features supporting a high con gurability and usability in the context of speci cation languages.We found the following implemented features particularly useful for our exploratory language design of VADL.
• Inheritance of macro de nitions across language denitions • Lexical manipulation of identi ers and strings • Con gurable and conditional macro expansions using match and command line arguments

Overview
In this section we give an overview of the Vienna Architecture Description Language (VADL) with special focus on the instruction set architecture (ISA) section.

Vienna Architecture Description Language
VADL is a Domain-Speci c Language in the domain of computer architecture and compiler construction.It permits the complete formal speci cation of a processor architecture.
Additionally, it is possible to specify the behavior of generators which produce di erent artifacts from a processor speci cation like a compiler or an instruction set simulator.VADL strictly separates the speci cation of the instruction set architecture (ISA), the micro architecture (MiA) and the application binary interface (ABI).To provide a proof of concept, we only implemented and evaluated our macro language for the instruction set architecture speci cation section.However, the ideas and techniques presented can be applied to the other sections as well as to any similarly structured DSL.

ISA Syntax Elements
Presenting the whole syntax and semantics of VADL used to describe ISAs, let alone the VADL language as a whole, is out of the scope of this article.Therefore, we will focus only on the relevant portion of the instruction set architecture de nition.First, we have to establish how instructions are dened.VADL separates the abstract concept of an instruction into three parts.The instruction de nition, the instruction encoding and the textual representation, i.e. assembly.The instruction de nition is the core part of the three de nitions, holding information on the name, the used encoding format and the instruction semantics.The instruction encoding speci es the values of the static encoded elds.The instruction assembly de nition speci es a pattern on how the assembly string is computed.Figure 1 shows a speci cation of an ADD instruction, which adds two registers together and stores the result in a third.The format elds of the format de nition F, which are not assigned to a static value inside the encoding de nition, become dynamic elds or operands.Inside the instruction semantics, we can observe that rd, rs1 and rs2 are indeed used as operands.The call expressions to X(.) represent indexing of a register bank X, de ned somewhere else in the ISA.
If we de ne a new instruction, e.g.AND, that di ers from ADD in a single encoding bit and the binary operator, we would need to create a completely new instruction de nition, encoding and assembly.Which brings us to the downside of such element or block based speci cation languages like VADL.We designed VADL to be descriptive and simple, which led us to a very small core language for the ISA section.While we support functions, our core type system is very simple and does not support these de nitions as rst class citizens.During our language development phase we also experimented with di erent language built-in features that could reduce code duplication, but we came to the conclusion that they only introduce a lot of complexity and obfuscate the original code.This led us to the idea of designing a template-based macro system speci cally directed towards speci cation languages., r e g i s t e r ( r d ) , " , "

23
, r e g i s t e r ( r s 1 ) , " , "

24
, r e g i s t e r ( r s 2 )

25
) Figure 1.ISA Example Speci cation for an ADD instruction

VADL's Macro System
In this section we give a detailed description about the syntax and techniques implemented for VADL's macro system.

Syntax Models
At the core of our macro system are the so-called syntax models.A syntax model can be seen as a parameterized and well typed template.Figure 2 shows how such a syntax model can be de ned.Every model has a name, a typed parameter list, a result type and a body.Note how the use of the parameters are indicated by a leading "$".This design decision has two advantages.First, it simpli es parsing as it explicitly marks the use of a macro element.Second, the "$" captures the model parameter names, preventing name collisions with ISA de nitions, which strengthens the hygiene of the macros.Similarly to the parameters, we use the "$" for the instantiation of de ned syntax models.Figure 3 shows how the model from Figure 2 can be instantiated.To separate the syntax elements from each other we use ";" as separator inside an instantiation.Recall the dilemma of Section 2.2, the introduction of models provides us now with a mechanism to e ciently specify both instructions without code repetition.In the example of Figure 2 we only used the identi er (Id), binary operator (BinOp) and ISA element (IsaDefs) syntax types.However, the syntax model de nition supports a variety of types discussed in the following sections.

Syntax Types
This section introduces all the available core syntax types.We designed our syntax types to have a one-to-one relation to parser rules.This already provides us with a partial order, where the relation is a partially ordered subtype relation.Table 1 gives an overview of all the available base types with a short description and examples.Additionally, it is important to note that the presented base types, function types (Section 3.4), record types and type aliases (Section 3.3) can be arbitrarily nested.The resulting types can be used everywhere a syntax type is expected with the only exception being result types of models and function types.Figure 4 displays the subtype relation between the presented core types.The macro type system provides an implicit upcasting of the value types.For example, if a model expects a value of type Val, any subtype, i.e.Bool, Int or Bin will be accepted as argument.

Type Alias and Composition
The VADL macro system provides a feature rich type interface.Besides the basic types mentioned in Section 3.2, the macro system also supports type aliasing and a form of type composition to make the typed templates more readable.Figure 5 shows a type alias de nition BinExprType, which from now on can be used instead of the function type (Ex, Ex) -> Ex.An application of BinExprType can be seen in Figure 8.
Figure 6 shows a record de nition used for type compositions.In this particular case the record de nition composes an Id and BinOp type to the new type BinInstRec.The body of a record consists of a parameter list providing typed elds.Figure 7 shows how the record is initialized and the elds name and op are accessed.Passing a record type argument can be either done by reference or by creating a syntax tuple.A syntax tuple is speci ed the same way a model argument list is provided, i.e. syntax elements are separated by ";" and enclosed inside brackets.Accessing the passed elements is done using the record's name followed by a "." and the desired eld.Accesses of sub-records can be arbitrary chained together.The whole access may be wrapped inside brackets, i.e. "$(...)", to better indicate what is part of the access and what belongs to the VADL speci cation.Furthermore, it is important to note that records are treated as type tuples.Their eld names do not a ect the type and are only used to access the internal elements.

Higher-Order Macros
To the best of our knowledge, we have not seen typed higherorder macros in a pattern-based syntax macro system as presented in this paper.In this section we will shortly describe how they are used in the context of our macro system.In Section 3.8, we will further discuss why they are important for ADLs and how we use them in VADL.A higher-order macro is a statically evaluated function, mapping a list of syntax types to a result syntax type.We chose the term higher-order, to underline the capability of providing model references as argument.In Figure 5 we have already de ned the type signature of a model in form of a function type.We will reuse this type for the higher-order model BinExStat in Figure 8.The instantiation of BinExStat in the presented gure, produces an assignment statement of X(rd) taking the addition of X(rs1) and X(rs2) as argument.
A valid argument for a parameter with a function type is either a model reference, as seen in the example, or a parameter of function type from an outer model.In both cases the types are evaluated and checked during parse time.

Conditional Expansion
A minor di erence to some pattern-based approaches is our conditional expansion.VADL macros provide an explicitly typed match-statement shown in Figure 9.The entries are processed from top-to-bottom and it uses the right-hand-side of the rst satis ed left-hand-side for expansion.The matchstatement has the requirement of providing a default case at the last position, indicated by the "_".Beside the default case, each entry contains a condition that is either matching equality ("=") or inequality ("!=") of a parameter and a syntax element matching the type of the parameter.The comparison is done on a lexical, i.e. token-based, level and performed

Lexical Macro Functions
While most of the needs are covered by syntactical macros, we came to the conclusion that string and identi er manipulation is best done using lexical macros.A lexical macro acts on the abstraction level of token streams in contrast to an already parsed AST.Through language exploration, we narrowed the lexical macros down to two use-cases.These use-cases are safely implemented using special macro functions.Firstly, templates generating instruction behavior and assembly used to require the instruction name once in form of an identi er (Id) and again in form of a string (Str).We solved this issue by introducing the IdToStr function.This function takes an Id typed syntax element and converts it to a Str typed syntax element.Secondly, we encountered the problem of not being able to e ciently manipulate our identi ers.This is especially tedious when dealing with di erent con gurations.To provide a type safe identi er manipulation, we introduced the ExtendId function.This function takes an Id typed identi er and an arbitrary number of Str typed syntax elements, concatenates them together and returns a single Id typed syntax element.Figure 10 shows a small example of both functions with their typed result as comment.It is important to note that the context of the lexical macros generated identi ers is strictly separated from the context of the syntactical macros.Therefore, it is not possible to de ne or refer to a model name or parameter using a generated identi er.
For VADL, the lexical macro functions are the "alternative" to hygienic macros.Consider the ISA elements described in Section 2.2.When de ning a new instruction we either want to capture identi ers, e.g.registers or memories, or we want full control over the identi er name.From our experience, VADL's IdToStr and ExtendId are su cient for this use-case.
1 ExtendId ( I , "Am" , " An " , " I d e n t i f i e r " )

Con guration and Inheritance
VADL provides the possibility of passing con guring information to the macro system using the command line.Currently, this mechanism is kept very simple and is based on Id elements.To prepare a con gurable macro variable, one has to create a simple model of type Id containing a default value.Figure 11 shows such a variable of name Arch, with the default setting Aarch32.Without any passed con gurations the instantiation of Arch results in the identi er Aarch32.However, if VADL receives the command-line option -m or -model, followed by the string "Arch=Aarch64" the value of Arch is overridden.If Arch is instantiated given the previous command-line option, it would result in Aarch64.In combination with conditional expansion, see Section 3.5 and Figure 9, this simple mechanism already provides powerful con guration capabilities.Additionally, we want to shortly mention VADL's inheritance in this section.Every ISA component in VADL can inherit from an arbitrary other ISA component using the keyword extending.Since we tightly coupled the macro system into our language, the inheritance and visibility does also a ect the macro system.Figure 12 shows this mechanism in action and provides additional information in the comments.

Macro Application in Processor Speci cations
The following macro application design patterns demonstrate with simpli ed examples the usage of macros for instruction set architecture speci cations.The simpli ed examples are based on a real speci cation of the AAarch32 instruction set architecture from ARM.The most common case is a simple argument substitution pattern shown in Figure 13.
AArch32 has a register le called R consisting of 16 registers which are 32 bits wide.Conditions are speci ed by boolean expressions on ags of the status register APSR, e.g. the zero ag Z.Every instruction can be executed conditionally.There are 15 di erent conditions which are described by an enumeration in the speci cation and encoded by the cc eld in an instruction word which is 32 bits wide.Arithmetic/logic instructions which have an immediate value as second source operand share a common instruction encoding speci ed in the ArLoImm instruction format.The ALImmInstr instructions themselves di erentiate each other only by the unique instruction identi er, the assembly instruction name, the binary operation to be executed and the instruction encoding.Therefore, a model with these four parameters is de ned which substitutes these four parts in the instruction speci cation.Then with a single line macro call an arithmetic/logic immediate instruction can be speci ed.This leads to concise instruction set architecture speci cations, 1 r e g i s t e r f i l e R : B i t s <4 > −> B i t s <32 > 2 3 enumeration cond : B i t s <4 > = 4 { EQ / / e q u a l Z == 1  Most instruction set architectures are too complex to get by with the substitution pattern.As in the AArch32 architecture every instruction can be executed conditionally, a basic instruction exists in 15 variants for 15 di erent conditions.This problem can be solved smartly by an extension macro pattern using higher-order macros as demonstrated in Figure 14.
To reduce the number of macro arguments record types are de ned for an instruction and a condition.The Inst record type de nition groups the four arguments describing an instruction from Figure 13 together.The Cond record type de nition consists of a string representing the extension of the assembly name, the identi er of the enumeration of the condition encoding and a boolean expression for condition evaluation.
In contrast to the previous example in Figure 14 now 15 di erent instructions with a unique identi er have to be created.This can be handled with the lexical macro function ExtendId by appending the extension string of the condition to the identi er.
The nal problem is that there is a set of models which describe di erent kinds of conditional instructions and all these models should be called 15 times for the 15 di erent conditions.This can be solved by the higher-order model CondInstr, which takes the instruction model as rst argument.The instruction model is then called 15 times with an argument list, which has been extended by the conditions.In the above example the 4 macro calls expand to 60 di erent instructions.The AArch32 architecture has instructions with a lot of additional variants like setting the status register, shifted operands or complex addressing modes.This leads to a speci cation with multiple higher-order macro arguments.

Implementation
In this section we give an overview of our macro system implementation for VADL.VADL manages the macros in two separated phases: parsing and expansion.It is important to note that the parsing phase does only analyze the macros.It guides the parser through the di erent kind of macro actions while asserting their syntactical and partly semantical correctness.Applications of the macro actions are done in the expansion phase.

Parsing
Parser.The VADL frontend uses a modi ed version of the Xtext framework [6].The Xtext framework is a Java based DSL development tool.It takes a Xtext grammar le as input and generates a variety of useful artifacts, e.g.IDE integration, metamodel classes for the syntax-tree or a parser.We have refrained from using any non LL(k) Xtext grammar functionalities and disabled the backtracking feature of the generated parsers to start our implementation from a true LL(k) parser.A LL(k) parser is a top-down parser processing the language from left to right.The k indicates a constant lookahead, which may be performed by the parser.Additionally, we extended the implementation to allow semantic predicates [20] and code actions.This grammar extension lifts the parser to pred-LL(*).The pred pre x indicates the use of predicates in combination with grammar rules and the star (*) lifts the lookahead requirement of a xed constant k to an arbitrary constant.The Xtext framework targets ANTLR [19], which already supports code actions and semantic predicates.Therefore, extending the grammar was a quite straight forward task for our simple purposes.In Sections 4.1 and 5.3 we will further comment on the context-sensitivity.
Concrete Syntax Tree.The Xtext framework automatically creates classes for each non-terminal grammar rule that has at least one labeled rule eld.While parsing a source le, it uses these classes to build a concrete syntax tree (CST).To keep the implementation e ort manageable we kept this 1 record I n s t r ( i d : Id , a s s : Str , op : BinOp , opcode : Bin ) 2 record Cond ( s t r : Str , code : Id , ex : Ex ) ( $ i n s t r .a s s , $cond .s t r , ' ' , r e g i s t e r ( r d ) , ' , ' , r e g i s t e r ( r n ) , ' , ' , decimal ( imm12 ) ) Wrapper Rules.Since the CST consists of Java classes based on grammar rules, we decided to introduce additional wrapper rules.Each rule, which we would like to use as macro type, is enclosed in an additional rule to create a clear location for replacement later on.In most cases, such a wrapper rule contains two alternatives: a generic macro replacement rule guarded by a semantic predicate and the concrete value rule.The semantic predicate is used to perform a minor lookahead to see if the next tokens are part of a concrete value or a macro action.Unfortunately, this introduces a slight overhead as we have to create an additional wrapper rule for each syntax type.Figure 15 shows a wrapper rule StringRule and a concrete rule ConcreteStringRule to express strings.Retrieving the current context with re ections or the parser itself was quite tedious, so we decided to simply parameterize our isMacroAction predicate with the current syntax type context (Str).The StringRule rule can now be used anywhere as if it was a normal grammar rule for strings.Inside the StringRule the value eld is used as location for replacement.We implemented all non-terminal rules used as syntax types (see Section 3.2) in the same fashion.

Figure 15. Wrapper Rule Example
Context Sensitivity.By using a context-sensitive parse approach, we guide the parser in such a way that only syntactically and semantically correct macro occurrences are parsable.To manage the context sensitivity, we implemented a parser state speci cally for macros.It provides an API used by semantic predicates and code actions to compare and update symbols and macro information.The core of the parse state itself consists of a symbol table containing information on macro related de nitions in the current scope.Moreover, it holds a variety of type information on actively parsed macro constructs.In the initial parse state only the core syntax types listed in section 3.2 are registered.During parsing, ISA namespaces, models, their parameters, records and syntax type aliases are added.Figure 16 shows a simpli ed version of our grammar rules handling a model de nition.The start of the ModelRule is straight forward by expecting the model keyword, a name, a typed parameter list and a syntax type.Before the parser enters the model body, the parse state has to be updated.In the example this is done by the code actions, indicated with an opening and closing "$$".We feed the parse state the name, the syntax type and the parameters.Note that the passed values are CST nodes and therefore already Java classes, making it possible to access type and name of the parameters.Inside the parse state, the symbol table is extended by the model name and the information on the parameters.The passed type is used to select the correct rule to parse the model body.This is done using syntax predicates, which can be seen in Figure 16 inside the ModelBodyRule.They are similar to code actions but with an additional "? =>" after the enclosing "$$".The predicates are tested in-order from top to bottom.If a predicate is satis ed, the parser tries to apply the rule(s) on the right-hand side of the current alternative.The ModelBodyRule reveals another slight overhead as we have to manually implement the relation between syntax type and desired rule.The presented example was of course just a simpli ed version of the actual implementation and should help to understand the main idea.A similar approach was applied for all the other contextsensitive tasks, e.g.managing ISA namespaces or checking the correctness of syntax types of passed arguments.

Expansion
The macro expansion is done in a separate pass after parsing and works solely on the CST representation.This pass is responsible to perform all macro related CST manipulations.Note that at this stage the correctness of the macros were already checked by the parser.The examples in Figure 17 and Figure 18 show a textual representation of the CST before and after the expansion pass.The expansion is executed in two steps.
Setup.The rst step is to remove all the macro nodes of the CST that do not produce any new nodes.This includes model, syntax type alias and record de nitions.During the removal, each de nition is stored in a symbol table to preserve its information.Additional to the parsed de nitions, the command line con gurations are added to the symbol table.
Execution.The second step is the actual execution and expansion of macro code.The expansion is done in a top-down fashion and performs iterative replacements on wrapper nodes.The rst top level construct, or root node, the macro expander encounters, is by design a model instantiation node.The instantiation contains information on the model and the passed parameters.Similar to function inlining, the expander creates a copy of the referenced model body and replaces each parameter occurrence with the respective argument.The new body is now passed again to the expander to resolve nested macro actions.Afterwards, the macro node is replaced by the newly created subtree of the updated copy of the model body.Conditional macro expansions are also done during the model instantiation.The implementation is very simple and currently only uses a uni ed symbolic equality comparison.The match statement is symbolically executed and replaced by the right-hand side of the rst satis ed left hand side condition.If no condition evaluates to true, the mandatory last wild card statement is used.Similar to the model instantiation, the resulting node is again iteratively expanded.The implementation of the lexical macro actions (IdToStr, ExtendId) are also straight forward.We simply create a new identi er or string CST node and replace the old macro node occurrence with the newly created node.Recall that the newly created identi er is only part of the expanded program and not available during expansion.This means that it cannot be used to refer to macro models or macro parameters.
Termination.Finally, we want to emphasize again the fact that the termination of our expansion is always guaranteed.Although we allow for multiple nested model instantiations and invocation of higher-order model parameters, the topdown parsing and our conscious opposition to use-beforede ne, make recursive model calls impossible.Note that a model is only considered to be de ned after it is fully parsed, which prevents a recursive call to itself or passing itself as argument inside its body.Therefore, it should be clear that our iterative expansion is bounded by a nite number of steps.

Evaluation
For the evaluation of the macro system's e ciency and expressiveness we used VADL ISA speci cations of AArch64, Aarch32, MIPS IV, RISCV and our RISCV-like toy architecture TriLen.The following section is separated into a qualitative evaluation and a runtime evaluation section.The qualitative evaluation investigates a variety of source code properties like amount of models, records or type-alias.The runtime section provides an overview of the execution time.

Qualitative Evaluation
This section should provide an overview on the macro system's expressiveness with a particular focus on the presented macro concepts, e.g.higher-order models or type compositions.For evaluation, we implemented data collection passes before and after the macro expansion pass.Lines of code and lines of comments were collected manually.Lines of code exclude any trailing empty lines in a le.To determine the comments only lines, we used following regex: Firstly, we present the overall expressiveness by providing a comparison between the original speci cations and their expanded, pretty-printed results.Table 2 contains data on the lines of code, lines of comments, CST nodes and instruction de nitions before and after expansion for each architecture respectively.The instruction de nitions value includes all instruction language elements (see Section 2.2) no matter if it contains placeholders or if it is located inside a model template.The concept of CST nodes are described in Section 4.1.We believe that instruction de nitions and CST nodes are a more accurate metric to demonstrate the actual generative capabilities of our system.Lines of code (including comment lines) is a more tangible metric, which is why we also provided them.
It is important to note that our pretty printer does not insert new lines in-between an expression, which in case of nested if-else-expressions could result in even more output lines.Additionally, all the comments are not preserved during the parse step and are therefore missing in the output.Our prime examples are the ARM speci cations AArch64 and especially AArch32.Their speci cations heavily use the macro system to model the di erent instruction variations.This can be seen by their 52 and 53 initial instruction definitions, which expand to 799 and a tremendous 8865 instruction de nitions.These big numbers can be explained by the fact that the nal speci cation explicitly models each hardware mode and conditional execution combination as a separate instruction de nition.Although the other examples are not that impressive, MIPS IV, RISCV and TriLen still show improvements regarding code size and especially instruction de nition abstraction.RISCV 's lines of code is a perfect example why we chose to provide additional metrics to compare the expanded speci cations to the originals.The lines of codes decrease after the expansion, indicating that the macro system introduced more verbosity.However, when we look at the CST nodes and the instruction de nitions, one can see that there actually was an increase in syntax elements after the expansion.Additionally, the instruction de nition amount doubled, which is an indication that the system provides a good abstraction.Overall, we are satis ed with the expressiveness of the macro system as the code reduction measured by the CST nodes of our examples are between the factors 1.2 and 90.
Furthermore, we were interested in the absolute frequency of each macro element for each architecture speci cation respectively.Table 3 displays the overall amount of model definitions, their placeholders inside the template, model instantiations (macro invocations), record de nitions, type-alias de nitions and the uses of our macro conditional (match).An interesting discovery when looking at Table 2 and Table 3 is that more model de nitions and model instantiations do not necessary mean a higher increase in CST nodes.This can be seen by comparing the properties for AArch32 and AArch64.In general, complex architectures like AArch32, AArch64 and even MIPS IV seem to use the set of features more than the simpler ones like RISCV and TriLen.Especially, when it comes to the use of type aliases and records, the complex architectures bene t more from it.The reason is that their model templates are more complex.For the given examples, the ratios of placeholders to models seem to give a good indicator on this complexity.Furthermore, it is interesting to see that records, type aliases and match-statements have been used sparsely.
Finally, we decided to provide an overview of a variety of parameter related statistics for model de nitions.Table 4 shows the minimum, maximum, absolute and average amount of arguments for: • Normal Parameters: This includes all model parameters.• Flattened Parameters: This includes all model parameters with a slight manipulation.All parameters, which resolve to the type record or tuple are attened.This means that each element inside a record or tuple is iteratively unpacked and moved to the outer parameter list.The original type container is removed.• Higher-Order Parameters: This includes all parameters that are of higher-order, i.e. are of type, or contain a subtype, of the function type.4 presents statistics regarding the number of macro arguments.It shows that for the two complex architectures the usage of record types halves the maximum and average number of arguments.Higher-order and record arguments are only used by the complex architectures.

Runtime Evaluation
The runtime evaluation was done on an Apple Mac mini M2 Pro with 32 GB memory under macOS Ventura 13.4 using OpenJDK 64-Bit Server VM Temurin-17.0.6 and the newest version of the VADL tool. Figure 5 shows the runtime of the source parsing and macro expansion pass in milliseconds on the original source speci cation and on the expanded source code which is the result of the expansion of all macros.Each speci cation was executed 3 times and the minimal time was selected.The variance between the 3 runs was very low.The parse pass scans the text and generates the CST.The expansion pass traverses the CST, does macro expansion and generates an expanded CST.Additionally, we applied the same approach to the already expanded speci cations to have a direct comparison between parsing with and without macros.This provides an idea for how much time is actually taken up by the macro system.Table 5 shows that the macro expansion adds almost no additional runtime for speci cations with sparse use of macros.More surprisingly is the savings in runtime for AArch32.This can be explained by the increased I/O and parsing e ort, compared to much faster in-memory expansion operations.

Re ection
In this section we would like to give a brief overview on interesting experiences we gained while investigating a macro system design for VADL.
To use VADL's existing IDE integration feature, the syntax errors must be detected by the parser.While most de nitions, e.g.models, did not impose a real problem, we soon realized that model instantiations, and parameter uses are not safely parsable in LL(k) without some form of syntax type hints.These hints were essentially the syntax types of parameters or models written before an identi er, e.g."$( )" or "$( )".While this helped in the context of grammar ambiguities, the identi ers may still fail in the latter applied expansion pass as they were never checked.Furthermore, we ran into similar issues when it came to parsing instantiation arguments, as they potentially allow a huge set of syntax rules.With the small extension to pred-LL(*) we were able to not only remove the unwanted syntax, but also guide the parser in a much cleaner manner.Now the parser guarantees syntactic correctness even before the macro expansion, all while preserving the IDE functionalities.
Another interesting realization was that a macro system for PDLs does not necessarily require hygienic macros.Most of the time, capturing identi ers is a desired behavior.For the remaining cases we preferred the control over the names provided by our lexical macro functions.
In contrast to GPLs, VADL macros are mainly used for code generation instead of language extension.Lexical macros have proven to be prone to errors, especially for larger speci cations.Procedural macros are very powerful, but we have not yet discovered a use case where the resulting increase in complexity would be pro table.This is why we found the pattern-based macro approach extended with our higherorder macros a sensible compromise between complexity and implementation e ort for PDL speci cation.
Finally, we would like to make a few comments about the uses of our macro elements.The record element improves the readability of our speci cation greatly as it enables us to group related parameters together.The models were mostly used to express variance of instruction semantic.The match statement together with the command line con gurability were mainly used to model the variance of processors.

Related Work
Most of the related work we found that ts our class of macro system, i.e. syntactical and pattern-based, were focusing on syntactically rich GPLs and not simple DSLs.Our work took inspiration from <bigwig> [7] and programmable syntax macros [26].We tried to capture some core ideas and incorporate them in our higher-order, composable syntax types in form of the presented VADL macro system.
One of the rst approaches to bring syntax macros to syntactically rich languages or even languages outside the LISP family are programmable syntax macros for C [26].The pattern-based macro system by Weise and Crew uses an extended version of the C language as macro language.Similar to our approach, a macro is de ned by a meta construct providing the resulting syntax type, typed parameters and a template body with placeholders.Weise and Crew's macro headers enable parameter parsing in form of patterns, providing more syntactic freedom than our approach.However, this freedom may lead to unparsable macros if it cannot be deterministically parsed, which is always guaranteed by our approach.Both approaches rely on a context-sensitive parser to manage macros.Weise and Crew's macro system does not support such a rich type system as ours (e.g.higherorder, records, type-alias).Furthermore, they do not support alternatives in their templates.
The <bigwig> [7] is an extensible system for interactive web service.Similarities to our approach can be found in the usage of non-terminal grammar rules as syntax types and the way their macros and our models are de ned.However, our approach di er in fundamental design decisions concerning the metamorphisms.Morphing new keywords into the host language's grammar would require costly rebuilds of our parser.As a result, we decided to increase the initial complexity by making our parser context-sensitive, but save ourselves the execution of a preprocessor.The <bigwig> macro system has a similar approach to our higher-order models by allowing metamorph rules as arguments.However, these rules are translated into grammar rules.They are not instantiated with syntax typed arguments, but guide the way on parsing the arguments.This allows for more syntactic freedom, for example arbitrary arity.
Similar to the previous mentioned techniques, the ExJS [25], a macro system for JavaScript, also splits its macro into a pattern and template part.Instead of metamorph nonterminals, it uses so called phantom patterns to establish arbitrary length repetition.Compared to us the template syntax types are rather poor, as it only supports expressions or statements.The biggest di erence to us and the other approaches is the implementation.ExJS uses a rst stage parser to build a second stage macro-aware parser to retrieve the AST.While similar implementations exist [4,7], they further convert the AST to S-expression, feed them to a scheme macro expander and convert the resulting S-expression back to macro-free JavaScript.
The Honu [22] macro system follows a simpler approach when it comes to macro patterns and templates.Although the patterns are still syntactically more expressive than our simple parameter list, they follow a much simpler more LISPlike convention.The main idea presented is the enforestation parsing step, which converts a at stream of tokens into an S-expression-like tree.It allows for LISP-style extensibility while still providing enough syntactic freedom to supporting macro in x operators.
One of the newer macro systems that is used in various elds ranging from DSLs to symbolic veri cation language extensions is the one from the Rust programming language [11,13,15,17].Speci cally, the pattern-based macro_rule! is of interest compared to our work.It shares the technique of specifying a pattern with typed parameters that is rewritten based on a template.What stands out is that it supports more general and meta types, e.g.token-tree, pattern or item.Furthermore, the macros are invoked using their identi er and a trailing "!".The de ned pattern is then provided inside parenthesis.This is closer related to our LISP-like macro invocation, in contrast to the previous mentioned approaches.As with the previous macro systems, Rust does not support higher-order macros.

Conclusion and Future Work
We have designed and implemented a type-safe higher-order macro system, speci cally designed for (computationally weak) architecture description languages.Our presented pattern-based syntax macros are pred-LL(*) parsable and require no preprocessor or complex host language.The implementation was based on a context-sensitive top-down parser and an iterative expansion algorithm with termination guarantee.We demonstrated our approach using our work-in-progress speci cation language VADL.The evaluation of our macro system was conducted with the help of VADL based ISA speci cations for the AArch64, AArch32, MIPS IV, RISCV and TriLen, our RISCV-like toy architecture.
Our macro system has still shortcomings we want to address in future work.First, the VADL module management and import system is currently put after the macro expansion, which limits the scope of a macro de nition to a single le.Second, we would like to increase the syntactical freedom, for example variable arguments or arbitrary patterns, to see if they work well with our type system.Third, we want to extend our work by some form of hygienic macros.Finally, we want to continue our exploration of design possibilities for ADLs using macros.

1 2 : I s a D e f s = { 3 i n s t r u c t i o n $name : F = { 4 X 5 } 6 }Figure 2 .
Figure 2. Syntax Model De nition

1 2 model
i n s t r u c t i o n s e t a r c h i t e c t u r e A = { ModelA ( ) : I s a D e f s = / / . . .

3 4 model 7 R 8 } 9 encoding
ALImmCondInstr ( cond : Cond , i n s t r : I n s t r ) : I s a D e f s = { 5 i n s t r u c t i o n ExtendId ( $ i n s t r .i d , $cond .s t r ) : ArLoImm = { 6 i f ( $cond .ex ) then ( r d ) : = R ( r n ) $ i n s t r .op imm12 ExtendId ( $ i n s t r .i d , $cond .s t r ) = 10 { c c = cond : : $cond .code , op = $ i n s t r .opcode , f l a g s = 0 } 11 assembly ExtendId ( $ i n s t r .i d , $cond .s t r ) = 12

1 2 $ 6 7
S t r i n g R u l e : $ i s M a c r o A c t i o n ( S t r ) $$ ?= > / / sem− p r e d 3 v a l u e = M a c r o A c t i o n R u l e 4 | v a l u e = C o n c r e t e S t r i n g R u l e 5 ; C o n c r e t e S t r i n g R u l e : / / . . .

Table 2 .
Expansion Statistics

Table 3 .
Macro Element Statistics

Table 4 .
Model Parameter Statistics

Table 5 .
Runtime of the Macro System in milliseconds