Create a Transpiler: From VBA to VB.NET

Create your own custom transpiler in a few, simple steps.

Updated Sep. 03, 19 · Tutorial

Likes (4)

Comment

Save

20.9K Views

In this tutorial, we show how to create a transpiler, a software that can be useful in a few scenarios: if you have a large codebase in a language that does not fit your needs anymore or if you still want to keep developing in one language, but you need better performance or need to run in a specific environment.

The example transpiler will convert Visual Basic for Applications (VBA) to Visual Basic.NET (VB.NET) code. There are companies that need to keep using VBA because it is a simple language used by non-developers inside Excel. However, there could also be a need for increased performance or to mix calculations made by analysts with parts developed by professional developers. In such cases, individuals can transpile from VBA to VB.NET and let everything run in a single environment.

What Is a Transpiler

A transpiler is software that translates a language to another one. It allows developers to preserve the investment made to create a code base and use that codebase in another context.

In short, a transpiler is a special kind of compiler. A compiler usually transforms source code into assembler or bytecode (to be run on the JVM or CLR, for example). So, you might say that a compiler transforms source code into any language, even one that is not human readable. A transpiler, on the other hand, converts source code to another language that is still human-readable. Then, a compiler transforms the new source code in an executable format.

Compiler vs. transpiler

Creating a transpiler seems like a daunting exercise, but in this article, we are going to see that it can be easily done. In this tutorial, although we are creating a transpiler that will translate VBA to VB.NET, the principles will apply to any transpiler.

How to Write a Transpiler

To build a transpiler there are usually two to three steps that need to be performed:

Parsing the original source code.
Performing partial Symbol Resolution.
Converting the AST in the destination language.

You need step one to transform the original source into an Abstract Syntax Tree (AST), a logical representation of the source code that can be managed programmatically. Step three is always necessary, as it transforms the AST in the destination language. You only need to perform step two if you need to understand more than simple parsing can tell you in order to transpile an instruction.

Symbol Resolution in a Transpiler

Symbol resolution consists of linking a piece of text to the corresponding entity. For example, you might need to link all the times a variable name/identifier is used to that specific variable-entity.

text As String
text = "Look at me!"

In this code, the identifier/name text refers both times to the same local variable.

Symbol resolution is absolutely necessary for a compiler. It is also usually necessary to perform it partially when you are building a transpiler. You do not need to perform any symbol resolution only when two languages are very similar — when you can translate each individual statement without looking up another statement. In this case, we can simply trust the final compiler that will check the correctness of the code we produce.

Why you might need to do that for a transpiler? Imagine the following piece of VBA code:

data(0)

Are you trying to access the element at index zero of the array data or you are trying to call the function data with argument 0? That is not immediately obvious from this instruction because in VBA you use the same syntax for both operations. So, you have to keep track of the declaration of data and whether it is a variable or a function.

This might also be needed if you are transpiling from a language that allows first-level functions (e.g., C++) to one that does not allow them (e.g., Java). You need to put all first-level functions in a custom class and then change all calls to first-level function in the original C++ code to functions of a custom Java class.

So, the kind and extent of symbol resolution that you have to do depends on both the source language and the destination language. This will vary from project to project.

Our Transpiler

In this article, we are going to build only a partial transpiler from VBA to VB.NET, but we are going to show the whole process that could be used for a complete transpiler.

Since the two languages are quite similar, we can also avoid doing a complete conversion of the original AST from VBA to VB.NET. Instead, we are going to rewrite the parts of the input that must be adapted for VB.NET. This approach requires less work, but each change will require to find the exact place where to make the change. In contrast, rewriting every instruction would be longer, but each instruction would be easier to write.

In both cases, the process is done the same way: we traverse the AST and perform the operations we need.

Since we are transpiling VBA in VB.NET, we do not really care about making a distinction between array indexing and function calls because VB.NET behaves in the same way. So, we will not need to change or keep track of instructions like the previous example data(0). However, our transpiler will need to perform an operation that needs the same approach: two-passes. This simply means that we have to traverse the AST two times to accomplish a transformation. Usually, the first pass is used to collect some information from multiple places, while the second pass uses that information to complete the transpilation.

The Structure of the Transpiler

The workflow of our transpiler is:

Parsing the original source with an ANTLR-based parser.
Making a first-pass of the code on the AST produced by ANTLR.
Taking the temporary and partially-transpiled code and parsing it with Roslyn.
Performing a second-pass to complete the transpilation with the Roslyn visitor.
Compiling the transpiled code with Roslyn.

Technically the step five is not part of the transpiler, but you probably will always want to compile the transpiled code, so we also perform this step. You generally want to do that to discover and report any error to the user. Hopefully, these errors are in the original source code and not due to your transpiler. You want to do that because you are transpiling and not compiling the original source code; you cannot know if the original source code is actually correct.

Usually, you cannot easily programmatically compile the transpiled code and report any error because the compiler is a completely separate program. However, thanks to Roslyn, the compiler-as-a-service, we can do that quite easily and report the results to the user. This could also offer the chance to perform automatically fixes to errors in the source code in case you wanted to do that.

We do something unusual with our transpiler. We make the second-pass on the AST produced by Roslyn instead of doing that on the AST produced by the first-pass (i.e., ANTLR). This, of course, requires you to parse the text twice, so it not the most efficient process. However, we choose to do that because Roslyn has a very advanced parser for VB.NET. This makes some changes easier to do on the AST produced by Roslyn than the one we have from the start.

The Main Program

The general structure of the transpiler is reflected by the main function of our program.

public static void Main(string[] args)
{
    string dir = $"{BasePath}/sampleCode/";

    Directory.CreateDirectory($"{BasePath}/transpilation_output/");

    SourceCode = new List<String[]>();

    // delete any previous transpilation artifact
    if (Directory.Exists($"{BasePath}/transpilation_output/transpiled_files/"))
        Directory.Delete($"{BasePath}/transpilation_output/transpiled_files/", true);

    List<Error> errors = new List<Error>();

    // parallel parsing
    Parallel.ForEach(Directory.EnumerateFiles(dir), basFile =>
    {
        // parsing and first-pass
ParseFile(basFile, ref errors);
    });

    // parsing errors
    if (errors.Count > 0)
    {
        Console.WriteLine("The following errors were found: ");
        foreach (var error in errors)
        {
            Console.WriteLine(error.Msg);
        }
    }

    // second-pass transpilation
    TranspileCode(FixerListener.StructuresWithInitializer);

    // compilation of the transpiled files
    CompileCode(NameProgram);

    Console.WriteLine("=== Process of Transpilation completed ===");
}

After initialization and cleanup (lines three to 13), we parse all input files in parallel (lines 16 to 20) and list all errors we find (lines 23 to 30). Notice that we pass the list of errors (line 19) by reference so that we can modify it inside the function and get one list of errors.

We directly show the errors reported by the parser. We show parsing errors because we assume they are due to errors made by the user (i.e., our parser is not faulty). In other words, if we cannot parse the input, this means that the original source code is syntactically incorrect.

Inside the ParseFile function, we parse and also make the first-pass of transpilation. Usually, this means that we gather information necessary to transpile some instructions. However, in this case, we do that and also do a partial transpilation. We can do that here because, as we said, VBA and VB.NET are similar enough that we can change a VBA source code file piece-by-piece until it becomes a VB.NET source code file.

After we have finished that, we complete the transpilation (line 32) and compile the resulting transpiled files (line 34).

Error!

Inside the function CompileCode, we also communicate to the user any compilation error. This is a bit tricky because the errors might be due to semantic errors in the original source code or mistakes/limitations of our transpiler. So, during development, we have to find a way to discriminate between types of errors and hopefully reduce limitations in our code to zero.

Furthermore, we have to make sure to communicate to the user all errors in relation to the original source code. For instance, imagine that the user tried to add a string and an int. This is a user error, and the compilation will fail. So, we have to communicate this to the user. However, the transpiled file seen by the compiler might have more lines than the original source file, so we have to keep track of which line in the original file caused the error for any affected instruction in the transpiled file.

Parsing VBA Source Files

There is not much to say about parsing. We simply use our trusted ANTLR and a ready-to-use grammar to get a parser with just a few lines. We load the input from the file, pass it to the lexer, and then pass it to the parser (lines five to eight).

private static void ParseFile(string basFile, ref List<Error> errors)
{            
    Console.WriteLine($"Parsing {Path.GetFileName(basFile)}");

    ICharStream inputStream = CharStreams.fromPath(basFile);
    VBALexer lexer = new VBALexer(inputStream);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    VBAParser parser = new VBAParser(tokens);

    // we remove the standard error listeners and add our own
    parser.RemoveErrorListeners();
    VBAErrorListener errorListener = new VBAErrorListener(ref errors);
    parser.AddErrorListener(errorListener);

    lexer.RemoveErrorListeners();
    lexer.AddErrorListener(errorListener);

    var tree = parser.startRule();

    [..]
}

Readers of our website should be familiar with ANTLR. In case you need more information, you can read our tutorial. In the companion repository for this article, the parser is already generated, so you do not need to do anything special to take advantage of ANTLR. You might use it like any other library.

In the following lines (11 to 16), we replace the default error listener (which would output to the console) with our custom error listener. This listener will record all errors, which we will show later to the user.

Finally, in line 19, we take the root of the AST from the parser, so that we can start visiting the tree.

First-pass of Transpilation

    FixerListener listener = new FixerListener(tokens, Path.GetFileName(basFile));
    ParseTreeWalker.Default.Walk(listener, tree);

    if (listener.MainFile == true)
        NameProgram = listener.FileName;

// we ensure that additions to the list are parallel-safe
    lock (Locker)
    {
        SourceCode.Add((file: basFile, text: listener.GetText()));
    }
}

We use a listener to perform the first-pass of our transpilation. We pass to the listener the stream of tokens (i.e., the output of the lexer) so that we can alter the input. The listener will also save the information needed for the second-pass.

In lines four to five, we check whether the current file is the one that contains the main function, the one that is called at the beginning of the program. We do that to use the name of the file as the name of the program (i.e., the executable produced by the compiler).

We add the code modified by the first-pass/listener to a list of source code representing our program. These source code files will be given to Roslyn for the second-pass and compilation. We add the code to the list inside a lock-protected section to avoid any issue because the parsing of the input file happens in parallel. Alternatively, we could have saved the output of all partially transpiled files and then added them together in the list after the parallel parsing. Given that we are parsing a few files and there is a small chance of conflict, this is the most efficient way.

The FixerListener Class

Now that the whole organization is clear, let’s see what the first-pass looks like. In other words, let’s see the the FixerListener class that gathers information for the second-pass and performs part of the transpilation.

It is a listener, so remember that the walker will automatically call the proper Enter and Exit methods whenever it finds the corresponding node.

public class FixerListener : VBABaseListener
{
    private CommonTokenStream Stream { get; set; }
    private TokenStreamRewriter Rewriter { get; set; }
    private IToken TokenImport { get; set; } = null;        
    private Stack<string> Types { get; set; } = new Stack<string>();
    private Dictionary<string, StructureInitializer> InitStructures { get; set; } = new Dictionary<string, StructureInitializer>();

    public string FileName { get; private set; }
    public bool MainFile { get; set; } = false;

    public static List<String> StructuresWithInitializer = new List<string>();

    public FixerListener(CommonTokenStream stream, string fileName)
    {
        Stream = stream;
        FileName = fileName.LastIndexOf(".") != -1 ? fileName.Substring(0, fileName.LastIndexOf(".")) : fileName;
        Rewriter = new TokenStreamRewriter(stream);
    } 

    [..]

Our listener inherits from the base listener that is generated by ANTLR. Inheriting from this listener allows us to implement only the methods we need. The rest remain the empty one, like the following example:

public virtual void ExitStartRule([NotNull] VBAParser.StartRuleContext context) { }

The first two lines of the class (lines three to four) define a CommonTokenStream and a TokenStreamRewriter. They are both needed to rewrite parts of the input. The variable TokenImport (line 5) holds the position of an Option, that, if it is present, must be put before any other instruction in the transpiled file.

The lines 6-8 contain definitions for the transpilation of VBA Type elements. These are the elements that are transpiled to a VB.NET structure. We need a two-pass solution to transpile them.

The rest is self-explanatory. Inside the constructor, on line 17, we check whether the filename given to us ends with an extension. If so, we eliminate the extension and save the result as the FileName that will have the transpiled file.

The First Steps

Now we can start seeing the first operations of a transpilation. For reference, this is a simple file created by exporting some VBA code from Excel.

VERSION 1.0 CLASS
BEGIN
  MultiUse = -1  'True
END
Attribute VB_Name = "ThisWorkbook"
Attribute VB_GlobalNameSpace = False
Attribute VB_Creatable = False
Attribute VB_PredeclaredId = True
Attribute VB_Exposed = True

Option Explicit

Public Sub Main_Sub()
   Dim message As String   
End Sub

To transpile this code to VB.NET we want to eliminate the header (VERSION etc.) and the config parts (Attribute etc.), but we need to keep the Option statement.

// get the name of the file
public override void EnterAttributeStmt([NotNull] VBAParser.AttributeStmtContext context)
{            
    if (context.implicitCallStmt_InStmt().GetText() == "VB_Name")
    {
        FileName = context.literal()[0].GetText().Trim('"');
    }

    // remove all attributes
    Rewriter.Replace(context.Start, context.Stop, "");
}

// determine whether an option is present
public override void EnterModuleDeclarationsElement([NotNull] VBAParser.ModuleDeclarationsElementContext context)
{
    // check whether an option is present
    if (context?.moduleOption() != null && !context.moduleOption().IsEmpty)
    {
        TokenImport = context.moduleOption().Stop;                
    }
}

In the function EnterAttributeStmt, we first check whether the attribute VB_Name is present. If so, we use it to get the definitive name of the transpiled file. We then delete all attributes, including VB_Name on line 10. This is the same instruction that we also use to delete all header and config parts in the proper functions. These functions are not shown here, but you can see them in the companion repository.

In EnterModuleDeclarationsElement, we check whether any Option is present. If so, we save the position where the lastOption ends. We do this because we have to add some Imports to the VB.NET files. Usually, theImports are at the beginning of the file, but anOption statement must be put before anything else. So, if anOption is present, it must be put first.

Wrapping Everything With a Module

Before looking at some examples of how to transpile specific statements, let’s conclude the overall view. Most of the VBA code is in a bunch of sparse functions and variables. This is not permissible in VB.NET, so we have to put everything inside a module. We do this at the end of our traversal of the tree. By doing this, we are sure that the elements we add here are exactly where they should be. So, they are not accidentally moved by something else.

// wrap the VBA code with a Module
public override void ExitStartRule(VBAParser.StartRuleContext context)
{
    // if an option is present it must before everything else
    // therefore we have to check where to put the module start

    var baseText = $"Imports System {Environment.NewLine}{Environment.NewLine}Imports Microsoft.VisualBasic{Environment.NewLine}Imports System.Math{Environment.NewLine}Imports System.Linq{Environment.NewLine}Imports System.Collections.Generic{Environment.NewLine}{Environment.NewLine}Module {FileName}{Environment.NewLine}";

    if (TokenImport != null)
    {
        // add imports and module
        Rewriter.InsertAfter(TokenImport, $"{Environment.NewLine}{baseText}");
    }
    else
    {
        // add imports and module
        Rewriter.InsertBefore(context.Start, baseText);
    }

    Rewriter.InsertAfter(context.Stop.StopIndex, $"{Environment.NewLine}End Module");
}

On line seven, we gather all the imports that we need for our specific project. In this simple case, we just store them in a string. We also add the start of the module declaration.

Then we do what we anticipated in the previous section: if there is an option we put the imports after that, otherwise we put the imports at the beginning of the file.

Finally, we put the End Module statement right at the end of the file.

Mind the Distance

As we said, VBA and VB.NET are quite similar, but there are some differences that we have to consider. In this section, we see one of the common ones. We will see another one in the second-pass.

// the Erase statement works differently in VBA and VB.Net
// in VBA, it deletes the array a re-initialize it
// in VB.Net, it only deletes it
public override void EnterEraseStmt([NotNull] VBAParser.EraseStmtContext context)
{
    Rewriter.Replace(context.Start, context.Stop, $"Array.Clear({context.valueStmt()[0].GetText()}, 0, {context.valueStmt()[0].GetText()}.Length)");
}

One difference pertains the Erase statement. In VBA, this statement deletes and re-initializes the array. However, in VB.NET it basically assigns the array to Nothing. Therefore, after anErase statement, we would now have a variable that is invalid.

The solution that we use on line 17 is to call the Array.Clear procedure to re-initialize every element of the array. However, there is a problem; this works for fixed-size arrays, but it does not work for dynamic size arrays. For these arrays, we could leave theErase statement as it is to obtain the desired behavior.

The problem is that we have to understand whether we are dealing with fixed-size arrays or dynamic arrays. How can we do that? That would be a job for a two-pass solution. In the first-pass, we record the kind of array declared in any declaration, and in the second-pass, we transpile the Erase statements. We need two passes because there are global declarations that can be in one file and used in other files. However, we do not do any of this here to keep the article at a manageable length. Instead, we see a more complex example of a problem that requires a two-pass solution.

Transpiling Types Into Structures

A Type in VBA is a custom data type that can be created by the user. Looking at the VB.NET documentation, we can see that a Structure is the corresponding element in VB.NET:

A structure is a generalization of the user-defined type (UDT) supported by previous versions of Visual Basic. In addition to fields, structures can expose properties, methods, and events. A structure can implement one or more interfaces, and you can declare individual access levels for each field.

Sounds goods, right? There is a problem, though. Try to transpile the following Type.

Public Type Data
Num As Integer
Text(1 To 10) As String
End Type

Into a corresponding Structure.

Public Structure Data
Num As Integer
Text(1 To 10) As String
End Structure

You will get an error from the compiler:

Arrays declared as structure members cannot be declared with an initial size

So, we can only transpile the array field Text in this way.

Public Structure Data
Num As Integer
Text() As String
End Structure

The problem, of course, is that the two pieces of code are obviously quite different. So, how do we solve this issue? We have to transpile the array fields in the only way we can. Then, we have to initialize them in some way.

Changing a Type Into a Structure

We begin with changing a Type into a Structure.

// transform a Type in a Structure
public override void EnterTypeStmt(VBAParser.TypeStmtContext context)
{
    var typeName = context.ambiguousIdentifier().GetText();
    Types.Push(typeName);
    InitStructures.Add(typeName, new StructureInitializer(typeName));

    Rewriter.Replace(context.TYPE().Symbol, "Structure");
    Rewriter.Replace(context.END_TYPE().Symbol, "End Structure");

    string visibility = context.visibility().GetText();

    foreach(var st in context.typeStmt_Element())
    {
        Rewriter.InsertBefore(st.Start, $"{visibility} ");
    }
}

Inside the function EnterTypeStmt, we have to do a few things. First, we need to set the current type name and set up the element that will contain the initializer for the Structure (lines four to six). The current type name will be used when visiting the declarations of the individual type fields (line 13).

Then we change the keyword Type into the keyword Structure (lines 8 to 9).

Finally, in the lines (11 to 16) we take the global visibility of the original Type and we apply it to every element of the Structure.

// you cannot initialize elements inside a Structure
// since VBA Type(s) are transformed in VB.NET Structure(s)
// we remove the initialization of array
public override void ExitTypeStmt_Element([NotNull] VBAParser.TypeStmt_ElementContext context)
{
    var currentType = Types.Peek();

    if (context.subscripts() != null && !context.subscripts().IsEmpty)
    {
        InitStructures[currentType].Add($"ReDim {context.ambiguousIdentifier().GetText()}({context.subscripts().GetText()})");

        StringBuilder commas = new StringBuilder();
        Enumerable.Range(0, context.subscripts().subscript().Length - 1).ToList().ForEach(x => commas.Append(","));
        Rewriter.Replace(context.subscripts().Start, context.subscripts().Stop, $"{commas.ToString()}");
    }
}

While we are checking every element of the type, we search for array declarations. If we find one, we copy the original array declaration from the VBA file (line 11) and save it for the initializer. Then, we replace the original declaration with one for a dynamic array with the same number of dimensions.

What Do We Do With the Initializer?

Now, we still have to understand where and how to use the initializer.

// we add initialization Sub for the current Structure
public override void ExitTypeStmt([NotNull] VBAParser.TypeStmtContext context)
{
    var currentType = Types.Pop();

    if (InitStructures.ContainsKey(currentType) && InitStructures[currentType].Text.Length > 0)
    {
        Rewriter.InsertBefore(context.Stop, InitStructures[currentType].Text);

        StructuresWithInitializer.Add(currentType);
    }
    else
    {
        InitStructures.Remove(currentType);
    }
}

The first step is to write the code for the initializer and then of the Structure. Basically, we now have added a Sub called Init{name_of_Structure} with all the initialization code for the array. We have not shown it here, but the InitStructure class takes care of adding the beginning and end of the aforementioned Sub.

However, we still have to call the initializer whenever necessary. This will be done during the second-pass.

We Need an Entry Point

Now, we have to take care of the main Sub of our program. The problem is that a C++ program requires a main function, so we must ensure that there is one.

// we search for the Main Sub
public override void EnterSubStmt([NotNull] VBAParser.SubStmtContext context)
{            
    if (context.ambiguousIdentifier().GetText().Trim() == "Main_Run" ||
        context.ambiguousIdentifier().GetText().Trim() == "Main_Sub" ||
        context.ambiguousIdentifier().GetText().Trim() == "Main")
    {
        MainFile = true;

        Rewriter.Replace(context.ambiguousIdentifier().Start, "Main");
        // Some function of VB.Net are culture-aware,
        // this means, for instance, that when parsing a double from a
        // string it searchs for the proper-culture decimal separator (e.g, ',' or '.'). So, we set a culture that ensure
        // that VB.Net uses a decimal separator '.'                
        Rewriter.InsertBefore(context.block().Start, $"{Environment.NewLine}Dim sw As System.Diagnostics.Stopwatch = System.Diagnostics.Stopwatch.StartNew(){Environment.NewLine}");
        Rewriter.InsertBefore(context.block().Start, $"{Environment.NewLine}System.Globalization.CultureInfo.CurrentCulture = System.Globalization.CultureInfo.InvariantCulture{Environment.NewLine}");
        // make the program wait at the end                
        Rewriter.InsertBefore(context.block().Stop, $"{Environment.NewLine}Console.WriteLine(\"Press any key to exit the program\"){Environment.NewLine}Console.ReadKey(){Environment.NewLine}");
        Rewriter.InsertBefore(context.block().Stop, $"{Environment.NewLine}sw.Stop(){Environment.NewLine}Console.WriteLine($\"Time elapsed {{sw.Elapsed}}\"){Environment.NewLine}");
    }
}

The implementation of this function varies depending on your needs. Here, we show an example of what you can do. The main issue is that your initial VBA code might not have a proper Main procedure, but VB.NET needs one. In our example, we search for any Sub called Main_Run, Main_Sub or Main. Then, we ensure that the name of the Sub becomes Main. We also set the MainFile to true, so that we can use the name of this file as the name of the program.

The rest of the function is a bit arbitrary. We just do a bunch of things that we found useful. For instance, since VB.NET is culture-aware, it can change how it parses and output a decimal number. A number like 5.6 might be outputted as 5,6, depending on who executes the program. On line 15, we set the current culture as an InvariantCulture, so that we always see decimal numbers with the dot (.) as decimal separator. Whether you want to do this obviously depends on your needs.

We also set two calls to measure how much time it takes to complete the program (lines 15 and 19). Finally, we force the console to not close until we press a key (line 18).

Notice that since we use the method InsertBefore, the actual position in the transpiled files will be reversed. So, for instance, what we add on line 15 will appear after what we add on line 16.

As we said, these are mostly arbitrary additions, but they are a simple demonstration that you can add whatever you want while you are transpiling. In a real scenario, you might want to add memoization or perform some other optimizations to your code.

How We Can Get the Transpiled Code

The last method to see in the FixerListener class is the method that returns the modified text, i.e., the transpiled text.

// returns the changed text
public string GetText()
{
    return Rewriter.GetText();
}

This is very simple. Thanks to the method GetText of the TokenStreamRewriter. Note that the rewriter does not actually change the input; it just records all the changes that you add to the rewriter, and then it plays them out when you ask for the rewriter's text. This is very handy if you need to transform the input.

The TranspileCode Method

Since we have already seen the general structure of the program, we can directly go to the TranspileCode method. This is where we finish the transpilation. This data will be used in the second-pass.

private static void TranspileCode(List<String> structuresWithInitializer)
{
    List<SyntaxTree> vbTrees = new List<SyntaxTree>();
    VBARewriter rewriter = new VBARewriter(structuresWithInitializer);  

    Parallel.ForEach(SourceCode, sc => {
        Console.WriteLine($"Completing transpilation of {Path.GetFileName(sc.file)}");

        vbTrees.Add(rewriter.Visit(VisualBasicSyntaxTree.ParseText(sc.text).GetCompilationUnitRoot()).SyntaxTree.WithFilePath(sc.file));                
    });           

    vbTrees.Add(VisualBasicSyntaxTree.ParseText(File.ReadAllText($"{BasePath}/Libs/Runtime.vb")).WithFilePath($"{BasePath}/Libs/Runtime.vb"));

    // create the necessary directories            
    Directory.CreateDirectory($"{BasePath}/transpilation_output/transpiled_files/");    

    foreach (var vt in vbTrees)
    {
        string fileName = Path.GetFileName(vt.FilePath);

        if (fileName.LastIndexOf(".") != -1)
            fileName = fileName.Substring(0, fileName.LastIndexOf("."));

        fileName = fileName + ".vb";

        Console.WriteLine($"Writing on disk VB.NET version of {Path.GetFileName(vt.FilePath)}");
        File.WriteAllText($"{BasePath}/transpilation_output/transpiled_files/{fileName}", vt.ToString());
    }
}

What happens in the method is quite simple. We parse the source code as changed by the FixerListener, add the parse tree created by the Roslyn parser, after it has been changed again by the VBARewriter (lines 6-10). The VBARewriter is what performs the second-pass of our transpilation process.

After that, we add the Runtime (line 12), which contains any additional code that we need. For instance, it may contain Excel methods or any other library function that you use in your code. If you can run the transpiled program on the .NET Framework, this will save you some time. That is because the .NET Framework already contains some utility functions, like Strings.Right, that are not part of .NET Core.

Finally, we write the final transpiled files onto the disk with the correct extension. We will compile them later.

How to Initialize a Variable

All of the code in our VBARewriter is needed for the proper initialization. Despite the difference in terminology between ANTLR and Roslyn, this is like a visitor. Actually, there is a class in Roslyn called VisualBasicSyntaxVisitor. The difference is that that class cannot alter the visiting nodes, while this one can. Since this is a visitor, you can govern the path of the visit, so, for instance, you can stop the visit while visiting any node.

The overall strategy is quite simple:

We have to find all declarations of variables with a type one of the Structure that necessitates an initializer.
We have to add a call to the initializer after every declaration.

The execution is a bit complicated because there are two cases: the variable can be at the module level or inside a method. If the variable is declared inside a method, we can simply add an initializer after the declaration. However, if the variable is declared at the module level we have to add a module constructor (a Sub New()) and add the initializer there. That is because the variable could be used anywhere else in the program. So, we cannot simply add an initializer before the first use of the variable.

Adding an Initializer for a Module Variable

As we said, the method to initialize a variable declared directly inside a module is the most complicated. So, we are going to see that one.

Now, we are going to see the first part that collects the variables to initialize and create the statements to initialize them.

public override SyntaxNode VisitModuleBlock(ModuleBlockSyntax node)
{            
    var initInvocations = new SyntaxList<StatementSyntax>();

    var space = SyntaxFactory.SyntaxTrivia(SyntaxKind.WhitespaceTrivia, " ");

    var newline = SyntaxFactory.SyntaxTrivia(SyntaxKind.EndOfLineTrivia, Environment.NewLine);            

    for (int a = 0; a < node.Members.Count; a++)
    {                
        if (node.Members[a].IsKind(SyntaxKind.FieldDeclaration))
        {                                     
            foreach (var d in (node.Members[a] as FieldDeclarationSyntax).Declarators)
            {                        
                if (StructuresWithInitializer.Contains(
        d?.AsClause?.Type().WithoutTrivia().ToString()))
                {                            
                    foreach (var name in d.Names)
                    {
                        if (name.ArrayBounds != null)
                        {
                            initInvocations = initInvocations.Add(CreateInitializer(name.Identifier.ToFullString().Trim(), d.AsClause.Type().WithoutTrivia().ToString(), true).WithTrailingTrivia(newline));
                        }
                        else
                        {
                            initInvocations = initInvocations.Add(CreateInitializer(name.Identifier.ToFullString().Trim(), d.AsClause.Type().WithoutTrivia().ToString(), false).WithTrailingTrivia(newline));
                        }                                
                    }
                }
            }
        }
    }

    [..]

Logically, the code is quite simple: we search for declaration of variables and check their type. If the type of a variable is one of a structure that requires an initializer, we add a statement to initialize it. The statements are stored in the list initInvocations. We do not add them right away.

The actual code seems a bit more complicated because of the Roslyn syntax. Roslyn is very useful and powerful, but makes editing source code difficult. That is because by default everything is immutable. it is not hard to learn all the terminology of the different parts but requires a bit of time. However, it is easy to understand it, while you are reading it in context.

For example, everything that has trivia in its name is unnecessary for the code, like whitespace or end of line characters. So, the code on line 16 means that we want the identifier of the type WithoutTrivia without any starting or ending space.

In case you need a deeper introduction to Roslyn, you can see our article on Getting Started with Roslyn.

Initializing an Array of Structures

So, in the end, the only unclear part of this piece of code might be on the lines 20-27. We distinguish between the case of an array variable and a simple variable. Then, we call the function CreateInitializer with the proper arguments. The reason becomes obvious once you see the transpiled code.

Public Structure Data
    Public Num As Integer
    Public Text() As String
    Sub InitData()
        ReDim Text(0 To 10)
    End Sub
End Structure

Public Example As Data
Public Examples(10) As Data

We obviously cannot call the method InitData directly on Examples because it is an array. Instead, we can call it only on its individual elements.

We are going to see the function CreateInitializer later. Now, we are going to see the rest of the method, that is how to create a constructor for the module.

The Module Constructor

The constructor for the module will contain all the initializations for the variables.

if (initInvocations.Count > 0)
    {           
        var subStart = SyntaxFactory.SubStatement(SyntaxFactory.Identifier("New()").WithLeadingTrivia(space)).WithLeadingTrivia(newline).WithTrailingTrivia(newline);                
        var subEnd = SyntaxFactory.EndSubStatement(
            SyntaxFactory.Token(SyntaxKind.EndKeyword, "End ").WithLeadingTrivia(newline), SyntaxFactory.Token(SyntaxKind.SubKeyword, "Sub")).WithTrailingTrivia(newline);

        var moduleConstructor = SyntaxFactory.SubBlock(subStart, initInvocations, subEnd);

        node = node.WithMembers(node.Members.Add(moduleConstructor));
    }

    return base.VisitModuleBlock(node);
}

We create the delimiting instructions separately for the constructor (i.e., Sub New() and End Sub). Then we add to the block of the Sub the statements for the initializations. We create the complete constructor on line 7.

The statements on lines 3 to 4 might seem a bit confusing, but they are actually quite simple. You can see this more clearly with special formatting.

SyntaxFactory.SubStatement(
SyntaxFactory.Identifier("New()").WithLeadingTrivia(space)
)
.WithLeadingTrivia(newline)
.WithTrailingTrivia(newline);

We are creating a Sub statement with the identifier New() and a space after the identifier. Then, we add a newline both before and after the Sub statement.

Using the identifier to also add the empty parameter lists, might not be the cleaner way to add an empty parameter list. But it works. And the proper way is quite long.

The proper method to add a list of parameters

Adding a list of parameters

Finally (line 9) we add the module constructor to the current module. Given that in Roslyn everything is immutable, we actually create a new module. We take the current module and add the module constructor to its Members field. Then, we assign the resulting new module to the current module.

Since we want to continue the visit, we call the method of the base class (line 12).

The CreateInitializer Method

The method that actually creates the initializer is quite short.

private StatementSyntax CreateInitializer(string identifier, string type, bool isArray)
{
    if(isArray)
    {                
        return SyntaxFactory.ParseExecutableStatement($"For Index = 0 To {identifier}.Length - 1{Environment.NewLine}" +
            $"Call {identifier}(Index).Init{type}(){Environment.NewLine}" +
            $"Next{Environment.NewLine}");
    }
    else
    {
        return SyntaxFactory.ParseExecutableStatement($"Call {identifier}.Init{type}()");
    }
}

To initialize a simple field we just need to call the Init{name_of_Structure} method on the Structure (line 11). Instead of initializing an array, we have to walk through all the elements of the array and call the Init{name_of_Structure} method on each of them.

Our example just works on arrays with one dimension (e.g., Data(10)). However, it is not hard to generalize the method to an array of all dimensions. We would just have to add a nested for cycle for each dimension of the array.

Fixing a Subtle Difference

Now, we are going to solve another issue due to a difference between VBA and VB.NET. In VBA, you can start an array from the index one, while in VB.NET, you cannot. All arrays must start at index zero. All we have to do to solve this issue is check whether we have a fixed-size array that is initialized in a manner similar to the following example.

Dim strings(1 To 10) As String

If that is the case, we check whether the first number is a literal that corresponds to 1. In case that is true, we change it to zero.

The VBARewriter class also contains the method VisitArgumentList that fixes this problem. We fix it here instead of the first pass because of the possible presence of array declarations as a field in a Type. If we solved this issue during the first-pass, we could leave some Type fields incorrect. That is because we would have copied the original array declaration of any array field in type without fixing it, so they could still contain an array that starts from one.

For example, it could happen something like the following.

'In VBA was originally
Public Nope(1 To 10) As String
Public Type Data
    Public Text(1 To 10) As String
End Type

'In VB.NET became
'This is fixed during the first-pass
Public Nope(0 To 10) As String
Public Structure Data
    Public Text() As String
    Sub InitData()
       'We have yet to fix this statement
        ReDim Text(1 To 10)
    End Sub
End Structure

All we have to do is to find any 1 as the first element in a range used inside an array declaration. Since the code is quite simple, you can see this method in the repository.

Compiling the Transpiled Code

Roslyn might make editing the original code a bit hard, but it makes compiling it very easy. We can see it by looking at the method CompileCode.

private static void CompileCode(string projectName)
{   
    List<SyntaxTree> vbTrees = new List<SyntaxTree>();

    var files = Directory.EnumerateFiles($"{BasePath}\\transpilation_output\\transpiled_files");

    foreach(var f in files)
    {
        vbTrees.Add(VisualBasicSyntaxTree.ParseText(File.ReadAllText(f)).WithFilePath(f));
    }

    // gathering the assemblies
    HashSet<MetadataReference> references = new HashSet<MetadataReference>{
        // load essential libraries
        MetadataReference.CreateFromFile(Assembly.Load(new AssemblyName("mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089")).Location),
        MetadataReference.CreateFromFile(Assembly.Load(new AssemblyName("System, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089")).Location),
        MetadataReference.CreateFromFile(Assembly.Load(new AssemblyName("Microsoft.VisualBasic, Version=10.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a")).Location),
        MetadataReference.CreateFromFile(Assembly.Load(new AssemblyName("System, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089")).Location),
        // load the assemblies needed for the runtime
         MetadataReference.CreateFromFile(Assembly.Load(new AssemblyName("System.Data, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089")).Location),
        MetadataReference.CreateFromFile($"{BasePath}\\Dlls\\System.Data.SQLite.dll")
    };                     

    var options = new VisualBasicCompilationOptions(outputKind: OutputKind.ConsoleApplication, optimizationLevel: OptimizationLevel.Release, platform: Platform.X64, optionInfer: true, optionStrict: OptionStrict.Off, optionExplicit: true,
                concurrentBuild: true, checkOverflow: false, deterministic: true, rootNamespace: projectName, parseOptions: VisualBasicParseOptions.Default);

    // compilation
    var compilation = VisualBasicCompilation.Create(projectName,
                         vbTrees,
                         references,
                         options
                        );   

    Directory.CreateDirectory($"{BasePath}/transpilation_output/compilation/");

    var emit = compilation.Emit($"{BasePath}/transpilation_output/compilation/{projectName}.exe");

The steps are easy. We gather the source and assemblies, and then we compile the code. To find the complete name of the assemblies, we simply had to look manually in an existing project.

The constructor for the options variable (line 24) is more complicated than it is necessary. You do not have to list all the options we picked. The only required one is the first: outputKind. However, if you are compiling programmatically, the output of a transpiler is frequent to want to ensure that all options are set the way you want them. Do this to avoid any false positive or unexpected changes in the results. They are all self-explanatory options, except for deterministic.

The variable emit (line 35) contains the results of the compilation. The second part of the method deals with them.

    if (!emit.Success)
    {
        Console.WriteLine("Compilation unsuccessful");
        Console.WriteLine("The following errors were found:");

        foreach (var d in emit.Diagnostics)
        {                    
            if (d.Severity == DiagnosticSeverity.Error)
            {
                Console.WriteLine(d.GetMessage());
            }
        }

        // we also write the errors in a file
        using (StreamWriter errors = new StreamWriter($"{BasePath}/transpilation_output/compilation/errors.txt"))
        {
            foreach (var d in emit.Diagnostics)
            {
                if (d.Severity == DiagnosticSeverity.Error)
                {
                    errors.WriteLine($"{{{Path.GetFileName(d.Location?.SourceTree?.FilePath)}}} {d.Location.GetLineSpan().StartLinePosition} {d.GetMessage()}");
                }
            }
        }
    }
    else
    {
        Directory.CreateDirectory($"{BasePath}/transpilation_output/compilation/x86");
        Directory.CreateDirectory($"{BasePath}/transpilation_output/compilation/x64");

        // we have to copy the Dlls for the runtime
        foreach (var libFile in Directory.EnumerateFiles($"{BasePath}\\Dlls\\"))
        {
            File.Copy(libFile, $"{BasePath}\\transpilation_output\\compilation\\{Path.GetFileName(libFile)}", true);
        }
        foreach (var libFile in Directory.EnumerateFiles($"{BasePath}\\Dlls\\x86\\"))
        {
            File.Copy(libFile, $"{BasePath}\\transpilation_output\\compilation\\x86\\{Path.GetFileName(libFile)}", true);
        }
        foreach (var libFile in Directory.EnumerateFiles($"{BasePath}\\Dlls\\x64\\"))
        {
            File.Copy(libFile, $"{BasePath}\\transpilation_output\\compilation\\x64\\{Path.GetFileName(libFile)}", true);
        } 

        // we copy a SQLite Db           
        File.Copy($"{BasePath}\\Data\\data.db", $"{BasePath}\\transpilation_output\\compilation\\{Path.GetFileName($"{BasePath}\\Data\\data.db")}", true);           
    }
}

If the compilation fails, we have to show errors to the user. We also save them to a file for logging purposes. In a real application, we would send them to a logging system that would help us understand when our transpiler fails. Collecting errors could also be useful to perform automatic fixes in case the users themselves made some common mistake.

In the case that compilation succeeds, the output directory contains the finished executable. However, we might have to add any required .DLL file manually. For instance, if all our code uses a specific library, we can add this way. This is needed for our Runtime code. That is also the reason because we copy a model SQLite database.

The Runtime

The Runtime we need is quite simple. Obviously, your case might be different.

Public Module Runtime
    Public db As SQLiteConnection = New SQLiteConnection()

    Public Sub OpenSqlite()
        db.ConnectionString = "Data Source=data.db;Version=3;"
        db.Open()
    End Sub

    Public Sub CloseSqlite()
        db.Close()
    End Sub

    Public Sub SaveData(message As String)
        OpenSqlite()

        Dim sql As StringBuilder = New StringBuilder($"Insert Into 'Messages' VALUES ( ")

        sql.Append("""" + DateTime.Now.ToString() + """ , ")
        sql.Append("""" + message + """")
        sql.Append(")")

        Dim stmt As New SQLiteCommand(sql.ToString(), db)
        stmt.ExecuteNonQuery()

        CloseSqlite()
    End Sub
End Module

Public Class Debug
    Public Shared Sub Print(message As String)
        Console.WriteLine(message)
    End Sub

    Public Shared Sub Print(format As String, ByVal ParamArray args() As Object)
        Dim arg As String = format
        For i As Integer = 0 To UBound(args, 1)
            arg = arg & " " & args(i)
        Next i

        Console.WriteLine(arg)
    End Sub
End Class

We add a couple of methods in the runtime to save data and printing on the console. The Debugclass is usually available in VBA programs, while SaveData is a simple custom method that we have added to our runtime.

In the repository, we have also included a simple example VBA file.

VB.NET file IO Data structure Database Element Data Types Parser (programming language) Pass (software)

Published at DZone with permission of Gabriele Tomassetti, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending