tech

[English] [Japanese]

X. The Importance of Self-Hosting and Test Design in Language Development Quality Control

If you are looking at this document, I think most of you are coding in a "programming language" every day. And there are probably many people who are busy with bug squashing work that someone else has put in place.

The "programming language" that we are indebted to on a daily basis is also software created by someone else, so quality control is important to ensure that no bugs occur.

Maybe I'm lucky (?), but I've never encountered a language bug in my work with programming languages.

There are language standards that seem like bugs. . .

If you encounter a programming language bug, I think it's one of the most annoying software bugs to deal with.

It's difficult to get to the point that it's a programming language bug in the first place.

Here, I would like to introduce the test method for the original language that I am developing.

Programming languages can be classified into the following two types.

  • Compiler type
  • interpreted

Since the language I'm developing is compiled, I'll be talking about compiler-style tests from now on.

By the way, I will introduce LuneScript, the original language I am developing, in the next article.

https://qiita.com/dwarfJP/items/21d4d4099ab0feb68eaf

In the future, I would appreciate it if you could help me with the test design examination of those who are thinking about developing their own language.

the compiler is a function

The job of a compiled programming language is to convert the code written in that programming language into machine language.

for example,

  • C language compiler converts to native code
  • Java's compiler translates to JVM code
  • C# translates to CIL
  • Clang converts to LLVM-IR, LLVM converts to various codes

In other words, you can think of a compiler as "one big function that takes an input and returns an output based on that input."

If you think of the compiler as a function, testing it is very simple. You can test it by giving various inputs and comparing the output with the expected value.

The concept is exactly the same as the UNIT TEST of the functions that we create on a daily basis.

Compiler test

In our own language, LuneScript, we perform the following tests:

  • Self-hosted build of LuneScript itself
  • Normal form of all syntaxes supported by the language
  • Anomalies in all syntaxes supported by the language

Self-hosting is especially important here.

By self-hosting, even if you do not write test code, your own code will be the test code as it is.

Writing meaningful test code at a certain scale is quite a painstaking task.

Especially in the case of a proprietary language that no one else uses, it is not easy to find proper working code on github instead of test code. Test code at scale is priceless.

When you're self-hosting, your own code becomes that precious test code.

However, even if you say that your own code becomes test code as it is, that alone does not make it a sufficient test. The syntax and design patterns used are biased, and in terms of comprehensiveness, the test is not good enough. Also, abnormal code that causes compilation errors cannot be placed in your own self-hosted code. Therefore, the self-hosted code is not enough as a test case, and a test to comprehensively check the normal system and an abnormal system test to detect compilation errors are required separately.

By preparing expected values in advance for the normal and abnormal tests, you can check the success or failure of the test.

On the other hand, how should I judge "whether the result of compiling my own code is correct"? is the problem.

Since test case code is generally immutable, its compilation results are also immutable. In other words, once you create a test case/expectation pair, you can keep using the same expectation as long as you don't change the test case.

On the other hand, your own self-hosted code will naturally change. In other words, the expected value is always changing, so it is impossible to prepare the expected value in advance.

Then, how to judge whether the compilation result of your self-hosted code is correct or not is judged by whether or not the following holds true.

「使用中のコンパイラでのテストケースの結果」 == 「新しくコンパイルしたコンパイラでのテストケースの結果」

Assuming your compiler is working correctly, this is the result of a test case run with your working compiler that is working correctly, and the result of a test case running with a newly compiled compiler. The logic is that if are identical, then the newly compiled compiler is also correct.

Additionally, it is compiling itself again with a newly compiled compiler. This is done to ensure that the output results are exactly the same when compiling the same code.

In summary, the LuneScript test does the following:

step1
Compile your self-hosted code using your current compiler A to generate compiler B
step2
Compile your code again using compiler B to generate compiler C
step3
Compile your code again using compiler C to generate compiler D
step4
Ensure Compiler C and Compiler D are Identical
step5
Executes normal and abnormal tests of compiler A and saves test results in result A
step6
Executes normal and abnormal tests of compiler D and saves the test results in result D
step7
Check that result A and result D are identical

After passing the above test, use compiler D as the latest compiler A from the next time. In addition, normal and abnormal tests of the extended language specification will be added as needed.

In the case of self-hosting, if there is a problem, it will not be possible to compile itself, and there is a possibility that development will not proceed. By doing this testing, you can be sure that your newly built compiler will work correctly, and you can safely proceed with language enhancements.

In the case of proprietary languages, I think the timing of transition to self-hosting is extremely important.

The larger the code size of the compiler, the longer the time it takes to port. We recommend that you migrate.

still the bug remains

I introduced the tests that are being conducted in the original language development, but unfortunately bugs remain even after testing.

The cause of the bug can be classified into the following two.

  • Cases in which an abnormal system cannot be detected
  • A case where it does not work even though it should work normally

Among the above two cases, there are overwhelmingly many cases in which abnormal systems cannot be detected.

For the normal path, it is sufficient to write code that conforms to the language specifications and check that it works, but for the abnormal path, it is necessary to write code that deviates from the language specifications and detect errors.

It is quite difficult to "deviate from the language specification", and there are many holes.

Rather than seeking a perfect test from the beginning, it is important in testing to find such a "hole", add test cases to close it, and respond so that when the same "hole" opens again, it can be detected. I think.

lastly

I believe that the reason why we have been able to continue developing our own language is that we have been able to proceed with the following test policy.

  • An early move to self-hosting

    • Moving to self-hosting will inevitably require a certain level of quality assurance

      • Self-hosting is hindered if the quality is poor, so the quality is maintained naturally.
  • Don't aim for 100% testing from the start

    • The goal is to develop your own language, not to develop tests.
    • You can concentrate on your own language.
  • Simultaneous expansion of language specifications and expansion of test code

    • Prevent omissions and degradation of tests
  • Tested at the compiler input/output level, not at the function level

    • Function-level tests require test case changes every time the design changes, but compiler input/output level tests do not require test case changes unless the compiler specifications change.

When developing your own language, I think it's more efficient to first aim for self-hosting in terms of ensuring quality.

that's all.