Using tools to auto-generate code from high-level models or specifications looks like a cool idea. Though generating code instead of hand-writing it from scratch has many advantages, one needs to also be aware of the disadvantages of this approach. This article takes a closer look at auto-generating code.
When I first got exposed to auto-code generators, I was quite impressed with what they could achieve. For example, YACC (Yet Another Compiler Compiler) is a compiler generator. To explain what a compiler generator means in layman terms, it’s when you provide the specification (i.e., language grammar) for parsing interspersed with hand-coded actions, and the YACC tool generates code corresponding to that grammar. YACC is a compiler; in other words, it’s a compiler that takes the grammar and generates as output the code for a compiler that recognises that grammar. So, instead of writing a compiler (the parser component of a compiler, to be specific), which can be a really complex task, you can specify the grammar in the format required by YACC, and you get parser code auto-generated. You can use that parser to parse the input source files.
Similarly, you can generate a lexical analyser using the Lex tool. A lexical analyser takes source code, which is a stream of characters, and generates a stream of tokens (for this reason, a lexical analyser is also known as a tokeniser); these tokens are then fed to the parser. Coming back to Lex, you can specify what tokens you need to generate in the format required by Lex, and the Lex tool will generate a lexical analyser code for you. Looks cool, doesn’t it?
Tools like lex and yacc can be considered to be domain-specific languages (DSLs). These are designed for a specific domain, and are specialised languages for performing tasks in those domains. You can contrast DSLs with general-purpose languages such as C++ and Java, which are pretty much domain-neutral, and you can use them for a vast range of programming tasks. Not all DSLs generate compilable code. For example, SQL (Structured Query Language) is a DSL and is meant for interacting with a database. These DSLs can, in turn, be considered as Application Oriented Languages (AOLs). AOLs are specialised languages that are designed to program for a specific application or problem domain. AOLs allow us to worry about the problem domain, and help us focus on specifying the problem instead of the implementation details. AOLs are a good target for auto-generating code. Well-known examples are code generators from UML diagrams where you specify your high-level design in UML diagramming tools (such as ArgoUML, Enterprise Architect and Rational Rose), and you can get code automatically generated from those diagrams. Other examples are MatLab and Simulink, used for mathematical analysis and modelling.
Another well-known kind of auto-code generator includes GUI builders, which most of you may already be familiar with. For example, you can use tools such as NetBeans (for Java) or Visual Studio (for C#/VB) for UI design, and these tools will automatically generate compilable source code. Of course, you need to add things such as event-handling code and business-logic code to the generated code, but they already allow you to focus on the UI design by just dragging and dropping UI controls instead of writing all the template code in a text editor.
At some point you may have used wizards that generate code for you for your specific needs. These are quite popular, especially for simplifying programming for repetitive tasks. They are widely used, especially in enterprise projects, since they help software companies make their programmers more productive. Also, product companies that sell software packages targeting programmers, attract potential developers by showing how software developers can reduce their programming efforts if they buy their software package. As a programmer, you can now simply click or drag and drop to create programs instead of typing at the keyboard for long hours. Looks quite attractive, doesn’t it?
So, what are the advantages of generating code?
Improved productivity. Automated code generators allow you to work at the level of specifications or models, and communicate in the terms used in the application domain. With general-purpose languages, you need to work in terms of low-level constructs such as for and while loops, which is quite tedious and time consuming.
Reduced complexity. When programmers work at higher levels of abstraction, they can perform complex tasks with ease, compared to low-level constructs in general-purpose languages.
Less buggy code. When you hand-code something complex, it’s likely to have numerous bugs. Worse, with high complexity, you probably won’t know what bugs there are in the hand-written code. The best way to overcome this problem is not to write code at all ! The code is auto-generated from the specification, so if the specification is correct, you can be certain that the generated code is correct.
These advantages are well-known. That’s why such auto-code generators appear to be a great idea. But that’s not the end of the story. If you have considerable working experience with using such auto-code generators, you are probably aware of the numerous problems and disadvantages of using them. We’ll discuss them in greater detail now.
Developers understand the code they have written, and will be able to understand code written by fellow programmers who are human beings. The generated code is auto-generated by a program, is often quite complex, and often unreadable and it’s not an exaggeration if I say it is sometimes a nightmare to read auto-generated code! So, maintainability and understanding suffers with auto-generated code.
The auto-generated code often needs to be customised to make it usable for your requirements or to get it working. For example, if you create screens using GUI builders, the generated code will contain empty event handlers with TODO entries for you to enter code for specific events such as mouse clicks and mouse moves. You will have to figure out what these methods are, and understand what to write in there. They also contain hook methods for inserting your own logic or adding business logic. Often, developers add the bare minimum code necessary to get the program working. If the code segments with default behaviour are not exercised during the testing, the tests will pass and the software will get released. But bugs will be found in actual usage, and get filed as such. Forgetting to customise code is the cause of numerous bugs related to auto-generated code.
The major difficulty with auto-generated code is that it’s difficult to integrate programmer code into the generated code. Consider the following scenario. You create large UML diagrams in a tool that can generate Java code. In the generated code, you have added lots of business logic, and made modifications to the parts of the generated code. For the next release of the software, you get new requirements and need to change the UML diagrams. However, you cannot make changes to the UML diagrams and generate code because the existing modified code is not in sync with the newly generated code! This occurs, assuming that the UML design tool does not support syncing the changes back to the diagrams from the code, which is the case with most of the tools available today. Hence, you need to make all the changes in the actual code from the earlier version, without using the UML design tool, which defeats the very purpose of auto-generating code this is the most significant problem with it.
One practical way to integrate generated code with the programmers code is to maintain strict separation between them by keeping them in physically separate files. In this approach, programmers don’t directly make changes to the file containing generated code. Rather, they make changes in a separate file, for example, by overriding the hook methods. So, an argument to this approach is that if the higher-level specification or model changes, then the code can be regenerated, and that will not impact the programmer-written code. Yes, this solves some of the problems with mixing generated code and programmer written code. However, it still does not address the problem of when the generated code needs to be modified, particularly in the context of supporting Non-Functional Requirements (NFRs).
Consider, for example, that you want to improve the performance of your software. Generated code, because it is generic and has numerous hook methods, is often inefficient. Similarly, consider that you want to make your code thread-safe, and assume that the auto-generated code is not. In both these cases, you cannot directly make modifications to the generated code, because if you regenerate the code from the generator tool, those changes will be lost. So, separating generated code from programmers code solves some problems, but not all.
In practice, the evolution of the auto-generator tool can also cause problems. Assume that you are working on a maintenance project that made use of auto-generated code. Now you want to regenerate the code with new modifications, but you have access only to the new version of the auto-generator tool. This new version generates code that is incompatible with the code generated by the older version of the tool; so if you use the code generated from the new version of the tool, your existing code (written to be compatible with the code in the older format) will not work. You will then be forced to modify the generated code by hand, and not use the newer version of the tool.
There are other problems as well that I don’t delve into too deeply, but I leave it to you to think about. Assuming that you use auto generators in your project, and face these problems, how will you handle the following situations:
There are many cases where the generated code has bugs, which means that the generator tool generates a buggy program (this is not an uncommon problem).
You have coding guidelines for your project that need to be strictly adhered to, and the auto-generated code does not conform to your coding guidelines.
Yours is an embedded systems project, which does not allow the use of recursion or allocation of dynamic memory, but the auto-generated code makes liberal use of them.
Yours is a complex project, and you make use of auto-generators to specify the models. But the autogenerator does not support modelling a specific feature that is required for your project.
I f you discover some of the limitations of the auto-generators late in the software development life-cycle, it can cause serious problems.
Now, after looking at the disadvantages and potential problems with auto-generating code, let’s reflect on its advantages. Are there any? For example, I mentioned improved productivity as an advantage is it really so? If you think about it, things like GUI builders help reduce the tedium involved in writing UI code. But we still take care of the harder part, which is design. For example, we need to separate the UI logic from business logic, and again maintain clean separation from the underlying database (or any data source). Design is hard work and takes time. In a large software development project, the time required for creating UI is negligible it’s the design during development, and re-factoring during maintenance that takes most of the time. So, the extent of improved productivity may not be as significant as you would expect.
As you can see, here is a neutral view on auto-generating code, which boils down to a few fundamentally simple ideas:
Auto-generating code helps improve productivity by simplifying the implementation of functionality, but it does not deal with the problem of meeting non-functional requirements.
Auto-generating code, as a computer science problem, works well; but from a software engineering perspective of not just developing, but also maintaining the software, it has its limitations. Though it solves many problems, it also introduces new problems to solve.
Auto-generators help programmers like us to work on high-level abstractions; but any auto-generator tool will have specific limitations, and working around them will expose you to implementation details and may force you to work on the actual generated code.
So, if you are given a choice to either auto-generate the code or write it yourself from scratch, take a holistic look at auto-generating code, instead of blindly auto-generating the code.
Before I end this column, one final thought. Auto-generation is not just for source code, it could be used for generating test code as well. For example, with Model-Based Testing (MBT), you can specify the conditions to check in a diagrammatic form (i.e., a model), and the MBT tools would generate test cases for those conditions. It’s an effective way to reduce unit testing effort and create effective test cases. Why? During testing, you don’t systematically cover all the possibilities and even for a few possibilities, the number of tests to be written to cover all those could be huge. But by auto-generating test code from models, this problem can be solved. However, this isn’t used widely because it takes time to learn how to build models to generate test cases. Also, often the models generate test cases that will check for conditions that cannot occur in practice. So it takes time to find out which tests are legitimate and which tests are invalid. Even with these disadvantages, MBT is a good approach, and we should try to use it.