We bring you an interesting article where three final year students from Thiagarajar college of Engineering (TCE), Thiruparankundram, Tamil Nadu have designed an exciting web based tool ‘Code Validator’ (CV). Know more about this tool and how they developed..in their own words!
We have designed and implemented a web based tool called “Code Validator” (CV) to automate the process of validating code submissions by students for lab assignments and course projects. As the code submitted by the students could potentially contain some malicious segments, we validate the code submissions inside a ‘sand-box’ process running on a Virtual Machine which is deployed on top of an OpenStack based cloud running in Thiagarajar College of Engineering’s (TCE) data centre.
This web based tool validates programs written in languages such as C, C++ and Java. CV allows users to upload their programs and get the programs validated. The program submissions are validated against the administrator specified test-cases. CV has a cyclomatic complexity component, which will guide an admin to frame test-cases for validating the code submissions. The user’s program is executed against each test case and the output of the execution is compared with the expected output specified with the test case. CV has an automated memory and time limit enforcer component, to validate the code submissions against the asymptotic complexity of the best known solution. CV has also been integrated with code-style check tools to validate the programming style adopted.
Design of the Code Validator
The “CodeValidator” tool is deployed on a cloud environment. Using Openstack, couple of virtual machines with the required resources were orchestrated and the “CodeValidator” was launched in one of the virtual machines in the cluster of machines. The main purpose of hosting the tool in a cloud is for system security. Some students may intentionally or unintentionally upload harmful code to the tool.
When these are executed on a physical machine, the action may tend the machine to crash. So a virtual machine is used instead. Hence there would be no damage done to the physical machine. When the VM crashes its state is preserved by pre-built check pointing algorithms and the VM can resume its execution from where it left-off.
The requests that are being directed to the CV’s URL are redirected to the particular port number of a virtual machine that is actually hosting the “Code Validator” tool. We have created this cloud environment on top of a 32 node cluster in our college data centre.
Design of the Cyclomatic complexity component
While testing a submitted code for a given problem, the test cases used for evaluating the program are very important. The administrator must generate test cases that should test the entire program. In order to guide the test case generation, cyclomatic complexity is determined by the tool. The administrator enters a sample/correct programmatic solution for a problem. From the sample program, a control flow graph (CFG) is generated, using which the cyclomatic complexity is found. The CFG contains nodes that represent pieces of code. The branching statements form separate nodes.
When the CFG construction is complete, we can make use of the following formulae to determine the cyclomatic complexity of the component.
1. CC = P + 1 (P = No. Of Penultimate nodes)
2. CC = E – N + 2 (E = No. Of Edges & N = No. Of Nodes)
3. CC = R (R=No. Of regions or closed loops in the CFG)
The cyclomatic complexity helps in finding out all the possible execution paths that can be traversed in a program. Using this value we can determine the total number of test cases that are actually needed to test the program completely. This way the administrator is prompted to enter as many test cases that would traverse through all the paths in the program. For all the test cases that are entered by the administrator, the portion of CFG is traversed is displayed. Thus the administrator can generate all possible test cases that can truly test the strength of the program. This has to be typically done by an administrator multiple times depending upon the number of correct CFGs possible for a given problem. Please refer below for a sample CFG (Fig – 2) generated for a simple branch construct (Fig – 1).
Integration of the tool with style validators
The programming style adopted by a programmer often determines the maintainability and readability of a code. To check the programming style of a program, Code validator has been integrated with CheckStyle, a code analysis tool that checks whether the submitted JAVA code is written as per coding standards.
It reviews the user source code and displays comments to the user. This helps users to adhere to coding standards and develop readable and good quality programs. For validating the style of C and C++ programs, a tool called Uncrestify is used.
Upper bound enforcement on the memory and time
For their code submissions to be accepted by CV the users should not only submit the correct code but the memory used by the program should also be within the specified memory limit. When adding a program into the CV tool administrator, must specify the maximum memory needed for that program using the asymptotic notation (for e.g. O(n), O(n log2 n) …).
Depending upon the number and data type of the input specified in the test case, CV automatically calculates the upper bound for the permissible memory usage. CV calculates the heap size and the environmental stack size for every program submitted. If the total memory used by the program is less than or equal to the calculated upper bound, then the program will be accepted successfully. The memory constraint is implemented in the machine using a system call called “ulimit” which limits the resources that are allocated to a process. We use this call to limit the memory resource that is to be allocated for the execution of the program.
The time taken for execution is one of the important factors that determines the efficiency of a program. Further students may write programs that keep on executing (due to a possible inadvertently coded infinite loop) and may eventually crash the server by depleting the server resources. To avoid this, CV checks the time taken for the program to complete and if it exceeds the specified time limit, it kills the program and throws an error message.
The program is not allowed to execute beyond the time limit. The administrator will specify the average no. of lines for the program and the time complexity of the program in terms of asymptotic notions (Big-Oh). Depending on the processing speed and the instruction set architecture of the machine in which the tool is deployed, CV will automatically calculate the upper-limit on the time required by an optimal program to execute successfully. The time limit check is implemented using a system call called “timeout”. This system call does not limit the CPU time of the process, instead it stops the execution of the program if the program executes even after the time allocated is exceeded.
Future Scope
In the future, we intend to extend this tool for web programming laboratories. This can be implemented using existing frameworks such as Cucumber and Capybara. These frameworks are for testing web based applications and can be integrated with our tool.
Code validator – rolled out to TCE students
CV has already been rolled out to the students in TCE and most of the students have expressed a positive feedback regarding the tangible improvement in their programming style. The students have also seen an improvement in their algorithmic thinking, as a strict time and memory upper bound can be enforced by the tool.
Developers of CV
This tool has been developed by three pre-final year UG Computer Science students (S. Aswin Karthik, R. Harivignesh and M. Karthikeyan) under the guidance of the authors Karthick Seshadri (Assistant professor in the CSE department in TCE) and Dr. S. Mercy Shalinie (Professor and Head of the CSE department in TCE)