Trouble-Maker, as the name suggests, is a Linux tool that, when installed in a system, randomly selects a problem from its list and actually creates it on the system. The idea behind using this tool is to sharpen your systems administrator skills, enabling you to solve boot problems, system configuration problems, hardware problems, etc.
Why do we use tools? To simplify tasks, obviously. But wait a minute; in this article I’m going to introduce a tool that has been created to give you a hard time. Now before you begin to think that I’ve gone insane to even discuss a tool that’s designed to drive people nuts instead of making their lives easier, relax-it will be clear to you soon enough.
This article is all about Trouble-Maker, a handy tool to test your Linux troubleshooting skills. This tool creates an array of problems that you need to identify and rectify. Initially, it sounds like a pain but once you are done with it, you’ll have some great troubleshooting skills under your belt. So let’s proceed without further delay.
You can download Trouble-Maker from http://sourceforge.net/projects/trouble-maker/files/ and get installation and running instructions at http://trouble-maker.sourceforge.net/doc.html. Once you’ve installed it, the real action begins. Trouble-Maker has 16 modules (15 actually, because one module is just a dummy and does nothing), each creating a unique problem for your system, which has to be rectified. There are two ways to launch Trouble-Maker:
/usr/local/trouble-maker/bin/trouble-maker.pl --version=RHEL_5
will run a random module.
/usr/local/trouble-maker/bin/trouble-maker.pl --version=RHEL_5
--selection=name_of_the_module_file
will run a specified module. You’ll find 16 .tar files in the /usr/local/trouble-maker/kitbag directory. You can use any of these files (including .tar extension) to run that file. For example, there is a file called do_nothing.tar, which is a dummy file and it does nothing. You can run it with the following command:
/usr/local/trouble-maker/bin/trouble-maker.pl --version=RHEL_5 --selection=do_nothing.tar
Now run the command shown below:
/usr/local/trouble-maker/bin/trouble-maker.pl --version=RHEL_5
and wait for things to explode. Once you run it, you’ll get a description about what has gone wrong. Your job is to deal with the problem and fix it. Run the command and restart the system to see what happens. You’ll come across various cases on RHEL5 while playing with this tool. Twelve such scenarios are discussed here.
Scenario 1: You have logged into the command line mode instead of GUI mode
Here you see that once you’ve restarted your machine, it starts in command line mode instead of the default GUI mode. This is a clear indication of a problem in the inittab file. Your default run-level is specified in the /etc/inittab file. If you open this file, you’ll see a line id:3:initdefault: which is the root cause of this problem. This tells Linux to boot into runlevel 3 (command line mode). Change it to runlevel 5 ( for GUI mode), save the file and restart your machine. Next time you’ll be back on your GUI mode. Before you run Trouble-Maker again, don’t forget to delete the /tmp/trouble-maker/backup and /tmp/trouble-maker/rescue directories.
Scenario 2: INIT: no more processes left in this run-level
This is again a problem with your inittab file. This file contains information about the processes to be run at each runlevel. You can find the solution for this problem in one of my previous articles at http://www.linuxforu.com/2013/04/learn-the-art-of-linux-troubleshooting/. Here, just look at the section that describes what to do when your inittab files are deleted. Though our inittab file has not been deleted but, rather, is mis-configured, we’ll still need to reinstall the initscripts package.
Scenario 3: The GDM user gdm doesn’t exist
User information is stored in the /etc/passwd file and this message shows that there is something wrong with that file. When you click on OK and you try to log in at the command line, you will still find problems. This is the time to check your /etc/passwd file and you can find the solution in the article mentioned earlier (link given). In fact, I suggest that you read that article along with http://www.linuxforu.com/2013/03/playing-hide-and-seek-with-passwords/ for some valuable solutions to such situations.
Scenario 4: Can’t log in as the root
When you run Trouble-Maker and get this message as the problem description, restart your machine. If you can normally log in to your machine, it means your passwd file is intact. It’s time to switch to another terminal and check whether you can log in there or not. If, when you enter your username, instead of asking for your password, it repeatedly shows the localhost login message, look for the /etc/pam.d/login file. If it’s missing, then refer to the two articles I’ve mentioned earlier to get the solution.
Scenario 5: switchroot:mount failed:
No such file or directory’
If you get this message after rebooting your machine, it shows that the location of the root that you have specified is invalid. To solve this problem, you’ll have to go to rescue mode and open /boot/grub/grub.conf file. Here, edit the line about the kernel and point to your valid root location. Again, the articles mentioned earlier will be handy as they give a step-by-step explanation about troubleshooting in single user mode.
Scenario 6: initrd file not found
Earlier it was the root location and now it’s the initrd file name. The process of rectifying this is quite similar to what was done in Scenario 5. You’ll have to enter rescue mode and then use the following command:
ls /boot .
Here you’ll get your initrd file name. Now open grub.conf file and correct the corresponding entries. The next time you boot, everything will be fine.
Scenario 7: Can’t log in to the root via a console
When you get this message as the Trouble-Maker problem description, whenever you try to log in via another terminal, it won’t let you do so. So check your /etc/pam.d/login file as we did earlier and, if it’s intact, then there’s a good chance that your /etc/securetty file has some issues. If it’s so, then make the necessary amendments; save the file, restart the machine and check whether the problem has been solved or not.
Scenario 8: Network is not working
If you get this problem description after running Trouble-Maker, the stage is all set for some network related issue. After restarting, fire the ifconfig command to check whether you’ve got an IP address. Here, you won’t see any IP address. Then check the status of the network service by issuing the service network status command. If you don’t get any output, then the networking facility is not available within your system. This is controlled by /etc/sysconfig/network file. Open it and if you see NETWORKING=no then set it to yes. Save the file and restart the network service. Now, if you issue ifconfig, you’ll get your IP address.
Scenario 9-10: FTP is not working
You’ll get this message for two problem sets. This problem descriptor means that you need to check for FTP configuration, but before you nosedive to your FTP configuration file, make sure your network, iptables and SELinux is working properly. Trouble-Maker doesn’t deal with SELinux, so you can eliminate this possibility and set SELinux off by firing the setenforce 0 command. Check the IP address, the network service status and IPTables configuration. If all is well, then check for the TCP wrappers files -/etc/hosts.allow and /etc/hosts.deny. If you find all:all in /etc/hosts.deny file, then that’s the problem as it’s denying every connection to your system. Delete this line and now try to run the ftp command. It should work fine. Now if you are able to run ftp from this system but face issues in running the same command from another system, check /etc/hosts.allow file.
You’ll find the all:local line written over here, which is what has caused the connection error from the remote system. Delete it or make it all:all and the problem is solved.
Scenerio 11: Troubleshooting SSH
First, try to ssh from the same machine; you won’t be able to do it. It’s always good to check network related issues from the same machine first and then go for a remote system. Like in the previous case, don’t jump to SSH configuration hastily. Instead, follow the same procedure as we did before. This time you’ll see a REJECT rule in iptables. You can verify it by stopping the iptables service and then try to ssh. This time you’ll be able to do so. Now that we’ve found the cause of the problem, all you need to do is to drop this rule, and everything will be fine after that.
Scenerio 12: Enabling /etc/fstab swaps:FAILED
If you got this error message after rebooting, then your fstab file has got some problems. Log in to single user mode and check just what went wrong with this file. In our case, this file has been deleted and you’ll have to re-create it. You can take help from http://www.linuxfromscratch.org/lfs/view/development/chapter08/fstab.html to check for the fstab entries and get the task completed.
So far, we’ve examined 12 out of the 15 test cases, which is exactly 80 per cent of the given problem set. This reminds me of the Pareto Principle, according to which 80 per cent of the effects come from 20 per cent of the causes. What I am going to do right now is to leave the remaining 20 per cent (or three problems) for you. Now it’s time for you to raise the bar and address these problems. Believe me-80 per cent of your learning is going to occur in the process of solving these problems on your own. So the stage is all set and now it’s up to you to demonstrate your analysis and problem-solving abilities along with some serious Linux skills.