Handling large files using open source tools

1
26633

 

Handling large log files is problematic. Systems administrators must be able to open them to view and act upon the vital information they contain. Text editors are the best option for this task. Here is a review of a selection of these editors for our readers.

In almost every project we work on, we need to deal with files. Consider the following scenarios:

  • As a systems administrator, you need to deal with large system log files every day (in gigabytes), which are very difficult to handle.
  • As a developer, you may have to deal with huge application log files or application source code files.
  • As a tester, you may have to deal with the large-sized data files generated as the output of some process, and need to validate a particular attribute present in those files.

There may be various other scenarios as well but, generally, all these data and log files are text files. The difference between a log file, a text file and a data file is that a log file is generated automatically to keep track of some specific application, a text file is created by the user in a word processor, and a data file stores data pertaining to a specific application for later use. Data files can be stored as text files or as binary files, and are particularly helpful when debugging a program.

Figure 1: Error with Notepad for a file size greater than 1GB
Figure 2: Notepad++ text editor logo

The need to handle large files carefully
The files generated in the situations mentioned earlier could be of any size, format or encoding. Besides, they may or may not contain the text in specific formats, such as comma-separated values (CSV), log data files (LDF) and master data files (MDF). These large text files may also vary with respect to the lines they have or the reason they need to be opened for. For example, the event log files for some common applications that generate a large amount of data could be more than 1GB in size and we may need to read them. As the file size increases, handling them becomes a tedious task. Generally, we come across the following issues while attempting to handle large files:

  • The system may hang while attempting to open large (GB) log files.
  • The system will slow down due to full RAM usage by the opened large text file while the editor is in use.
  • While attempting to open some 1GB sized text files, the built-in text editor of the Windows application, Notepad, can show some serious error messages like The file is too large for Notepad. In such cases, we need to use another editor to edit the file.

To interpret the information of these files carefully without facing such issues, we need some good editors that allow us to open, read and edit the files. So let’s discuss some basic editor tools, which could be handy for these tasks.

Open source tools to handle large files

Notepad++
Notepad++ is an advanced and feature-rich sibling of the Notepad text editor that you find in the Windows OS. It is for those who want a very simple and easy UI with a great feature set, as it is clean, very fast and an excellent way to get work done. It is a lightweight replacement for Notepad, and is better than Microsoft Notepad in every possible way. It supports multiple tabs to open different files in a single window. It also supports multiple programming languages. Some of the languages supported by Notepad++ are C, C++, Java, C#, XML, HTML, PHP, JavaScript, VB/VBS, SQL, Perl, Python, UNIX shell script, etc.
Notepad++ is commonly known as the best HTML editor. It supports coloured lines and reports code error at the same instant. Besides, its functionality can be extended by using the hundreds of available plugins.

Key features

  • WYSIWYG (What You See Is What You Get).
  • User defined syntax highlighting and auto-completion.
  • Has the multi-document and multi-view feature.
  • Regular expression and the ‘multiple file search, mark and replace’ feature is supported.
  • Full drag-and-drop support.
  • Support for a multi-language environment support.
  • Being built for the Windows platform, it can also run on Linux, UNIX and Mac OS X (using Wine).
  • Macro recording and playback.
  • Zooming for those afflicted with sore eyes from staring at computer screens.
  • Comes in both ‘installer’ and ‘zipped’ versions for people who don’t have admin rights on their work computers.

Disadvantages

  • Third party program (Wine) needed to run the application on Mac OS X.
  • Remote file editing does not support HTTP, SSH or WebDAV.
Figure 3: Log file opened with Glogg
Figure 4: Vim text editor

Glogg
Glogg is a multi-platform, open source GUI application tool for viewing and searching large, complex log and data files. It is a very quick tool as it reads data directly from disk and does not load it entirely into memory, which enables it to open very large files for viewing. Glogg is a read-only editor and does not come with a lot of features. However, it is very useful in the overall handling of large files. It can be interpreted as a graphical combination of Grep with fewer commands.

Key features

  • It is very quick while handling large files as it directly reads from the disk without loading into memory.
  • Runs on UNIX-like systems, Windows and Mac (using the Qt software development framework).
  • It automatically updates the log with auto-refresh features in real-time.
  • Lines in a file can be marked while reading and they can be combined later.
  • It supports keyboard commands like vim/less to move around the file.

Disadvantages

  • You can’t edit files as it is a read-only text editor.
Figure 5: Tool options with ConTEXT editor

Vim
VI (Visual Interactive) is one of the main editors for UNIX systems and Vim is a multi-platform clone of this text editor. It has been written by the Dutch programmer, Bram Moolenaar, who is an active member of the open source software community. Vim stands for Vi Improved. It is considered one of the most popular text editors among developers and is perfectly customisable. It is famous for two reasons. First, it supports complete keyboard operations without any need for the mouse and, second, Vim is present in almost every UNIX-based machine. It is a free text editor, which needs a terminal shell environment, and it is also available as GVim, which is Vim with a built-in GUI.
For new users, it is sometimes hard to interact with Vim because this editor does not prompt the user for the next instructions. To learn the basic commands through gaming, you can refer to http://vim-adventures.com/.

Key features

  • It supports keyboard based operations fully.
  • Its performance with handling large files is very good.
  • Portability and ubiquity is a key feature of Vim.
  • Vim is customisable. You can Google for .vimrc, dotfile and VimScript to find examples of preconfigured Vim configuration files.
  • It has a good ability to work with files on a remote server using a terminal over SSH.
  • Vim is open source and free to use.
  • Its core functionalities can be extended using various plugins with support for Vim.

Disadvantages

  • Vim is generally for advanced users. It is not easy to master it quickly but, once done, it provides you with the power that no other text editor can give you.

ConTEXT
ConTEXT is a small, fast and very useful open source text editor for Microsoft Windows. It has been developed to serve as a powerful tool for software developers.

Key features

  • It is very good at handling large files.
  • It has powerful syntax highlighting for multiple languages like C#, C/C++, Java, etc.
  • It can open multiple files at once.
  • It supports more than 20 languages like English, Spanish, French, etc.
  • We can compare multiple files in it.
  • Search and replace with regular expressions.
  • It has user definable execution keys, depending on the file type.
  • Powerful command line handler.
  • Customisable with syntax highlighter colours, cursors, margins, gutter, line spacing, etc.
  • Search and replace text in all open files at one go.

EditPad Lite
EditPad Lite is a general-purpose text editor. It can be used to easily edit any kind of plain text file. It displays content instantly because it uses pointers to access the file directly, rather than to read the entire file into memory at once. EditPad Lite has all the essential features to make text editing easy.

Key features

  • Full Unicode support, including complex scripts and right-to-left scripts.
  • It does editing of files directly in Windows, UNIX and Mac with text encoding (code pages) and line breaks.
  • Working with multiple files is very easy using the tabbed interface.
  • You can undo and redo all open files infinitely, even after you save the file.
  • It supports large files and long lines very well.
  • It prevents any data loss by automatic backup and saves working copies of files.
  • Its powerful search-and-replace option with literal search terms is a worthy feature.
  • Regular expressions that can span multiple lines.

Disadvantages

  • It supports Windows only.
  • It is only free for personal use.
Figure 6: EditPad is very good with regular expressions
Figure 7: EmEditor tool logo

EmEditor
This is a powerful editor with some very great features like Unicode support and coloured syntax highlighting. It supports multi-platform files, and many files can be opened concurrently. It uses an effective approach to read large files as it spills the content onto the disk rather than loading it to memory. This helps it to open very large files with ease. One major feature of EmEditor is that it auto-detects if the CSV file has an uneven number of columns, and it tries to fix it at the same time. Its large file controller helps it to read large files at once.

Key features

  • Easily handles files up to 248 GB.
  • It can split large files and can also combine files into one.
  • Syntax highlighting and regular expressions are some of its main coding features.
  • Customisable interface and quick launch makes the user experience better.
  • Various free plugins make this tool more productive.
  • Powerful and scriptable macros.
  • The word count plug-in allows you to count specific terms or characters.
  • Split window (up to four panes).
  • Has the auto-save and auto-indent features.
  • Support for external tools includes launching external programs via keyboard shortcuts or toolbar buttons.

Disadvantages

  • Some of its features are available only on the pro edition, which is not free.

LogExpert
This is a Windows tail program, which serves as a GUI replacement for the UNIX tail command. It is very good at reading large files, and is free for commercial as well as non-commercial use.

Key features

  • It has a plugin API for log file data sources.
  • It supports the tail program and Unicode.
  • Multiple plugin support is available from third party.
  • MDI-interface with tabs.
  • It has a search function including regular expressions.
  • We can bookmark the search result and can add comments to bookmarks.
  • Any change in the log file automatically gets updated in the same log file that is being used by the LogExpert too.
  • It has a very flexible filter view.
  • We can highlight lines with search criteria.
  • Columnizer is its key feature, which helps it to split log file lines into several multiple columns for some well-defined log file formats.

Other alternatives to consider

  • Atom: This is a freeware open source solution that supports C++, HTML, CSS, and JavaScript.
  • Visual Studio Code: This is another free text editor created by Microsoft under the MIT licence.
  • GNU Emacs: This is a very popular text editor derivative of the Emacs family, created by Richard Stallman for the GNU projects.

Tips to handle large files

  • Do not close large files immediately after viewing them. You will be dealing with a lot of information and you wouldn’t want to close a file only to realise that you need it open again. And besides, large files take time to open but, after that, it’s all fine.
  • It is good to have complete documentation of all the findings of large log files, as we may forget these later when we need to use them and end up wasting time in reopening the same large file.
  • It is a good practice to keep your RAM as free as possible as most text editors load the files into the RAM to read them.
  • More than a text editor, it’s a RAM issue. If you want to edit a 5GB file and you do not have enough free RAM, then you will have to close any other open application to handle the opening of the large file, carefully.
  • Instead of using a text editor tool, you may go ahead with the good old Grep command, or you could use the split function to split the large file into multiple files of smaller sizes, which are easy to handle.
  • You can use cmd to open large files in Windows. First, open cmd with Start>Run. Navigate to your file location and then type the following command:
type <filename.extension> | more

Now the cmd window will show you a screen with the contents of the file. This happens quickly for large files also, without any time lag. You can copy the details from there as well.

1 COMMENT

LEAVE A REPLY

Please enter your comment!
Please enter your name here