The Complete Magazine on Open Source

Faster file search with Python

14.2K 1

Python files search

This article presents a file search utility created by using the power of the versatile Python programming language. Read on to discover how it works and how it can be used in Windows systems.

Computer users often have a problem with file search as they tend to forget the location or path of a file even though Windows provides a file search utility. The Explorer in Windows 7 offers a search facility but it takes around two to three minutes to search a file. In this article, I will give you a Python program which will search a file on your computer’s hard disk, within hardly one second.
Let us first understand the program’s logic. Figure 1 explains this. Let us first do indexing or, in Python language terms, let’s construct a dictionary in which the file will be the key of the dictionary and the value will be the path of the file. The dictionary will be dumped into the pickle file. The next time, the file will be searched in the dictionary (dumped in the pickle file).

Now that you have understood the logic of the program, let us look at the program in detail. I have broken it into different functions. Let’s see what each function does.

#program created by mohit
#offical website
# email-id  [email protected]

The block of code below imports the essential modules:

import os
import re
import sys
from threading import  Thread
from datetime import datetime
import subprocess
import cPickle
dict1 = {}
Figure 1 Program logic

Figure 1: Program logic

Next, let’s write a function to acquire the drives. This function gets all the drives in your Windows machine. If you have inserted any external/USB pen drive or hard drive disk, the function also obtains details for them.

def get_drives():
            response = os.popen("wmic logicaldisk get caption")
            list1 = []
            total_file = []
            for line in response.readlines():
                        line = line.strip("\n")
                        line = line.strip("\r")
                        line = line.strip(" ")
                        if (line == "Caption" or line == ""):
            return list1
Figure 2 Creating exe file of the Python program

Figure 2: Creating exe file of the Python program

Figure 3 Creating a database of all files

Figure 3 Creating a database of all files

Our next function is the search1 function, which constructs a dictionary in which the file name is the key and the path is the value of the dictionary.

def search1(drive):
       for root, dir, files in os.walk(drive, topdown = True):
                  for file in files:
                            file= file.lower()
                          if file in dict1:
                                          file = file+”_1”
                                          dict1[file]= root
                                    else :
                                          dict1[file]= root

The create function opens the thread process for each drive, and each thread process calls the search1 function.

def create():
        list2 = []   # empty list is created           
        list1 = get_drives()
        print list1                                 
        for each in list1:
            process1 = Thread(target=search1, args=(each,))
         for t in list2:
                  t.join() # Terminate the threads

After creating the dictionary, the following code dumps the dictionary into the hard disk as a pickle file.

pickle_file = open(“finder_data”,”w”)
            total =t2-t1
            print “Time taken to create “ , total
            print “Thanks for using”

Next time, when you search any file, the program will search the file in the dumped dictionary, as follows:

if len(sys.argv) < 2 or len(sys.argv) > 2:
            print “Please use proper format”
            print “Use <finder -c >  to create database file”
            print “Use <finder file-name> to search file”
            print “Thanks for using”
elif sys.argv[1] == ‘-c’:
                     pickle_file  = open(“finder_data”, “r”)
                     file_dict = cPickle.load(pickle_file) 
            except IOError:
            except Exception as e : print e
            file_to_be_searched = sys.argv[1].lower()
            list1= []
            print “Path \t\t: File-name”
Figure 4 File searching

Figure 4: File searching

Figure 5 File searching using regular expressions

Figure 5: File searching using regular expressions

Here, we used the search method of regular expressions so that we can use a regular expression to find the file.

for key in file_dict:
                   if, key):
                           str1 =  file_dict[key]+” : “+key
            for each in list1:
                              print each
                            print “-----------------------”
            total =t2-t1
            print “Total files are”, len(list1)
            print “Time taken to search “ , total
            print “Thanks for using”
Figure 6 Searching used power for regular expressions

Figure 6: Searching used power for regular expressions

The rest of the code is very easy to understand.
Let us save the complete code as (you can also download it from and make it a Windows executable (exe) file using the  Pyinstaller module. You can also download it from Run the command shown in Figure 2.  After running it successfully, you can find the finder.exe in  folder C:\PyInstaller-2.1\finder\dist .

You can put the finder.exe file in the Windows folder, but if you place this in a different folder, you will have to set the path to that folder.  Let us run the program. You can see from Figure 3 that just 33 seconds are required to create the database. Now search the file and see the power of the program.

I am going to search for songs which contain the string waada.
Look at Figure 4. You can see that two searches have taken approximately half a second. The program is case insensitive, so using upper case or lower case doesn’t matter. The program also has the power of regular expressions. Let’s assume that you want to search files which contain the wada or waada strings. Let us look at Figure 5. The regular expression ‘a+’ means the letter ‘a’ can appear once or many times. Again, you can get the result in less than one second.  Let us consider one more example of a regular expression search. Let’s assume that you want to search the files which contain wa+da with digit numbers (see the first search of  Figure 6). Assume that you want to search the files that start with the string wa+da (see the results of the second search in Figure 6).
The program is indeed very useful. Suppose, for instance, you have forgotten the file path but have a vague idea of the file name. You can search the file within one second by using regular expressions. The best part is the speed of the search. You can try searching with the file name repeatedly. But if you use Windows Explorer, each search will take around two to four minutes.

One Comment

Leave A Reply

Your email address will not be published.