The Complete Magazine on Open Source

Regular Expressions in Programming Languages: PHP and the Web

1.28K 0

This is the fourth article in the series on regular expressions. In the past three articles, we have discussed regular expression styles in Python, Perl and C++. Now, we will explore regular expressions in PHP.

Let me first introduce the PHP environment before discussing regular expressions in it. This basic introduction of PHP will be sufficient even for non-practitioners to try out the regular expressions discussed here. Even if you’re not interested in PHP, the regular expressions discussed here will definitely interest you.

So let us start with the expansion of PHP. Earlier, the expansion of PHP was ‘Personal Home Page’. But now it has been replaced with the recursive backronym ‘PHP: Hypertext Preprocessor’. PHP was developed by Rasmus Lerdorf in 1994, and now the PHP development team is responsible for producing the PHP reference implementation. The standard PHP interpreter is free software released under the PHP License. PHP can be called a general-purpose programming language but it is mostly used for Web development as a server-side scripting language. The latest version is PHP 7.1, which was released in December 2016.

Standalone PHP scripts

PHP is mostly used for server-side scripting. Hence, most of the time you will see PHP scripts embedded inside HTML. I am sure all of you have heard about HTML (Hypertext Markup Language), which is used as a markup language for creating Web pages. Even if you are an absolute beginner in HTML, there’s no need to worry. You won’t need any specific HTML skills to understand the regular expressions in this article. Even though PHP is almost always paired with HTML, it doesn’t mean that you can’t have standalone PHP scripts running offline in your machine, but it is a bit weird to use PHP to develop an application that only works in the offline mode. You may find some other programming languages that work better than PHP for such purposes.

The first PHP script we are going to run is a standalone PHP script called first.php shown below.

<?php

echo ‘I don\’t depend on HTML always’;

?>

Execute the command php -f first.php in a terminal to run the script first.php. This and all the other PHP scripts and HTML files discussed in this article can be downloaded from opensourceforu.com/article_source_code/October17PHP.zip. It is also possible to make PHP scripts executable. Consider the slightly modified PHP script called second.php shown below.

#!/usr/bin/php

<?php

echo ‘I don\’t depend on HTML always’;

?>

Execute the command ./second.php in a terminal to run the script second.php. But before doing this, make sure that you have a PHP executable in your system. Sometimes this executable named ‘php’ may not be present in the directory /usr/bin. In that case, find the path to the ‘php’ executable and replace the line of code #!/usr/bin/php with the line of code #!/YOUR_PATH_TO_PHP/php. Also, make sure that you have the execute permission for the file second.php. Figure 1 shows the outputs of the two PHP scripts first.php and second.php.

Figure 1: Output of standalone PHP scripts

The ‘Hello World’ script in PHP

In each of the articles in this series, I have discussed a different programming language but I never had the chance to discuss a ‘Hello World’ program. So here it is — the ‘Hello World’ script in PHP embedded inside HTML, called hello.php, is shown below:

<html>

<head>

<title>Hello World Script PHP</title>

</head>

<body>

<?php

echo ‘<b> Hello World </b>’;

?>

</body>

</html>

But to run this PHP script, you need a Web server like Apache. My system has XAMPP, which is a free and open source Web server solution stack that provides Apache HTTP Server and MariaDB, a database. XAMPP can also interpret PHP and Perl scripts on its own. Make sure you have Apache HTTP Server available in your system by using XAMPP or a similar LAMP based Web server solution stack. From this point onwards, I assume all of you have XAMPP in your system. Even if you are using a different Web server, it will not affect the output of the PHP scripts in this article. Just make sure that you know how to run PHP scripts with your Web server.

Now if you have XAMPP, use the command sudo /opt/lampp/lampp start in a terminal to start the XAMPP service. Of course, you will need root privileges to do this. After this, open a Web browser and type ‘localhost’ on the address bar. If the XAMPP service is running, you will see the welcome page of XAMPP. To run the PHP script hello.php, copy it into the directory /opt/lampp/htdocs. All the PHP scripts discussed in this article, except first.php and second.php, should be copied into this directory because we need a Web server to process them. But in the case of first.php and second.php, this is not necessary because they are standalone PHP scripts and can be executed from anywhere. Now, on the address bar of the Web browser, type localhost/hello.php. You will see the Web browser displaying the message ‘Hello World’ in bold. Figure 2 shows the output of the PHP script hello.php in the Mozilla Firefox Web browser.

Now let us examine the script hello.php in detail. Most of the HTML tags used in the script, like <html>, <head>, <title>, <body>, etc, are self-explanatory; so let us not waste time worrying about them. The PHP interpreter parses the PHP part of the script starting with the opening tag <?php and ending with the closing tag ?> inside which you can have PHP statements separated by semi-colons. The line of PHP code ‘echo ‘<b> Hello World </b>’;’ passes the output ‘<b> Hello World </b>’ to the body of the HTML script. Now, a Web browser will process this further by interpreting the HTML tag <b> which specifies bold text. This is why bold text is displayed on the Web browser as shown in Figure 2.

Figure 2: ‘Hello World’ in PHP

Regular expressions in PHP

Now that we know how to set up a server and run PHP scripts, it is time for us to discuss regular expressions in PHP. There are three sets of regular expression functions in PHP to choose from. These are the preg functions, mb_ereg functions and ereg functions. Out of these three, we will be discussing just one set of functions used for regular expression processing, the preg functions.

There are some good reasons to choose preg functions over the other two. First of all, preg is PCRE based. We have already discussed PCRE (Perl Compatible Regular Expressions) regular expression style in detail in the first two articles in this series. Those articles covered Python and Perl, both of which use PCRE style regular expressions. So, it is wise to use this style because then it is not necessary to discuss the syntax of the regular expressions used in PHP. All you have to do is just refresh the syntax you have learned while learning regular expressions in Python and Perl. This is one point in favour of preg functions, while there are some faults with the other two sets of regular expressions.

The mb_ereg functions are more complicated and are useful only if we are processing multi-byte character sets. We will come across multi-byte character sets only when processing languages like Korean, Japanese or Chinese that have a huge number of characters. As an aside, let me add, unlike most other languages which use an alphabet with a fixed number of characters, these languages have tens of thousands of logograms to represent different words.

Now, let us come back to our business; it would be unnecessary to burden learners by discussing the mb_ereg set of functions with no real benefit in sight. And what disqualifies the ereg set of functions? Well, they are the oldest set of functions but they were officially deprecated from PHP 5.3.0 onwards. Since we have decided to stick with the preg set of functions in PHP to handle regular expressions, we don’t need any further discussion regarding the syntax, because we are already familiar with the PCRE syntax.

The main functions offered by the preg regular expression engine include preg_match( ), preg_match_all ( ), preg_replace( ), preg_replace_all( ), preg_split( ), and preg_quote( ). The function preg_match( ) can give different results based on the number of parameters used in it. In its simplest form, the function can be used with just two parameters as preg_match($pat, $str). Here, the regular expression pattern is stored in the variable $pat and the string to be searched is stored in the variable $str. This function returns true if the given pattern is present in the string and returns false if no match is found.

A simple PHP script using regular expressions

Now that we have some idea about the regular expression syntax and the working of one function in the preg set of functions, let us consider the simple PHP script called regex1.php shown below:

<html>

<body>

<?php

$pat = ‘/You/’;

$str = ‘Open Source For You’;

if(preg_match($pat,$str))

{

echo ‘<b> Match Found </b>’;

}

else

{

echo ‘No Match Found’;

}

?>

</body>

</html>

To view the output of this script, open a Web browser and type localhost/regex1.php on the address bar. The message ‘Match Found’ will be displayed on the Web browser in bold text. This script also tells us how the function preg_match( ) searches for a match. The function searches the entire string to find a match. Let us analyse the script regex1.php line by line. The HTML part of the code is straightforward and doesn’t need any explanation. In the PHP part of the script, we have used two variables $pat and $str. The pattern to be matched is stored in the variable $pat by the line of code ‘$pat = ‘/You/’;’. Here we are going for a direct match for the word ‘You’. As you might have observed, the delimiters of the regular expression are a pair of forward slashes (/). The variable $str contains the string which is searched for a possible match, and this is done by the line of code ‘$str = ‘Open Source For You’;’. The next few lines of code have an if-else block to print some messages depending on the condition of the if statement.

In the line of code ‘if(preg_match($pat,$str))’ the function preg_match( ) returns true if there is a match and returns false if there is no match. In case of a match, the line of code ‘echo ‘<b> Match Found </b>’;’ inside the if block will print the message ‘Match Found’ in bold text. In case there is no match, the line of code ‘echo ‘No Match Found’;’ in the else block will print the message ‘No Match Found’.

It is also possible to call the function preg_match() with three parameters as preg_match($pat, $str, $val) where the array variable $val contains the matched string. Consider the PHP script regex2.php shown below:

<?php

$pat = ‘/b+/’;

$str = ‘aaaabbbbaaaa’;

if(preg_match($pat,$str,$val))

{

$temp = $val[0];

echo “<b> Matched string is $temp </b>”;

}

else

{

echo ‘No Match Found’;

}

?>

To view the output of this script, open a Web browser and type ‘localhost/regex2.php’ on the address bar. The message ‘Matched string is bbbb’ will be displayed on the Web browser in bold text. This also tells us that the function preg_match( ) goes for a greedy match, which results in the longest possible match. Thus, the function does not match strings b, bb, or bbb; instead bbbb is the matched string. The variable $val[0] contains the entire text matched by the regular expression pattern. At this point, I should also mention the difference between strings inside single quotes and double quotes in PHP. The former are treated literally, whereas for the strings inside double quotes, the content of the variable is printed instead of just printing their names.

Figure 3: HTML page from number.html

Other functions in preg

There are many other useful functions offered by the preg class of functions in PHP for regular expression processing other than the function preg_match(). But we will only discuss a very useful function called preg_replace() which replaces the matched string with another string. The function can be used with three parameters as follows: preg_replace($pat, $rep, $str) where $pat contains the regular expression pattern, $rep contains the replacement string, and $str contains the string to be searched for a pattern. Consider the PHP script regex3.php shown below:

<?php

$pat = ‘/World/’;

$rep = ‘Friends’;

$str = ‘Hello World’;

if(preg_match($pat,$str))

{

$str = preg_replace($pat,$rep,$str);

echo “<b> The modified string: $str </b>”;

}

else

{

echo ‘No Match Found’;

}

?>

The function preg_replace() will not modify the contents of the variable $str as such. Instead the function will only return the modified string. In this example, the line of code ‘$str = preg_replace($pat,$rep,$str);’ replaces the word ‘World’ with the word ‘Friends’, and this modified string is explicitly stored in the variable $str. To view the output of this script, open a Web browser and type localhost/regex3.php on the address bar. The message ‘The modified string: Hello Friends’ will be displayed on the Web browser in bold text. In case of both regex2.php and regex3.php, I have only shown the PHP portion of the scripts for want of space, but the complete scripts are available for download.

A regular expression for validating numbers

Now we are going to look at how our knowledge of regular expressions will help us validate numbers using PHP. The aim is to check whether the given number entered through a text box in an HTML page is an integer or a real number, and print the same on the Web page in bold text. If the input text is neither an integer nor a real number, then the message ‘Not a number’ is displayed on the Web page in bold text. But remember, this statement is factually incorrect as mathematicians will be eager to point out that the input text could still be a number by being an irrational number like Π (Pi) or a complex number like 5 + 10i. It could even be a quaternion or an octonion, even more bizarre number systems. But I think as far as practising computer science people are concerned, integers and real numbers are sufficient most of the times. To achieve this, we have two scripts called number.html and number.php. The script number.html is shown below:

<html>

<body>

<form action=”number.php” method=”post”>

Enter a Number:

<input type=”text” name=”number”>

<input type=”submit” value=”CLICK”>

</form>

</body>

<html>

The script number.html reads the number in a text field, and when the Submit button is pressed the script number.php is invoked. The input data is then passed to the script number.php by using the POST method for further processing. The script number.php is shown below. At this point, also remember the naming convention of HTML files. If the HTML file contains embedded PHP script, then the extension of the HTML file is .php, and if there is no embedded PHP script inside an HTML script, then the extension of the file is .html.

<html>

<body>

<?php

$pat1 = ‘/(^[+-]?\d+$)/’;

$pat2 = ‘/(^[+-]?\d*\.\d+$)/’;

$str = $_POST[“number”];

if(preg_match($pat1,$str))

{

echo ‘<b> Integer </b>’;

}

elseif(preg_match($pat2,$str))

{

echo ‘<b> Real Number </b>’;

}

else

{

echo ‘<b> Not a number </b>’;

}

?>

</body>

</html>

The HTML section of the file only contains the tags <html> and <body> and their meaning is obvious. But the PHP script in the file requires some explaining. There are two regular expression patterns defined by the PHP script stored in the variables $pat1 and $pat2. If you examine the two regular expression patterns carefully, you will understand the benefits of using preg which is based on PCRE. I have reused the same regular expression patterns we have discussed in the earlier article dealing with Perl. The line of code ‘$pat1 = ‘/(^[+-]?\d+$)/’;’ defines a regular expression pattern that matches any integer. Even integers like +111, -222, etc, will be matched by this regular expression.

The next line of code ‘$pat2 = ‘/(^[+-]?\d*\.\d+$)/’;’ defines a regular expression pattern that matches real numbers. Here again, we are only identifying a subset of real numbers called rational numbers. But then again, let us not be too mathematical. For a detailed discussion of these regular expressions, refer to the earlier article on Perl, in this series. The best part is that any regular expression that we have developed there can be used in PHP without making changes. I have made a slight change in the second regular expression pattern /(^[+-]?\d*\.\d+$)/ to accommodate real numbers of the form .333 also. The original Perl regular expression was /(^[+-]?\d+\.\d+$)/ which will only validate real numbers like 0.333 and not .333.

Figure 4: Output of number.php

The next line of code ‘$str = $_POST[“number”];’ reads the input data from the HTML file number.html and stores it in the variable $str. The next few lines of code contain an if-else block which matches the input text with the two regular expression patterns. The function preg_match( ) is used in the if statement and the elseif statement to search for a match. Depending on the results of these matches, the PHP script prints the suitable message in bold text in the Web browser. To view the output of the HTML script, open a Web browser and on the address bar, type localhost/number.html. The resulting HTML page is shown in Figure 3. Enter a number in the text field and press the Submit button. You will see one of the three possible output messages on the Web page — ‘Integer’, ‘Real Number’, or ‘Not a number’. Figure 4 shows the output obtained when the number -222.333 is given as input.

Now that we have discussed a useful regular expression, it is time to wind up the article. Here, I have discussed the programming language PHP almost as much as the regular expressions in it. I believe the whole point of this series is to explore how regular expressions work in different programming languages by analysing the features of those programming languages rather than discussing regular expressions in a language-agnostic way. And now that we have covered PHP regular expressions, I am sure you will have some idea about using regular expressions on the server side. But what about regular expressions on the client side? In the last example, the validation could have been done on the client side itself rather than sending the data all the way to the server. So, in the next article in this series, we will discuss the use of regular expressions in JavaScript – a client-side scripting language.