Joy of Programming: Liskov’s Substitution Principle

8
7471
Substitution...

Substitution...

LSP is a cardinal rule to follow in object-oriented designs. In this column, we’ll introduce LSP to those new to OOP (Object Oriented Programming), and discuss a couple of examples from JDK that violate this principle.

The whole of object-oriented design boils down to a few principles such as abstraction, encapsulation, modularisation, hierarchy, and regularity.

Abstraction allows us to model in terms of commonality between objects, and express the design in terms of the problem domain. Encapsulation allows bundling data and the functions operating on it as a single unit; further encapsulation enables us to hide implementation details, and expose only the interface to the users. Modularisation allows separation of concerns and definition of crisp boundaries between abstractions.

With hierarchy, we can organise or arrange abstractions at multiple levels. With regularity, we can create uniform solutions that are easier to grasp and understand. It takes years of experience to understand that these are the essence of OOP (and not the favourite features that your programming language provides), and following these generic principles gives us incredible power in problem solving.

Let us take the hierarchy principle, for example. One way to realise hierarchy is to create a relationship between abstractions, using a language feature named inheritance. For example, in an image-processing application, instead of writing code in terms of different kinds of image files such as GIF, JPEG, EPS, SVG, PNS, etc., one can exploit the commonalities between the image file formats.

For example, we can classify image types at a higher level as raster (storing images as bitmaps, or in terms of pixels), or vector (geometric description of images) formats. This allows for abstracting commonalities of the specific file format, and moving the specific details of a file format to that class.

In other words, we can have a base class named ImageFile, and two derived classes, namely, RasterFile and VectorFile. Further, RasterFormat can have derived classes such as JPEGFile, TIFFFile, PNGFile, etc., and VectorFile can have derived classes such as CGMFile, SVGFile, etc. With this design, when we want to write high-level code — for example, reading the file from the disk — we can write code in terms of the generic type ImageFile.

Now, if there is any need to refer to specific types — for example, when converting from one image type to another — we can use specific file types such as JPEGFile, SVGFile, etc. With this design approach, it is possible to reuse the code — the general code applicable to all ImageFiles can be moved to that class. More specific code relating to RasterFile and VectorFiles can move to those classes, and concrete details on image formats such as JPEG or SVG can go into corresponding classes such as JPEGFile, SVGFile, etc.

To summarise: with inheritance, we are exploiting commonalities between implementations by abstracting the interface. Now, code can be written in terms of the common base interface. When specific derived implementations need to be assigned to the base interface references, the user code need not change. Hence, this approach leads to reusability and flexibility in design.

However, this fundamental benefit of inheritance is broken if the derived types cannot be assigned to the base types. Liskov’s Substitution Principle (LSP) describes such a situation, and hence is a cardinal rule to follow in OOP.

The informal definition of LSP is this: “Derived classes must be usable through the base class interface without the need for the user to know the difference.” I know this description is difficult to understand, and I’ll explain LSP using two examples from the Java library, to illustrate what it means to violate this principle.

Every computer science student who has taken a data structure course knows that a stack is not a vector. A vector is just like an array, only that it can grow in size. So we can insert or delete elements from anywhere in the vector. However, stack is a LIFO (Last In First Out) data structure: we can insert and remove only from one end of the data structure.

Hence, a stack is not a vector; maybe a stack can be implemented using a vector. In JDK, these two container classes share an inheritance relationship:  Stack extends Vector. For this reason, we can add or remove elements anywhere from the Stack! Here is a code example that illustrates this problem:

Vector<String> vectorStack = new Stack<String>();
vectorStack.addElement("one");
vectorStack.addElement("two");
vectorStack.addElement("three");
vectorStack.removeElementAt(1);
System.out.println(vectorStack.size());
// prints: 2

As you can see, in this program, we can remove the element from the middle of the Stack (with the call removeElementAt(1)), and treat Stack as if it were a Vector! So, how do we treat a Stack as a stack when we have a Vector reference?

One way is for the user to check the dynamic type (i.e., what class type the object reference points to at runtime) of the object — and if that is a Stack, do a downcast to Stack and apply operations such as push and pop.

This is too much of a workaround because of a design mistake in the Java library. In other words, Stack could have been declared as an interface, and different Stack implementations could have been derived from it. Or else, Stack could have been implemented using Vector as the data container, i.e., using containment instead of inheritance.

Figure 1: Two LSP violation examples from JDK
Figure 1: Two LSP violation examples from JDK

Another example of LSP violation is in the case of the Property class, which extends Hashtable. A Hashtable can take any non-null values as key and values. A Property object can take only a String as key or value. The Property class has methods like getProperty and setProperty to get and set property values.

However, we still have access to methods such as put and putAll from Hashtable, which we can use to put keys/values of any type. When we attempt to put non-String keys or values in a Property, it appears to work fine:

Hashtable extnNos = new Properties();
extnNos.put("Kathy", "3542");
extnNos.put("Joel", "4433");
// mistake in the following statement:
//            3224 typed instead of "3224"
extnNos.put("Joshua", 3224);
System.out.println(extnNos);

This code segment prints:

{Kathy=3542, Joshua=3224, Joel=4433}

However, Java documentation calls such Property objects “compromised”. Method calls such as store, save or list fail when called on “compromised” Property objects.

Here is a slightly modified version of the same code using Properties.list method instead of System.out.println:

Hashtable extnNos = new Properties();
extnNos.put("Kathy", "3542");
extnNos.put("Joel", "4433");
// mistake in the following statement:
//            3224 typed instead of "3224"
extnNos.put("Joshua", 3224);
((Properties)extnNos).list(System.out);

This code results in a crash:

-- listing properties --
Kathy=3542
Exception in thread "main" java.lang.ClassCastException:
                java.lang.Integer cannot be cast to java.lang.String
        at java.util.Properties.list(Unknown Source)
        at UseProperties.main(UseProperties.java:9)

This design — Property extending Hashtable — violates LSP. In this case, a Property would have been better implemented using a Hashtable, i.e., without sharing the inheritance relationship with each other. Because of this design mistake, users of the Property class need to take more care in using the class correctly.

Since JDK is a library — a published API — it is not possible to correct these design mistakes. As users, we cannot do anything about these design mistakes in JDK, but we can learn from them; following established design principles or rules can help create better designs, and not following them can be costly.

Featured image courtesy: mseery. Reused under the terms of CC-BY-NC 2.0 License.

8 COMMENTS

  1. Great article.What inheritance means by “sub-classing” is not what most people naturally mean, which is to partition a class in to subsets. This terminology single-handedly contradicts Liskov’s Substitution Principle.

  2. […] shouldn’t go around calling it a car. This principle is always explained by giving examples of where it isn’t […]

  3. Need a clarification, here is property file really breaking LSP. The reason if you substitute property file with hashtable it will work. The list function is only to property file and not to hashtable. And as per LSP, it says derived class should substitute parent class and not opposite.
    Correct me if I am wrong.

LEAVE A REPLY

Please enter your comment!
Please enter your name here