The Rest of Rust: A Beginner’s Gateway

0
81

Explore what Rust can do and find out that it is not as tough to learn as is made out to be…

The urge to replace C for systems programming is stronger than ever due to the serious kinds of bugs that arise from its design. Although several alternatives have been proposed, Rust seems to be the only promising one so far, welcomed even by Linus Torvalds, notorious for his hostility towards any language other than C for kernel development.

However, it is widely observed that the adoption rate of Rust does not match the excitement for it. The primary reason cited is the steep learning curve. But is that the whole story?

Being memory safe with the same performance as of C and C++ is a primary goal of Rust. Naturally, almost all learning material on Rust out there has a heavy focus on the underlying memory model. Unfortunately, this can easily ward off beginners. But what if we start forgetting memory management for a while? Yes, one has to start facing the borrow checker at some point, but there’s a lot a beginner can do without it. That includes features that are comparable to the ones that we’re already familiar with in other programming languages. An encouraging entry like this would make the real deal more enjoyable.

Getting Rust

Setting up a Rust environment depends on the platform you use, and everything you need to know and get is available on rust-lang.org. But for the purpose of this article, the simple online playground available at play.rust-lang.org will suffice.

By the way, I recommend you not install Rust from your distro’s repository, especially if it prefers stability over being up-to-date (like Debian). Rust is still evolving, and there is no guarantee that you’ll get even these simple examples working with your distro’s version of Rust.

Enums that are unions

People who are familiar with functional programming languages won’t need any introduction to an enum in Rust. They’ll instantly get it and love it. However, a C programmer, even if they manage to get it, may find it unnecessarily overloaded. Let’s demystify Rust enums.

Have a look at the following sample program:

enum Vehicle {
Bicycle,
Car(String),
}
fn print_vehicle(v:Vehicle) {
match v {
Vehicle::Bicycle =>
println!(“A bicycle.”),
Vehicle::Car(regno) =>
println!(“A car with reg. no.: {}.”, regno),
};
}
fn main() {
let v1 = Vehicle::Car(“KL 00 X 0000”.to_string());
let v2 = Vehicle::Bicycle;
let v3 = Vehicle::Car(“KL 00 Y 1111”.to_string());

print_vehicle(v1);
print_vehicle(v2);
print_vehicle(v3);
}

There is an enum called ‘Vehicle’ whose variants include ‘Bicycle’ and ‘Car’. We are creating a couple of Vehicle variables in main() and printing them using a helper function called print_vehicle. There is nothing surprising about the output:

A car with reg. no.: KL 00 X 0000.
A bicycle.
A car with reg. no.: KL 00 Y 1111.

As you can guess, the match construct used inside print_vehicle() is the Rust counterpart of C’s switch (simply speaking). However, what confuses a C programmer is the fact that the variant Car has some data associated with it.

Stop focusing on the code for a while and think about what the program does. It stores and displays vehicle objects, of which cars can (in fact, must) have a registration number and bicycles cannot. How would you code this in C? You’ll have a struct named Vehicle that includes all the parameters related to all kinds of vehicles and set or retrieve them based on the kind of the vehicle stored in a field called, say, ‘type’. What is the problem here? There is nothing that prevents you from accessing the (invalid) data stored in the regno field of a Vehicle object even if its type is set to Bicycle:

typedef enum {
BICYCLE,
CAR,
...
} VehicleType;
typedef struct {
VehicleType type;
char regno[MAX_REGNO];
...
} Vehicle;

// Somewhere in the middle, where vehicle.type can be BICYCLE:
add_to_serviced_vehicles(vehicle.regno); // no compile-time error

What about unions? Yes, unions let us have some separation, but it’s still up to you to decide how to interpret that data. But this is exactly what Rust does here on its own. The enum we saw in the Rust code is technically called a tagged union, something that lets multiple kinds of data coexist, settable or gettable only in accordance with the kind (tag) of object. We need to match this data to extract their values, making it impossible to have incorrect interpretations. This comparison is summarised in Table 1.

enum + struct (C) enum + union (C) Tagged union (Rust)
Valid tag at a time Only one Only one Only one
Valid fields at a time Related to any one variant Related to any one variant Related to the current variant
Storage Sum of all variants Maximum of all variants Maximum of all variants
Incorrect access Possible Possible Not possible
Data when misinterpreted Junk Junk Not allowed

Table 1: C workarounds vs proper tagged unions

Tip: In the Rust code we saw, Bicycle is a simple variant while Car is a tuple variant. There is one more kind called struct variant, in which you can label the fields like in a struct.

Option type

You’ve probably just started liking the concept of tagged unions and would like to try it out sometime in the future. Guess what — you cannot avoid them even if you wanted to! One tagged union called Option<T>, defined by the std::option module, is heavily used by the Rust library.

What is an option type? It is the return type of functions that cannot guarantee there will always be a valid value to return. A generic example is a function that searches for a substring in a string, returning the position of the occurrence. Another example is a function that pops the top element off a stack. In the first example, what if no such substring was there? In the second example, what if the stack is already empty?

The C-style workarounds include returning special constants (‘magic numbers’) like -1, or even aborting when called without sufficient checks like stack underflow. What about the option type? Option<T> (where T can be any type like i32 or string) has two variants, Some<T> and None. Some<T> contains a value of type T. None means there is no value.

Let’s see a simple example that shows Option<T> in action:

fn main() {

let haystack = “the quick brown fox”;

for needle in [“fox”, “rabbit”, “brown”] {

let pos = haystack.find(needle);

match pos {

Some(pos) => println!(“’{}’ found at: {}”, needle, pos),

None => println!(“’{}’ not found.”, needle),

}

}

}

As you might’ve guessed, the output is:

‘fox’ found at: 16

‘rabbit’ not found.

‘brown’ found at: 10

A packed example

Alright, we’re out of space and still not past tagged unions. So let’s make a leap and have a look at a fresh example packed with many Rust features that we didn’t discuss:

struct AltUnder10(i32);

impl Iterator for AltUnder10 {

type Item = i32;

fn next(&mut self) -> Option {

self.0 += 2;

if self.0 < 10 {

Some(self.0)

} else {

None

}

}

}

fn main() {

let alt_numbers = AltUnder10(1);

for num in alt_numbers {

println!(“{}”, num);

}

}

That looks like a giant leap. All it does is print the numbers 3, 5, and 7 in three lines (alternating numbers under 10, seeded with 1). Although the code showcases little that makes Rust unique, there is still a lot to explain that will help us get familiar with Rust.

Let’s start with the for loop in main(), where all the action happens. It iterates through an object called alt_numbers, which we created just above the loop. In a previous example, we saw iterating through a list. But here the variable alt_numbers is an object of a struct AltUnder10, which we defined. How did it become iterable? Because we implemented the trait (think of interfaces in Java) called std::iter::Iterator for the struct. This is what happens in the block that starts with impl Iterator for AltUnder10.

Any implementation of Iterator is required to have a method next() and an associated type Item. When called each time, next() should return a value of Some, or None if the iteration needs to stop.

Does our next() really return anything? In Rust, if you omit the semicolon for a statement inside a function, it acts as an implicit return. Another interesting thing is that even certain control flow structures like if are expressions in Rust (similar to Lisp).

Of course, our iterator needs to keep track of the state, and that’s what the only field inside the struct AltUnder10 is for. Note that the struct is declared as a tuple, meaning it has unnamed fields. This is why the field is referred to as self.0. If you declared the struct as struct AltUnder10 { curno: i32 } instead, you’ll be referring to the field as self.curno inside the implementation. Creation of its variable inside main() will also require some change.

One last thing is about the input argument to next(). The name looks similar to the same thing in Python and the implicit this in C++. But what is &mut? There are two concepts here. The ampersand stands for a reference that does not change the ownership. Whoever calls alt_numbers.next() passes the reference to alt_numbers while still retaining the ownership. mut makes the reference mutable, without which next() won’t be able to update the state of the object. So, finally, we’re on to something that is related to ownership and memory management!

Syntax ‘sugar’ and zero-cost abstractions

When you are sure that an option value cannot be None, you can avoid matching on it and directly use the value by calling unwrap() on it (for example, haystack.find(needle).unwrap() gives you the position directly). But never ever misuse the feature. Matching on None will result in an abort. unwrap() should be used only when you are really sure, like inside an if block that already performs the necessary checks.

There is another syntax ‘sugar’ related to the option type that is safe but is for a different purpose. It is the question mark operator, which you can affix with every expression that would result in an option value to automatically return None when any of those values is None. As you might’ve guessed, this can only be used inside functions that return option types.

One last word is about performance. Hard core C programmers, even if they appreciate the beauty of these high-level features, will be reluctant to use them worrying about the performance penalty. That’s where Rust’s promise of zero-cost abstractions comes in. For example, if T is a reference type in Option<T>, Rust won’t introduce any additional tag field, but simply use the NULL value to represent None. Yes, this is exactly what a C programmer does with pointers, but the point is, Rust keeps you away from the dangers of using NULL directly.

The official Rust documentation includes a benchmark result in which the use of an iterator is faster than loops. That’s surprising and impressive. So you don’t need to go back to low-level constructs and concepts unless you’ve proven the high-level ones slow or incompatible with the systems you want to deal with.

LEAVE A REPLY

Please enter your comment!
Please enter your name here