sh$ zig version 0.9.1
Zig is a relatively new (2016) low-level programming language that could compete with C for embedded applications and library writing. Zig has excellent interoperability with C, but can also be used as a stand-alone programming language and even bare metal.
At the date of this writing (Q2 2022), the Zig language hasn’t still reached its 1.0 release, so it is yet subject to many changes and evolutions. It is nevertheless functional and is shipped with its (still experimental) standard library written 100% in Zig.
While writing this article, I used Zig 9.1.0 installed following the instruction given on the ziglearn.org website.
sh$ zig version 0.9.1
I also assume a previous experience with an imperative programming language, ideally C or C++, since you may be more familiar with the notions of data types and pointers.
For the rest of this article, I will guide you step by step in writing in Zig a command-line utility to count the characters, and lines contained in a file, very much like the POSIX wc
utility.
For now, copy the code above into a new text file named wc-v1.zig
.
Then you will compile that program using the zig build-exe
command:
sh$ zig build-exe wc-v1.zig
This step aims to turn the Zig source file into an executable program.
It should produce a new file named wc-v1
in the current directory:
sh$ ls -l total 588 -rwxrwxr-x 1 sylvain sylvain 589840 mai 2 11:25 wc-v1 -rw-rw-r-- 1 sylvain sylvain 618 mai 2 11:28 wc-v1.zig drwxr-xr-x 5 sylvain sylvain 4096 mai 2 11:25 zig-cache
Note
|
Zig has also created the |
As you can see, the wc-v1
program has the x
bit set, meaning it’s an executable program.
Let’s try it immediately:
sh$ ./wc-v1 0 chars, 0 words, 0 lines
For now, it’s not very exciting, but let’s examine the program in more detail to discover a little bit more about the Zig syntax.
In the source file above, you can see many lines starting with //
.
Those are comments ignored by the compiler but helpful to explain how a program works to other programmers (or to the future you).
A comment in Zig starts with //
and continues up to the line’s end.
There are no multi-line comments in Zig.
A Zig program can be divided into packages to improve reusability or manage complexity.
The standard and third-party libraries are shipped as packages.
Large scope packages can be subdivided into sub-packages.
It is the case for the standard library, whose root package is accessible under the name "std"
(with quotes).
To use a package, you need to import it using the @import
internal function.
But you also need to bind that package to a symbol to make it accessible to the rest of the code.
Here, for consistency, I bind it to the (newly declared) constant std
(no quotes).
As a best practice, Zig encourages you to use constants (const
) whenever you can.
But sometimes, you have to store data you will update during the program life’s time.
You will use a variable (var
) declaration in that case.
The var
keyword is followed by a user-defined symbol, which becomes the variable’s name.
After the variable’s name, you see the :u32
declaration.
It specifies the type of the variable.
Zig is a statically typed programming language.
The compiler has to know how much space to reserve in memory to store your data and how to interpret those data.
The u32
notation means our data are 32-bits unsigned integers.
In other words, each of our variables can store a positive number in the range 0~232-1.
I assume it should be sufficient to count the number of characters in a document.
If it’s not enough for you, you may change the variable’s type to u48
or u64
if you prefer.
On the other hand, if you think it’s too much, why not reduce our program’s memory consumption by using u24
or even u16
?
Zig allows arbitrary bit-width for integer variables, so you may choose the number of bits that suits you the most.
Aside from the const
and var
declarations, function definitions are the most used kind of statement in Zig.
In the purest tradition of imperative programming language, a Zig program is divided into several functions that are individual units of execution.
Functions definition will likely change before the Zig 1.0 release to make them more consistent with the rest of the syntax.
For now, they are introduced by the keyword fn
followed by the developer-assigned function’s name, a pair of parenthesis, and the function’s return type.
Like in C, when the return type is void
, the function returns nothing (in other programming languages, we talk about procedures in that case).
A typical program is made of hundreds of functions.
Among them, the one named main
is special since your program’s execution starts with its body’s first instruction and continues up to its end.
Here, the body of the function, defined in curly braces, contains just one statement:
a call to the std.debug.print
function to write a message on the console.
Remember, The std.
prefix means the standard library provides this function.
Like the C printf()
function or the Python format()
function, std.debug.print
is used for formatted output.
The message to display contains placeholders that will be replaced by actual values when the function is executed.
In Zig, the placeholders are {}
.
They will be replaced by the values given in the .{ … }
.
Since our counter variables were initialized to 0 and never touched again, each {}
is replaced by 0
in the output.
You may try to recompile the program by changing the initialization value of the chars
variable, for example.
The new value should appear in the output (don’t forget to save your changes and to recompile the program).
Note
|
Character strings are always enclosed by double quotes in Zig.
You may also notice the |
To make our program useful, it needs to read data from somewhere.
For the sake of simplicity, the program will read data from the standard input stdin
.
The standard Zig package std.io
provides a function to access the standard input:
Once you have access to stdin, you can read bytes from that stream into a buffer using the following piece of code:
stdin.readAll(&buffer);
But we won’t use that directly from the main
function. Instead, to keep things organized, we will create a new function to perform the heavy work of reading the file and counting the number of characters read.
I introduced several new language constructs in the code above.
First, consider the buffer
variable declaration: The buffer is a sequence of consecutive bytes in memory.
This is called an array in Zig.
To declare an array, you prefix the individual items' data type by the array length.
A byte is an 8-bits integer (u8
), so an array of, say, 256 bytes is declared with the type [256]u8
.
Then look at the buffer’s initialization.
Zig makes variable initialization mandatory.
You use the special undefined
value when you have nothing meaningful to put in a variable at its declaration site.
It is the case here, since the actual content of the buffer will be read from the input stream on the next line.
The actual work of reading from the file is done by the readAll
function of the stdin
structure.
If you have familiarity with object-oriented programming, it would be tempting to use the word object here.
But Zig is not object-oriented.
It does not have inheritance, polymorphism, or dynamic dispatch.
The &buffer
notation allows passing the location in memory of the array (its address, we also say "a pointer" to the array).
The readAll
function will populate the bytes starting at that location with the data coming from the input stream.
But not all files are exactly 256 bytes long.
So, the readAll
function returns the number of bytes actually read.
We will display that result to the user.
That solves the case for files shorter than 256 bytes.
But what will append if the file is longer?
Well, contrary to what append in C, for example, when passing an array to a function in Zig, the receiver knows the size of that array.
So the readAll
function knows it must not read more than 256 bytes.
Take a closer look at the definition of the function count
.
Did you notice the exclamation mark?
And look at the call to stdin.readCall
.
Did you see the try
keyword?
Both are related to error handling.
In Zig, errors are just integers. There is no such thing as an error structure (at the time of this writing, at least). However, and even if they just are numbers, error codes are first-class citizens in Zig. The language has extensive features to handle them. And, by design, you cannot ignore an error returned by a function.
When a function may fail with an error, its return type is prefixed by the exclamation mark.
And in the body of a function, if you call a function that may fail, you either have to handle the error locally (we will see that in a moment) or propagate ("return") the error to the parent function in the call stack.
It’s the purpose of the try
keyword. Let me summarize the sequence of events:
If there is an I/O error while reading the input stream, the readAll
function will abort processing by returning an error code.
The count
function will receive that error code on the line containing the call to readAll
Thanks to the try keyword, our count
function will, in its turn, stop processing, returning that error to its caller, which is… well, I haven’t talked about that yet. So fix that immediately.
We call the count
function from the main program.
You may have noticed the return type of the main program is void
, not !void
.
That means the main program cannot return an error.
So if any sub-function call may return an error, it has to be handled by the main.
It’s the purpose of the catch
construct.
It will capture the error’s value in the new variable err
, then introduces a new block containing the developer-defined error handler.
In this program, I simply inform the user an error has occurred.
I let you compile and run the program to see how it works now:
sh$ zig build-exe wc-v2.zig sh$ echo "hello world" | ./wc-v2 12 byte(s) read 0 lines, 0 chars
However, it is not easy to test error handling in this program unless you have some faulty USB stick at hand. The best we can do for now is to add an instruction to simulate an error:
Here, if the buffer contains an X
as the starting byte, it will trigger an error.
Please notice the character X
is enclosed in simple quotes.
That’s how you write character constants in Zig.
Please do not confuse them with strings of characters enclosed by double-quotes.
Let’s test that:
sh$ echo "hello world" | ./wc-v3 12 byte(s) read 0 lines, 0 chars sh$ echo "Xhello world" | ./wc-v3 Error error.Unexpected while reading file 0 lines, 0 chars
We have made lots of progress in the preceding section.
But the code required to update the chars
and lines
variables is still missing.
In addition, we somehow have hard-coded the maximum file length to 256 bytes.
Let’s take a more realistic approach by looping over stdin.readAll
to read all data from the input file regardless of its length:
At this point, you should be able to understand most of the changes. Two remarks, though: first, I reduced the buffer’s size to one byte. It makes counting chars and detecting the end of the file trivial, even if that does not seem very efficient.
Second, look at the done
variable definition.
It does not contain a type declaration because Zig can infer the variable’s type from its initialization value.
true
and false
are two predefined constants representing the possible value of a boolean (bool
) variable.
Initializing done
to false
implicitly makes it a variable of the bool
type.
sh$ zig build-exe wc-v4.zig sh$ ./wc-v4 < /etc/passwd 0 lines, 2645 chars
As of now, the program compiles and counts the number of characters read from the input stream.
But not the number of lines.
As an exercise, I suggest you modify the code to detect the newline character ('\n'
) and update the lines
variable accordingly.
I won’t give you the solution, but if you are really stuck, you may try to look at the next section to find some inspiration.
Note
|
Do you remember we saw Zig can store integers using an arbitrary number of bits?
Try to reduce the |
I left you in the last part with a program that counts the number of characters in a file while reading it byte per byte. A better option would probably require reading a few kibibytes block of data, then iterating from memory through that block to count the characters and detect the new lines. Repeating these operations until the entire file has been read. Let' start with the easiest part: increasing the buffer size.
But we also need to loop over the bytes read:
I briefly introduce two new notations here.
The buffer[0..nb]
syntax takes a slice from an array.
Think of that as a sub-array.
The goal here is to consider only the part of the array whose bytes were updated in the last read operation.
Slicing does not copy the data.
I merely store the slice length and a pointer to the slice’s start in the existing array.
So it’s a very fast operation.
Lastly, you can iterate over the items of a slice or array using a for
loop with capturing syntax.
It will execute its body once for each item of the slice or array, binding at each iteration a symbol (here c
) to the currently examined item.
All that put together leads us to that final version of the program:
And we’re done.
Well, I’m done.
But the program can be improved in many ways.
For example, you may detect word boundaries to also count the number of words in the file.
Or, for the bravest among you, you could investigate the `std.unicode.utf8ByteSequenceLen
function to count the number of UTF-8 characters in the input files, rather than simply (and erroneously) considering than one byte is one character as we did here.
As always, don’t hesitate to experiment and share your finding on social networks!