Yes, I know
How to write my first Zig program

Discover Zig and write your first command-line utility.

Share 

What is Zig?

Zig is a relatively new (2016) low-level programming language that could compete with C for embedded applications and library writing. Zig has excellent interoperability with C, but can also be used as a stand-alone programming language and even bare metal.

At the date of this writing (Q2 2022), the Zig language hasn’t still reached its 1.0 release, so it is yet subject to many changes and evolutions. It is nevertheless functional and is shipped with its (still experimental) standard library written 100% in Zig.

Prerequisites

While writing this article, I used Zig 9.1.0 installed following the instruction given on the ziglearn.org website.

sh$ zig version
0.9.1

I also assume a previous experience with an imperative programming language, ideally C or C++, since you may be more familiar with the notions of data types and pointers.

First steps

For the rest of this article, I will guide you step by step in writing in Zig a command-line utility to count the characters, and lines contained in a file, very much like the POSIX wc utility.

For now, copy the code above into a new text file named wc-v1.zig. Then you will compile that program using the zig build-exe command:

sh$ zig build-exe wc-v1.zig

This step aims to turn the Zig source file into an executable program. It should produce a new file named wc-v1 in the current directory:

sh$ ls -l
total 588
-rwxrwxr-x 1 sylvain sylvain 589840 mai    2 11:25 wc-v1
-rw-rw-r-- 1 sylvain sylvain    618 mai    2 11:28 wc-v1.zig
drwxr-xr-x 5 sylvain sylvain   4096 mai    2 11:25 zig-cache
Note

Zig has also created the zig-cache directory. Zig uses it to cache some compilation artifacts to speed up executable building. We won’t talk about it in this article.

As you can see, the wc-v1 program has the x bit set, meaning it’s an executable program. Let’s try it immediately:

sh$ ./wc-v1
 0 chars, 0 words, 0 lines

For now, it’s not very exciting, but let’s examine the program in more detail to discover a little bit more about the Zig syntax.

Comments

In the source file above, you can see many lines starting with //. Those are comments ignored by the compiler but helpful to explain how a program works to other programmers (or to the future you).

A comment in Zig starts with // and continues up to the line’s end. There are no multi-line comments in Zig.

Packages

A Zig program can be divided into packages to improve reusability or manage complexity. The standard and third-party libraries are shipped as packages. Large scope packages can be subdivided into sub-packages. It is the case for the standard library, whose root package is accessible under the name "std" (with quotes).

To use a package, you need to import it using the @import internal function. But you also need to bind that package to a symbol to make it accessible to the rest of the code. Here, for consistency, I bind it to the (newly declared) constant std (no quotes).

Variables

As a best practice, Zig encourages you to use constants (const) whenever you can. But sometimes, you have to store data you will update during the program life’s time. You will use a variable (var) declaration in that case.

The var keyword is followed by a user-defined symbol, which becomes the variable’s name. After the variable’s name, you see the :u32 declaration. It specifies the type of the variable.

Zig is a statically typed programming language. The compiler has to know how much space to reserve in memory to store your data and how to interpret those data. The u32 notation means our data are 32-bits unsigned integers. In other words, each of our variables can store a positive number in the range 0~232-1. I assume it should be sufficient to count the number of characters in a document. If it’s not enough for you, you may change the variable’s type to u48 or u64 if you prefer. On the other hand, if you think it’s too much, why not reduce our program’s memory consumption by using u24 or even u16? Zig allows arbitrary bit-width for integer variables, so you may choose the number of bits that suits you the most.

The main program

Aside from the const and var declarations, function definitions are the most used kind of statement in Zig. In the purest tradition of imperative programming language, a Zig program is divided into several functions that are individual units of execution.

Functions definition will likely change before the Zig 1.0 release to make them more consistent with the rest of the syntax. For now, they are introduced by the keyword fn followed by the developer-assigned function’s name, a pair of parenthesis, and the function’s return type. Like in C, when the return type is void, the function returns nothing (in other programming languages, we talk about procedures in that case).

A typical program is made of hundreds of functions. Among them, the one named main is special since your program’s execution starts with its body’s first instruction and continues up to its end.

Here, the body of the function, defined in curly braces, contains just one statement: a call to the std.debug.print function to write a message on the console. Remember, The std. prefix means the standard library provides this function. Like the C printf() function or the Python format() function, std.debug.print is used for formatted output. The message to display contains placeholders that will be replaced by actual values when the function is executed. In Zig, the placeholders are {}. They will be replaced by the values given in the .{ …​ }. Since our counter variables were initialized to 0 and never touched again, each {} is replaced by 0 in the output. You may try to recompile the program by changing the initialization value of the chars variable, for example. The new value should appear in the output (don’t forget to save your changes and to recompile the program).

Note

Character strings are always enclosed by double quotes in Zig. You may also notice the \n metacharacter in our sample program. It’s used to represent the newline character.

Reading an input stream

Using stdin

To make our program useful, it needs to read data from somewhere. For the sake of simplicity, the program will read data from the standard input stdin. The standard Zig package std.io provides a function to access the standard input:

Once you have access to stdin, you can read bytes from that stream into a buffer using the following piece of code:

stdin.readAll(&buffer);

But we won’t use that directly from the main function. Instead, to keep things organized, we will create a new function to perform the heavy work of reading the file and counting the number of characters read.

Working with arrays

I introduced several new language constructs in the code above. First, consider the buffer variable declaration: The buffer is a sequence of consecutive bytes in memory. This is called an array in Zig. To declare an array, you prefix the individual items' data type by the array length. A byte is an 8-bits integer (u8), so an array of, say, 256 bytes is declared with the type [256]u8.

Then look at the buffer’s initialization. Zig makes variable initialization mandatory. You use the special undefined value when you have nothing meaningful to put in a variable at its declaration site. It is the case here, since the actual content of the buffer will be read from the input stream on the next line.

The actual work of reading from the file is done by the readAll function of the stdin structure. If you have familiarity with object-oriented programming, it would be tempting to use the word object here. But Zig is not object-oriented. It does not have inheritance, polymorphism, or dynamic dispatch.

The &buffer notation allows passing the location in memory of the array (its address, we also say "a pointer" to the array). The readAll function will populate the bytes starting at that location with the data coming from the input stream.

But not all files are exactly 256 bytes long. So, the readAll function returns the number of bytes actually read. We will display that result to the user. That solves the case for files shorter than 256 bytes. But what will append if the file is longer? Well, contrary to what append in C, for example, when passing an array to a function in Zig, the receiver knows the size of that array. So the readAll function knows it must not read more than 256 bytes.

Handling errors

Take a closer look at the definition of the function count. Did you notice the exclamation mark? And look at the call to stdin.readCall. Did you see the try keyword? Both are related to error handling.

In Zig, errors are just integers. There is no such thing as an error structure (at the time of this writing, at least). However, and even if they just are numbers, error codes are first-class citizens in Zig. The language has extensive features to handle them. And, by design, you cannot ignore an error returned by a function.

When a function may fail with an error, its return type is prefixed by the exclamation mark. And in the body of a function, if you call a function that may fail, you either have to handle the error locally (we will see that in a moment) or propagate ("return") the error to the parent function in the call stack. It’s the purpose of the try keyword. Let me summarize the sequence of events:

  1. If there is an I/O error while reading the input stream, the readAll function will abort processing by returning an error code.

  2. The count function will receive that error code on the line containing the call to readAll

  3. Thanks to the try keyword, our count function will, in its turn, stop processing, returning that error to its caller, which is…​ well, I haven’t talked about that yet. So fix that immediately.

We call the count function from the main program. You may have noticed the return type of the main program is void, not !void. That means the main program cannot return an error. So if any sub-function call may return an error, it has to be handled by the main. It’s the purpose of the catch construct. It will capture the error’s value in the new variable err, then introduces a new block containing the developer-defined error handler. In this program, I simply inform the user an error has occurred.

I let you compile and run the program to see how it works now:

sh$ zig build-exe wc-v2.zig
sh$ echo "hello world" | ./wc-v2
 12 byte(s) read
 0 lines, 0 chars

However, it is not easy to test error handling in this program unless you have some faulty USB stick at hand. The best we can do for now is to add an instruction to simulate an error:

Here, if the buffer contains an X as the starting byte, it will trigger an error. Please notice the character X is enclosed in simple quotes. That’s how you write character constants in Zig. Please do not confuse them with strings of characters enclosed by double-quotes. Let’s test that:

sh$ echo "hello world" | ./wc-v3
 12 byte(s) read
 0 lines, 0 chars
sh$ echo "Xhello world" | ./wc-v3
Error error.Unexpected while reading file
 0 lines, 0 chars

Counting things

We have made lots of progress in the preceding section. But the code required to update the chars and lines variables is still missing. In addition, we somehow have hard-coded the maximum file length to 256 bytes. Let’s take a more realistic approach by looping over stdin.readAll to read all data from the input file regardless of its length:

At this point, you should be able to understand most of the changes. Two remarks, though: first, I reduced the buffer’s size to one byte. It makes counting chars and detecting the end of the file trivial, even if that does not seem very efficient.

Second, look at the done variable definition. It does not contain a type declaration because Zig can infer the variable’s type from its initialization value. true and false are two predefined constants representing the possible value of a boolean (bool) variable. Initializing done to false implicitly makes it a variable of the bool type.

sh$ zig build-exe wc-v4.zig
sh$ ./wc-v4 < /etc/passwd
 0 lines, 2645 chars

As of now, the program compiles and counts the number of characters read from the input stream. But not the number of lines. As an exercise, I suggest you modify the code to detect the newline character ('\n') and update the lines variable accordingly. I won’t give you the solution, but if you are really stuck, you may try to look at the next section to find some inspiration.

Note

Do you remember we saw Zig can store integers using an arbitrary number of bits? Try to reduce the u32 type used for the lines and chars variables to something ridiculously small like u3 or u4. What happens when you feed the program with a file large enough to overflow the allocated storage capabilities?

A final (?) version

I left you in the last part with a program that counts the number of characters in a file while reading it byte per byte. A better option would probably require reading a few kibibytes block of data, then iterating from memory through that block to count the characters and detect the new lines. Repeating these operations until the entire file has been read. Let' start with the easiest part: increasing the buffer size.

But we also need to loop over the bytes read:

I briefly introduce two new notations here. The buffer[0..nb] syntax takes a slice from an array. Think of that as a sub-array. The goal here is to consider only the part of the array whose bytes were updated in the last read operation. Slicing does not copy the data. I merely store the slice length and a pointer to the slice’s start in the existing array. So it’s a very fast operation.

Lastly, you can iterate over the items of a slice or array using a for loop with capturing syntax. It will execute its body once for each item of the slice or array, binding at each iteration a symbol (here c) to the currently examined item.

All that put together leads us to that final version of the program:

Done?

And we’re done. Well, I’m done. But the program can be improved in many ways. For example, you may detect word boundaries to also count the number of words in the file. Or, for the bravest among you, you could investigate the `std.unicode.utf8ByteSequenceLen function to count the number of UTF-8 characters in the input files, rather than simply (and erroneously) considering than one byte is one character as we did here. As always, don’t hesitate to experiment and share your finding on social networks!