Darrien's technical blog

Documenting the technical stuff I do in my spare time

My rust experiences over a year

I had a whole lot of fun with the Typeracer post I made showing how the program itself had changed throughout its lifetime.

What I didn’t mention (although is fairly obvious if you look at the code) is that Typeracer was my first real exposure to Rust.

I went through a bit of the rust book and wrote a tiny compression tool that did a sort of run length encoding algorithm, but both of them were super simple and I really wanted something I could work on for longer to understand the language.

Working on the two I almost never interacted with the borrow checker. I had no idea what I was getting myself into. But there had to be a reason for uh, super vocal subreddits about rust, so I figured it was time to see what all the hype was about.

Some background about me

When I started Typeracer, I was a year out of college (UMass Lowell) with a major in Computer Science. UMass Lowell (UML) teaches C like it were 1990 with some Computing I courses enforcing usage of ansi C. You learn a little C++ the following year, but none of the followup classes really focus on it and of course you end up writing C again for the OS type courses.

Otherwise I was quite fluent in Java and Kotlin (previously employed), knew Python quite well, and was regretfully getting acclimated with Go at my current employer.

In other words, I wasn’t new to programming and had a systems background. With all that in mind, let’s jump in.

Before we get into the content

The structure of this will follow roughly the structure of the previous typeracer post comparing releases over time. Roughly each release I’ll talk about what I was struggling with and what I learned along the way while working on it.

The beginning

commit: 3476353d45c7dd330d33cac40567587d7b6d90b6 the first thing that kinda looked like a typeracer. To be honest I didn’t really know what was going on here.

I’d been experimenting for a little bit and copied stuff from some tui demos, some Termion demos, and used to_owned on strings but had no idea why I needed to own the string.

Speaking of which, I barely had any idea what the borrow checker was. I had read the book, but I still hadn’t really run into it. It kind of just felt like I was programming weird C. And I needed to put this result thing on my main.

I was flailing a bit, but it seemed to work, and so I was happy.


Pre-First release - What is rust?

commit: 4e6f7e711c560d0df0a463c01063f4d349d93106

At this point I’d done my first real bit of coding - now writing the indexer algorithm that highlighted text as I went along. I had my first battle with the borrow checker, and yet I’m proud to say there wasn’t much cloning along the way.

I was a little scared of it though and my global consts for layout width were stored methods rather than const (immutable) variables because I thought the borrow checker would yell at me if I made them variables.

Otherwise it was going pretty smooth.


version-0.9 The first release - The hell is a module?

version-0.9

There were a couple of cool stability and feature fixes here, but the biggest “technical” accomplishment here was learning how the rust module system worked.

I’ve worked in a number of programming languages, but the rust module system was something else. When I started, rust 2018 was still sort of new too, so examples were a mish mosh of old and new.

For someone new to the experience how was I supposed to know this:

1
2
3
4
5
$ ls -T
.
├── main.rs
└── submod
   └── mod.rs

And this:

1
2
3
4
$ ls -T
.
├── main.rs
└── submod.rs

Were the same?

But also that for it to take effect, you couldn’t just include, you had to do a:

1
2
# main.rs
mod submod;

And all submodules of submod must go in the submod folder unless you use a mod scope:

1
2
3
mod submod {
  # some cool rust code
}

And if I want to use those modules in other modules:

1
2
3
4
5
$ ls -T
.
├── main.rs
├── submod_a.rs  # I want to use this in submod_b.rs
└── submod_b.rs

You need to make a pub mod in the main.rs:

1
2
# main.rs
pub mod submod_a;

And then include it as a crate (referring to the root crate, AKA your current project) in submod_b:

1
2
# submod_b.rs
use crate::submod_a;

This is all well and easy now, and I understand a lot of the design decisions that went into it, but I have yet to find a tutorial that explains it so succinctly.

Even the rust book goes into super and self, and single file modules and while it eventually does get to the point, I was so confused by the time I got to the part I wanted I had no idea what was going on (at least initially).

Anyway I figured it all out while getting ready for an interview with American Well. They took ages to call which gave me plenty of time to figure it out.


v1.0.1 - RustFormat 🙏

v1.0.1

The most exciting thing about this release was that I learned about rustfmt

I will never voluntarily, manually format code again.


v1.0.5 - slice it up

v1.0.5

Thanks to some trial runs of clippy on my codebase, I found out that slices and vectors could be used interchangeably in a lot of places.

Ever take the address of a vec? Just make it a slice!

1
&Vec<&str> == &[&str]

They don’t have the exact same properties, but fills the role of a “more generic version of a vector” kind of like how java.util.List is the base form of java.util.ArrayList


v1.10.0 - lifetimes and pals

v1.10.0

It had now been almost a month since I started the project, and at this point I FINALLY figured out what a lifetime was. You can read the rust book on lifetimes as many times as you want, but is nothing really like getting yelled at by the borrow checker.

At this point I had replaced all my usages of 'static with explicitly named lifetimes like 'a and pals. Goodbye memory leaks :)

I will not be explaining them like the module system because there are a lot of good tutorials on lifetimes and more importantly, you understand them best by experiencing them.


v1.1.1 - libgit

v1.1.1

This isn’t exactly a rust thing, but I figured it was worth putting here. If you think you’re fluent in git and that you’ll be able to pick up libgit with exactly the same knowledge, boy do I have a surprise for you.

Ever heard of a ref? No? Do you like your git checkout command? Well it works almost nothing like the CLI checkout. It turns out checkout on the CLI is a heavily overloaded command and works completely different.

Most of my time here was cross referencing C API docs for libgit and the rust libgit2 docs.


v1.2.1 - Errors and options

v1.2.1

It had been a few months since I started, and at this point I finally understood the point of results and options. Before I would just throw expects around options and results and try to make sure we never got a bad expect.

With the new config parser and validator, I finally understood options and results. I made my own and really wished they were available in my day job writing Go (and Python).

I write Java now and my current company has mercifully ripped the Rust Result type and shoved it into Java.

In short, Option for when something might not be there, and Result for when something might not be there and you need a reason why. I miss ADTs whenever I use other languages.


v1.6.0 - Encoding hell

v1.6.0

I knew about different encodings, and I knew Rust supported UTF-8 (all strings are UTF-8 no matter what) and so when someone asked for Chinese support in Typeracer, or rather, non-latin support I figured either:

  • It would be a breeze
  • Surely the bug report must be wrong, rust must handle this already

WRONG

While all strings in rust are UTF-8 and cannot be anything else without using something like bstr as an alternative string type, the unicode standard is… interesting, and you can still do terrible things with strings.

Take for instance: 你好

And you want to take the length of the string.

1
2
3
4
fn main() {
    let interest_string = "你好";
    println!("String len: {}", interest_string.len())
}

You’ll get…

1
2
3
4
5
unicode-pain [master●] cargo run
   Compiling unicode-pain v0.1.0 (/tmp/unicode-pain)
    Finished dev [unoptimized + debuginfo] target(s) in 0.12s
     Running `target/debug/unicode-pain`
String len: 6

6? Huh?

There are only two characters here. In fact if you ask for char count, you’ll get 2.

1
2
println!("char count: {}", interest_string.chars().count())
// char count: 2

I’m not going to go too deep into why unicode is like this. If you’re really interested in Unicode, here’s a pretty solid post on how Unicode and text encoding probably doesn’t work the way you think it does.

Anyway what was problematic was that used in a number of places we:

  • split on space and called it a word (Chinese does not have spaces)
  • used string len in a number of places for indexing our higher level formatted word types

The first issue meant wpm would be very off for Chinese and other languages without spaces.

The second issue was very problematic because it caused crashes literally everywhere.

For reference, for highlighting words as they’re typed, we use tui’s Text type for formatting. While a user types, we keep a copy of the full text (split on the word boundary) and the whole formatted Text. The Vec<Text> is updated throughout the life of the program and the copy of the raw string that makes the passage is immutable.

Much of the hard work here was replacing all word len checks with grapheme length checks:

1
UnicodeSegmentation::graphemes(s, true).collect::<Vec<&str>>();

via the great UnicodeSegmentation crate which I did here and the much harder part, adding a second mode for properly counting and highlighting multibyte characters which was done here.

Once that was done. I had a much better understanding of Unicode, and typeracer could now understand multibyte characters!

Throw some emojis in there if you really want. I won’t stop you 💃


v1.7.0 Graphs and friends

v1.7.0

I didn’t really learn a whole lot about rust here, but I did learn that if you’re making a lot of writes to a sqlite database, make sure they’re in a single transaction.

You might say - well yeah, I do that normally with MySQL or whatever, but it’s a number of order of magnitudes slower without a transaction.

If you do say, 10,000 trivial writes to a SQLite DB outside of a transaction, it can take upwards of 30 minutes (at least on my hardware). Put it all in a transaction, and it barely takes a second.


And that’s been my rust journey so far. It’s easily become my favorite language. The growing pains were there, but the safety it guarantees is something I wish I had at my Java day job.

The takeaway if there is one, is that if you’re trying to learn rust and finding it a little difficult, or aren’t sure if you want to learn, I promise the payoff at the end is worth it.

Perhaps the other takeaway is that you should try out terminal-typeracer for yourself as well :)

Share on: