In my last post I talked about how I made a enjoyable little display of fireworks for the terminal. It was fun to make and it’s fun to watch when meetings get boring (always).

Fireworks can be found here.

However after running my fireworks for a while, I started noticing some funky details. The fireworks could be a little flickery at times. And not in a good way.

Well we couldn’t have that! So it was time to investigate.

What do the flickers mean Mason?

The first thing I noticed was this wasn’t a problem on Linux, it only happened on macOS.

I do most of my development on a beefy Linux machine with 6 real cores (12 total threads) and figured perhaps this was just a specs thing, so I fired up my 7 year old ThinkPad running CentOS and still didn’t have flickering. Compared to my 2019 work MacBook, it was smooth as butter. Could this be a Mac only bug?

On a whim I remembered how iTerm is not the fastest terminal in the world and decided to try my fireworks on the slowest terminal I could think of. The WSL terminal.

I broke out my old Windows machine and after a half an hour, got WSL and fireworks up and running. The result:

flickering in action

The Windows terminal running WSL exhibited the same behaviors as on macOS, but much more frequently and much more consistently. This lead me to believe that the problem was in drawing. I must not be doing it efficiently. Other terminal applications can run flicker free in Windows terminal, so why not me too?

The old firework rendering pipeline

The old rendering pipeline was very simple. It used two total threads, one for input, and one for all rendering work. This is a simple approximation of it:

renderer-v1

If you look at this for long enough, you’re going to see some low hanging fruit to optimize away.

  • we’re making fireworks on the same thread we draw points
  • we’re clearing way more points than we need to
  • we’re drawing way more points than we need to

Let’s dig into each one individually.

we’re making fireworks on the same thread we draw points

The maybe generate new firework and advance fireworks one step happens before drawing every single time. In order to keep a consistent framerate, this would have to be completed in a fraction of a fraction of a second. I’ve done my best to optimize this as much as possible, but especially with the more exciting fireworks I plan on adding later, this won’t be possible in the future.

Firework generation and advancing should happen on another thread.

we’re [clearing|drawing] more points than we need to

The painting implementation is dumb as rocks, and simply:

  • creates a list of every single point to be drawn
  • creates a list of every single point to be cleared

This means on each loop, we draw points that were already on the screen, and clear points just to have them repainted shortly after.

Drawing is easily the slowest part of the application, so that’s not good at all.

How do we deal with this?

Given our two problems, I felt it would make the most sense to move as much of the actual firework building work to another thread. While overkill for how we’re currently figuring out the points we need to draw fireworks, if we move the work to another thread we’ll have time for all of the set logic required to only draw and clear the points that have changed.

The new architecture

The new architecture revamps a few things, but mostly just moves them around with the addition of a new compositor component.

renderer-v2

The compositor runs in a separate thread and sets up all the points required for all of the fireworks ahead of time.

Originally when planning up this v2 architecture I was only going to prep one frame of fireworks at a time, but then I figured, why do just one? Why not get a few ready ahead of time?

Rust has sync_channels which easily lets you do this. Set the max buffer size and it won’t insert more than that. For my Java folks out there, it’s conceptually similar to a LinkedBlockingQueue for sharing data across threads.

It’s just a regen buffer

At this point I was thinking myself very clever for coming up with such a new innovative idea. Imagine what others will think when I tell them about this cool new concept of async rendering!

I put my head down and tried to think of all of the other applications that might find this technique useful, only to realize… a lot of other applications already did something like this.

Distraught but still hopeful I had “really done something” I Google’d around only for my hope to meet its demise on the framebuffer wikipedia page.

In computing, a screen buffer is a part of computer memory used by a computer application for the representation of the content to be shown on the computer display. The screen buffer may also be called the video buffer, the regeneration buffer, or regen buffer for short. Screen buffers should be distinguished from video memory. To this end, the term off-screen buffer is also used.

Well so much for that.

Implementation

This post would be no fun without a bit of discussing how it was made. We can spend all day in the ivory tower talking about architecture, but entering the trenches and actually writing the code is a little different.

Channel setup

If you look at the v2 architecture diagram you’ll see there are 3 components now. I wanted to give them all a chance to shut down completely, so the component accepting input now takes a list of senders it will send request to kill signals to.

Likewise, we need to set up the buffer for which we can send fireworks to the renderer. This is all done like so:

fn main() {
  let (input_sender_1, input_receiver_1): (Sender<bool>, Receiver<bool>) = mpsc::channel();
  let (input_sender_2, input_receiver_2): (Sender<bool>, Receiver<bool>) = mpsc::channel();

  let (regen_buffer_filler, regen_buffer): (SyncSender<Drawables>, Receiver<Drawables>) =
    mpsc::sync_channel(5);

  let mut stdout = stdout()
    .into_raw_mode()
    .expect("Unable to capture stdout. Exiting.");

  // all internal modules in fireworks
  input::capture(vec![input_sender_1, input_sender_2]);
  compositor::start(seed, regen_buffer_filler, input_receiver_2);
  renderer::start(&mut stdout, regen_buffer, input_receiver_1);
}

A design decision I made here was to not have the compositor be spawned by the renderer. I didn’t want the renderer to know anything about the compositor, just that it will receive Drawables from some magic buffer. Who knows what’s filling it!

Input capturer

This barely changes and isn’t really isn’t worth talking about. It has now been upgraded from an 18 line file to a 20 line file.

Poor input capturer; perhaps you’ll get a longer section in the future.

Renderer

The renderer changes in a few exciting ways, the most exciting in that it is gutted. No longer does it spin up a firework state machine to get Drawables, now it just waits at a channel, lonely and blocking until it gets some input.

let to_draw = regen_buffer.recv().expect("Compositor unexpectedly died!");

Otherwise the renderer is the same. The goal of the renderer is to be as dumb as possible and it is certainly achieving that.

Compositor

Finally the compositor, which is 30 or so lines that really come down to: ticking the state machine and sending the points it made somewhere.

pub fn compositor_loop(
  seed: u64,
  regen_buffer_filler: SyncSender<Drawables>,
  end_signal: Receiver<bool>,
) {
  let mut state_machine = state_machine::StateMachine::new(seed, terminal_width());

  loop {
    let drawables = state_machine.tick(terminal_width());
    if regen_buffer_filler.send(drawables).is_err() {
      panic!("Renderer unexpectedly died!");
    }
    if let Ok(v) = end_signal.try_recv() {
      if v {
        return;
      }
    }
  }
}

And that’s it for the new architecture components!

So then where are all the code changes??

There are only a few. Really, the bulk of the work was in architectural changes.

The final changes, the ones that affect drawing are in the firework itself. We can do a whole lot more work now that we can work in a separate thread, and so the Drawable trait we worked with in the previous post goes from:

impl Drawable for TailPoints {
  fn draw(&self) -> &[Point] {
    &self.tail
  }
  fn clear(&self) -> &[Point] {
    &self.old_tail
  }
}

to:

impl Drawable for TailPoints {
  fn draw(&self) -> Vec<Point> {
    difference(&self.tail, &self.old_tail)
  }
  fn clear(&self) -> Vec<Point> {
    difference(&self.old_tail, &self.tail)
  }
}


fn difference(points_1: &[Point], points_2: &[Point]) -> Vec<Point> {
  let set_1 = points_1.iter().cloned().collect::<HashSet<Point>>();

  let set_2 = points_2.iter().cloned().collect::<HashSet<Point>>();
  set_1.difference(&set_2).cloned().collect()
}

Looking at the two, you’ll see the first is infinitely faster and also does infinitely fewer allocations. And that’s how it was meant to be! As Drawables had to be ready in real time, the implementation had to be fast as lightning. But now they can take a little longer and do a lot more work. With async rendering, they can do (roughly) as much work as they like and still be real time!

All this extra work affords us some really slick optimizations. Post work we:

  • paint exactly what has changed and nothing else
  • clear exactly what needs to disappear and nothing else

These two things optimizations vastly increase the speed of drawing fireworks.

Downsides

Well there is one, and if a point is ever drawn over and then cleared, it will never be redrawn, meaning there can occasionally be small holes in the fireworks. Something like this happens when two fireworks collide.

Solving this wouldn’t be too hard, but the fireworks still look pretty nice and I think the asymmetrical nature of ones with little holes gives them some character.

So was it all worth it?

Well I’ll let the results speak for themself.

results

Thanks for reading! If you liked this post, feel free to check out my blog posts about the silly CLIs I made using rust :)

Fireworks source and binaries can be found here.