Darrien's technical blog

Documenting the technical stuff I do in my spare time

Error handling in Java is error prone

There have been a number of debates over the years about the merits of checked versus unchecked exceptions. Kotlin’s approach when doing interop with Java is to ignore checked exceptions altogether, effectively turning every exception into unchecked exceptions. You may have your opinions about whether you prefer checked or unchecked exceptions, but I’m here to tell you both are problematic.

If you look deeply enough into this, you may come to the conclusion I think all exceptions are problematic. You would be right. However I work in Java every day, so this post will be from the Java perspective. With that said you can make the same conclusions for any other languages that use exceptions over other error handling methods.

Because this is in Java, and I don’t feel like adding all the verbosity: all examples will assume everything is in a class unless otherwise specified. Now let’s jump in.

Checked and unchecked exceptions

As a quick reminder, this is an example of a checked and unchecked exception.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
/**
 * This is an example of a checked exception.
 * It must be caught when this method is called, otherwise there will be a
 * compile error.
 *
 * @throws IOException if something bad happens.
 */
private SomeDataClass getFromDb() throws IOException {
  throw new IOException("trouble reading from DB");
}

/**
 * This is an example of an unchecked exception.
 * You do not have to catch it when the method is called. It may be thrown even
 * if you are not handling the error.
 *
 * @throws RuntimeException if something bad happens.
 */
private String doesSomething() {
  try {
    var data = getSomeData();
    return data.getString();
  } catch (IOException e) {
    throw new RuntimeException(e);
  }
}

The scenario

You are an enterprise grade Java developer working on a new feature in an already large codebase. You must integrate code from a number of different teams. Let’s play out how this might work.

This new feature is expected to gracefully handle failures. As such, you’d like to make sure you handle the errors ahead of time appropriately.

Your application must download a file from a URL provided, CDN it to your own network, and send off an event to Kafka so other folks can know it was CDN’d. Teams maintain code for all of these so you don’t have to write them yourself. Libraries with the core functionality are already provided.

Let’s start writing some code.

First you write a PoC that downloads the data. You set up a method to do just that.

1
2
3
4
5
private int cdnDataAndAudit(String urlToCdn) {
  HttpClient client = new HttpClient.builder().someParams().build();
  ByteBuffer buffer = client.get(urlToCdn);
  return 0;
}

As soon as you go to compile, you see an error. Your http client throws a checked IOException if it isn’t able to GET the data, and a BadRequestException if the URL is malformed. You catch and rethrow your exception with a company proprietary exception that tells your code to retry or fail forever.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
private int cdnDataAndAudit(String urlToCdn) {
  try {
    HttpClient client = new HttpClient.builder().someParams().build();
    ByteBuffer buffer = client.get(urlToCdn);
    return 0;
  } catch (IOException e) {
    throw new RetryException("Problem downloading the file", e);
  } catch (BadRequestException e) {
    throw new PermanentFailException("Given bad URL, can't download", e);
  }
}

With your error caught, now it’s time to upload the file to your own servers and send off Kafka audit. You use the clients provided by the two teams maintaining them.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
@Inject
private final AuditKafkaClient auditKafkaClient;

@Inject
private final CdnClient cdnClient;

private int cdnDataAndAudit(String urlToCdn, int userId) {
  try {
    HttpClient client = new HttpClient.builder().someParams().build();
    ByteBuffer buffer = client.get(urlToCdn);
    int fileId = cdnClient.cdnByteBuffer(buffer);
    auditKafkaClient.send(userId, fileId);
    return fileId;
  } catch (IOException e) {
    throw new RetryException("Problem downloading the file", e);
  } catch (BadRequestException e) {
    throw new PermanentFailException("Given bad URL, can't download", e);
  }
}

You manually test it and and everything seems fine. You write some unit tests, get it reviewed, and deploy it to prod. You feel good about your code and call it quit for the day.

The next day

You come in and hear the CdnService your CdnClient uses was having trouble last night. You don’t worry too much initially because you handle errors well, but being a good engineer, you check logs and see there are an abnormal number of PermanentFailureExceptions with the error Given bad URL, can't download around the time the CdnService was down. There are a number of customer complaints about flakey uploads too, and so you investigate.

After a little bit of digging, you find out the CdnClient throws a BadRequestException if the service is down. Because HttpClient throws the same exception and you were already handling it, you didn’t even realize CdnClient was also using the same checked exception to indicate an entirely different error!

You fix the error and put up a PR:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
@Inject
private final AuditKafkaClient auditKafkaClient;

@Inject
private final CdnClient cdnClient;

private int cdnDataAndAudit(String urlToCdn, int userId) {
  try {
    HttpClient client = new HttpClient.builder().someParams().build();
    int fileId;
    try {
      ByteBuffer buffer = client.get(urlToCdn);
      fileId = cdnClient.cdnByteBuffer(buffer);
    } catch (BadRequestException e) {
      throw new RetryException("Temporary failure, CdnService is down", e);
    }
    auditKafkaClient.send(userId, fileId);
    return fileId;
  } catch (IOException e) {
    throw new RetryException("Problem downloading the file", e);
  } catch (BadRequestException e) {
    throw new PermanentFailException("Given bad URL, can't download", e);
  }
}

Your code looks less nice than before, but at least you’re handling all errors. You deploy the updated code to prod and call it a day.

The next day

You come in the next day you come in and get an angry email from your boss. Customers were complaining again about about the service you just made. They say it was giving them unspecified errors and throwing 500’s.

Well that’s not good. You thought you handled all error gracefully. Perturbed, you investigate.

Apparently during the night, the Kafka team had some issues and their client was throwing RuntimeExceptions. You look at the logs further and see it’s more like:

1
new RuntimeException(new KafkaErrorException(new KafkaIsDownException()));

This is called Exception chaining, and is meant as to not expose the lower level workings of the Kafka pipeline to users. It also makes it convenient for callers to catch all types of exceptions Kafka may throw. Unfortunately it means you must go through a level of indirection to get the exception you want. You are using Guava though, so at least it’s less painful than it has to be.

You write up this PR:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
@Inject
private final AuditKafkaClient auditKafkaClient;

@Inject
private final CdnClient cdnClient;

private int cdnDataAndAudit(String urlToCdn, int userId) {
  try {
    HttpClient client = new HttpClient.builder().someParams().build();
    int fileId;
    try {
      ByteBuffer buffer = client.get(urlToCdn);
      fileId = cdnClient.cdnByteBuffer(buffer);
    } catch (BadRequestException e) {
      throw new RetryException("Temporary failure, CdnService is down", e);
    }
    auditKafkaClient.send(userId, fileId);
    return fileId;
  } catch (IOException e) {
    throw new RetryException("Problem downloading the file", e);
  } catch (BadRequestException e) {
    throw new PermanentFailException("Given bad URL, can't download", e);
  } catch (RuntimeException e) {
    Throwable t = Throwables.getRootCause(e);
    if (t instanceOf KafkaIsDownException) {
      throw new RetryException(
        "Unable to audit CdnService, retrying in a short bit",
        e
      );
    }

    LOGGER.error("Unhandled exception, rethrowing", e);
    throw e;
  }
}

Realizing there are more possible exceptions the Kafka client could throw, you scan through the client codebase and look for all possible exceptions. The day ends and you have code that looks like this:

Your final code

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
@Inject
private final AuditKafkaClient auditKafkaClient;

@Inject
private final CdnClient cdnClient;

private int cdnDataAndAudit(String urlToCdn, int userId) {
  try {
    HttpClient client = new HttpClient.builder().someParams().build();
    int fileId;
    try {
      ByteBuffer buffer = client.get(urlToCdn);
      fileId = cdnClient.cdnByteBuffer(buffer);
    } catch (BadRequestException e) {
      throw new RetryException("Temporary failure, CdnService is down", e);
    }
    auditKafkaClient.send(userId, fileId);
    return fileId;
  } catch (IOException e) {
    throw new RetryException("Problem downloading the file", e);
  } catch (BadRequestException e) {
    throw new PermanentFailException("Given bad URL, can't download", e);
  } catch (RuntimeException e) {
    Throwable t = Throwables.getRootCause(e);
    if (t instanceOf KafkaIsDownException) {
      throw new RetryException(
        "Unable to audit cdn service, retrying in a short bit",
        e
      );
    } else if (t instanceOf KafkaIsOverloadedException) {
      // impl hidden
    } else if (...) {
      // impl hidden
    }
    // etc.
    LOGGER.error("Unhandled exception, rethrowing", e);
    throw e;
  }
}

Satisfied you are handling all errors, you put the PR up for review, push it to prod, and call it a day.

The issues

As you can see, the troubles become obvious when working with real world codebases spread across a number of different folks. Issues hit along the way are:

  • Accidentally catching the same exception for two blocks of code you want to handle differently
  • Missing a whole class of unchecked exceptions
  • Turning a few lines of functionality into mostly error handling code that obfuscates the original work’s intent

Breaking down the 3 problems

Two methods using the same set of exceptions in different ways is not abnormal. To use checked exceptions properly, assuming you want to handle every error differently, this would be the proper solution:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
HttpClient client = new HttpClient.builder().someParams().build();
final ByteBuffer buffer;
try {
  buffer = client.get(urlToCdn)
} catch (IOException e) {
  // impl hidden
} catch (BadRequestException e) {
  // impl hidden
}

final int fileId;
try {
  fileId = cdnClient.cdnByteBuffer(buffer);
} catch (BadRequestException e) {
  // impl hidden
}

// etc...

With all variables declared outside the scope of the exception, and only the code that throws the exception inside the catch scope.

In practice I see this almost never because of how inconvenient it is. It is a very easy and common mistake to make. This is the only proper way to avoid it though. Keeping try catch scoped to exactly the lines of code they must catch exceptions around.


Missing a whole class of unchecked exceptions makes it incredibly difficult to properly handle all errors. In my copy of Effective Java (second edition), I see that unchecked exceptions should only be used for programming errors and checked exceptions used for recoverable errors. However, many folks despise checked exceptions and forgo these standards. For large codebases that span across decades, it is impossible to ensure these standards are held even when folks do agree.

Even in the Java standard library, if you use CompletableFutures, you have the option of CompletableFuture::join or CompletableFuture::get; one unchecked and one checked, depending on your preference.

Likewise, Effective Java recommends using @throws when working with unchecked exceptions, but this is not an enforceable contract and it is easy to forget. If you really want to handle all errors, you must look through the codebase of the client code you’re using.


And finally, the most egregious of the 3, there is way more error handling code than there is real code. This isn’t necessarily bad, and is almost expected, but the way it’s written, the actual functionality is almost obfuscated behind the error handling code.

In the final example, it takes real effort to see what the code is actually doing. This is a trivial example, but in a real codebase where these examples are much larger, it becomes harder and harder to see what the code is actually doing. This impedes the writer, reader, reviewer, and everyone else in-between when working with the code.

Alternative error handling strategies are better

This is a problem other folks have figured out. Not every language uses exceptions, and using their error types in Java is not impossible. Where I work, some folks share this opinion. We use 2 libraries for what we would argue is better error handling. Algebra and derive4j

These two libraries infuse the concepts of Algebraic Data Types and generalized pattern matching to Java, vastly improving the error strategy. We will mostly only be talking about the Algebra library in this blog post, but if you like the section on Algebra, I would strongly recommend derive4j as well.

ADTs

Algebraic data types (ADT’s) are types that can contain one of many types. Java’s Optional is an ADT that in functional programming terms would contain Some(v) (Optional::of) and None (Optional::empty). The Algebra library adds the common Result<V, E> ADT that contains either Ok(Value) or Error(e) and common methods for using them.

Here’s a quick example of how Result works. Say I have a function that returns a Result:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
enum Error {
  BROKEN,
  REALLY_BROKEN,
}

Result<String, Error> doWork() {
  if (success) {
    return Result.ok("It works!");
  } else if (thing1Broken) {
    return Result.err(Error.BROKEN);
  }

  return Result.err(Error.REALLY_BROKEN);
}

If I call this function and want to get data out of it, this is one way you could do it:

1
2
3
4
void someFunction() {
  Result<String, Error> workResult = doWork();
  doWork.unwrapErrOrElseThrow();
}

In the simplest strategy, you are explicitly required to observe each error even if you ignore it. You must knowingly ignore the error. This is already a step up over exceptions and it isn’t even the “right” way to use Results.1

The Algebra library provides us with a lot of conveniences, here are two more that will help us rewrite our example. Result::mapErr for changing error types, and Result::flatMapOk which is uses a standard monad concept of: if it’s ok, continue to the next step, otherwise stop and return error. Together these are very powerful.2

Adding in pattern matching for the full experience

Results are good, but using an enum wrapper isn’t enough for the full exception experience. In order to really bring out the rest, we must add pattern matching as well, which the derive4j library gives us.

Pattern matching lets us have a singular error class, with each of the error types containing a bit of data. Think Java enums, but the enum is not statically declared, and each enum in a set is allowed to store different data.

For instance, if you see this example:

1
2
3
4
5
6
7
8
@Data
public abstract class Error {
  interface Cases<T> {
      T retry(GenericError error);
      T permanentError(GenericError error);
  }
  public abstract <T> T match(Cases<T> cases);
}

What we have done is taken our exceptions from before and turned them into this wrapper class. When we would like to use them, we end up doing a Errors.retry(${SOME_ERROR}) and can match over them to get the data out later (see the second code block here for getting data out).

The rewrite

Say our example code were rewritten using the concept or Results, and every library I was using also used Results, how would this look?3

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
@Inject
private final AuditKafkaClient auditKafkaClient;

@Inject
private final CdnClient cdnClient;

enum TaskQueueError {
  Retry,
  FailForever,
}

@Data
public abstract class Error {
  interface Cases<T> {
      T retry(GenericError error);
      T permanentError(GenericError error);
  }
  public abstract <T> T match(Cases<T> cases);
}

private Result<Integer, TaskQueueError> cdnDataAndAudit(String urlToCdn, int userId) {
  HttpClient client = new HttpClient.builder().someParams().build();
  return client.get(urlToCdn)
    .mapErr(e -> {
      switch e {
        case DownloadFileError:
          return TaskQueueErrors.Retry(e);
        case BadUrlError:
          return TaskQueueErrors.FailForever(e);
      }
    })
    .flatMapOk(buf -> cdnClient.cdnByteBuffer(buf)
      .mapErr(e -> TaskQueueErrors.Retry(e))
    )
    .flapMapOk(fileId -> auditKafkaClient.send(fileId)
      .mapErr(e -> {
        switch e {
          case KafkaIsDownException:
            return TaskQueueErrors.Retry(e);
          // etc...
        }
      })
    );
}

The downside is that this code does not look like typical Java code, but Java has slowly been getting more functional since Java 8, and I don’t think this will look foreign to folks as it continues to get more functional.

On the flipside, this code has none of the faults of the previous code and is easier to read once you get an understanding of the Algebra library. Every error is explicitly handled, the way we need it to be.

And for folks more used to functional languages this way of error handling will be natural to them.

The end

Exceptions are fraught with the potential to make mistakes. Java is not alone here. However it is arguably one of the most common users of exceptions at corporate jobs. When working with exceptions, it’s important to know issues you may run into and easy mistakes to make along the way.

Even the Java standard library is aware of this strategy, and added Sealed classes as part of Java 15, a rough equivalent of algebraic data types and pattern matching.

If you’d like to get started early though, I can’t recommend Algebra and derive4j enough. The code is battle tested and powers sending millions of emails every day at HubSpot. Hopefully they too will help you write less error prone code.


  1. If you think this looks familiar to Go’s (type, err) tuple that is returned, you would be right. However, Go too fails to deliver the safety that is required in these situations, as you may go ahead and use the data from the result while ignoring the error. This is especially problematic if you pass the variable around. As the bad data is moved further away from where the error happened, the cause and the fix are both obfuscated. ↩︎

  2. Because this is a very rusty blog, I am obligated to say: this is the default method of handling errors in Rust, and the two equivalent methods in rust are Result::map_err and Result::and_then respectively. ↩︎

  3. It’s worth noting that even in the best of codebases, not everyone will be using Results and you will end up doing a hybrid approach often mapping exceptions to Results. I have found this hybrid approach is still vastly superior to just exceptions. ↩︎

Share on: