Error handling with Rust using delta-rs as an example

In this blog, I will explore the various ways to do handle and/or propagate errors with Rust. Qxf2 is transitioning to using Rust as our primary programming language. The Rust documentation is great and has a nice section on error handling and propagation. However, everytime we tried to implement some of those ideas in our day to day work, we frequently tripped up. So here is a practical walkthrough about five common ways in which to handle and/or propagate errors. If you are like us, you can use this post to reinforce the concepts learnt in the book.


I recently worked on implementing Delta Lake architecture for our Qxf2 Trello data where I created Delta Tables to store and analyze the data. As we started moving to Rust, I wanted to try a couple of Delta Table functions using Rust. I came across delta-rs, native Delta Lake implementation in Rust. While working on it and especially while trying to perform error handling, I realized I didn’t quite understand which technique works where best. I decided to try the different techniques to using a single function to get a better hang of it. In this blog, I will show what I tried. The function I will be using is open_table, which will access a Delta Table and use it to print some info about the Delta table.


Pre-requisites

I already have got a Delta Lake table (currently empty). If you want to play along, use the create table function I got here which will create an empty Delta Lake table. And then follow the rest of this post.

I created a basic Rust project and under src created lib.rs and a file named error_handling_example.rs where I will be working on my function. I also created a folder called data_tables which has my Delta Lake table called gold_table.

In Cargo.toml, add the deltalake dependency. At the time of writing, I did:
deltalake = { version = "0.7.0", features = ["datafusion-ext"] }


Error handling techniques

Let us look at the various ways we can do error handling in Rust using the open_table function of delta-rs. The code snippets presented in this blog is available here.

1. unwrap – panics on error (don’t do this!)

To start with, I will be using the function as described here – deltalake crate.

I am going to put it out in a function (you could always just use the snippet inside the function)

pub async fn open_delta_table(path: &str) {
    let table = deltalake::open_table(path).await.unwrap();
    println!("The table info is: {}",table);
}

And in the main.rs, I have defined the path of my Delta table and called the open_delta_table function I wrote earlier.

use deltalake::example;
 
#[tokio::main]
async fn main() {
    let table_path = "data_tables/gold_table";
    let table = error_handling_example::open_delta_table(table_path).await;
}

When I run this, I get:

The table info is: DeltaTable(deltalake_trello/data_tables/gold_table)
        version: 0
        metadata: GUID=25ed9d5b-0120-4b9e-82c5-77c024e212ed, name=None, description=None, partitionColumns=[], createdTime=Some(1676443662054), configuration={}
        min_version: read=1, write=1
        files count: 0

All is good since the table exists at the provided path.

But what happens if the path of the table is incorrect or there is a wrong table name?

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ObjectStore { source: Generic { store: "DeltaObjectStore", source: Os { code: 2, kind: NotFound, message: "No such file or directory" } } }',

We see that when encountered with an error, unwrap will throw an error message and panic. There might be some legitimate cases where we can do this (use panic!), but this is just ignoring the error!

As beginners, we recommend stay away from unwrap(). Why bother using Rust if the code panics easily? The Rust book recommends using unwrap() only in examples to make code more readable. Unfortunately, a lot of beginners see unwrap() in examples online and blindly copy them.

So, if not unwrap(), what other options do we have?

2. expect – convey a custom message on error and panic

A slight improvement over unwrap() is expect(). If we do want to terminate our program, we might as well provide a meaningful message before we do that. We can do that using expect.

pub async fn open_delta_table(path: &str) {
    let table = deltalake::open_table(path)
                .await
                .expect("Sorry there, but we really need that Delta table before you go ahead");
    println!("The table info is: {}",table);
}

It is the same as unwrap, but we get the custom message that we have defined.

thread 'main' panicked at 'Sorry there, but we really need that Delta table before you go ahead: NotATable("No snapshot or version 0 found, perhaps /home/deltalake_trello/data_tables/non_existing_gold_table/ is an empty dir?")'

The use of expect or unwrap like above is discouraged. Instead, we should explicitly handle the error.

3. match – use pattern matching with error propagation

To express the possibility of error in a better way, we can use Result enum which represents two possible outcomes in a computation. The Ok variant represents the type of value that will be returned in a success case and Err for the failure case.

So, instead of letting the program terminate(as we saw with unwrap), we will try to handle the error in our function and let the caller decide what to do. This is the technique of propagation of errors (to the called function) which is a commonly used practice.

In our example, we will make the open_table function tell us if it succeeded and return a handle for Delta table or provide some error information in case of failure.

pub async fn open_delta_table(path: &str) -> Result<DeltaTable, DeltaTableError> {
    let table_open_result = deltalake::open_table(path).await;
 
    let delta_table = match table_open_result {
        Ok(table) => Ok(table),
        Err(error) => Err(error),
    };
    delta_table
}

Here, I have stored the result of the function in a variable called ‘table_open_result’. And then using ‘match’ expression perform a match on the outcome of the function. When run, the function returns delta_table value which will be either DeltaTable wrapped in Result::Ok or Err(error) wrapped in Result::Err.

In the main(which is the calling function), we can take some action based on the open_delta_table function outcome.

#[tokio::main]
async fn main() {
    let table_path = "data_tables/non_existing_gold_table";
    let table = error_handling_example::open_delta_table(table_path).await;
    match table {
        Ok(table) => println!("The table version is: {}", table.version()),
        Err(error) => println!("Oh uh, couldn't get to that table!: {}", error)
    }
}

Now, on a negative run, the output will be:

Oh uh, couldn't get to that table: No snapshot or version 0 found, perhaps deltalake_trello/data_tables/non_existing_gold_table/ is an empty dir?

Here, I am printing an error message. But we can take a further action like creating a delta table instead. I have shown that in another technique below.

4. ? – propagate error

As beginners, we prefer the match syntax. But it is a bit too verbose. There’s a shorter way to propagate the error, that is especially useful when there are nested match conditions. One way of doing that is to use ?
Like above, the value inside Ok gets returned when successful and on error, the Err will be propagated to the calling code.

Our function will now become:

pub async fn open_delta_table(path: &str) -> Result<DeltaTable, DeltaTableError> {
    let delta_table = deltalake::open_table(path).await?;
    Ok(delta_table)
}

Note that the ? operator can only be used for functions that return a Result or Option type. But coming from Python, a language that loves defaults, there is yet another way you can handle errors.

5. unwrap_or_else – use a default value in case of error

Another way to avoid using nested or multiple match expressions is variations of unwrap. One of them is unwrap_or_else which fallbacks or expects a default value in case of an error.

Applying to our example, we would be needing a Delta table. So, even when encountered with an error, we would need to return a Delta table. Accordingly, in the error block we will be calling a function that can create a Delta table. To do that, I am using an existing create_table function that I have for creating an empty Delta table with 2 columns. The code for that is here.

pub async fn open_delta_table(path: &str)  {
    let table_open_result = deltalake::open_table(path).await;
 
    let delta_table = table_open_result.unwrap_or_else(|error|  {
        println!("uh oh, could not find that table. Never mind, let's create one!");
        let empty_delta_table = executor::block_on(create_table(path));
        match empty_delta_table {
            Ok(val) => val,
            Err(error) => panic!("no way can go ahead now :["),
        }
    });
    println!("The table info is: {}", delta_table);
}

So, now in the possibility of an error, a new Delta table will be created.

uh oh, could not find that table. Never mind, let's create one!
The table info is: DeltaTable(deltalake_trello/data_tables/default_gold_table/)
        version: 0
        metadata: GUID=eeaaf9aa-ed6b-4054-a844-032edf749673, name=None, description=None, partitionColumns=[], createdTime=Some(1676631354014), configuration={}
        min_version: read=1, write=1
        files count: 0

There are several other ways to handle errors. Like Boxing errors (useful when we expect different types of errors), implement custom errors, etc. In this blog, I have explored a few I found would fit my usecase. Hope you find them useful too.


Hire Qxf2!

This post was written by a tester. As of 2023, how many software testing firms do you know who work with Rust? If you are a company working with Rust and looking for technically inclined testers, reach out to Qxf2. You can fill out a short and simple form to get in touch with us.


Leave a Reply

Your email address will not be published. Required fields are marked *