A guide to Regex Crate – 2

Qxf2 is exploring commonly used Rust crates and sharing our learning. We are focused on crates that testers will end up using quite often. We have tried our best to include illustrative examples in order to help testers understand how to use these crates. This blog is continuation to our previous blog on Regex Crate Modules. In this blog we are covering the below modules with examples:

  • Regex::is_match
  • Regex::captures
  • Regex::captures_iter
  • Regex::captures_len

Regex::is_match:

The is_match() method in Rust allows you to check whether a given regular expression pattern matches any part of a text string. The method returns a boolean value, true if there is at least one match, and false otherwise.

For example, if the pattern is r”(\d+)” and the text is “The answer is 42”, then is_match() will return true, because the text has digits.

The following example use is_match() to extract mobile number from the provided string.

/*
is_match: Checks if the regex pattern matches the entire text
The below patter checks for the mobile number validation using is_match.
*/
use regex::Regex;
use std::io;
 
fn prompt_user_input(prompt: &str) -> String {
    println!("{}", prompt);
    let mut input = String::new();
    match io::stdin().read_line(&mut input) {
        Ok(_) => input.trim().to_string(),
        Err(error) => {
            eprintln!("Failed to read line: {}", error);
            std::process::exit(1);
        }
    }
}
 
fn mobile_number_validation(s: String) -> Result<String, regex::Error> {
    let pattern = r"\+91[-\s]?\d{10}"; // Pattern to match mobile number
    match Regex::new(pattern) {
        Ok(re) => {
            if re.is_match(&s) {
                Ok(("Valid Mobile Number").to_string())
            } else {
                Ok(("Invalid Mobile Number").to_string())
            }
        }
        Err(err) => Err(err),
    }
}
 
fn main() {
    let mobile_number = prompt_user_input("Please Input Mobile Number (like +91-XXXXXXXXXX):");
    println!("Mobile Number : {}", mobile_number);
 
    match mobile_number_validation(mobile_number) {
        Ok(re) => { 
            println!("{}", re);
            //println!("The Mobile Number: {}", mobile_number);
        }
        Err(error) => {
            eprintln!("Failed to read line: {}", error);
        }
    }
}
Output of the program:


Regex::captures:

The captures() module is a method of the Regex type that returns a Captures object. A Captures object contains information about the match of a regex in a string, such as the start and end positions of the match and each capture group. You can use the captures() module to extract parts of a string that match a regex pattern.

For example, if you have a regex like r”(\d+)-(\w+)” and a string like “123-abc”, you can use the captures() module to get the numbers and the letters separately. The Captures object will have three elements: the whole match “123-abc”, the first capture group “123”, and the second capture group “abc”.

The following example uses captures() module to extract the valid date, month, and year from the given date string.

/* 
captures: Capture returns an Option containing capture groups if the pattern matches the date.
The below regex pattern captures the date in dd/mm/yyyy.
*/
 
use regex::Regex;
use std::error::Error;
 
// Leap year validation function.
fn is_leap_year(year: u32) -> bool {
    (year % 4 == 0 && year % 100 != 0) || (year % 400 == 0)
}
 
fn validate_date(date: &str) -> Result<String, Box<dyn Error>> {
    // Declaring the pattern and validating the pattern
    let pattern = r"(\d{1,2})/(\d{1,2})/(\d{4})"; // Pattern to match dates
    let re = match Regex::new(pattern) {
        Ok(re) => re,
        Err(_) => return Err("Invalid regex pattern".into()),
    };
 
    if let Some(captures) = re.captures(date) {
        if let (Some(day), Some(month), Some(year)) = (
            captures.get(1).and_then(|m| m.as_str().parse::<u32>().ok()),
            captures.get(2).and_then(|m| m.as_str().parse::<u32>().ok()),
            captures.get(3).and_then(|m| m.as_str().parse::<u32>().ok()),
        ) {
            let days_in_month = match month {
                1 | 3 | 5 | 7 | 8 | 10 | 12 => 31,
                4 | 6 | 9 | 11 => 30,
                2 if is_leap_year(year) => 29,
                2 => 28,
                _ => {
                    return Err("Invalid month".into());
                }
            };
 
            if day > 0 && day <= days_in_month {
                return Ok(format!("Day: {}, Month: {}, Year: {}", day, month, year));
            } else {
                return Err("Invalid day for the given month and year".into());
            }
        }
    }
 
    Err("Invalid date format, provide correct input".into())
}
 
fn main() {
    // Change the date to check different date validations. Valid pattern dd/mm/yyyy or d/m/yyyy
    let date = "28/15/2025";
    match validate_date(date) {
        Ok(validated_date) => println!("{}", validated_date),
        Err(err) => println!("Error: {} \nDate: {}", err,date),
    }
}
Output of the program:

capture_regex


Regex::captures_iter:

The captures_iter() module is a method of the Regex type that returns an iterator over all the Captures objects in a string. An iterator is a way of looping over a collection of items one by one. A Captures object contains information about the match of a regex in a string, such as the start and end positions of the match and each capture group. You can use the captures_iter() module to extract parts of a string that match a regex pattern multiple times.

For example, if you have a regex like r”(\d+)-(\w+)” and a string like “123-abc 456-def 789-ghi”, you can use the captures_iter() module to get the numbers and the letters separately for each match. The iterator will have three elements, each one a Captures object. The first Captures object will have three elements: the whole match “123-abc”, the first capture group “123”, and the second capture group “abc”. The second Captures object will have three elements: the whole match “456-def”, the first capture group “456”, and the second capture group “def”. The third Captures object will have three elements: the whole match “789-ghi”, the first capture group “789”, and the second capture group “ghi”.

The following example uses captures_iter() module to extract the IP address from the provided vector and validates.

/*
captures_iter: allows you to iterate over multiple captures of a 
regular expression pattern in a text.
The below example captures the valid IP address from the provided 
text and prints it.
*/
 
use regex::{Error, Regex};
 
fn validate_ips(texts: Vec<&str>) -> Result<Vec<String>, Error> {
    let mut valid_ips = Vec::new();
    let pattern = match Regex::new(r"\b(?:\d{1,3}\.){0,9}\d{1,3}\b") {
        Ok(pattern) => pattern,
        Err(err) => return Err(err),
    };
 
    let mut ip_found = false; // Flag to check if any IP is found in the texts
 
    for text in texts {
        let text_has_valid_ip = process_text(&mut valid_ips, &pattern, text);
        if !text_has_valid_ip {
            println!("No valid IP address found in the text: '{}'\n", text);
        } else {
            ip_found = true;
        }
    }
 
    if !ip_found {
        println!("No valid IP addresses found in the texts vector.");
    }
 
    Ok(valid_ips)
}
 
fn process_text(valid_ips: &mut Vec<String>, pattern: &Regex, text: &str) -> bool {
    let mut text_has_valid_ip = false; // Flag to check if any valid IP is found in the current text
    for captures in pattern.captures_iter(text) {
        if let Some(ip) = captures.get(0).map(|m| m.as_str()) {
            if is_valid_ip(ip) {
                if !valid_ips.contains(&ip.to_string()) {
                    valid_ips.push(ip.to_string());
                }
                text_has_valid_ip = true;
            }
        }
    }
    text_has_valid_ip
}
 
fn is_valid_ip(ip: &str) -> bool {
    let octets: Vec<&str> = ip.split('.').collect();
    if octets.len() != 4 {
        println!("Invalid IP: {} (Wrong number of octets)", ip);
        return false;
    }
 
    for octet in &octets {
        match octet.parse::<u8>() {
            Ok(num) => {
                // Your code when parsing succeeds
                {}
            },
            Err(_) => {
                // Your code when parsing fails
                println!("Invalid IP: {} (Octet value out of range: {})", ip, octet);
                return false;
            }
        }
    }
    true
}
 
fn main() {
    let texts = vec![
        "192.168.0.1 is the router's IP address.",
        "The server's IP is 10.0.0.11.12", // No valid IP in this text
        "300.200.100.1",
        "Another valid IP is 172.16.254.1",
        "10.11",
        "11",
        "ab.bc.da.xy",
        "190.350.10.11",
        "The IP: 10.15.20.21",
    ];
 
    match validate_ips(texts) {
        Ok(valid_ips) => {
            if valid_ips.is_empty() {
                println!("No valid IPs found");
            } else {
                println!("\nValid IPs:");
                for ip in valid_ips {
                    println!("{}", ip);
                }
            }
        }
        Err(err) => {
            println!("Regex error: {}", err);
        }
    }
}
Output of the program:

capture_iter_regex


Regex::captures_len:

The captures_len() module is a method of the Regex type that returns the number of capture groups in a regex. A capture group is a part of a regex pattern that can be extracted from a match. Capture groups are usually marked by parentheses in the regex syntax.

For example, the regex r”(\d+)-(\w+)” has two capture groups: one for the digits and one for the letters.

The captures_len() module can be useful to check how many capture groups a regex has before using other methods like captures() or captures_iter(). It can also be used to iterate over all the capture groups in a match. The captures_len() module always returns at least one, because the whole regex is considered as a capture group.

The following example uses captures_len() along with captures_iter() to extract Timestamp,Log Level and message from the given log entry.

/* 
captures_len: returns the number of capturing groups in a regular expression pattern. 
The below program uses capture len to get the count to verify the structure of 
captured data and ensure it matches the expected format.
 */
 
use regex::Regex;
 
fn capture_info_from_log_entry(log_entries: Vec<&str>) -> Vec<Result<(String, String, String), String>> {
    // Define a regex pattern to capture timestamp, log level, and message
    let pattern = r"\[(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\] \[([A-Z,a-z]+)\] (.+)";
    let re = match Regex::new(pattern) {
        Ok(re) => re,
        Err(_) => return vec![Err("Invalid regex pattern".to_string()); log_entries.len()],
    };
 
    log_entries
        .iter()
        .map(|log_entry| {
            // Count the number of capture groups
            let captures_len = re.captures_len();
            match captures_len {
                4 => {
                    if let Some(captures) = re.captures(log_entry) {
                        let timestamp = captures.get(1).map(|m| m.as_str().to_string());
                        let log_level = captures.get(2).map(|m| m.as_str().to_string());
                        let message = captures.get(3).map(|m| m.as_str().to_string());
 
                        match (timestamp, log_level, message) {
                            (Some(timestamp), Some(log_level), Some(message)) => {
                                Ok((timestamp, log_level, message))
                            },
                            _ => Err("No match found".to_string())
                        }
                    } else {
                        Err("Expected 3 capture groups".to_string())
                    }
                },
                _ => Err("Invalid regex pattern. Expected 3 capture groups.".to_string()),
            }
        })
        .collect()
}
 
fn main() {
    // Sample log entries
    let log_entries = vec![
        "[2023-10-31 15:23:45] [ERROR] *370 connect() failed (111: Unknown error) while connecting to upstream, client: 135.125.246.189, server: _, request: \"GET /.env HTTP/1.1\", upstream: \"http://127.0.0.1:5000/.env\", host: \"18.118.196.200\"",
        "[2023-10-31 15:30:00] [INFO] Application started successfully",
        "[2023-10-31 15:40:22] [WARNING] Unrecognized log entry format",
        "[2023-01-01 00:00:00] [info] ",
        " [error] some text",
        " hi"
        // Add more log entries for testing
    ];
 
    let results = capture_info_from_log_entry(log_entries);
 
    for (index, result) in results.iter().enumerate() {
        match result {
            Ok((timestamp, log_level, message)) => {
                println!("Log Entry {}: ", index + 1);
                println!("Timestamp: {}", timestamp);
                println!("Log Level: {}", log_level);
                println!("Message: {}", message);
                println!();
            },
            Err(err) => {
                println!("Error in Log Entry {}: {}", index + 1, err);
                println!();
            }
        }
    }
}
Output of the program:

capture_len_iter


In this blog we tried to cover few more Regex Crate modules like is_match(), capture(), capture_iter(), and capture_len() for pattern matching and extraction with examples. These methods enable efficient text processing, allowing validation, extraction, and manipulation of data from strings.


Hire technical testers from Qxf2

Qxf2 is the home for technical testers. We employ experienced testers with a technical bent of mind. Our testers are naturally inclined towards learning new things and take the time to share their learnings on this blog. Additionally, as a company we invest in the practical development of all our employees. For example, since 2023, we have tried to get everyone to learn and use Rust. This post is an outcome of a couple of our testers having used Rust’s regex crate in their daily activities. If you want to work with technical test engineers in your project, please get in touch with us.


Leave a Reply

Your email address will not be published. Required fields are marked *