{"id":21264,"date":"2024-03-07T23:26:41","date_gmt":"2024-03-08T04:26:41","guid":{"rendered":"https:\/\/qxf2.com\/blog\/?p=21264"},"modified":"2024-03-07T23:26:41","modified_gmt":"2024-03-08T04:26:41","slug":"a-guide-to-regex-crate-2","status":"publish","type":"post","link":"https:\/\/qxf2.com\/blog\/a-guide-to-regex-crate-2\/","title":{"rendered":"A guide to Regex Crate &#8211; 2"},"content":{"rendered":"<p><a href=\"https:\/\/qxf2.com\/?utm_source=rust_regex_crate_2&#038;utm_medium=click&#038;utm_campaign=From%20blog\" rel=\"noopener\" target=\"_blank\">Qxf2<\/a> is exploring commonly used Rust crates and sharing our learning. We are focused on crates that testers will end up using quite often. We have tried our best to include illustrative examples in order to help testers understand how to use these crates. This blog is continuation to our previous blog on <a href=\"https:\/\/qxf2.com\/blog\/rust-regex-crate\" rel=\"noopener\" target=\"_blank\">Regex Crate<\/a> Modules. In this blog we are covering the below modules with examples:<\/p>\n<ul>\n<li>Regex::is_match<\/li>\n<li>Regex::captures<\/li>\n<li>Regex::captures_iter<\/li>\n<li>Regex::captures_len<\/li>\n<\/ul>\n<hr>\n<h4>Regex::is_match:<\/h4>\n<p>The is_match() method in Rust allows you to check whether a given regular expression pattern matches any part of a text string. The method returns a boolean value, true if there is at least one match, and false otherwise.<\/p>\n<p>For example, if the pattern is r&#8221;(\\d+)&#8221; and the text is &#8220;The answer is 42&#8221;, then is_match() will return true, because the text has digits.<\/p>\n<p>The following example use is_match() to extract mobile number from the provided string.<\/p>\n<pre lang=\"rust\">\r\n\/*\r\nis_match: Checks if the regex pattern matches the entire text\r\nThe below patter checks for the mobile number validation using is_match.\r\n*\/\r\nuse regex::Regex;\r\nuse std::io;\r\n\r\nfn prompt_user_input(prompt: &str) -> String {\r\n    println!(\"{}\", prompt);\r\n    let mut input = String::new();\r\n    match io::stdin().read_line(&mut input) {\r\n        Ok(_) => input.trim().to_string(),\r\n        Err(error) => {\r\n            eprintln!(\"Failed to read line: {}\", error);\r\n            std::process::exit(1);\r\n        }\r\n    }\r\n}\r\n\r\nfn mobile_number_validation(s: String) -> Result<String, regex::Error> {\r\n    let pattern = r\"\\+91[-\\s]?\\d{10}\"; \/\/ Pattern to match mobile number\r\n    match Regex::new(pattern) {\r\n        Ok(re) => {\r\n            if re.is_match(&s) {\r\n                Ok((\"Valid Mobile Number\").to_string())\r\n            } else {\r\n                Ok((\"Invalid Mobile Number\").to_string())\r\n            }\r\n        }\r\n        Err(err) => Err(err),\r\n    }\r\n}\r\n\r\nfn main() {\r\n    let mobile_number = prompt_user_input(\"Please Input Mobile Number (like +91-XXXXXXXXXX):\");\r\n    println!(\"Mobile Number : {}\", mobile_number);\r\n\r\n    match mobile_number_validation(mobile_number) {\r\n        Ok(re) => { \r\n            println!(\"{}\", re);\r\n            \/\/println!(\"The Mobile Number: {}\", mobile_number);\r\n        }\r\n        Err(error) => {\r\n            eprintln!(\"Failed to read line: {}\", error);\r\n        }\r\n    }\r\n}\r\n<\/pre>\n<h6>Output of the program:<\/h6>\n<p><img decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2024\/02\/is_match_regex-1.jpg\" \/><\/p>\n<hr>\n<h4>Regex::captures:<\/h4>\n<p>The captures() module is a method of the Regex type that returns a Captures object. A Captures object contains information about the match of a regex in a string, such as the start and end positions of the match and each capture group. You can use the captures() module to extract parts of a string that match a regex pattern.<\/p>\n<p>For example, if you have a regex like r&#8221;(\\d+)-(\\w+)&#8221; and a string like &#8220;123-abc&#8221;, you can use the captures() module to get the numbers and the letters separately. The Captures object will have three elements: the whole match &#8220;123-abc&#8221;, the first capture group &#8220;123&#8221;, and the second capture group &#8220;abc&#8221;.<\/p>\n<p>The following example uses captures() module to extract the valid date, month, and year from the given date string. <\/p>\n<pre lang=\"rust\">\r\n\/* \r\ncaptures: Capture returns an Option containing capture groups if the pattern matches the date.\r\nThe below regex pattern captures the date in dd\/mm\/yyyy.\r\n*\/\r\n\r\nuse regex::Regex;\r\nuse std::error::Error;\r\n\r\n\/\/ Leap year validation function.\r\nfn is_leap_year(year: u32) -> bool {\r\n    (year % 4 == 0 && year % 100 != 0) || (year % 400 == 0)\r\n}\r\n\r\nfn validate_date(date: &str) -> Result<String, Box<dyn Error>> {\r\n    \/\/ Declaring the pattern and validating the pattern\r\n    let pattern = r\"(\\d{1,2})\/(\\d{1,2})\/(\\d{4})\"; \/\/ Pattern to match dates\r\n    let re = match Regex::new(pattern) {\r\n        Ok(re) => re,\r\n        Err(_) => return Err(\"Invalid regex pattern\".into()),\r\n    };\r\n\r\n    if let Some(captures) = re.captures(date) {\r\n        if let (Some(day), Some(month), Some(year)) = (\r\n            captures.get(1).and_then(|m| m.as_str().parse::<u32>().ok()),\r\n            captures.get(2).and_then(|m| m.as_str().parse::<u32>().ok()),\r\n            captures.get(3).and_then(|m| m.as_str().parse::<u32>().ok()),\r\n        ) {\r\n            let days_in_month = match month {\r\n                1 | 3 | 5 | 7 | 8 | 10 | 12 => 31,\r\n                4 | 6 | 9 | 11 => 30,\r\n                2 if is_leap_year(year) => 29,\r\n                2 => 28,\r\n                _ => {\r\n                    return Err(\"Invalid month\".into());\r\n                }\r\n            };\r\n\r\n            if day > 0 && day <= days_in_month {\r\n                return Ok(format!(\"Day: {}, Month: {}, Year: {}\", day, month, year));\r\n            } else {\r\n                return Err(\"Invalid day for the given month and year\".into());\r\n            }\r\n        }\r\n    }\r\n\r\n    Err(\"Invalid date format, provide correct input\".into())\r\n}\r\n\r\nfn main() {\r\n    \/\/ Change the date to check different date validations. Valid pattern dd\/mm\/yyyy or d\/m\/yyyy\r\n    let date = \"28\/15\/2025\";\r\n    match validate_date(date) {\r\n        Ok(validated_date) => println!(\"{}\", validated_date),\r\n        Err(err) => println!(\"Error: {} \\nDate: {}\", err,date),\r\n    }\r\n}\r\n<\/pre>\n<h6>Output of the program:<\/h6>\n<p><img decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2024\/02\/capture_regex-1.jpg\" alt=\"capture_regex\" \/><\/p>\n<hr>\n<h4>Regex::captures_iter:<\/h4>\n<p>The captures_iter() module is a method of the Regex type that returns an iterator over all the Captures objects in a string. An iterator is a way of looping over a collection of items one by one. A Captures object contains information about the match of a regex in a string, such as the start and end positions of the match and each capture group. You can use the captures_iter() module to extract parts of a string that match a regex pattern multiple times.<\/p>\n<p>For example, if you have a regex like r&#8221;(\\d+)-(\\w+)&#8221; and a string like &#8220;123-abc 456-def 789-ghi&#8221;, you can use the captures_iter() module to get the numbers and the letters separately for each match. The iterator will have three elements, each one a Captures object. The first Captures object will have three elements: the whole match &#8220;123-abc&#8221;, the first capture group &#8220;123&#8221;, and the second capture group &#8220;abc&#8221;. The second Captures object will have three elements: the whole match &#8220;456-def&#8221;, the first capture group &#8220;456&#8221;, and the second capture group &#8220;def&#8221;. The third Captures object will have three elements: the whole match &#8220;789-ghi&#8221;, the first capture group &#8220;789&#8221;, and the second capture group &#8220;ghi&#8221;.<\/p>\n<p>The following example uses captures_iter() module to extract the IP address from the provided vector and validates.<\/p>\n<pre lang=\"rust\">\r\n\/*\r\ncaptures_iter: allows you to iterate over multiple captures of a \r\nregular expression pattern in a text.\r\nThe below example captures the valid IP address from the provided \r\ntext and prints it.\r\n*\/\r\n\r\nuse regex::{Error, Regex};\r\n\r\nfn validate_ips(texts: Vec<&#038;str>) -> Result<Vec<String>, Error> {\r\n    let mut valid_ips = Vec::new();\r\n    let pattern = match Regex::new(r\"\\b(?:\\d{1,3}\\.){0,9}\\d{1,3}\\b\") {\r\n        Ok(pattern) => pattern,\r\n        Err(err) => return Err(err),\r\n    };\r\n\r\n    let mut ip_found = false; \/\/ Flag to check if any IP is found in the texts\r\n\r\n    for text in texts {\r\n        let text_has_valid_ip = process_text(&mut valid_ips, &pattern, text);\r\n        if !text_has_valid_ip {\r\n            println!(\"No valid IP address found in the text: '{}'\\n\", text);\r\n        } else {\r\n            ip_found = true;\r\n        }\r\n    }\r\n\r\n    if !ip_found {\r\n        println!(\"No valid IP addresses found in the texts vector.\");\r\n    }\r\n\r\n    Ok(valid_ips)\r\n}\r\n\r\nfn process_text(valid_ips: &mut Vec<String>, pattern: &Regex, text: &str) -> bool {\r\n    let mut text_has_valid_ip = false; \/\/ Flag to check if any valid IP is found in the current text\r\n    for captures in pattern.captures_iter(text) {\r\n        if let Some(ip) = captures.get(0).map(|m| m.as_str()) {\r\n            if is_valid_ip(ip) {\r\n                if !valid_ips.contains(&ip.to_string()) {\r\n                    valid_ips.push(ip.to_string());\r\n                }\r\n                text_has_valid_ip = true;\r\n            }\r\n        }\r\n    }\r\n    text_has_valid_ip\r\n}\r\n\r\nfn is_valid_ip(ip: &str) -> bool {\r\n    let octets: Vec<&#038;str> = ip.split('.').collect();\r\n    if octets.len() != 4 {\r\n        println!(\"Invalid IP: {} (Wrong number of octets)\", ip);\r\n        return false;\r\n    }\r\n\r\n    for octet in &octets {\r\n        match octet.parse::<u8>() {\r\n            Ok(num) => {\r\n                \/\/ Your code when parsing succeeds\r\n                {}\r\n            },\r\n            Err(_) => {\r\n                \/\/ Your code when parsing fails\r\n                println!(\"Invalid IP: {} (Octet value out of range: {})\", ip, octet);\r\n                return false;\r\n            }\r\n        }\r\n    }\r\n    true\r\n}\r\n\r\nfn main() {\r\n    let texts = vec![\r\n        \"192.168.0.1 is the router's IP address.\",\r\n        \"The server's IP is 10.0.0.11.12\", \/\/ No valid IP in this text\r\n        \"300.200.100.1\",\r\n        \"Another valid IP is 172.16.254.1\",\r\n        \"10.11\",\r\n        \"11\",\r\n        \"ab.bc.da.xy\",\r\n        \"190.350.10.11\",\r\n        \"The IP: 10.15.20.21\",\r\n    ];\r\n\r\n    match validate_ips(texts) {\r\n        Ok(valid_ips) => {\r\n            if valid_ips.is_empty() {\r\n                println!(\"No valid IPs found\");\r\n            } else {\r\n                println!(\"\\nValid IPs:\");\r\n                for ip in valid_ips {\r\n                    println!(\"{}\", ip);\r\n                }\r\n            }\r\n        }\r\n        Err(err) => {\r\n            println!(\"Regex error: {}\", err);\r\n        }\r\n    }\r\n}\r\n<\/pre>\n<h6>Output of the program:<\/h6>\n<p><img decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2024\/02\/regex_capture_iter.jpg\" alt=\"capture_iter_regex\" \/><\/p>\n<hr>\n<h4>Regex::captures_len:<\/h4>\n<p>The captures_len() module is a method of the Regex type that returns the number of capture groups in a regex. A capture group is a part of a regex pattern that can be extracted from a match. Capture groups are usually marked by parentheses in the regex syntax. <\/p>\n<p>For example, the regex r&#8221;(\\d+)-(\\w+)&#8221; has two capture groups: one for the digits and one for the letters. <\/p>\n<p>The captures_len() module can be useful to check how many capture groups a regex has before using other methods like captures() or captures_iter(). It can also be used to iterate over all the capture groups in a match. The captures_len() module always returns at least one, because the whole regex is considered as a capture group.<\/p>\n<p>The following example uses captures_len() along with captures_iter() to extract Timestamp,Log Level and message from the given log entry.<\/p>\n<pre lang=\"rust\">\r\n\/* \r\ncaptures_len: returns the number of capturing groups in a regular expression pattern. \r\nThe below program uses capture len to get the count to verify the structure of \r\ncaptured data and ensure it matches the expected format.\r\n *\/\r\n\r\nuse regex::Regex;\r\n\r\nfn capture_info_from_log_entry(log_entries: Vec<&#038;str>) -> Vec<Result<(String, String, String), String>> {\r\n    \/\/ Define a regex pattern to capture timestamp, log level, and message\r\n    let pattern = r\"\\[(\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2})\\] \\[([A-Z,a-z]+)\\] (.+)\";\r\n    let re = match Regex::new(pattern) {\r\n        Ok(re) => re,\r\n        Err(_) => return vec![Err(\"Invalid regex pattern\".to_string()); log_entries.len()],\r\n    };\r\n\r\n    log_entries\r\n        .iter()\r\n        .map(|log_entry| {\r\n            \/\/ Count the number of capture groups\r\n            let captures_len = re.captures_len();\r\n            match captures_len {\r\n                4 => {\r\n                    if let Some(captures) = re.captures(log_entry) {\r\n                        let timestamp = captures.get(1).map(|m| m.as_str().to_string());\r\n                        let log_level = captures.get(2).map(|m| m.as_str().to_string());\r\n                        let message = captures.get(3).map(|m| m.as_str().to_string());\r\n        \r\n                        match (timestamp, log_level, message) {\r\n                            (Some(timestamp), Some(log_level), Some(message)) => {\r\n                                Ok((timestamp, log_level, message))\r\n                            },\r\n                            _ => Err(\"No match found\".to_string())\r\n                        }\r\n                    } else {\r\n                        Err(\"Expected 3 capture groups\".to_string())\r\n                    }\r\n                },\r\n                _ => Err(\"Invalid regex pattern. Expected 3 capture groups.\".to_string()),\r\n            }\r\n        })\r\n        .collect()\r\n}\r\n\r\nfn main() {\r\n    \/\/ Sample log entries\r\n    let log_entries = vec![\r\n        \"[2023-10-31 15:23:45] [ERROR] *370 connect() failed (111: Unknown error) while connecting to upstream, client: 135.125.246.189, server: _, request: \\\"GET \/.env HTTP\/1.1\\\", upstream: \\\"http:\/\/127.0.0.1:5000\/.env\\\", host: \\\"18.118.196.200\\\"\",\r\n        \"[2023-10-31 15:30:00] [INFO] Application started successfully\",\r\n        \"[2023-10-31 15:40:22] [WARNING] Unrecognized log entry format\",\r\n        \"[2023-01-01 00:00:00] [info] \",\r\n        \" [error] some text\",\r\n        \" hi\"\r\n        \/\/ Add more log entries for testing\r\n    ];\r\n\r\n    let results = capture_info_from_log_entry(log_entries);\r\n\r\n    for (index, result) in results.iter().enumerate() {\r\n        match result {\r\n            Ok((timestamp, log_level, message)) => {\r\n                println!(\"Log Entry {}: \", index + 1);\r\n                println!(\"Timestamp: {}\", timestamp);\r\n                println!(\"Log Level: {}\", log_level);\r\n                println!(\"Message: {}\", message);\r\n                println!();\r\n            },\r\n            Err(err) => {\r\n                println!(\"Error in Log Entry {}: {}\", index + 1, err);\r\n                println!();\r\n            }\r\n        }\r\n    }\r\n}\r\n<\/pre>\n<h6>Output of the program:<\/h6>\n<p><img decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2024\/03\/capture_len_regex.jpg\" alt=\"capture_len_iter\" \/><\/p>\n<hr>\n<p>In this blog we tried to cover few more Regex Crate modules like is_match(), capture(), capture_iter(), and capture_len() for pattern matching and extraction with examples. These methods enable efficient text processing, allowing validation, extraction, and manipulation of data from strings.<\/p>\n<hr>\n<h4>Hire technical testers from Qxf2<\/h4>\n<p>Qxf2 is the home for technical testers. We employ experienced testers with a technical bent of mind. Our testers are naturally inclined towards learning new things and take the time to share their learnings on this blog. Additionally, as a company we invest in the practical development of all our employees. For example, since 2023, we have tried to get everyone to learn and use Rust. This post is an outcome of a couple of our testers having used Rust&#8217;s regex crate in their daily activities. If you want to work with technical test engineers in your project, <a href=\"https:\/\/qxf2.com\/contact?utm_source=rust_regex_crate_2&#038;utm_medium=click&#038;utm_campaign=From%20blog\">please get in touch with us<\/a>.<\/p>\n<hr>\n","protected":false},"excerpt":{"rendered":"<p>Qxf2 is exploring commonly used Rust crates and sharing our learning. We are focused on crates that testers will end up using quite often. We have tried our best to include illustrative examples in order to help testers understand how to use these crates. This blog is continuation to our previous blog on Regex Crate Modules. In this blog we [&hellip;]<\/p>\n","protected":false},"author":12,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[306],"tags":[],"class_list":["post-21264","post","type-post","status-publish","format-standard","hentry","category-rust"],"_links":{"self":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/21264","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/users\/12"}],"replies":[{"embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/comments?post=21264"}],"version-history":[{"count":45,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/21264\/revisions"}],"predecessor-version":[{"id":21604,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/21264\/revisions\/21604"}],"wp:attachment":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/media?parent=21264"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/categories?post=21264"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/tags?post=21264"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}