{"id":19096,"date":"2023-08-04T07:37:44","date_gmt":"2023-08-04T11:37:44","guid":{"rendered":"https:\/\/qxf2.com\/blog\/?p=19096"},"modified":"2023-08-04T07:37:44","modified_gmt":"2023-08-04T11:37:44","slug":"validating-data-easy-soda-core","status":"publish","type":"post","link":"https:\/\/qxf2.com\/blog\/validating-data-easy-soda-core\/","title":{"rendered":"Validating Data Made Easy: A Dive into Soda Core"},"content":{"rendered":"<p>This post is a hands-on example to help you start writing data quality checks with <a href=\"https:\/\/github.com\/sodadata\/soda-core\">Soda Core<\/a>. We realize a lot of our readers are fairly new to the topic of writing data quality checks. So, in this post, we will also go over some of the most common data checks you might want to implement on your structured data. We hope the ideas and the practical guide in this post help you get started with writing data quality checks at your company.<\/p>\n<p><a href=\"https:\/\/qxf2.com\/?utm_source=sodaIntro&#038;utm_medium=click&#038;utm_campaign=From%20blog\">Qxf2<\/a> has noticed that writing data validation checks is becoming important skill for testers. Our experience with <a href=\"https:\/\/greatexpectations.io\/\">Great Expectations<\/a> has been phenomenal, and we&#8217;ve shared our insights through <a href=\"https:\/\/qxf2.com\/blog\/data-validation-great-expectations-real-example\/\">blog posts<\/a>. In order to expand our testing capabilities, we ventured into using Soda Core (previously known as Soda SQL) with another client in mid-2022. We successfully utilized Soda Core to validate data in a RedShift Database, which served as a crucial entry point into a complex data pipeline. <\/p>\n<h3>About Soda Core<\/h3>\n<p><a href=\"https:\/\/github.com\/sodadata\/soda-core\">Soda Core<\/a> is an open-source SQL-based data auditing tool that helps users to analyze and audit their data. It is designed to be used with various databases, data warehouses, and data lakes, such as PostgreSQL, MySQL, Redshift, BigQuery, and Snowflake.<\/p>\n<p><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/08\/soda.jpg\" data-rel=\"lightbox-image-0\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/08\/soda.jpg\" alt=\"this image is to describe the data flow of the Soda Core\" width=\"941\" height=\"391\" class=\"alignnone size-full wp-image-19279\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/08\/soda.jpg 941w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/08\/soda-300x125.jpg 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/08\/soda-768x319.jpg 768w\" sizes=\"auto, (max-width: 941px) 100vw, 941px\" \/><\/a><\/p>\n<p>Some of the features of Soda Core include:<br \/>\na) Custom tests: You can write custom tests in SQL or Python to validate the quality of our data.<\/p>\n<p>b) Data profiling: Soda can generate data profiles for our tables, columns, and schemas to give you an overview of the data.<\/p>\n<p>c) Automated monitoring: You can set up soda to run tests automatically on a schedule and get alerts when a test fails.<\/p>\n<p>d) Data lineage tracking: Soda can track changes to our data and help you understand the impact of those changes on downstream processes.<\/p>\n<h3>Setup to play along with this post<\/h3>\n<p>We created some synthetic data for you to play long. You can find the schema and the data in the form of a SQL dump file <a href=\"https:\/\/gist.github.com\/qxf2\/f5fb085275508f6e84e8ba24d26aeffa\">here<\/a>. Make sure that you import our data into your MySQL instance. The provided schema and data can be used to carry out practical exercises and gain experience. This will improve your understanding and abilities by enabling you to apply the ideas you have learned in a practical setting.<\/p>\n<p>We are using MySQL as the data source in this post. However, if you prefer a different data source, modify this section to suit your needs. You can find easily configuration instructions in the <a href=\"https:\/\/github.com\/sodadata\/soda-core\/blob\/main\/docs\/installation.md\">library documentation<\/a>. Additionally, make sure to set up the necessary credentials and connection parameters for your chosen data source. <\/p>\n<p>Run the following command to get the Python module for soda-core that connects to MySQL. <\/p>\n<pre lang='python'> pip install soda-core-mysql <\/pre>\n<p>You are now set to play along with this post.<\/p>\n<h3>Getting connected to a data source<\/h3>\n<p>After installing Soda Core, you must make a <strong>configuration.yml <\/strong>file with the necessary information so that Soda Core can connect to your data source (except Apache Spark DataFrames, which does not use a configuration YAML file).<\/p>\n<p>Set the value of a property that the configuration YAML file uses in a system variable that you can access from your command-line interface.<\/p>\n<pre lang='python'>\r\nexport MYSQL_USERNAME=<replace your mysql_username>\r\nexport MYSQL_PASSWORD=<replace your mysql_password>\r\n<\/pre>\n<p>By using the echo command, you can verify that the system can retrieve the value you entered.<\/p>\n<pre lang='python'>\r\necho $MYSQL_USERNAME\r\necho $MYSQL_PASSWORD\r\n<\/pre>\n<p>Set environment variable value in YAML configuration.<\/p>\n<pre lang='python'> data_source movie_rentals:\r\n  type: mysql\r\n  host: 127.0.0.1\r\n  username: ${MYSQL_USERNAME}\r\n  password: ${MYSQL_PASSWORD}\r\n  port: 3306\r\n  database: movie_rentals <\/pre>\n<p>After saving the configuration YAML file, run a scan to verify that Soda Core successfully connects to your data source.<\/p>\n<pre lang='python'>\r\nsoda scan -d movie_rentals -c configuration.yml\r\n<\/pre>\n<h3>Getting started with basic validations <\/h3>\n<p>Soda performs tests called &#8220;checks&#8221; when it scans a dataset in your data source. These checks are written using the Soda Checks Language (SodaCL) and are stored in the checks.yml file. <\/p>\n<p>As you work through this section, please pay attention to the kind of data quality checks we are writing. We took some pains to make sure that we show cover of the most common kind of checks. For example, almost everyone will have at least some checks for schema, duplicate data, invalid data, null data in certain columns, etc. You can consider using similar checks on your data sources. <\/p>\n<h4> 1. Schema Validations <\/h4>\n<p><strong>a). Validate for missing column<\/strong><\/p>\n<pre lang='python'>\r\n# Checks for schema validations\r\nchecks for dim_customer:\r\n  - schema:\r\n      fail:\r\n        when required column missing:\r\n          - first_name\r\n          - last_name\r\n          - email_id  \r\n<\/pre>\n<p>Now execute the schema validation check that was prepared. <\/p>\n<pre lang='python'>\r\nsoda scan -d movie_rentals -c configuration.yml check_schema_validations.yml  \r\n<\/pre>\n<p><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_validation_missing_column-1.jpg\" data-rel=\"lightbox-image-1\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_validation_missing_column-1.jpg\" alt=\"this image is to describe the schema validation for missing column\" width=\"1787\" height=\"159\" class=\"alignnone size-full wp-image-19160\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_validation_missing_column-1.jpg 1787w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_validation_missing_column-1-300x27.jpg 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_validation_missing_column-1-1024x91.jpg 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_validation_missing_column-1-768x68.jpg 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_validation_missing_column-1-1536x137.jpg 1536w\" sizes=\"auto, (max-width: 1787px) 100vw, 1787px\" \/><\/a><\/p>\n<p>Let us consider dim_customer table schema lacks an address column. We&#8217;ll look at how to validate this. We&#8217;ve included the column in the code. <\/p>\n<pre lang='python'>\r\n# Checks for schema validations\r\nchecks for dim_customer:\r\n  - schema:\r\n      fail:\r\n        when required column missing:\r\n          - first_name\r\n          - last_name\r\n          - email_id  \r\n          - address\r\n<\/pre>\n<p><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_missing_column_failed.jpg\" data-rel=\"lightbox-image-2\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_missing_column_failed.jpg\" alt=\"this image describe soda run if any missing column with failed message\" width=\"1901\" height=\"191\" class=\"alignnone size-full wp-image-19159\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_missing_column_failed.jpg 1901w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_missing_column_failed-300x30.jpg 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_missing_column_failed-1024x103.jpg 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_missing_column_failed-768x77.jpg 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_missing_column_failed-1536x154.jpg 1536w\" sizes=\"auto, (max-width: 1901px) 100vw, 1901px\" \/><\/a><\/p>\n<p><strong>b) Validate Forbidden column <\/strong><br \/>\nWe can now verify if any restricted column exists in the schema. Instead of marking a test case as failed, we might issue a warning that the schema contains a banned column.<\/p>\n<pre lang='python'>\r\n# Checks for schema validations\r\nchecks for dim_customer:\r\n  - schema:\r\n      warn:\r\n        when forbidden column present: [credit_card]\r\n      fail:\r\n        when required column missing:\r\n          - first_name\r\n          - last_name\r\n          - email_id  \r\n<\/pre>\n<p>The test will succeed since the schema no longer has a column with that name. Therefore, we will change the schema to ensure that the test will issue a caution rather than a failure message. We will add the column &#8220;credit_card&#8221; to the schema . <\/p>\n<pre lang='sql'>\r\nALTER TABLE movie_rentals.dim_customer\r\nADD COLUMN credit_card VARCHAR(200);\r\n<\/pre>\n<p>Run the test now, and see what happens.<br \/>\n<a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_forbidden_warning.jpg\" data-rel=\"lightbox-image-3\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_forbidden_warning.jpg\" alt=\"This image shows the validation run after the forbidden column is present\" width=\"1908\" height=\"189\" class=\"alignnone size-full wp-image-19151\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_forbidden_warning.jpg 1908w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_forbidden_warning-300x30.jpg 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_forbidden_warning-1024x101.jpg 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_forbidden_warning-768x76.jpg 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_forbidden_warning-1536x152.jpg 1536w\" sizes=\"auto, (max-width: 1908px) 100vw, 1908px\" \/><\/a><\/p>\n<p>Revert the change , If you prefer not to receive this warning again.<\/p>\n<pre lang='sql'>\r\nALTER TABLE movie_rentals.dim_customer\r\nDROP COLUMN credit_card;\r\n<\/pre>\n<p><strong>c) Validate column types <\/strong><br \/>\nNext, we&#8217;ll proceed to validate the column type within the schema.<\/p>\n<p><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/describe_schema_dim_customer.jpg\" data-rel=\"lightbox-image-4\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/describe_schema_dim_customer.jpg\" alt=\"\" width=\"629\" height=\"340\" class=\"alignnone size-full wp-image-19154\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/describe_schema_dim_customer.jpg 629w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/describe_schema_dim_customer-300x162.jpg 300w\" sizes=\"auto, (max-width: 629px) 100vw, 629px\" \/><\/a><\/p>\n<p>Add to checks yml for the column type validation . <\/p>\n<pre lang='python'>\r\n# Checks for schema validations\r\nchecks for dim_customer:\r\n  - schema:\r\n      warn:\r\n        when forbidden column present: [credit_card]\r\n\r\n      fail:\r\n        when required column missing:\r\n          - first_name\r\n          - last_name\r\n          - email_id                                      \r\n        when wrong column type:\r\n          active: boolean\r\n<\/pre>\n<p><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_column_type.jpg\" data-rel=\"lightbox-image-5\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_column_type.jpg\" alt=\"This image shows the validation for the column type\" width=\"1775\" height=\"170\" class=\"alignnone size-full wp-image-19156\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_column_type.jpg 1775w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_column_type-300x29.jpg 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_column_type-1024x98.jpg 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_column_type-768x74.jpg 768w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/schema_column_type-1536x147.jpg 1536w\" sizes=\"auto, (max-width: 1775px) 100vw, 1775px\" \/><\/a><\/p>\n<h4>2. Missing metrics <\/h4>\n<p>To identify missing values in your dataset, use a missing metric in a check.<br \/>\nLet&#8217;s create a file named &#8220;check_missing_count_dim_actor.yml&#8221; where we will perform validation to identify any occurrences of null values in both the &#8220;first_name&#8221; and &#8220;last_name&#8221; columns.<\/p>\n<pre lang='python'>\r\n# Checks for missing metrics \r\nchecks for dim_actor:\r\n   - missing_count(first_name) = 0\r\n   - missing_count(last_name) = 0\r\n<\/pre>\n<p><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/validate_missing_count.jpg\" data-rel=\"lightbox-image-6\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/validate_missing_count.jpg\" alt=\"This image shows the result set of soda after running the missing metric check for null values \" width=\"1348\" \nheight=\"226\" class=\"alignnone size-full wp-image-19183\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/validate_missing_count.jpg 1348w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/validate_missing_count-300x50.jpg 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/validate_missing_count-1024x172.jpg 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/validate_missing_count-768x129.jpg 768w\" sizes=\"(max-width: 1348px) 100vw, 1348px\" \/><\/a><\/p>\n<p>Based on the result set, it appears that there is a row that failed in each category. Let&#8217;s proceed to perform the query and confirm.<\/p>\n<p><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/db_query_missing_values.jpg\" data-rel=\"lightbox-image-7\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/db_query_missing_values.jpg\" alt=\"This image shows the result set if we run using the sql query\" width=\"737\" height=\"274\" class=\"alignnone size-full wp-image-19187\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/db_query_missing_values.jpg 737w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/db_query_missing_values-300x112.jpg 300w\" sizes=\"auto, (max-width: 737px) 100vw, 737px\" \/><\/a><\/p>\n<h4>3. Missing percent<\/h4>\n<p>&#8220;missing percent&#8221; refers to the percentage of missing or null values in a specific column of a dataset. We are validating against the address column here . A high missing percent in a column may indicate data quality issues, which could impact data analysis and decision-making.<\/p>\n<pre lang='python'>\r\n# Checks for missing_percent validations\r\nchecks for dim_address:\r\n  - missing_percent(address) > 5%\r\n<\/pre>\n<p><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/missing_percent.jpg\" data-rel=\"lightbox-image-8\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/missing_percent.jpg\" alt=\"This iamge shows the result set with the missing percent of address column\" width=\"1276\" height=\"202\" class=\"alignnone size-full wp-image-19207\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/missing_percent.jpg 1276w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/missing_percent-300x47.jpg 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/missing_percent-1024x162.jpg 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/missing_percent-768x122.jpg 768w\" sizes=\"auto, (max-width: 1276px) 100vw, 1276px\" \/><\/a><\/p>\n<p><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/missing_percent_dim_address.jpg\" data-rel=\"lightbox-image-9\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/missing_percent_dim_address.jpg\" alt=\"This image shows the sqlquery result set of the missing percent in address column\" width=\"676\" height=\"165\" class=\"alignnone size-full wp-image-19208\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/missing_percent_dim_address.jpg 676w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/missing_percent_dim_address-300x73.jpg 300w\" sizes=\"auto, (max-width: 676px) 100vw, 676px\" \/><\/a><\/p>\n<h4>4. Data Freshness <\/h4>\n<p>By performing this check, you may determine how recent the data in a dataset is by looking at the age of the row that was most recently added to a table. &#8220;freshness&#8221; will check with the current date and time . &#8220;Row count&#8221; will check the number of rows in the table. We have around 2000 rows in the table. Create a file check_data_freshness_fact_rental.yml<\/p>\n<pre lang='python'>\r\n# Checks for data freshness \r\nchecks for fact_rental:\r\n- row_count > 1000\r\n- freshness(last_update) < 5d\r\n<\/pre>\n<p><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/data_freshness.jpg\" data-rel=\"lightbox-image-10\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/data_freshness.jpg\" alt=\"This image shows the resultset with check of datafreshness \" width=\"1436\" height=\"163\" class=\"alignnone size-full wp-image-19222\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/data_freshness.jpg 1436w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/data_freshness-300x34.jpg 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/data_freshness-1024x116.jpg 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/data_freshness-768x87.jpg 768w\" sizes=\"auto, (max-width: 1436px) 100vw, 1436px\" \/><\/a><\/p>\n<h4>5. Duplicate count<\/h4>\n<p>To check the duplicate count for the film_id column, add the metric in the check_missing_count_dim_actor.yml file, which was created step 2.<\/p>\n<pre lang='python'>\r\n# Checks for missing metrics and duplicates\r\nchecks for dim_actor:\r\n   - missing_count(first_name) = 0\r\n   - missing_count(last_name) = 0:\r\n      missing values: [NA, n\/a, '0']\r\n   \r\n    - duplicate_count(film_id):\r\n      warn: \r\n        when between 1 and 99\r\n      fail: when > 100\r\n<\/pre>\n<p><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/duplicate_warning.jpg\" data-rel=\"lightbox-image-11\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/duplicate_warning.jpg\" alt=\"This image shows the resultset with the duplicate count while running the resultset\" width=\"1469\" height=\"323\" class=\"alignnone size-full wp-image-19225\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/duplicate_warning.jpg 1469w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/duplicate_warning-300x66.jpg 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/duplicate_warning-1024x225.jpg 1024w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/duplicate_warning-768x169.jpg 768w\" sizes=\"auto, (max-width: 1469px) 100vw, 1469px\" \/><\/a><\/p>\n<h4>6. Invalid counts <\/h4>\n<p>You will validate metrics that specify which data values or formats are acceptable or unacceptable in accordance with your own business requirements.<\/p>\n<pre lang='python'>\r\n# Checks for Invalid counts\r\nchecks for dim_customer:\r\n  - invalid_count(active) = 0:\r\n      valid values: ['0', '1']\r\n<\/pre>\n<p><a href=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/invalid_column_value.jpg\" data-rel=\"lightbox-image-12\" data-rl_title=\"\" data-rl_caption=\"\" title=\"\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/invalid_column_value.jpg\" alt=\"This image shows the result set for the invalid values in the resultset \" width=\"999\" height=\"337\" class=\"alignnone size-full wp-image-19226\" srcset=\"https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/invalid_column_value.jpg 999w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/invalid_column_value-300x101.jpg 300w, https:\/\/qxf2.com\/blog\/wp-content\/uploads\/2023\/07\/invalid_column_value-768x259.jpg 768w\" sizes=\"auto, (max-width: 999px) 100vw, 999px\" \/><\/a><\/p>\n<p>Soda Core can work together with data orchestration tools like <a href=\"https:\/\/airflow.apache.org\/\">Airflow<\/a> to set up and validate data checks on a schedule. By connecting to <a href=\"https:\/\/docs.soda.io\/soda-cloud\/overview.html\">Soda Cloud,<\/a> which has various additional features, we can enhance its functionality. <a href=\"https:\/\/docs.soda.io\/soda-cl\/soda-cl-overview.html\">SodaCL<\/a> also provides numerous ready-to-use checks that are easy to understand and allows us to create our own metrics and modify checks using SQL queries.<\/p>\n<hr>\n<h3>Hire technical testers from Qxf2<\/h3>\n<p><a href=\"https:\/\/qxf2.com\/contact?utm_source=sodaIntro&#038;utm_medium=click&#038;utm_campaign=From%20blog\">Hire Qxf2<\/a> for top-notch technical testers who excel in diverse testing challenges. We bring a wealth of experience to help you solve critical testing problems and provide expertise in testing modern technical stacks.<\/p>\n<hr>\n","protected":false},"excerpt":{"rendered":"<p>This post is a hands-on example to help you start writing data quality checks with Soda Core. We realize a lot of our readers are fairly new to the topic of writing data quality checks. So, in this post, we will also go over some of the most common data checks you might want to implement on your structured data. [&hellip;]<\/p>\n","protected":false},"author":32,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[314,186,374],"tags":[],"class_list":["post-19096","post","type-post","status-publish","format-standard","hentry","category-data-testing","category-database","category-soda-core"],"_links":{"self":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/19096","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/users\/32"}],"replies":[{"embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/comments?post=19096"}],"version-history":[{"count":52,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/19096\/revisions"}],"predecessor-version":[{"id":19402,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/posts\/19096\/revisions\/19402"}],"wp:attachment":[{"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/media?parent=19096"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/categories?post=19096"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/qxf2.com\/blog\/wp-json\/wp\/v2\/tags?post=19096"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}