Data Quality Assurance

Great Expectations – Data Quality Assurance
#

Bad data produces bad decisions. Analytics and data engineers need reliable, repeatable checks—catching unexpected nulls, out-of-range values, and violations of business rules—before data reaches reports or models.

Great Expectations (GE) turns data quality into readable, testable assertions called “expectations.” You can author expectations by hand, generate them from samples, and run validations in notebooks, CI, scheduled pipelines, or batch jobs.

Key Features
#

Write human-readable expectations for data validation
Generate expectations automatically from existing datasets
Easily integrate with tools like Airflow and dbt
Build custom validation rules for specific domains

Quick examples (Python)
#

Install: pip install great_expectations

Fast checks with pandas (good for notebooks and quick validation)

# Example: validate a pandas DataFrame with ge.from_pandas
import pandas as pd
import great_expectations as ge

df = pd.DataFrame({"age": [25, None, 40], "score": [80, 95, 102]})
gdf = ge.from_pandas(df)

gdf.expect_column_values_to_not_be_null("age")
gdf.expect_column_values_to_be_between("score", min_value=0, max_value=100)

results = gdf.validate()
print(results["statistics"]["successful_expectations"], results["statistics"]["successful_percentage"])

Production-style validation with DataContext and Validator

# Example: use a DataContext + Validator (assumes `ge init` or existing GE project)
from great_expectations.data_context import DataContext
import pandas as pd

context = DataContext()  # reads local Great Expectations project config
df = pd.DataFrame({"id": [1,2,3], "value": [10, None, 30]})

batch_request = {
    "runtime_parameters": {"batch_data": df},
    "batch_identifiers": {"default_identifier": "batch_1"},
}

validator = context.get_validator(batch_request=batch_request, expectation_suite_name="default")
validator.expect_column_values_to_not_be_null("id")
validator.expect_column_values_to_be_between("value", min_value=0, max_value=100)
validator.save_expectation_suite()

result = validator.validate()
print("Validation success:", result["success"])

Auto-generating expectations and running them as part of checkpoints or CI/CD is supported—see the Great Expectations docs for profilers, expectation suites, and checkpoints.