Skip to main content
  1. Posts/

Data Quality Assurance

··318 words·2 mins·

Great Expectations – Data Quality Assurance
#

Bad data produces bad decisions. Analytics and data engineers need reliable, repeatable checks—catching unexpected nulls, out-of-range values, and violations of business rules—before data reaches reports or models.

Great Expectations (GE) turns data quality into readable, testable assertions called “expectations.” You can author expectations by hand, generate them from samples, and run validations in notebooks, CI, scheduled pipelines, or batch jobs.

Key Features
#

  • Write human-readable expectations for data validation
  • Generate expectations automatically from existing datasets
  • Easily integrate with tools like Airflow and dbt
  • Build custom validation rules for specific domains

Quick examples (Python)
#

Install: pip install great_expectations

  1. Fast checks with pandas (good for notebooks and quick validation)
# Example: validate a pandas DataFrame with ge.from_pandas
import pandas as pd
import great_expectations as ge

df = pd.DataFrame({"age": [25, None, 40], "score": [80, 95, 102]})
gdf = ge.from_pandas(df)

gdf.expect_column_values_to_not_be_null("age")
gdf.expect_column_values_to_be_between("score", min_value=0, max_value=100)

results = gdf.validate()
print(results["statistics"]["successful_expectations"], results["statistics"]["successful_percentage"])
  1. Production-style validation with DataContext and Validator
# Example: use a DataContext + Validator (assumes `ge init` or existing GE project)
from great_expectations.data_context import DataContext
import pandas as pd

context = DataContext()  # reads local Great Expectations project config
df = pd.DataFrame({"id": [1,2,3], "value": [10, None, 30]})

batch_request = {
    "runtime_parameters": {"batch_data": df},
    "batch_identifiers": {"default_identifier": "batch_1"},
}

validator = context.get_validator(batch_request=batch_request, expectation_suite_name="default")
validator.expect_column_values_to_not_be_null("id")
validator.expect_column_values_to_be_between("value", min_value=0, max_value=100)
validator.save_expectation_suite()

result = validator.validate()
print("Validation success:", result["success"])

Auto-generating expectations and running them as part of checkpoints or CI/CD is supported—see the Great Expectations docs for profilers, expectation suites, and checkpoints.

More information at the link 👇

Also published on LinkedIn.
Juan Pedro Bretti Mandarano
Author
Juan Pedro Bretti Mandarano