Could not find local file-backed GX project

I am using the latest version of GX. Im unable to generate the report. The generated file is empty.

INFO:great_expectations.data_context.data_context.context_factory:Could not find local file-backed GX project
INFO:great_expectations.data_context.types.base:Created temporary directory ‘C:\Users*****\AppData\Local\Temp\tmpv95zw91s’ for ephemeral docs site

hello @pythons! without providing any environment information, code, or virtually anything, i am not sure how i can help.

This is my code. May i know what other details are needed? thank you

Initialize Base instance

base = Base()

Define index value

index_value = 0
source = base.query_tables(‘Raw_Data’, ‘dev’, ‘DB’)

destination = base.query_tables(‘master’, ‘dev’, ‘DB’)

Extract relevant data from source

taxsource = source[‘TaxIdentificationNumber’].iloc[index_value]

firstgroup = str(source[‘GroupName’].iloc[index_value]).upper()

firstlast = str(source[‘ProviderLastName’].iloc[index_value]).upper()

firstaddress = str(source[‘ProviderAddress1’].iloc[index_value]).upper()[:4]

Debug output

print(firstaddress)

print(type(firstaddress))

print(taxsource)

Extract relevant data from destination

tax = destination[‘TaxID’]

totid = destination[

(tax == taxsource) &

(destination[‘Hospital_N’] == firstgroup) &

(destination[‘Last’] == firstlast) &

(destination[‘Address’].str.startswith(firstaddress))

]

firstID = totid[‘Tothpdpyid’].iloc[index_value]

firsttax = totid[‘TaxID’].iloc[index_value]

print(firsttax)

Create a DataFrame to merge

merge_data = pd.DataFrame({‘TaxIdentificationNumber’: [taxsource], ‘TaxID’: [firsttax]})

print(merge_data)

Set up Great Expectations context

context = gx.get_context()

print(type(context).name)

data_source = context.data_sources.add_pandas(‘totid’)

data_asset = data_source.add_dataframe_asset(name=“pd dataframe asset”)

batch_definition = data_asset.add_batch_definition_whole_dataframe(“batch definition”)

batch = batch_definition.get_batch(batch_parameters={“dataframe”: merge_data})

Define and validate expectations

great_expectation = gx.expectations.ExpectColumnPairValuesToBeEqual(

column_A=“TaxIdentificationNumber”,

column_B=“TaxID”

)

validation_results = batch.validate(great_expectation)

print(validation_results)

checkpoint_result = validation_results.run()

context.view_validation_result(checkpoint_result)

context.build_data_docs()

are you successfully seeing the validation results? data docs requires a checkpoint be correctly set up and i think that is why you arent seeing anything in data docs. the quickstart you followed is more of a quick proof of concept and a checkpoint is a more production ready aspect of GX.

Here is how we define a a checkpoint: Checkpoint is the primary means for validating data in a production deployment of GX. Checkpoints enable you to run a list of Validation Definitions with shared parameters. Checkpoints can be configured to run Actions, and can pass Validation Results to a list of predefined Actions for processing.

adjusting your code to include a checkpoint should result in data docs correctly generating

1 Like

can you share a link wherein i can follow on how i can add the checkpoint if im using a data frame?I found the one below but i believe this is for SQL?

create a checkpoint

run a checkpoint

You pass the dataframe as a batch parameter. Example:

validation_results = checkpoint.run(batch_parameters={"dataframe": pd.DataFrame({"a": [1, 2, 3]})})

1 Like

I tried to follow it but im stuck with validation definition. I encountered an error while retrieving batch definition.

import great_expectations as gx

import pandas as pd

from scripts.base import Base

from great_expectations import expectations as gxe

Initialize Base instance

base = Base()

Define index value

index_value = 0

Query source and destination tables

source = base.query_tables(‘Raw_Data’, ‘', '’)

destination = base.query_tables(‘', ‘DEV-SQL02’, '*’)

Extract relevant data from source

taxsource = source[‘TaxIdentificationNumber’].iloc[index_value]

firstgroup = str(source[‘GroupName’].iloc[index_value]).upper()

firstlast = str(source[‘ProviderLastName’].iloc[index_value]).upper()

firstaddress = str(source[‘ProviderAddress1’].iloc[index_value]).upper()[:4]

Debug output

print(firstaddress)

print(type(firstaddress))

print(taxsource)

Extract relevant data from destination

tax = destination[‘TaxID’]

totid = destination[

(tax == taxsource) &

(destination[‘Hospital_N’] == firstgroup) &

(destination[‘Last’] == firstlast) &

(destination[‘Address’].str.startswith(firstaddress))

]

firstID = totid[‘Tothpdpyid’].iloc[index_value]

firsttax = totid[‘TaxID’].iloc[index_value]

print(firsttax)

Create a DataFrame to merge

merge_data = pd.DataFrame({‘TaxIdentificationNumber’: [taxsource], ‘TaxID’: [firsttax]})

print(merge_data)

Set up Great Expectations context

context = gx.get_context()

print(type(context).name)

data_source = context.data_sources.add_pandas(‘totid’)

data_asset = data_source.add_dataframe_asset(name=“pd dataframe asset”)

batch_definition = data_asset.add_batch_definition_whole_dataframe(“batch definition”)

batch = batch_definition.get_batch(batch_parameters={“dataframe”: merge_data})

create expectation suite

suite_name = “my_expectation_suite”

suite = gx.ExpectationSuite(name=suite_name)

suite = context.suites.add(suite)

existing_suite_name = (

“my_expectation_suite” # replace this with the name of your Expectation Suite

)

suite = context.suites.get(name=existing_suite_name)

Define and validate expectations

great_expectation = gx.expectations.ExpectColumnPairValuesToBeEqual(

column_A=“TaxIdentificationNumber”,

column_B=“TaxID”

)

suite.add_expectation(great_expectation)

great_expectation.save()

validation_results = batch.validate(great_expectation)

print(validation_results)

create validation definition

expectation_suite_name = “my_expectation_suite”

expectation_suite = context.suites.get(name=expectation_suite_name)

#retrieve batch definition

data_source_name = “my_data_source”

data_asset_name = “my_data_asset”

batch_definition_name = “my_batch_definition”

batch_definition = (

context.data_sources.get(data_source_name)

.get_asset(data_asset_name)

.get_batch_definition(batch_definition_name)

)

#create validation definition

definition_name = “my_validation_definition”

validation_definition = gx.ValidationDefinition(

data=batch_definition, suite=expectation_suite, name=definition_name

)

Save the Validation Definition to your Data Context

validation_definition = context.validation_definitions.add(validation_definition)

validation_definition_name = “my_validation_definition”

validation_definition = context.validation_definitions.get(validation_definition_name)

validation_results = validation_definition.run()

print(validation_results)

# context.build_data_docs()


Do you know where should the values below will come from?data_source_name = “my_data_source”
data_asset_name = “my_data_asset”
batch_definition_name = “my_batch_definition”

when you are creating the validation definition you are retrieving a datasource “my_data_source” when the name of the data source you created earlier was named "totid"

Thank you for your response. I did replace my data_source_name with totid but now my issue is with the data_asset_name. In my code my data asset is ‘data_asset’ but it seems incorrect. I wonder what should be the right value?

again, you added the asset:
data_asset = data_source.add_dataframe_asset(name=“pd dataframe asset”)

and then are trying to fetch an asset with a different name. above where you have batch_definition = you don’t need to name the data source and asset again, as they already have names, you simply need to fetch them as you are with

batch_definition = (
.....get())
1 Like

I need to update the data_asset part of my code? What am I going to get then? I just followed the instructions here Try GX Core | Great Expectations

I was able to generate the report successfully but im wondering why i still have this error