In the documenation, it says to If you have a configured Data Source, Data Asset, and Batch Definition you can test your Expectation before adding it to your Expectation Suite. To do this see Test an Expectation.
. However, when i create batch data from my pandas dataframe i get the error EphemeralDataContext. What am i doing wrong?
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[57], [line 2](vscode-notebook-cell:?execution_count=57&line=2) [1](vscode-notebook-cell:?execution_count=57&line=1) # Create a datasource from the pandas DataFrame ----> [2](vscode-notebook-cell:?execution_count=57&line=2) datasource = context.sources.pandas_default.read_dataframe( [3](vscode-notebook-cell:?execution_count=57&line=3) df_sample, [4](vscode-notebook-cell:?execution_count=57&line=4) name="ms_marco_datasource" [5](vscode-notebook-cell:?execution_count=57&line=5) ) AttributeError: 'EphemeralDataContext' object has no attribute 'sources'
Hers the code to recreate the problem:
# Create reproducible sample data
sample_data = {
'query': [
'what is the capital of france?',
'how tall is mount everest?',
'who wrote romeo and juliet?',
'what year did world war 2 end?',
'what is the speed of light?'
],
'answers': [
['Paris'],
['29,029 feet (8,848 meters)'],
['William Shakespeare'],
['1945'],
['299,792,458 meters per second']
],
'query_type': [
'LOCATION',
'NUMERIC',
'PERSON',
'NUMERIC',
'NUMERIC'
],
'query_id': [1, 2, 3, 4, 5],
'passages': [
{'is_selected': [1], 'passage_text': ['Paris is the capital of France.'], 'url': ['example.com/1']},
{'is_selected': [1], 'passage_text': ['Mount Everest is 29,029 feet tall.'], 'url': ['example.com/2']},
{'is_selected': [1], 'passage_text': ['Romeo and Juliet was written by William Shakespeare.'], 'url': ['example.com/3']},
{'is_selected': [1], 'passage_text': ['World War 2 ended in 1945.'], 'url': ['example.com/4']},
{'is_selected': [1], 'passage_text': ['The speed of light is 299,792,458 meters per second.'], 'url': ['example.com/5']}
],
'wellFormedAnswers': [
['Paris is the capital of France.'],
['Mount Everest is 29,029 feet (8,848 meters) tall.'],
['William Shakespeare wrote Romeo and Juliet.'],
['World War 2 ended in 1945.'],
['The speed of light is 299,792,458 meters per second.']
]
}
df_sample = pd.DataFrame(sample_data)
print("\nSample dataset:")
df_sample
import great_expectations as gx
context = gx.get_context()
# Create a datasource from the pandas DataFrame
datasource = context.sources.pandas_default.read_dataframe(
df_sample,
name="ms_marco_datasource"
)
data_asset_name = "my_data_asset"
batch_definition_name = "my_batch_definition"
batch_definition = (
context.data_sources.get(df_pd_ms_macro)
.get_asset(data_asset_name)
.get_batch_definition(batch_definition_name)
) ```
Without batched data everything runs fine but from what i can see i need batch data to validate, can someone help me out to what the solution is?