EphemeralDataContext' object has no attribute 'sources'

resolver101757 · December 29, 2024, 9:45pm

In the documenation, it says to If you have a configured Data Source, Data Asset, and Batch Definition you can test your Expectation before adding it to your Expectation Suite. To do this see Test an Expectation.. However, when i create batch data from my pandas dataframe i get the error EphemeralDataContext. What am i doing wrong?

--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[57], [line 2](vscode-notebook-cell:?execution_count=57&line=2) [1](vscode-notebook-cell:?execution_count=57&line=1) # Create a datasource from the pandas DataFrame ----> [2](vscode-notebook-cell:?execution_count=57&line=2) datasource = context.sources.pandas_default.read_dataframe( [3](vscode-notebook-cell:?execution_count=57&line=3) df_sample, [4](vscode-notebook-cell:?execution_count=57&line=4) name="ms_marco_datasource" [5](vscode-notebook-cell:?execution_count=57&line=5) ) AttributeError: 'EphemeralDataContext' object has no attribute 'sources'

Hers the code to recreate the problem:


# Create reproducible sample data
sample_data = {
    'query': [
        'what is the capital of france?',
        'how tall is mount everest?', 
        'who wrote romeo and juliet?',
        'what year did world war 2 end?',
        'what is the speed of light?'
    ],
    'answers': [
        ['Paris'],
        ['29,029 feet (8,848 meters)'],
        ['William Shakespeare'],
        ['1945'],
        ['299,792,458 meters per second']
    ],
    'query_type': [
        'LOCATION',
        'NUMERIC', 
        'PERSON',
        'NUMERIC',
        'NUMERIC'
    ],
    'query_id': [1, 2, 3, 4, 5],
    'passages': [
        {'is_selected': [1], 'passage_text': ['Paris is the capital of France.'], 'url': ['example.com/1']},
        {'is_selected': [1], 'passage_text': ['Mount Everest is 29,029 feet tall.'], 'url': ['example.com/2']},
        {'is_selected': [1], 'passage_text': ['Romeo and Juliet was written by William Shakespeare.'], 'url': ['example.com/3']},
        {'is_selected': [1], 'passage_text': ['World War 2 ended in 1945.'], 'url': ['example.com/4']},
        {'is_selected': [1], 'passage_text': ['The speed of light is 299,792,458 meters per second.'], 'url': ['example.com/5']}
    ],
    'wellFormedAnswers': [
        ['Paris is the capital of France.'],
        ['Mount Everest is 29,029 feet (8,848 meters) tall.'],
        ['William Shakespeare wrote Romeo and Juliet.'],
        ['World War 2 ended in 1945.'],
        ['The speed of light is 299,792,458 meters per second.']
    ]
}

df_sample = pd.DataFrame(sample_data)
print("\nSample dataset:")
df_sample


import great_expectations as gx

context = gx.get_context()
# Create a datasource from the pandas DataFrame
datasource = context.sources.pandas_default.read_dataframe(
    df_sample,
    name="ms_marco_datasource"
)
data_asset_name = "my_data_asset"
batch_definition_name = "my_batch_definition"
batch_definition = (
    context.data_sources.get(df_pd_ms_macro)
    .get_asset(data_asset_name)
    .get_batch_definition(batch_definition_name)
) ```

Without batched data everything runs fine but from what i can see i need batch data to validate, can someone help me out to what the solution is?

joshstauffer · March 12, 2025, 2:08pm

Hey @resolver101757, there are a couple issues with your code.

As the error says, there isn’t a sources attribute on the DataContext, it should be data_sources.

This method returns a Batch, not a data source, and accepts an asset_name parameter, not a name parameter. You can test expectations directly against this Batch by calling batch.validate(expect). Here’s an updated version of your code:

sample_data = {
    'query': [
        'what is the capital of france?',
        'how tall is mount everest?',
        'who wrote romeo and juliet?',
        'what year did world war 2 end?',
        'what is the speed of light?'
    ],
    'answers': [
        ['Paris'],
        ['29,029 feet (8,848 meters)'],
        ['William Shakespeare'],
        ['1945'],
        ['299,792,458 meters per second']
    ],
    'query_type': [
        'LOCATION',
        'NUMERIC',
        'PERSON',
        'NUMERIC',
        'NUMERIC'
    ],
    'query_id': [1, 2, 3, 4, 5],
    'passages': [
        {'is_selected': [1], 'passage_text': ['Paris is the capital of France.'], 'url': ['example.com/1']},
        {'is_selected': [1], 'passage_text': ['Mount Everest is 29,029 feet tall.'], 'url': ['example.com/2']},
        {'is_selected': [1], 'passage_text': ['Romeo and Juliet was written by William Shakespeare.'], 'url': ['example.com/3']},
        {'is_selected': [1], 'passage_text': ['World War 2 ended in 1945.'], 'url': ['example.com/4']},
        {'is_selected': [1], 'passage_text': ['The speed of light is 299,792,458 meters per second.'], 'url': ['example.com/5']}
    ],
    'wellFormedAnswers': [
        ['Paris is the capital of France.'],
        ['Mount Everest is 29,029 feet (8,848 meters) tall.'],
        ['William Shakespeare wrote Romeo and Juliet.'],
        ['World War 2 ended in 1945.'],
        ['The speed of light is 299,792,458 meters per second.']
    ]
}
import pandas as pd
df_sample = pd.DataFrame(sample_data)


import great_expectations as gx
import great_expectations.expectations as gxe

context = gx.get_context()
# Create a batch from the pandas DataFrame
batch = context.data_sources.pandas_default.read_dataframe(df_sample)

expect = gxe.ExpectColumnToExist(column="query_id")
result = batch.validate(expect=expect)
print(result.describe())

Topic		Replies	Views
GX-Databricks:Datasource-Data asset - Validator GX Core Support databricks , datasource	8	449	December 19, 2024
Data Context GX Core Support how-to , databricks	3	174	February 12, 2025
Datasource for Microsoft Fabric throwing error when I call get_batch() method on a batch_definition GX Core Support	2	93	September 5, 2024
How to validate data without a Checkpoint Feedback how-to , help-wanted	4	1164	September 28, 2021
I am currently working with Great Expectations Core to validate data from two different sources: a CSV file and a MongoDB data source. While I am able to create Expectations and generate local Data Docs, I am encountering the same issue in both cases. S GX Core Support how-to	1	149	November 7, 2024

EphemeralDataContext' object has no attribute 'sources'

Related topics