Columns are missing in BatchData with Trino datasource

Hi everyone,

Here im trying to do validation with Trino connection. However it seems that GX are not recognizing or missing the columns information from table, because column type expectations are returning:
“exception_message”: “Error: The column “marketing_id” in BatchData does not exist.”, which i can assure you it exists because i did validator.head() and the column “marketing_id” does exist.

image

On the other hand, table type expectations like ExpectRowTableToBeBetween is working well

This is my code:

def get_asset(self, **kwargs):
        context = GXContextSingleton.get_context() #ephemeral context
        connection_string = f"trino://{user}@{endpoint}:{port}/{catalog}/{schema}"
        try:
            datasource = context.data_sources.get(name='my_trino_datasource')
        except:
            connect_args={
                "http_scheme": 'https',
                "verify": False,
                "auth": BasicAuthentication(user, password)
            }
            datasource : SQLDatasource = context.data_sources.add_sql(
                name='my_trino_datasource',
                connection_string=connection_string,
                kwargs = {"connect_args": connect_args}
            )
        
        table_name = f"{schema}.{self.raw_table_name}"
        try:
            table_asset = datasource.get_asset(table_name)
        except Exception as e:
            print(str(e))
            table_asset = datasource.add_table_asset(
                name=table_name, 
                table_name=table_name
            )

        return table_asset
    
    def validate(self, table_asset: TableAsset, raw_table_name: str, stage: str):
        context = GXContextSingleton.get_context()
        suite = get_suite(table_name=raw_table_name, stage=stage)
        try:
            batch_request = table_asset.build_batch_request()
            validator = context.get_validator(batch_request=batch_request, expectation_suite=suite)
            print(validator.head())
            results = validator.validate()
            return results
        except Exception as e:
            print(str(e))

Here is the expectation builder:

context = GXContextSingleton.get_context()

    try:
        suite = context.suites.get('suite')
        if suite:
            suite.expectations.clear()
    except:
        suite = context.suites.add(gx.ExpectationSuite('suite'))

    suite.add_expectation(gx.expectations.ExpectTableRowCountToBeBetween(min_value=0, meta={"name": "row_count_expectation"}))
    suite.add_expectation(gx.expectations.ExpectTableColumnsToMatchSet(column_set=['created_at', 'updated_at', 'is_deleted', 'inserted_date'], meta={"name": "mandatory_column_expectation"}))
    suite.add_expectation(gx.expectations.ExpectColumnValuesToBeUnique(column=pk, meta={"name": "pk_unique_expectation"}))
    suite.add_expectation(gx.expectations.ExpectColumnValuesToNotBeNull(column=pk, meta={"name": "pk_not_null_expectation"}))

I also did validator.columns() and it returned a empty list

On the side note, I did have a problem with building the connection to Trino because it requires BasicAuthentication variable that is not JSON serializable and is blowing the GX yml file. Hence why im running the code in ephemeral context because it does not need to serialize the BasicAuthentication into JSON. Im not sure if this is related to the issue

Am i missing something? Any help would be appreciated, Thanks!

Hi @Jace, I have some additional questions so we can try to repro the issue.

What version of GX Core are you using?
What type of catalog are you using for Trino?
What table format are you using for Trino?

Thanks!

the GX version is 1.6.3. I’m not sure about the catalog and format. The problem was solved by changing the table asset name. Previously in the code the table asset name was the same as the real table name in the database, maybe that somehow cause confusion or ambiguity.

anyway, thanks for the reply