Skip to content

HIVE-29484: HMS.getFields() fails with 'SchemaReader not supported' for Avro and Hbase tables#6350

Open
rtrivedi12 wants to merge 4 commits intoapache:masterfrom
rtrivedi12:HIVE-29484
Open

HIVE-29484: HMS.getFields() fails with 'SchemaReader not supported' for Avro and Hbase tables#6350
rtrivedi12 wants to merge 4 commits intoapache:masterfrom
rtrivedi12:HIVE-29484

Conversation

@rtrivedi12
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

org.apache.hadoop.hive.serde2.avro.AvroSerDe and org.apache.hadoop.hive.hbase.HBaseSerDe are added to the default value of MetastoreConf.ConfVars.SERDES_USING_METASTORE_FOR_SCHEMA.

Why are the changes needed?

HiveMetaStore.get_fields_with_environment_context() checks whether a table's SerDe is listed in metastore.serdes.using.metastore.for.schema. If it is, columns are returned directly from tbl.getSd().getCols(). If not, HMS delegates to StorageSchemaReader, whose default implementation (DefaultStorageSchemaReader) unconditionally throws:

MetaException: java.lang.UnsupportedOperationException: Storage schema reading not supported

Does this PR introduce any user-facing change?

Yes. Previously, calling HMS.getFields() on a table using AvroSerDe or HBaseSerDe would fail with MetaException: Storage schema reading not supported. After this change, the call succeeds and returns the columns stored in the metastore, consistent with the behavior for all other built-in SerDes

How was this patch tested?

A new test testGetFieldsForStorageSerDes() is added to TestHiveMetaStore

mvn test -Dtest.groups= \
  -Dtest="TestEmbeddedHiveMetaStore#testGetFieldsForStorageSerDes+TestRemoteHiveMetaStore#testGetFieldsForStorageSerDes" \
  -pl standalone-metastore/metastore-server

Copy link
Copy Markdown
Contributor

@soumyakanti3578 soumyakanti3578 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please divide the test for each SerDe, otherwise the PR looks good!

"org.apache.hadoop.hive.serde2.OpenCSVSerde," +
"org.apache.iceberg.mr.hive.HiveIcebergSerDe",
"org.apache.iceberg.mr.hive.HiveIcebergSerDe," +
"org.apache.hadoop.hive.hbase.HBaseSerDe",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some ideas:
This white list expands as we move on, how about:

  • For native tables, we just get the column from Metastore, regardless of what the serde is;
  • For non-native tables, we try to get the column from serde first, if it fails, then get the column from Metastore.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dengzhhu653! To clarify, do you think it is acceptable to return the 'last known' schema as a fallback for non-native tables (like Avro) from HMS ? If we agree on this 'best-effort' approach implemented in this PR, we can handle the task of removing the whitelist logic in a follow-up PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opt for the 'best-effort', so the Metastore cab get ride of serde lib.
For those tables which determine the columns from serde, we can move StorageSchemaReader#readSchema to the client side

Copy link
Copy Markdown
Contributor

@soumyakanti3578 soumyakanti3578 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what others think about this, but get_fields_with_environment_context_core can be refactored as it has nested trys already and now we are adding another try within an if-else block.

Maybe we should remove nested trys altogether - we can have multiple catches for different exceptions, and we could also split this method to make it more readable and maintainable.

Also, please go through Sonar issues as several lines have length > 120.

@rtrivedi12
Copy link
Copy Markdown
Contributor Author

Thanks for the suggestions @soumyakanti3578 ! I have refactored get_fields_with_environment_context_core to make it more readable and cleaned up the exception handling.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud bot commented Apr 8, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants