HIVE-29484: HMS.getFields() fails with 'SchemaReader not supported' for Avro and Hbase tables#6350
HIVE-29484: HMS.getFields() fails with 'SchemaReader not supported' for Avro and Hbase tables#6350rtrivedi12 wants to merge 4 commits intoapache:masterfrom
Conversation
soumyakanti3578
left a comment
There was a problem hiding this comment.
Please divide the test for each SerDe, otherwise the PR looks good!
...store/metastore-server/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
Outdated
Show resolved
Hide resolved
…eserve Avro schema evolution in HS2
...tore/metastore-common/src/main/java/org/apache/hadoop/hive/metastore/conf/MetastoreConf.java
Show resolved
Hide resolved
| "org.apache.hadoop.hive.serde2.OpenCSVSerde," + | ||
| "org.apache.iceberg.mr.hive.HiveIcebergSerDe", | ||
| "org.apache.iceberg.mr.hive.HiveIcebergSerDe," + | ||
| "org.apache.hadoop.hive.hbase.HBaseSerDe", |
There was a problem hiding this comment.
Some ideas:
This white list expands as we move on, how about:
- For native tables, we just get the column from Metastore, regardless of what the serde is;
- For non-native tables, we try to get the column from serde first, if it fails, then get the column from Metastore.
There was a problem hiding this comment.
Thanks @dengzhhu653! To clarify, do you think it is acceptable to return the 'last known' schema as a fallback for non-native tables (like Avro) from HMS ? If we agree on this 'best-effort' approach implemented in this PR, we can handle the task of removing the whitelist logic in a follow-up PR.
There was a problem hiding this comment.
I opt for the 'best-effort', so the Metastore cab get ride of serde lib.
For those tables which determine the columns from serde, we can move StorageSchemaReader#readSchema to the client side
There was a problem hiding this comment.
Not sure what others think about this, but get_fields_with_environment_context_core can be refactored as it has nested trys already and now we are adding another try within an if-else block.
Maybe we should remove nested trys altogether - we can have multiple catches for different exceptions, and we could also split this method to make it more readable and maintainable.
Also, please go through Sonar issues as several lines have length > 120.
...ne-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
Outdated
Show resolved
Hide resolved
...ne-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
Outdated
Show resolved
Hide resolved
...ne-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HMSHandler.java
Outdated
Show resolved
Hide resolved
|
Thanks for the suggestions @soumyakanti3578 ! I have refactored |
|



What changes were proposed in this pull request?
org.apache.hadoop.hive.serde2.avro.AvroSerDe and org.apache.hadoop.hive.hbase.HBaseSerDe are added to the default value of MetastoreConf.ConfVars.SERDES_USING_METASTORE_FOR_SCHEMA.
Why are the changes needed?
HiveMetaStore.get_fields_with_environment_context()checks whether a table's SerDe is listed inmetastore.serdes.using.metastore.for.schema. If it is, columns are returned directly from tbl.getSd().getCols(). If not, HMS delegates to StorageSchemaReader, whose default implementation (DefaultStorageSchemaReader) unconditionally throws:Does this PR introduce any user-facing change?
Yes. Previously, calling HMS.getFields() on a table using AvroSerDe or HBaseSerDe would fail with MetaException: Storage schema reading not supported. After this change, the call succeeds and returns the columns stored in the metastore, consistent with the behavior for all other built-in SerDes
How was this patch tested?
A new test testGetFieldsForStorageSerDes() is added to TestHiveMetaStore