feat(index): Add Indexer abstraction and refactor metadata table init… by cshuo · Pull Request #18348 · apache/hudi

cshuo · 2026-03-19T02:45:47Z

…ialization logic

Describe the issue this Pull Request addresses

The metadata table initialization path had index-specific logic scattered across multiple metadata writer/util classes, which made the code hard to extend and maintain as new index types are added.

This PR addresses that by introducing a dedicated Indexer abstraction and refactoring initialization flows to use it consistently across engine-specific implementations.

This PR is based on #12983

Summary and Changelog

Commit 1: Add `Indexer` abstraction and related POJOs

Introduced a new metadata indexing abstraction layer:
- Indexer, BaseIndexer, IndexerFactory
- record-level indexers (RecordIndexer, PartitionedRecordIndexer, BaseRecordIndexer)
- concrete indexers for files, bloom filters, column stats, partition stats, expression, and secondary index
Added expression index record generators (including unsupported/spark variants).
Added supporting model POJOs for index partition initialization/data flow.
Added focused unit tests for factory and each major indexer implementation.

Commit 2: Refactor metadata table initialization to use the abstraction

Refactored metadata table initialization logic in metadata writer paths to rely on the new Indexer abstraction.
Simplified and consolidated previously duplicated logic in:
- HoodieBackedTableMetadataWriter
- engine-specific metadata writer implementations (Spark/Flink/Java)
- related metadata utility classes
Updated related tests to align with the new abstraction-based flow.
Net effect: significant code reduction and cleaner separation of responsibilities.

Impact

Functional impact: No intended behavior change in index outcomes; this is primarily an architectural refactor plus new abstraction scaffolding.
Maintainability: Major improvement via centralized index initialization contract and reduced duplication.
Extensibility: Easier to add new metadata index types with minimal touch points.

Risk Level

low

Documentation Update

Contributor's checklist

Read through contributor's guide
Enough context is provided in the sections above
Adequate tests were added if applicable

yihua · 2026-03-19T03:03:51Z

+ * including file-group initialization, commit, and partition state update.
+ */
+@Slf4j
+public abstract class BaseIndexer implements Indexer {


To break the PR down, could we first add these abstraction classes to ease the reviews?

How about I split changes in the PR into 2 commits, since the indexer abstraction and its implementations involve some refactor of util class, which also affects metadata writer.

yihua

Blocked on my review due to critical logic

yihua

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Style & Readability Review
Style & readability pass — a few naming and duplication nits below, otherwise the code is well-structured and readable. Style & readability pass — a few small naming and simplification nits below, nothing blocking. Style & readability pass — a handful of naming and readability suggestions below, mostly around the new Indexer abstraction and the refactored initialization flow. Style & readability pass — a few naming nits below, primarily around boolean accessor conventions (flag() → isDeleted()), a misleading name() accessor that returns a full path, an unused @Slf4j, and a comment that drifted away from the logic it describes after the refactor. Style & readability pass — a couple of minor naming and simplification opportunities in the test code, plus a comment that looks like it has a stray word and may not match its adjacent assertion. Style & readability pass — a couple of small naming and readability nits across the new test files; production code changes look clean. Style & readability pass — a few naming and consistency nits below, mostly around a stale Javadoc reference, an opaque boolean field name, a parameter/field name mismatch, and a mutability inconsistency across the new data-holder classes.

yihua · 2026-04-05T02:05:34Z

+                reader.close();
+              }
+            } catch (IOException e) {
+              throw new HoodieIOException("Error reading total records from file " + path, e);


🤖 nit: small typo in the Javadoc — "context ot use" should be "context to use".

yihua · 2026-04-05T02:05:34Z

+ */
+@Slf4j
+public class SecondaryIndexer extends BaseIndexer {
+


🤖 nit: RECORD_INDEX_AVERAGE_RECORD_SIZE = 48 is also defined in BaseRecordIndexer. Since SecondaryIndexer can't inherit it, could this constant live in a shared place (e.g. HoodieTableMetadataUtil or a RecordIndexConstants class) to avoid the duplication?

yihua · 2026-04-05T02:05:34Z

+import static org.apache.hudi.metadata.HoodieTableMetadataUtil.createRecordIndexDefinition;
+import static org.apache.hudi.metadata.MetadataPartitionType.RECORD_INDEX;
+
+/**


🤖 nit: the Javadoc says "Implementation of the global RECORD_INDEX index" — but this class is the partitioned variant. RecordIndexer is the global one. Might be worth updating to avoid confusion.

yihua · 2026-04-05T02:05:34Z

+    Map<String, DataPartitionAndRecords> fileGroupCountAndRecordsPairMap = new HashMap<>(partitionFileSlicePairsMap.size());
+    int maxParallelismPerHudiPartition = partitionFileSlicePairsMap.isEmpty()
+        ? 1 : Math.max(1, dataTableWriteConfig.getMetadataConfig().getRecordIndexMaxParallelism() / partitionFileSlicePairsMap.size());
+    int totalFileGroupCount = 0;


🤖 nit: iterating over keySet() and then calling partitionFileSlicePairsMap.get(partition) inside is a classic two-lookup pattern. The second loop in this same method already uses entrySet() — could you align the first loop to do the same?

yihua · 2026-04-05T02:05:34Z

+    DataPartitionAndRecords init = new DataPartitionAndRecords(2, Option.empty(), records);
+    ExposedRecordIndexer indexer = new ExposedRecordIndexer(engineContext, writeConfig, metaClient, init);
+
+    try (MockedStatic<org.apache.hudi.metadata.HoodieTableMetadataUtil> mockedUtil = mockStatic(org.apache.hudi.metadata.HoodieTableMetadataUtil.class)) {


🤖 nit: same fully-qualified org.apache.hudi.metadata.HoodieTableMetadataUtil usage as in TestPartitionedRecordIndexer — an import would clean this up and it repeats on lines 63, 111, and 116 too.

yihua · 2026-04-05T02:05:35Z

+ * A class which represents a directory and the files and directories inside it.
+ * <p>
+ * A {@code PartitionFileInfo} object saves the name of the partition and various properties requires of each file
+ * required for initializing the metadata table. Saving limited properties reduces the total memory footprint when


🤖 nit: the Javadoc references PartitionFileInfo but the class is DirectoryInfo — looks like a stale copy-paste. Could you update it to match?

yihua · 2026-04-05T02:05:35Z

+  private final String partitionPath;
+  private final String fileName;
+  private final boolean flag;
+


🤖 nit: flag is pretty opaque — it's unclear what this boolean represents at the call site. Something like isDeleted or isAdded (or whatever semantic it carries) would read more clearly.

yihua · 2026-04-05T02:05:35Z

+  }
+
+  public static FileInfoAndPartition of(String partition, String path, long size) {
+    return new FileInfoAndPartition(partition, path, size);


🤖 nit: the factory parameter is named path but the backing field is name — could you align them so readers don't have to mentally translate at the call site?

yihua · 2026-04-05T02:05:35Z

+ */
+@AllArgsConstructor
+@Getter
+@Setter


🤖 nit: @Setter makes this mutable, which is inconsistent with FileInfo and FileInfoAndPartition in the same package — both use final fields and no setters. If mutation isn't needed here, have you considered dropping @Setter and making the fields final?

yihua

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Nice work on the Indexer abstraction refactor! This architectural improvement significantly simplifies the metadata initialization path and enhances test coverage, though there are several functional issues regarding initialization edge cases, unmodifiable maps, and serialization that require attention.

yihua · 2026-04-05T02:46:30Z

+      String instantTime) {
+    if (metaClient.getTableConfig().getTableVersion().lesserThan(HoodieTableVersion.EIGHT)) {
+      throw new HoodieNotSupportedException("Table version 6 and below does not support expression index");
+    }


🤖 The error message says "Table version 6 and below" but the check is lesserThan(HoodieTableVersion.EIGHT), which also covers version 7 (0.16.0). Should this say "Table version 7 and below" or "versions before 8"?

yihua · 2026-04-05T02:46:30Z

+    if (partitionToAllFilesMap.isEmpty()) {
+      this.columnsToIndex = Lazy.lazily(Collections::emptyList);
+      return Collections.singletonList(IndexPartitionInitialization.of(fileGroupCount, COLUMN_STATS.getPartitionPath(), engineContext.emptyHoodieData()));
+    }


🤖 Overwriting this.columnsToIndex with an empty list when partitionToAllFilesMap is empty means postInitialization will register the column_stats index definition with zero source fields. Is that intentional? If this indexer instance is reused for a subsequent call with non-empty partitions, the columns would also be lost.

yihua · 2026-04-05T02:46:30Z

+          Map<String, Long> fileNameToSizeMap = partitionInfo.getValue().stream()
+              .collect(Collectors.toMap(FileInfo::fileName, FileInfo::size));
+          return HoodieMetadataPayload.createPartitionFilesRecord(
+              partitionInfo.getKey(), fileNameToSizeMap, Collections.emptyList());


🤖 Calling fileListRecords.count() triggers a full Spark action just for validation. For tables with many partitions, this doubles the initialization cost of this step. Could you validate using partitionToAllFilesMap.size() against partitions.size() instead (which is always true by construction), or remove this check?

partitions is the key set of partitionToAllFilesMap, so partitions.size() is always equals to partitionToAllFilesMap.size(). It seems the validation here is unnecessary for the current code base, and it should be safe to be removed.

yihua · 2026-04-05T02:46:30Z

+    if (partitionFileSlicePairs.isEmpty()) {
+      return Collections.singletonList(IndexPartitionInitialization.of(fileGroupCount, indexName, engineContext.emptyHoodieData()));
+    }
+


🤖 The Lazy wrapper here is created and immediately .get()-ed on the next line, so it doesn't actually provide any lazy evaluation benefit. Could simplify to a direct call: HoodieSchema tableSchema = HoodieTableMetadataUtil.tryResolveSchemaForTable(dataTableMetaClient).orElseThrow(...);

yihua · 2026-04-05T02:46:30Z

      List<String> filesToDelete = filesDeleted.getOrDefault(partition, Collections.emptyList());
      fileDeleteCount[0] += filesToDelete.size();
-      HoodieRecord record = HoodieMetadataPayload.createPartitionFilesRecord(partition, filesToAdd, filesToDelete);
+      Map<String, Long> filesToAddMap = filesToAdd.stream().collect(Collectors.toMap(FileInfo::fileName, FileInfo::size));


🤖 The old signature used Map<String, Long> which inherently prevented duplicate file names. The new List<FileInfo> can potentially contain duplicates, and Collectors.toMap(FileInfo::fileName, FileInfo::size) will throw IllegalStateException on duplicate keys. Could you add a merge function (e.g., Long::max) to handle this defensively, or document why duplicates can't occur?

yihua · 2026-04-05T02:46:30Z

+  @EnumSource(classOf[HoodieTableType])
+  override def testRLIUpsertAndDropIndex(tableType: HoodieTableType): Unit = {
+    val hudiOpts = commonOpts ++ Map(DataSourceWriteOptions.TABLE_TYPE.key -> tableType.name(),
+      HoodieMetadataConfig.ENABLE_METADATA_INDEX_COLUMN_STATS.key -> "true")


🤖 The comment mentions that partition stats should be present, but the assertion checks for a size of 2. If only files and col stats are expected, the count is correct, but the comment might be misleading.

yihua · 2026-04-05T02:46:30Z

+    // FILES partition should always be initialized first if enabled
+    if (!filesPartitionAvailable) {
+      initializeMetadataPartition(FILES, indexerMapForPartitionsToInit.get(FILES),
+          dataTableInstantTime, partitionIdToAllFilesMap, lazyLatestMergedPartitionFileSliceList);


🤖 This will NPE if the metadata table directory exists but the FILES partition is not marked as available.

yihua · 2026-04-05T02:46:30Z

+      this.enabledIndexerMap.remove(RECORD_INDEX);
+      indexerMapForPartitionsToInit.remove(RECORD_INDEX);
+    }



🤖 This will throw UnsupportedOperationException because enabledIndexerMap is unmodifiable.

Fixed, now RECORD_INDEX partition will be filtered during initializing enabledIndexerMap.

yihua · 2026-04-05T02:46:30Z

  protected final transient HoodieEngineContext engineContext;
  @Getter
-  protected final List<MetadataPartitionType> enabledPartitionTypes;
+  protected final transient Map<MetadataPartitionType, Indexer> enabledIndexerMap;


🤖 Making these fields transient breaks the Serializable contract of HoodieTableMetadataWriter.

Marking these fields as transient is intentional. HoodieBackedTableMetadataWriter can be captured in Spark execution closures, so any field reachable from it must either be serializable or excluded from serialization. enabledIndexerMap / expressionIndexRecordGenerator and the existing field engineContext are runtime-only collaborators and are not serializable as part of the Spark closure. So the goal here is to avoid closure-serialization failures, not to make the writer meaningfully deserializable for later reuse.

yihua · 2026-04-05T02:46:30Z

+    }
+    return Collections.unmodifiableMap(Arrays.stream(MetadataPartitionType.getValidValues(dataTableMetaClient.getTableConfig().getTableVersion()))
+        .filter(partitionType ->
+            (partitionType.isMetadataPartitionEnabled(dataTableWriteConfig.getMetadataConfig(), dataTableMetaClient.getTableConfig())


🤖 Is there a versioned getValidValues method added to MetadataPartitionType? This might cause a compilation error.

Yes, there is a getValidValues method accepting a parameter of type HoodieTableVersion.

…dex abstraction

yihua

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Style & Readability Review — The refactoring is clean and well-organized overall. One minor readability concern: the parameter order of the factory method doesn't match the constructor's field order, which could confuse developers. code looks clean overall, with one minor style inconsistency in variable naming within a stream pipeline.

yihua

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Good progress on the Indexer abstraction — the decomposition into per-index-type indexers is clean and should make adding new index types much easier. A few items need attention before merging: FileAndPartitionFlag is missing Serializable (which will cause runtime failures on Spark), and there are edge cases to verify around empty-partition handling, HashMap iteration ordering for partition initialization, and ensuring hasPartitionsStateChanged is set correctly when initialization is skipped.

yihua · 2026-04-07T23:35:59Z

+        dataTableWriteConfig.getMetadataConfig().getMaxReaderBufferSize(),
+        columnsToIndex.get());
+    return Collections.singletonList(IndexPartitionInitialization.of(fileGroupCount, COLUMN_STATS.getPartitionPath(), records));
+  }


🤖 When columnsToIndex is empty and buildInitialization returns empty data, postInitialization will still be called and will register an index definition with empty source fields. Is that intentional? It seems like registering a column stats index with no source fields could confuse downstream readers.

I've checked the logic of the current master branch, the behavior is same as this pr. Not sure if it's intentional cc @yihua

There is an comment explain this case:

hudi/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java

Lines 592 to 595 in 0591933

if (columnsToIndex.isEmpty()) {

// this can only happen if meta fields are disabled and cols to index is not explicitly overridden.

return Pair.of(columnsToIndex, Pair.of(fileGroupCount, engineContext.emptyHoodieData()));

}

yihua · 2026-04-07T23:36:00Z

+      log.info("Initializing partitioned record index from data partition {}", partition);
+      DataPartitionAndRecords dataPartitionAndRecords = initializeRecordIndexPartition(partition, fileSliceAndPartitions, maxParallelismPerHudiPartition);
+      fileGroupCountAndRecordsPairMap.put(partition, dataPartitionAndRecords);
+      totalFileGroupCount += dataPartitionAndRecords.numFileGroups();


🤖 When partitionFileSlicePairsMap is empty (e.g., empty table), totalFileGroupCount will be 0 and initializationList will be empty, but the code still returns an IndexPartitionInitialization with 0 file groups. Could this cause issues downstream? The non-partitioned RecordIndexer path goes through estimateFileGroupCount which guarantees at least 1 file group via Math.max(1, ...). Is an empty-table scenario expected to be gated earlier in the call chain?

The same logic exists in the master branch, and I think it's indeed an issue. If partitionFileSlicePairsMap is empty, it still returns a non-empty buildInitialization() result with totalFileGroupCount = 0 and no DataPartitionAndRecords. That means the writer will not skip initialization; instead it can mark the record index partition as available without creating any file groups.
I think we should early return a empty initialization result if partitionFileSlicePairsMap is empty, then metadata writer will skip initializing the partitioned RLI. WDYT? @yihua

yihua · 2026-04-07T23:36:02Z

+  @Override
+  public void postInitialization(HoodieTableMetaClient metadataMetaClient, HoodieData<HoodieRecord> records, int fileGroupCount, String relativePartitionPath) {
+    super.postInitialization(metadataMetaClient, records, fileGroupCount, relativePartitionPath);
+    // Validate record index after commit if validation is enabled


🤖 Worth noting that since records passed to postInitialization is the union of all per-partition records (built in initializeMetadataPartition), calling records.unpersist() on a Spark union RDD does not automatically unpersist the parent RDDs that were individually persisted inside estimateFileGroupCount. For non-partitioned RLI this is fine (single source), but for partitioned RLI, the individual per-partition records that were persisted may leak. The old code had the same pattern though, so it's not a regression.

codecov-commenter · 2026-04-08T13:00:17Z

Codecov Report

❌ Patch coverage is 89.26441% with 54 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.81%. Comparing base (fd20018) to head (e018541).
⚠️ Report is 4 commits behind head on master.

Files with missing lines	Patch %	Lines
.../hudi/metadata/index/record/BaseRecordIndexer.java	71.84%	25 Missing and 4 partials ⚠️
...hudi/metadata/HoodieBackedTableMetadataWriter.java	92.42%	3 Missing and 2 partials ⚠️
...dex/UnsupportedExpressionIndexRecordGenerator.java	42.85%	4 Missing ⚠️
...i/metadata/index/expression/ExpressionIndexer.java	87.87%	1 Missing and 3 partials ⚠️
...org/apache/hudi/metadata/index/IndexerFactory.java	87.50%	2 Missing and 1 partial ⚠️
...ata/index/SparkExpressionIndexRecordGenerator.java	78.57%	2 Missing and 1 partial ⚠️
...udi/metadata/index/secondary/SecondaryIndexer.java	90.47%	0 Missing and 2 partials ⚠️
.../apache/hudi/metadata/HoodieTableMetadataUtil.java	93.33%	2 Missing ⚠️
...metadata/index/columnstats/ColumnStatsIndexer.java	96.96%	0 Missing and 1 partial ⚠️
...etadata/index/record/PartitionedRecordIndexer.java	96.66%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #18348      +/-   ##
============================================
+ Coverage     68.76%   68.81%   +0.04%     
- Complexity    28086    28200     +114     
============================================
  Files          2451     2471      +20     
  Lines        134839   135110     +271     
  Branches      16328    16367      +39     
============================================
+ Hits          92718    92970     +252     
- Misses        34793    34795       +2     
- Partials       7328     7345      +17

Flag	Coverage Δ
common-and-other-modules	`44.54% <59.24%> (+0.09%)`	⬆️
hadoop-mr-java-client	`44.92% <54.64%> (-0.01%)`	⬇️
spark-client-hadoop-common	`48.41% <44.80%> (-0.08%)`	⬇️
spark-java-tests	`48.92% <76.64%> (+0.10%)`	⬆️
spark-scala-tests	`45.51% <69.26%> (+0.01%)`	⬆️
utilities	`38.29% <66.66%> (+0.06%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...oodieBackedTableMetadataWriterTableVersionSix.java	`76.41% <100.00%> (+0.73%)`	⬆️
...apache/hudi/metadata/HoodieMetadataWriteUtils.java	`83.44% <100.00%> (+0.84%)`	⬆️
.../metadata/SecondaryIndexRecordGenerationUtils.java	`88.27% <100.00%> (ø)`
...va/org/apache/hudi/metadata/index/BaseIndexer.java	`100.00% <100.00%> (ø)`
...tadata/index/bloomfilters/BloomFiltersIndexer.java	`100.00% <100.00%> (ø)`
...apache/hudi/metadata/index/files/FilesIndexer.java	`100.00% <100.00%> (ø)`
...data/index/model/IndexPartitionInitialization.java	`100.00% <100.00%> (ø)`
...ta/index/partitionstats/PartitionStatsIndexer.java	`100.00% <100.00%> (ø)`
...ache/hudi/metadata/index/record/RecordIndexer.java	`100.00% <100.00%> (ø)`
...c/main/java/org/apache/hudi/table/HoodieTable.java	`89.49% <100.00%> (ø)`
... and 23 more

... and 25 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hudi-bot · 2026-04-08T13:02:46Z

CI report:

e018541 Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

cshuo · 2026-04-09T03:06:27Z

@yihua all comments are resolved or replied. Can you take another look.

yihua

🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.

Nice updates — all the feedback from the previous review has been addressed cleanly:

FileAndPartitionFlag Serializable (BUG): Fixed. All model POJOs (FileAndPartitionFlag, FileInfo, FileInfoAndPartition, FileSliceAndPartition, DirectoryInfo) now implement Serializable with explicit serialVersionUID.
hasPartitionsStateChanged set unconditionally (BUG): Fixed. The flag is now set at the end of initializeMetadataPartition, after the early-return guard for empty initialization lists, so it's only set when work actually happens.
Empty filesToIndex causing zero parallelism (BUG): Fixed. The emptiness check in ExpressionIndexer now correctly guards on filesToIndex.isEmpty() instead of partitionFileSlicePairs.isEmpty().
records.unpersist() not recursive (QUESTION): Fixed. Changed to unpersistWithDependencies() to clean up per-partition persisted datasets.
Empty columnsToIndex registering empty source fields (QUESTION): Fixed. postInitialization now checks records.isEmpty() and registers with an empty list when there are no records.
SecondaryIndexer partition type and HashMap ordering questions were answered by the author with reasonable justifications.

yihua reviewed Mar 19, 2026

View reviewed changes

cshuo force-pushed the indexer_abstraction branch from d2b4e7e to 116a487 Compare March 19, 2026 03:50

github-actions bot added the size:XL PR with lines of changes > 1000 label Mar 19, 2026

cshuo force-pushed the indexer_abstraction branch 2 times, most recently from 42e6f52 to ec26eb4 Compare March 20, 2026 02:25

cshuo changed the title ~~[WIP] feat(index): Add Indexer abstraction and refactor metadata table init…~~ feat(index): Add Indexer abstraction and refactor metadata table init… Mar 20, 2026

yihua requested changes Mar 20, 2026

View reviewed changes

cshuo force-pushed the indexer_abstraction branch 3 times, most recently from bdfa722 to 3237e4d Compare March 23, 2026 02:09

cshuo requested review from danny0405 and vinothchandar March 23, 2026 02:10

cshuo force-pushed the indexer_abstraction branch 3 times, most recently from 08eecef to db53699 Compare March 25, 2026 09:03

cshuo mentioned this pull request Mar 25, 2026

feat(index): Refactor metadata table update logic based on index abstraction #18372

Open

3 tasks

cshuo force-pushed the indexer_abstraction branch from db53699 to 8a03ea2 Compare March 26, 2026 03:42

yihua reviewed Apr 5, 2026

View reviewed changes

cshuo added 2 commits April 7, 2026 15:30

feat(index): Add Indexer abstraction and related POJOs

2ce9236

feat(index): Refactor metadata table initialization logic based on in…

fcdb880

…dex abstraction

cshuo force-pushed the indexer_abstraction branch 2 times, most recently from 87975f4 to 397cf29 Compare April 7, 2026 10:03

fix comment

e4ee525

cshuo force-pushed the indexer_abstraction branch from 397cf29 to e4ee525 Compare April 7, 2026 12:28

yihua reviewed Apr 7, 2026

View reviewed changes

Comment thread hudi-common/src/main/java/org/apache/hudi/metadata/model/FileSliceAndPartition.java

Comment thread ...di-client-common/src/main/java/org/apache/hudi/metadata/HoodieBackedTableMetadataWriter.java

yihua reviewed Apr 7, 2026

View reviewed changes

yihua mentioned this pull request Apr 7, 2026

[OSS PR #18348] feat(index): Add Indexer abstraction and refactor metadata table init… yihua/hudi#26

Open

cshuo added 2 commits April 8, 2026 17:26

fix comment

7ffff85

fix test

e018541

yihua reviewed Apr 14, 2026

View reviewed changes

	if (columnsToIndex.isEmpty()) {
	// this can only happen if meta fields are disabled and cols to index is not explicitly overridden.
	return Pair.of(columnsToIndex, Pair.of(fileGroupCount, engineContext.emptyHoodieData()));
	}

Conversation

cshuo commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the issue this Pull Request addresses

Summary and Changelog

Commit 1: Add Indexer abstraction and related POJOs

Commit 2: Refactor metadata table initialization to use the abstraction

Impact

Risk Level

Documentation Update

Contributor's checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yihua left a comment

Choose a reason for hiding this comment

Uh oh!

yihua left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yihua left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cshuo Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yihua left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cshuo commented Mar 19, 2026 •

edited

Loading

Commit 1: Add `Indexer` abstraction and related POJOs

cshuo Apr 7, 2026 •

edited

Loading

cshuo Apr 8, 2026 •

edited

Loading