Skip to content

feat: Add matillion data cloud support#26773

Merged
harshach merged 10 commits intomainfrom
matillion_dpc
Mar 30, 2026
Merged

feat: Add matillion data cloud support#26773
harshach merged 10 commits intomainfrom
matillion_dpc

Conversation

@ulixius9
Copy link
Copy Markdown
Member

@ulixius9 ulixius9 commented Mar 25, 2026

Describe your changes:

Fixes

I worked on ... because ...

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Summary by Gitar

  • New auth config:
    • Added MatillionDPCAuth class with OAuth2 client credentials and personal access token support
    • Added lineageLookbackDays configuration parameter (1-365 days, default 30) for OpenLineage API
  • Connector updates:
    • Updated MatillionConnectionClassConverter to support both MatillionETLAuth and MatillionDPCAuth
    • Added new test class MatillionConnectionClassConverterTest with ETL/DPC auth conversion tests
  • Schema and types:
    • Created matillionDPC.json schema with region enum (us1, eu1)
    • Updated matillionConnection.json to include DPC auth as oneOf option
    • Regenerated TypeScript types across multiple API and entity files

This will update automatically on new commits.

Copilot AI review requested due to automatic review settings March 25, 2026 18:05
@ulixius9 ulixius9 self-assigned this Mar 25, 2026
@github-actions github-actions bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Mar 25, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds schema and backend secrets-conversion support for connecting to Matillion Data Productivity Cloud (DPC) alongside the existing Matillion ETL connection type.

Changes:

  • Extend MatillionConnection to allow a new MatillionDPC auth config and introduce lineageLookbackDays.
  • Add a new JSON schema defining the Matillion DPC authentication configuration (OAuth2 client credentials or PAT).
  • Update the backend MatillionConnectionClassConverter to convert/mask/unmask secrets for the new DPC auth type.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
openmetadata-spec/src/main/resources/json/schema/entity/services/connections/pipeline/matillionConnection.json Adds MatillionDPC as a supported connection option and introduces lineageLookbackDays for DPC OpenLineage lookback.
openmetadata-spec/src/main/resources/json/schema/entity/services/connections/pipeline/matillion/matillionDPC.json New schema defining Matillion DPC auth fields (client credentials, PAT, region).
openmetadata-service/src/main/java/org/openmetadata/service/secrets/converter/MatillionConnectionClassConverter.java Ensures secrets conversion supports both MatillionETLAuth and the new MatillionDPCAuth.

"format": "password"
}
},
"required": [],
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MatillionDPC auth config currently has an empty required list, which means an empty object (or one with only region) can pass schema validation. This will allow invalid connections to be saved and then fail later at runtime. Consider enforcing that either personalAccessToken is provided, or both clientId and clientSecret are provided (and ideally disallow providing both auth methods at once).

Suggested change
"required": [],
"required": [],
"anyOf": [
{
"required": [
"personalAccessToken"
]
},
{
"required": [
"clientId",
"clientSecret"
]
}
],

Copilot uses AI. Check for mistakes.
Comment on lines 28 to +49
@@ -36,10 +39,18 @@
"$ref": "../../../../type/filterPattern.json#/definitions/filterPattern",
"title": "Default Pipeline Filter Pattern"
},
"lineageLookbackDays": {
"title": "Lineage Lookback Days",
"description": "Number of days to look back when fetching lineage events from Matillion DPC OpenLineage API.",
"type": "integer",
"default": 30,
"minimum": 1,
"maximum": 365
},
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New schema fields (MatillionDPC auth option and lineageLookbackDays) aren’t covered by the existing configuration parsing tests for Matillion. Adding a unit test that parses a valid DPC config (PAT and/or client credentials) and asserts validation failures for missing credentials / out-of-range lineageLookbackDays would prevent regressions in schema-to-model generation and validation behavior.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

✅ TypeScript Types Auto-Updated

The generated TypeScript types have been automatically updated based on JSON schema changes in this PR.

@github-actions github-actions bot requested a review from a team as a code owner March 25, 2026 18:11
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 25, 2026

🟡 Playwright Results — all passed (18 flaky)

✅ 3421 passed · ❌ 0 failed · 🟡 18 flaky · ⏭️ 216 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 454 0 1 2
🟡 Shard 2 603 0 5 32
🟡 Shard 3 612 0 2 27
🟡 Shard 4 611 0 6 47
✅ Shard 5 587 0 0 67
🟡 Shard 6 554 0 4 41
🟡 18 flaky test(s) (passed on retry)
  • Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
  • Features/ActivityFeed.spec.ts › Mention notification shows correct user details in Notification box (shard 2, 1 retry)
  • Features/DataQuality/TestCaseImportExportE2eFlow.spec.ts › Admin: Complete export-import-validate flow (shard 2, 1 retry)
  • Features/DataQuality/TestCaseIncidentPermissions.spec.ts › User with only VIEW cannot PATCH incidents (shard 2, 1 retry)
  • Features/DataQuality/TestCaseResultPermissions.spec.ts › User with TEST_CASE.EDIT_ALL can see edit action on test case (shard 2, 1 retry)
  • Features/Glossary/GlossaryHierarchy.spec.ts › should cancel move operation (shard 2, 1 retry)
  • Features/Permissions/GlossaryPermissions.spec.ts › Team-based permissions work correctly (shard 3, 1 retry)
  • Flow/ExploreDiscovery.spec.ts › Should display deleted assets when showDeleted is checked and deleted is not present in queryFilter (shard 3, 1 retry)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Table (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Topic (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Add and update Security and SLA tabs (shard 4, 1 retry)
  • Pages/DomainAdvanced.spec.ts › Admin can edit data product description (shard 4, 1 retry)
  • Pages/Domains.spec.ts › Multiple consecutive domain renames preserve all associations (shard 4, 1 retry)
  • Pages/HyperlinkCustomProperty.spec.ts › should accept valid http and https URLs (shard 6, 1 retry)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)
  • Pages/Users.spec.ts › Check permissions for Data Steward (shard 6, 1 retry)
  • VersionPages/EntityVersionPages.spec.ts › Directory (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

harshsoni2024
harshsoni2024 previously approved these changes Mar 26, 2026
Copilot AI review requested due to automatic review settings March 26, 2026 06:06
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 15 changed files in this pull request and generated 1 comment.

Comment on lines +35 to +36
matillionConnection.getConnection(),
List.of(MatillionETLAuth.class, MatillionDPCAuth.class))
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation in this multi-line tryToConvertOrFail(...) call doesn’t match the project’s standard Java formatting (and is likely to be rewritten by Spotless / google-java-format). Please reformat this block (e.g., by running mvn spotless:apply) so the continuation indentation is consistent and CI formatting checks don’t fail.

Suggested change
matillionConnection.getConnection(),
List.of(MatillionETLAuth.class, MatillionDPCAuth.class))
matillionConnection.getConnection(),
List.of(MatillionETLAuth.class, MatillionDPCAuth.class))

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 26, 2026

OpenMetadata Service New-Code Coverage

PASS. Required changed-line coverage: 90.00% overall and per touched production file.

  • Overall executable changed lines: 3/3 covered (100.00%)
  • Missed executable changed lines: 0
  • Non-executable changed lines ignored by JaCoCo: 1
  • Changed production files: 1
File Covered Missed Executable Non-exec Coverage Uncovered lines
openmetadata-service/src/main/java/org/openmetadata/service/secrets/converter/MatillionConnectionClassConverter.java 3 0 3 1 100.00% -

Only changed executable lines under openmetadata-service/src/main/java are counted. Test files, comments, imports, and non-executable lines are excluded.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 26, 2026

Jest test Coverage

UI tests summary

Lines Statements Branches Functions
Coverage: 64%
65% (58732/90354) 44.75% (30898/69045) 47.89% (9318/19455)

Copilot AI review requested due to automatic review settings March 27, 2026 10:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 18 changed files in this pull request and generated no new comments.

Copilot AI review requested due to automatic review settings March 27, 2026 13:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 18 changed files in this pull request and generated 1 comment.

Comment on lines 18 to 36
"properties": {
"type": {
"title": "Service Type",
"description": "Service Type",
"$ref": "#/definitions/matillionType",
"default": "Matillion"
},
"connection": {
"title": "Matillion Connection",
"description": "Matillion Auth Configuration",
"oneOf": [
{
"$ref": "matillion/matillionETL.json"
},
{
"$ref": "matillion/matillionDPC.json"
}
]
},
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MatillionConnection does not declare any required fields (unlike most other pipeline connection schemas). With the new oneOf for connection, this means the top-level config can omit connection entirely and still validate, which is likely not a usable Matillion configuration. Consider adding a required list (at least connection, and possibly type if intended) to prevent silently-accepted invalid configs.

Copilot uses AI. Check for mistakes.
@gitar-bot
Copy link
Copy Markdown

gitar-bot bot commented Mar 30, 2026

Code Review ⚠️ Changes requested 1 resolved / 5 findings

Adds Matillion Data Cloud support but has three blocking issues: lineageLookbackDays is DPC-specific on a shared connection, applyCertification() passes null for audit fields appliedAt/appliedBy, and DataAssetsCoverage incorrectly parses FQN fields as integers instead of strings.

⚠️ Bug: applyCertification passes null for appliedAt/appliedBy audit fields

In applyCertification(), the applyTag call passes null for both appliedBy and appliedAt parameters (lines 4537-4538). However, getCertification() at line 4506 reads tagLabel.getAppliedAt() to populate AssetCertification.appliedDate. Since these are never set on create, the appliedDate will always be null for certifications applied via the normal (non-migration) code path. The migration code in MigrationUtil.java correctly sets appliedAt and appliedBy='admin', creating an inconsistency between migrated and newly created certifications.

Suggested fix
Pass the current timestamp and the updating user to applyTag:
    daoCollection
        .tagUsageDAO()
        .applyTag(
            TagLabel.TagSource.CLASSIFICATION.ordinal(),
            tagLabel.getTagFQN(),
            tagLabel.getTagFQN(),
            entity.getFullyQualifiedName(),
            TagLabel.LabelType.AUTOMATED.ordinal(),
            TagLabel.State.CONFIRMED.ordinal(),
            SecurityContext.getCurrentUser(),
            new Timestamp(System.currentTimeMillis()),
            metadata);
⚠️ Bug: DataAssetsCoverage parses FQN fields as integer counts

In DataAssetsCoveragePieChartWidget, lines 94-95 use parseInt(coverageData[0].originEntityFQN, 10) and parseInt(totalData[0].fullyQualifiedName, 10) to extract numeric counts. These are parsing entity FQN string fields as integers, which is fragile—if the API response shape changes or the fields contain actual FQN strings (e.g., "service.db.schema"), parseInt will return NaN. This suggests the API returns aggregation results where numeric values are placed in FQN-named fields, but there's no validation for NaN before using these values in arithmetic (line 99: total - covered).

Suggested fix
Add NaN guards after parsing:
  const covered = parseInt(coverageData[0].originEntityFQN, 10);
  let total = parseInt(totalData[0].fullyQualifiedName, 10);
  if (isNaN(covered) || isNaN(total)) {
    setDataAssetsCoverageStates(INITIAL_DATA_ASSETS_COVERAGE_STATES);
    return;
  }
💡 Quality: lineageLookbackDays is DPC-specific but on shared connection

📄 openmetadata-spec/src/main/resources/json/schema/entity/services/connections/pipeline/matillionConnection.json:42-49

The lineageLookbackDays property is placed at the top-level matillionConnection.json schema, meaning it will be visible for both ETL and DPC connection types. However, its description explicitly says it's for the "Matillion DPC OpenLineage API." If ETL connections don't use lineage lookback, this could confuse users configuring an ETL connection. Consider either:

  1. Moving it inside the matillionDPC.json schema, or
  2. Updating the description to be generic if it applies to both types.
Suggested fix
Either move lineageLookbackDays into matillionDPC.json, or update the description to not mention DPC specifically if it applies to both connection types.
💡 Performance: getCertification queries tag_usage on every entity read

The new getCertification() method executes a SQL query (getCertTagsInternalBatch) per entity on every read. For list/search operations that return many entities, this adds N additional DB queries. The batch-oriented setFieldsInBulk pattern is used elsewhere (e.g., tags, owners) to prefetch in bulk, but certification reads don't appear to have a batch path yet. This may cause N+1 query performance issues on entity listing endpoints.

Suggested fix
Consider adding a batch certification fetch in `setFieldsInBulk` that collects all entity FQNs and calls `getCertTagsInternalBatch` once, similar to how tags are batch-loaded.
✅ 1 resolved
Edge Case: MatillionDPC schema has no required fields, allowing empty config

📄 openmetadata-spec/src/main/resources/json/schema/entity/services/connections/pipeline/matillion/matillionDPC.json:44
The matillionDPC.json schema defines "required": [], meaning a user can create a Matillion DPC connection with no credentials at all — no clientId/clientSecret and no personalAccessToken. This contrasts with the ETL schema which requires hostPort, username, and password.

Since DPC supports two auth methods (OAuth2 client credentials vs. personal access token), at minimum one set should be required. Without any validation, a user could save an empty DPC config and only discover the problem at runtime during ingestion.

Consider using a oneOf to enforce that either OAuth2 credentials or PAT are provided, or at minimum require the type field so the oneOf discriminator in the parent schema can reliably distinguish between ETL and DPC connections.

🤖 Prompt for agents
Code Review: Adds Matillion Data Cloud support but has three blocking issues: `lineageLookbackDays` is DPC-specific on a shared connection, `applyCertification()` passes null for audit fields `appliedAt`/`appliedBy`, and `DataAssetsCoverage` incorrectly parses FQN fields as integers instead of strings.

1. 💡 Quality: lineageLookbackDays is DPC-specific but on shared connection
   Files: openmetadata-spec/src/main/resources/json/schema/entity/services/connections/pipeline/matillionConnection.json:42-49

   The `lineageLookbackDays` property is placed at the top-level `matillionConnection.json` schema, meaning it will be visible for both ETL and DPC connection types. However, its description explicitly says it's for the "Matillion DPC OpenLineage API." If ETL connections don't use lineage lookback, this could confuse users configuring an ETL connection. Consider either:
   1. Moving it inside the `matillionDPC.json` schema, or
   2. Updating the description to be generic if it applies to both types.

   Suggested fix:
   Either move lineageLookbackDays into matillionDPC.json, or update the description to not mention DPC specifically if it applies to both connection types.

2. ⚠️ Bug: applyCertification passes null for appliedAt/appliedBy audit fields

   In `applyCertification()`, the `applyTag` call passes `null` for both `appliedBy` and `appliedAt` parameters (lines 4537-4538). However, `getCertification()` at line 4506 reads `tagLabel.getAppliedAt()` to populate `AssetCertification.appliedDate`. Since these are never set on create, the appliedDate will always be null for certifications applied via the normal (non-migration) code path. The migration code in MigrationUtil.java correctly sets `appliedAt` and `appliedBy='admin'`, creating an inconsistency between migrated and newly created certifications.

   Suggested fix:
   Pass the current timestamp and the updating user to applyTag:
       daoCollection
           .tagUsageDAO()
           .applyTag(
               TagLabel.TagSource.CLASSIFICATION.ordinal(),
               tagLabel.getTagFQN(),
               tagLabel.getTagFQN(),
               entity.getFullyQualifiedName(),
               TagLabel.LabelType.AUTOMATED.ordinal(),
               TagLabel.State.CONFIRMED.ordinal(),
               SecurityContext.getCurrentUser(),
               new Timestamp(System.currentTimeMillis()),
               metadata);

3. ⚠️ Bug: DataAssetsCoverage parses FQN fields as integer counts

   In `DataAssetsCoveragePieChartWidget`, lines 94-95 use `parseInt(coverageData[0].originEntityFQN, 10)` and `parseInt(totalData[0].fullyQualifiedName, 10)` to extract numeric counts. These are parsing entity FQN string fields as integers, which is fragile—if the API response shape changes or the fields contain actual FQN strings (e.g., `"service.db.schema"`), `parseInt` will return `NaN`. This suggests the API returns aggregation results where numeric values are placed in FQN-named fields, but there's no validation for `NaN` before using these values in arithmetic (line 99: `total - covered`).

   Suggested fix:
   Add NaN guards after parsing:
     const covered = parseInt(coverageData[0].originEntityFQN, 10);
     let total = parseInt(totalData[0].fullyQualifiedName, 10);
     if (isNaN(covered) || isNaN(total)) {
       setDataAssetsCoverageStates(INITIAL_DATA_ASSETS_COVERAGE_STATES);
       return;
     }

4. 💡 Performance: getCertification queries tag_usage on every entity read

   The new `getCertification()` method executes a SQL query (`getCertTagsInternalBatch`) per entity on every read. For list/search operations that return many entities, this adds N additional DB queries. The batch-oriented `setFieldsInBulk` pattern is used elsewhere (e.g., tags, owners) to prefetch in bulk, but certification reads don't appear to have a batch path yet. This may cause N+1 query performance issues on entity listing endpoints.

   Suggested fix:
   Consider adding a batch certification fetch in `setFieldsInBulk` that collects all entity FQNs and calls `getCertTagsInternalBatch` once, similar to how tags are batch-loaded.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 18 changed files in this pull request and generated 1 comment.

@github-actions
Copy link
Copy Markdown
Contributor

Failed to cherry-pick changes to the 1.12.4 branch.
Please cherry-pick the changes manually.
You can find more details here.

@sonarqubecloud
Copy link
Copy Markdown

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch

Projects

Status: Done ✅

Development

Successfully merging this pull request may close these issues.

7 participants