Skip to content

[FLINK-39368][connector-http] Add User-Agent configuration support for HTTP sink and lookup#34

Open
featzhang wants to merge 1 commit into
apache:mainfrom
featzhang:feature/add-user-agent-config
Open

[FLINK-39368][connector-http] Add User-Agent configuration support for HTTP sink and lookup#34
featzhang wants to merge 1 commit into
apache:mainfrom
featzhang:feature/add-user-agent-config

Conversation

@featzhang
Copy link
Copy Markdown
Member

@featzhang featzhang commented Mar 15, 2026

What is the purpose of the change

Add a dedicated http.user.agent option so that requests issued by the
HTTP connector carry an identifiable User-Agent header out of the box,
for both the lookup source and the async sink.

Without this option, the underlying Java HttpClient sends requests
without any User-Agent header, which makes server-side logging,
rate limiting and access auditing harder. Users could already override
the header via http.source.lookup.header.User-Agent /
http.sink.header.User-Agent, but there was no sensible default.

Brief change log

  • Add http.user.agent ConfigOption to HttpLookupConnectorOptions
    and HttpDynamicSinkConnectorOptions. Default value is
    flink-connector-http (matches the connector artifact identifier).
  • Apply the option in RequestFactoryBase (lookup) and
    JavaNetSinkHttpClient (sink).
  • Reject setting User-Agent simultaneously via http.user.agent and
    via the corresponding http.*.header.User-Agent entry with a clear
    IllegalArgumentException.
  • Register both SINK_HEADERS / SOURCE_LOOKUP_HEADERS (from
    FLINK-HTTP-3) and the new USER_AGENT option in the factory
    optionalOptions() lists.
  • Document the new option in the lookup and sink options tables and
    add a short User-Agent subsection with an example DDL.
  • Tests cover default value, custom value and the conflict path.

Verifying this change

  • JavaNetSinkHttpClientTest#shouldBuildClientWithoutHeaders now
    asserts that the default User-Agent: flink-connector-http is
    attached when no header configuration is given.
  • JavaNetSinkHttpClientTest#shouldBuildClientWithCustomUserAgent and
    JavaNetHttpPollingClientTest#shouldBuildClientWithCustomUserAgent
    assert that a user-provided value is honoured.
  • JavaNetSinkHttpClientTest#shouldThrowErrorWhenUserAgentConflict
    and the lookup counterpart assert that the conflict detection fires
    with a helpful message.

Does this pull request potentially affect one of the following parts:

  • Dependencies: no
  • The public API: yes — new http.user.agent ConfigOption
  • The runtime per-record code paths: only by attaching one extra
    header to outgoing requests
  • Anything that affects deployment or recovery: no

Documentation

  • Does this pull request introduce a new feature: yes
  • If yes, how is the feature documented: in
    docs/content/docs/connectors/table/http.md and the content.zh
    equivalent, under the lookup options table, sink options table and
    a dedicated User-Agent subsection

@featzhang featzhang force-pushed the feature/add-user-agent-config branch from a6f4c6a to 5e11e4b Compare March 15, 2026 03:15
@featzhang featzhang force-pushed the feature/add-user-agent-config branch from 5e11e4b to 0c95cb3 Compare March 31, 2026 11:22
@featzhang featzhang changed the title [connector-HTTP] Add User-Agent configuration support [FLINK-39368][connector-http] Add User-Agent configuration support for HTTP sink and lookup Mar 31, 2026
Copy link
Copy Markdown
Contributor

@davidradl davidradl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add documentation

@featzhang
Copy link
Copy Markdown
Member Author

Hi @davidradl,

I've added comprehensive documentation for the http.user.agent configuration option in commit dddfa5b.

What's Added:

  1. New "User-Agent Configuration" section - Added detailed explanation after the configuration options table in both English and Chinese documentation
  2. Usage scenarios - Explained when and why to use this option
  3. Configuration examples - Provided complete SQL examples for both lookup and sink tables
  4. Important notes - Documented the conflict detection behavior and format recommendations

The documentation now includes:

  • Clear explanation of default behavior (flink-connector-http)
  • Real-world configuration examples showing the option in context
  • Warning about not setting User-Agent in both http.user.agent and http.headers
  • HTTP User-Agent format best practices

Please let me know if you'd like any additional documentation or clarification!

@featzhang featzhang force-pushed the feature/add-user-agent-config branch from dddfa5b to e4c988d Compare May 3, 2026 10:09
@featzhang
Copy link
Copy Markdown
Member Author

Thanks for the review. Pushed a rebuilt commit that squashes the history into a single clean commit on top of current main and addresses the outstanding feedback:

  • Default value is flink-connector-http (per your earlier comment on the getindata identifier).
  • Conflict with http.sink.header.User-Agent / http.source.lookup.header.User-Agent is now rejected at table creation time with a clear IllegalArgumentException, instead of silently letting one value win.
  • Rebased onto main, so both the new USER_AGENT option and the FLINK-HTTP-3 map-type header options (SINK_HEADERS, SOURCE_LOOKUP_HEADERS) are registered in the factories side by side — no regression on FLINK-HTTP-3.
  • Fixed the Java 17 CI failure (missing assertThatThrownBy static import in the two new tests).
  • Added documentation: a row in the lookup and sink options tables plus a short User-Agent subsection with an example DDL in both content and content.zh.

Local mvn spotless:check and mvn test -pl flink-connector-http are green (396 tests, 0 failures, 0 errors).

Introduce a new `http.user.agent` option for both the HTTP lookup
source and the HTTP sink so that every outgoing request carries an
identifiable User-Agent header by default.

Behaviour:
* Default value is `flink-connector-http` (matches the connector
  artifact identifier).
* Setting User-Agent simultaneously via `http.user.agent` and via
  the existing per-prefix header option
  (`http.source.lookup.header.User-Agent` or
  `http.sink.header.User-Agent`) is rejected at table creation
  time with an IllegalArgumentException, to avoid silently losing
  one of the two values.

Documentation:
* Add the new option to the lookup and sink options tables.
* Add a dedicated `User-Agent` subsection with an example DDL.

Tests:
* `JavaNetSinkHttpClientTest` and `JavaNetHttpPollingClientTest`
  cover the default value, a custom value and the conflict path.
@featzhang featzhang force-pushed the feature/add-user-agent-config branch from e4c988d to 31fc409 Compare May 3, 2026 10:27
@featzhang
Copy link
Copy Markdown
Member Author

Correction on my previous comment: the force-pushed commit is now properly based on the current apache/flink-connector-http main (HEAD 31fc409, on top of 813fd5d [FLINK-39364] Add request timeout). The earlier push was accidentally rebased onto my own fork's main, which contained some FLINK-HTTP-3 work that is not yet merged upstream — that's what GitHub reported as CONFLICTING. PR is now MERGEABLE and CI has been triggered.

@featzhang
Copy link
Copy Markdown
Member Author

Hi @davidradl, friendly ping — all three of your review threads on this PR are now marked as resolved and the latest commit (31fc409) addresses each of them:

Thread Fix
JavaNetSinkHttpClient.java L76 — "why a dedicated config, prefix headers already work" Kept the dedicated ConfigOption so there is a named, documented entry point and a sensible default (requests no longer go out with no User-Agent at all).
HttpDynamicSinkConnectorOptions.java L51 — "default to flink-connector-http" Default changed to flink-connector-http on both sink and lookup sides.
JavaNetSinkHttpClientTest.java L143 — "config error on User-Agent clash" Setting User-Agent via both http.user.agent and `http.(sink

Plus: documentation rows + a User-Agent subsection added in docs/content and docs/content.zh, Java 17 compile fix, rebased on current main (813fd5d).

CI (GitHub Actions Java 11 + 17) is green. PR is MERGEABLE. Would you mind taking another look when you have a moment? Happy to iterate further if anything still looks off.


Setting `User-Agent` through both `http.user.agent` and `http.source.lookup.header.User-Agent`
(or `http.sink.header.User-Agent` for the sink) is ambiguous and is rejected at table creation time
with an `IllegalArgumentException`. Configure it through exactly one of the two options.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove the word exactly

| http.request.query-param-fields-with-key | optional | A map of column names to query parameter keys. See the [Query Parameter Mapping](#query-parameter-mapping) section for details and examples. |
| http.request.body-template | optional | Used for the `http-generic-json-url` query creator. A JSON template string for constructing the request body for PUT and POST operations. Use `{{fieldName}}` placeholders to reference top-level columns from the lookup table. Supports creating complex nested JSON structures with both placeholders and literal values. See the [Body Template](#body-template) section for details and examples. |
| http.request.url-map | optional | Used for the `http-generic-json-url` query creator. The map of insert names to column names used as url segments. Parses a string as a map of strings. For example if there are table columns called `customerId` and `orderId`, then specifying value `customerId:cid,orderID:oid` and a url of https://myendpoint/customers/{cid}/orders/{oid} will mean that the url used for the lookup query will dynamically pickup the values for `customerId`, `orderId` and use them in the url e.g. https://myendpoint/customers/cid1/orders/oid1. The expected format of the map is: `key1:value1,key2:value2`. |
| http.user.agent | optional | The value of the `User-Agent` HTTP header sent with every lookup request. Defaults to `flink-connector-http`. Setting this option together with `http.source.lookup.header.User-Agent` is not allowed and will fail with a configuration error. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should put the new option near the top around url, so we start with all the common configuration options then specialize to the sink or lookup appropriately when we list them

properties.getProperty(
HttpDynamicSinkConnectorOptions.USER_AGENT.key(),
HttpDynamicSinkConnectorOptions.USER_AGENT.defaultValue());
if (userAgent != null && !userAgent.isEmpty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

userAgent will always have a value here - as it will be the user specified one or the default. So there is no way to specify http.sink.header.User-Agent and have it work after this change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants