Skip to content

Properly handle bad characters in stream names#350

Merged
matteius merged 2 commits intoopensensor:mainfrom
dpw13:fix/stream-name-spaces
Apr 2, 2026
Merged

Properly handle bad characters in stream names#350
matteius merged 2 commits intoopensensor:mainfrom
dpw13:fix/stream-name-spaces

Conversation

@dpw13
Copy link
Copy Markdown
Contributor

@dpw13 dpw13 commented Apr 1, 2026

  • Factor out path_utils.c:sanitize_stream_name() for path name and MQTT topics
  • Factor out url_utils.c:simple_url_escape() for URLs
  • Call url_decode() in http_request_extract_path_param()
  • Remove duplicate calls to url_decode()
  • Remove duplicate url_decode() prototype in web/web_server.h
  • Use MAX_STREAM_NAME and MAX_PATH_LENGTH consistently instead of integer literals
  • Remove inconsistent MAX_STREAM_NAME_LENGTH from storage_manager.c
  • Add and use MAX_TOPIC_LENGTH in MQTT instead of integer literals

This is a general cleanup of string handling around paths and URLs in addition to specifically handling input validation and encoding of stream names.

I tested locally and was able to confirm that motion detection, stream editing, and live viewing all worked as expected and produced no additional errors. I verified that viewing previous recordings worked, though my previous streams did not include spaces. Users may need to move their recordings to the new recording directory (stream name lowercased with '_' instead of spaces), though I kind of doubt that stream names with spaces were working enough to make recordings at all previously.

Should fix #324.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens and standardizes stream-name handling across LightNVR by centralizing URL escaping, URL decoding, and filesystem-safe stream naming—aiming to make names with spaces/special characters work reliably across HTTP APIs, go2rtc integration, MQTT topics, and recording paths (related to issue #324).

Changes:

  • Centralizes URL escaping (simple_url_escape) and stream-name sanitization for path/topic use (sanitize_stream_name), and replaces many ad-hoc literals with MAX_STREAM_NAME / MAX_PATH_LENGTH.
  • Moves URL decoding responsibility into request helpers (http_request_get_query_param, http_request_extract_path_param) and removes duplicate decoding in handlers.
  • Updates go2rtc URL construction and recording path generation to better tolerate special characters in stream names.

Reviewed changes

Copilot reviewed 37 out of 37 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/unit/test_db_streams.c Updates test buffer sizing to MAX_STREAM_NAME.
src/web/thumbnail_thread.c Replaces hard-coded path buffer sizes with MAX_PATH_LENGTH.
src/web/request_response.c Changes path-param extraction to URL-decode parameters.
src/web/api_handlers_timeline.c Escapes stream name when generating timeline playback URL in manifest.
src/web/api_handlers_system.c Uses MAX_PATH_LENGTH for backup path buffer.
src/web/api_handlers_streams_modify.c Removes per-handler decoding and uses decoded path param directly.
src/web/api_handlers_streams_get.c Removes duplicate decoding; uses decoded path param directly.
src/web/api_handlers_retention.c Uses MAX_STREAM_NAME and removes duplicate decoding.
src/web/api_handlers_recordings_thumbnail.c Uses MAX_PATH_LENGTH for thumbnail paths.
src/web/api_handlers_recordings_list.c Removes redundant query-param decoding (relies on helper).
src/web/api_handlers_recordings_files_backend_agnostic.c Removes redundant query-param decoding; uses MAX_PATH_LENGTH.
src/web/api_handlers_motion.c Uses MAX_PATH_LENGTH and removes duplicate stream-name decoding.
src/video/onvif_motion_recording.c Sanitizes stream name for filesystem recording path creation.
src/video/mp4_recording_core.c Sanitizes stream name for MP4 directory paths.
src/video/motion_storage_manager.c Uses MAX_PATH_LENGTH for recording path arrays.
src/video/go2rtc/go2rtc_stream.c Switches stream-id encoding to simple_url_escape for go2rtc URLs.
src/video/go2rtc/go2rtc_snapshot.c Escapes stream name when calling go2rtc snapshot API.
src/video/go2rtc/go2rtc_integration.c Escapes stream name in go2rtc integration URLs.
src/video/go2rtc/go2rtc_api.c Centralizes escaping for go2rtc API query/path construction.
src/video/buffer_strategy_mmap.c Uses MAX_PATH_LENGTH for mmap file paths.
src/video/buffer_strategy_hls_segment.c Uses MAX_PATH_LENGTH for segment paths.
src/video/buffer_strategy_go2rtc.c Escapes stream name in go2rtc HLS session init URL.
src/storage/storage_manager.c Uses MAX_STREAM_NAME / MAX_PATH_LENGTH; sanitizes directory names.
src/storage/storage_manager_streams.c Uses MAX_STREAM_NAME for stream-name arrays.
src/database/db_streams.c Updates get_all_stream_names signature to MAX_STREAM_NAME.
src/core/url_utils.c Adds simple_url_escape() helper.
src/core/path_utils.c Adds sanitize_stream_name() helper (new file).
src/core/mqtt_client.c Uses centralized sanitize_stream_name and MAX_TOPIC_LENGTH.
src/core/logger.c Uses MAX_PATH_LENGTH for log rotation paths.
src/core/logger_json.c Uses MAX_PATH_LENGTH for JSON log rotation paths.
include/web/web_server.h Removes duplicate url_decode prototype.
include/web/request_response.h Updates docs to reflect URL-decoding behavior in helpers.
include/video/pre_detection_buffer.h Uses MAX_PATH_LENGTH in segment path struct.
include/video/packet_buffer.h Uses MAX_PATH_LENGTH for disk buffer paths.
include/database/db_streams.h Updates get_all_stream_names signature to MAX_STREAM_NAME.
include/core/url_utils.h Declares simple_url_escape.
include/core/path_utils.h Declares sanitize_stream_name (new header).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +9 to +21
void sanitize_stream_name(const char *input, char *output, size_t output_size) {
size_t i = 0;
for (; i < output_size - 1 && input[i] != '\0'; i++) {
char c = input[i];
if (islower(c) || isdigit(c)) {
output[i] = c;
} else if (isupper(c)) {
output[i] = tolower(c);
} else {
output[i] = '_';
}
}
output[i] = '\0';
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanitize_stream_name() is not injective (different names can map to the same sanitized value). Now that this output is used for filesystem paths, two streams can collide into the same directory and overwrite/mix recordings. Consider a reversible encoding for path components, or append a stable unique suffix (DB id / hash) to guarantee uniqueness.

Suggested change
void sanitize_stream_name(const char *input, char *output, size_t output_size) {
size_t i = 0;
for (; i < output_size - 1 && input[i] != '\0'; i++) {
char c = input[i];
if (islower(c) || isdigit(c)) {
output[i] = c;
} else if (isupper(c)) {
output[i] = tolower(c);
} else {
output[i] = '_';
}
}
output[i] = '\0';
static unsigned int sanitize_stream_name_hash(const char *input) {
/* FNV-1a 32-bit hash for a stable, deterministic suffix */
const unsigned int fnv_offset_basis = 2166136261u;
const unsigned int fnv_prime = 16777619u;
unsigned int hash = fnv_offset_basis;
for (const unsigned char *p = (const unsigned char *)input; *p != '\0'; ++p) {
hash ^= *p;
hash *= fnv_prime;
}
return hash;
}
void sanitize_stream_name(const char *input, char *output, size_t output_size) {
if (output_size == 0) {
return;
}
/* First, build the legacy sanitized base name into a temporary buffer. */
char *base = (char *)malloc(output_size);
if (base == NULL) {
/* Fallback: best effort directly into output using legacy behavior. */
size_t i = 0;
for (; i < output_size - 1 && input[i] != '\0'; i++) {
char c = input[i];
if (islower((unsigned char)c) || isdigit((unsigned char)c)) {
output[i] = c;
} else if (isupper((unsigned char)c)) {
output[i] = (char)tolower((unsigned char)c);
} else {
output[i] = '_';
}
}
output[i] = '\0';
return;
}
size_t i = 0;
for (; i < output_size - 1 && input[i] != '\0'; i++) {
char c = input[i];
if (islower((unsigned char)c) || isdigit((unsigned char)c)) {
base[i] = c;
} else if (isupper((unsigned char)c)) {
base[i] = (char)tolower((unsigned char)c);
} else {
base[i] = '_';
}
}
base[i] = '\0';
/* Append a stable hash suffix to make the mapping practically injective. */
unsigned int hash = sanitize_stream_name_hash(input);
(void)snprintf(output, output_size, "%s__%08x", base, hash);
free(base);

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matteius what do you think about this? I realized last night that the current implementation will change the recording path for any stream names with uppercase letters, and the change above will change the recording path for all streams going forward. I do think it's a good change (my comment above even pointed this out already), but I'll leave it up to you whether the change to the recording path is tolerable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the CaseSensitive name and try to be as backwards compatible as we can.

@matteius
Copy link
Copy Markdown
Contributor

matteius commented Apr 1, 2026

@dpw13 let me know if you can tackle any of the co-pilot PR feedback but this looks good, and Ill get it merged.

@dpw13 dpw13 force-pushed the fix/stream-name-spaces branch from af2f37c to fb0f5b1 Compare April 1, 2026 14:11
* Factor out path_utils.c:sanitize_stream_name() for path name and MQTT topics
* Factor out url_utils.c:simple_url_escape() for URLs
* Call url_decode() in http_request_extract_path_param()
* Remove duplicate calls to url_decode()
* Remove duplicate url_decode() prototype in web/web_server.h
* Use MAX_STREAM_NAME and MAX_PATH_LENGTH consistently instead of integer literals
* Remove inconsistent MAX_STREAM_NAME_LENGTH from storage_manager.c
* Add and use MAX_TOPIC_LENGTH in MQTT instead of integer literals

This is a general cleanup of string handling around paths and URLs in addition
to specifically handling input validation and encoding of stream names.
@dpw13 dpw13 force-pushed the fix/stream-name-spaces branch from fb0f5b1 to a23aef4 Compare April 1, 2026 23:20
@dpw13
Copy link
Copy Markdown
Contributor Author

dpw13 commented Apr 1, 2026

I noticed a number of additional functions where stream_id or stream_name wasn't being properly escaped. I think in particular some of the HLS code is now updated, and I can confirm that output folders are now safe.

I also modified sanitize_stream_name() to allow uppercase and dashes, both of which do appear to be commonly used in MQTT topics. This should also ensure that any old output directories for streams with CamelCase will remain the same.

I believe this PR is ready, but I'm fine if you want to run it by Copilot one more time to make sure I didn't miss anything.

@matteius
Copy link
Copy Markdown
Contributor

matteius commented Apr 1, 2026

This is grade A work, even if there is something we didn't catch -- thank you so much for putting this together. I'll watch for the updated copilot comments as well.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 46 out of 46 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (2)

src/video/go2rtc/go2rtc_stream.c:118

  • go2rtc_stream_register() now percent-escapes stream_id into encoded_stream_id and then passes encoded_stream_id into go2rtc_api_add_stream(_multi). However go2rtc_api_add_stream(_multi) already escapes the stream_id again for the HTTP request, so this results in double-encoding (e.g., space -> %20 -> %2520). This can register a different stream name than intended and also breaks the ffmpeg:%s#audio=aac self-reference. Pass the raw stream_id into go2rtc_api_add_stream(_multi) and only escape at the point of constructing HTTP URLs/paths.
    // Sanitize the stream name so it can be safely used in a URL. Note that URL-encoding
    // the spaces results in go2rtc complaining with "source with spaces may be insecure",
    // so we strip any problematic characters from the string.
    char encoded_stream_id[URL_BUFFER_SIZE * 3];
    simple_url_escape(stream_id, encoded_stream_id, URL_BUFFER_SIZE * 3);

    // Ensure go2rtc is running
    if (!go2rtc_stream_is_ready()) {
        log_info("go2rtc not running, starting service");
        if (!go2rtc_stream_start_service()) {
            log_error("Failed to start go2rtc service");
            return false;

src/video/go2rtc/go2rtc_stream.c:226

  • Related to the double-encoding issue: ffmpeg_aac_source is built with encoded_stream_id and then encoded again inside go2rtc_api_add_stream_multi (it escapes each source). That turns the internal stream reference into a different literal name and is likely to fail to resolve the source for transcoding/recording. Build the ffmpeg source using the raw stream_id (or otherwise ensure it’s encoded exactly once, consistently with how go2rtc interprets stream names).
        // Build the FFmpeg AAC transcoding source URL for recording.
        // Audio-only: FFmpeg reads from go2rtc's internal RTSP, transcodes to
        // AAC, and publishes back.  go2rtc will transcode to OPUS for WebRTC
        // on demand without a separate persistent ffmpeg process.
        char ffmpeg_aac_source[URL_BUFFER_SIZE];
        snprintf(ffmpeg_aac_source, URL_BUFFER_SIZE, "ffmpeg:%s#audio=aac", encoded_stream_id);

        const char *sources[2] = { modified_url, ffmpeg_aac_source };
        result = go2rtc_api_add_stream_multi(encoded_stream_id, sources, 2);

        if (result) {
            log_info("Successfully registered stream with go2rtc (with AAC audio for recording): %s", encoded_stream_id);
        } else {
            log_error("Failed to register stream with go2rtc (with AAC audio): %s", encoded_stream_id);
            // Fall back to primary RTSP source only
            log_warn("Falling back to single source registration without audio transcoding");
            result = go2rtc_api_add_stream(encoded_stream_id, modified_url);

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +6 to +14
/**
* Sanitize a stream name for use as a path or object name.
* Replaces non-alphanumeric characters with underscores and lowercases.
*
* @param input The input string
* @param output The output buffer
* @param output_size The size of the output buffer
*/
void sanitize_stream_name(const char *input, char *output, size_t output_size);
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanitize_stream_name()’s doc/comments say it lowercases output, but the implementation preserves case. Either update the implementation to match (tolower on letters) or correct the documentation so callers (e.g., MQTT/HA discovery) don’t assume stable lowercase identifiers.

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +21
void sanitize_stream_name(const char *input, char *output, size_t output_size) {
size_t i = 0;
for (; i < output_size - 1 && input[i] != '\0'; i++) {
// Make sure non-ASCII values are converted to positive inputs to ctype functions
unsigned char c = (unsigned char) input[i];
// Allowed characters: [A-Za-z0-9_\-]
if (isalpha(c) || isdigit(c) || c == '-') {
output[i] = c;
} else {
output[i] = '_';
}
}
output[i] = '\0';
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanitize_stream_name() does not handle output_size==0: output_size - 1 underflows and the loop can write out of bounds. Add an early return when output_size is 0 (and ideally when output is NULL) to make the helper safe for all call sites.

Copilot uses AI. Check for mistakes.
Comment on lines +125 to +145
void test_extract_path_param_encoded(void) {
http_request_t req;
http_request_init(&req);
strncpy(req.path, "/api/streams/four%20score%20and%202", sizeof(req.path) - 1);

char param[32];
int rc = http_request_extract_path_param(&req, "/api/streams/", param, sizeof(param));
TEST_ASSERT_EQUAL_INT(0, rc);
TEST_ASSERT_EQUAL_STRING("four score and 2", param);
}

void test_extract_path_param_with_query(void) {
http_request_t req;
http_request_init(&req);
strncpy(req.path, "/api/streams/42?test=yes", sizeof(req.path) - 1);

char param[32];
int rc = http_request_extract_path_param(&req, "/api/streams/", param, sizeof(param));
TEST_ASSERT_EQUAL_INT(0, rc);
TEST_ASSERT_EQUAL_STRING("42", param);
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new tests cover (1) percent-encoded path params and (2) raw params with a query string, but they don’t cover the combination (encoded param followed by '?...'), which is the case that can leak query bytes into the decoded output. Add a test like /api/streams/four%20score?x=y expecting four score to prevent regressions in http_request_extract_path_param().

Copilot uses AI. Check for mistakes.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@matteius matteius merged commit 640b007 into opensensor:main Apr 2, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unable to get ONVIF working

3 participants