Skip to content

Commit ffe96c3

Browse files
authored
feat: mentions handling on tinybird (#3614)
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
1 parent 8fdd59e commit ffe96c3

7 files changed

Lines changed: 127 additions & 18 deletions

services/libs/tinybird/datasources/insights_projects_populated_ds.datasource

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,10 @@ SCHEMA >
5454
`softwareValue` UInt64,
5555
`contributorCount` UInt64,
5656
`organizationCount` UInt64,
57-
`healthScore` Float64
57+
`healthScore` Float64,
58+
`communityPlatforms` Array(String),
59+
`communityKeywords` Array(String),
60+
`communityLanguages` Array(String)
5861

5962
ENGINE MergeTree
6063
ENGINE_PARTITION_KEY toYear(createdAt)
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
DESCRIPTION >
2+
- `mentions` contains community mentions from various sources tracked via Octolens integration.
3+
- Raw datasource only exists in Tinybird - pushed directly from Octolens webhook processing.
4+
- Tracks mentions across platforms like Reddit, HackerNews, Twitter, and other community sources.
5+
- Includes sentiment analysis and relevance scoring for each mention.
6+
- `sourceId` is the unique identifier from the source platform.
7+
- `url` is the direct link to the mention on the source platform.
8+
- `timestamp` is when the mention occurred on the source platform.
9+
- `source` indicates the source platform (reddit, hackernews, twitter, etc.) using LowCardinality.
10+
- `author` is the username/display name of the person who created the mention.
11+
- `authorProfileLink` is the URL to the author's profile on the source platform.
12+
- `title` contains the mention's title or subject line.
13+
- `body` contains the full text content of the mention.
14+
- `imageUrl` contains the URL to any associated image (empty string if not available).
15+
- `relevanceScore` is the computed relevance score from Octolens (string representation).
16+
- `relevanceComment` contains the explanation for the relevance score.
17+
- `keyword` is the keyword that triggered this mention match.
18+
- `sentimentLabel` provides the sentiment classification (positive, negative, neutral, mixed).
19+
- `subreddit` contains the subreddit name for Reddit mentions (empty string for other sources).
20+
- `viewId` is the Octolens view identifier that captured this mention.
21+
- `viewName` is the human-readable name of the Octolens view.
22+
- `projectSlug` identifies which project this mention belongs to.
23+
- `createdAt` is the timestamp when the record was created in Tinybird.
24+
25+
TAGS "" Octolens integration", Community", "Sentiment analysis"
26+
27+
SCHEMA >
28+
`sourceId` String `json:$.sourceId` DEFAULT '',
29+
`url` String `json:$.url` DEFAULT '',
30+
`timestamp` DateTime `json:$.timestamp`,
31+
`source` LowCardinality(String) `json:$.source` DEFAULT '',
32+
`author` String `json:$.author` DEFAULT '',
33+
`authorProfileLink` String `json:$.authorProfileLink` DEFAULT '',
34+
`title` String `json:$.title` DEFAULT '',
35+
`body` String `json:$.body` DEFAULT '',
36+
`imageUrl` String `json:$.imageUrl` DEFAULT '',
37+
`relevanceScore` String `json:$.relevanceScore` DEFAULT '',
38+
`relevanceComment` String `json:$.relevanceComment` DEFAULT '',
39+
`keyword` String `json:$.keyword` DEFAULT '',
40+
`sentimentLabel` LowCardinality(String) `json:$.sentimentLabel` DEFAULT '',
41+
`subreddit` String `json:$.subreddit` DEFAULT '',
42+
`viewId` Int64 `json:$.viewId` DEFAULT 0,
43+
`viewName` String `json:$.viewName` DEFAULT '',
44+
`language` String `json:$.language` DEFAULT '',
45+
`projectSlug` LowCardinality(String) `json:$.projectSlug` DEFAULT '',
46+
`createdAt` DateTime64(3) `json:$.createdAt` DEFAULT now64(3),
47+
`bookmarked` UInt8 `json:$.bookmarked`,
48+
`keywords` Array(String) `json:$.keywords[:]`
49+
50+
ENGINE ReplacingMergeTree
51+
ENGINE_PARTITION_KEY toYear(timestamp)
52+
ENGINE_SORTING_KEY projectSlug, timestamp, sourceId
53+
ENGINE_VER createdAt
Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,17 @@
11
NODE health_score_select_fields
22
SQL >
3+
SELECT
4+
id,
5+
segmentId,
6+
slug,
7+
if(isNaN(overallScore), null, overallScore) as overallScore,
8+
toStartOfDay(now()) as date
9+
FROM health_score_copy_ds
310

4-
SELECT id, segmentId, slug, if (isNaN(overallScore), null, overallScore) as overallScore, toStartOfDay(now()) as date FROM health_score_copy_ds
5-
6-
TYPE sink
11+
TYPE SINK
712
EXPORT_SERVICE kafka
813
EXPORT_CONNECTION_NAME lfx-oracle-kafka-streaming
9-
EXPORT_KAFKA_TOPIC health_score_sink
1014
EXPORT_SCHEDULE 30 0 * * *
11-
12-
15+
EXPORT_FORMAT csv
16+
EXPORT_STRATEGY @new
17+
EXPORT_KAFKA_TOPIC health_score_sink

services/libs/tinybird/pipes/insightsProjects_filtered.pipe

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,6 @@
11
DESCRIPTION >
22
Provides filters for projects. Merges collection slug from associated collections. Merges segment aggregates from segmentsAggregatedMV
33

4-
TOKEN "insights-app-token" READ
5-
TOKEN "insighsProjects_filtered_endpoint_read_2583" READ
6-
74
NODE insightsProjects_filtered_1
85
SQL >
96
%
@@ -31,7 +28,10 @@ SQL >
3128
insights_projects_populated_ds.connectedPlatforms,
3229
insights_projects_populated_ds.firstCommit,
3330
insights_projects_populated_ds.repoData,
34-
insights_projects_populated_ds.healthScore
31+
insights_projects_populated_ds.healthScore,
32+
insights_projects_populated_ds.communityPlatforms,
33+
insights_projects_populated_ds.communityKeywords,
34+
insights_projects_populated_ds.communityLanguages
3535
FROM insights_projects_populated_ds
3636
where
3737
insights_projects_populated_ds.enabled = 1
@@ -92,4 +92,7 @@ SQL >
9292
insights_projects_populated_ds.connectedPlatforms,
9393
insights_projects_populated_ds.firstCommit,
9494
insights_projects_populated_ds.repoData,
95-
insights_projects_populated_ds.healthScore
95+
insights_projects_populated_ds.healthScore,
96+
insights_projects_populated_ds.communityPlatforms,
97+
insights_projects_populated_ds.communityKeywords,
98+
insights_projects_populated_ds.communityLanguages

services/libs/tinybird/pipes/insights_projects_populated_copy.pipe

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,16 @@ SQL >
121121
WHERE archived = true OR excluded = true
122122
GROUP BY segmentId, insightsProjectId
123123

124+
NODE insights_projects_populated_copy_mentions
125+
SQL >
126+
SELECT
127+
projectSlug,
128+
groupArrayIf(DISTINCT source, source != '') as communityPlatforms,
129+
groupArrayIf(DISTINCT keyword, keyword != '') as communityKeywords,
130+
groupArrayIf(DISTINCT language, language != '') as communityLanguages
131+
FROM mentions FINAL
132+
GROUP BY projectSlug
133+
124134
NODE insights_projects_populated_copy_results
125135
DESCRIPTION >
126136
Join everything together
@@ -156,7 +166,10 @@ SQL >
156166
insights_projects_populated_copy_aggregates.organizationCount as organizationCount,
157167
insights_projects_populated_copy_health_score_deduplicated.healthScore as healthScore,
158168
archived_excluded_repositories.archivedRepositories as archivedRepositories,
159-
archived_excluded_repositories.excludedRepositories as excludedRepositories
169+
archived_excluded_repositories.excludedRepositories as excludedRepositories,
170+
insights_projects_populated_copy_mentions.communityPlatforms as communityPlatforms,
171+
insights_projects_populated_copy_mentions.communityKeywords as communityKeywords,
172+
insights_projects_populated_copy_mentions.communityLanguages as communityLanguages
160173
FROM insightsProjects FINAL
161174
LEFT JOIN
162175
insights_projects_populated_copy_collections_slugs
@@ -179,6 +192,9 @@ SQL >
179192
LEFT JOIN
180193
archived_excluded_repositories
181194
ON archived_excluded_repositories.insightsProjectId = insightsProjects.id
195+
LEFT JOIN
196+
insights_projects_populated_copy_mentions
197+
ON insights_projects_populated_copy_mentions.projectSlug = insightsProjects.slug
182198
WHERE isNull (insightsProjects.deletedAt)
183199

184200
TYPE COPY
Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,12 @@
11
NODE insights_projects_select_fields
22
SQL >
3-
43
SELECT id, collectionsSlugs, name, slug, segmentId, softwareValue, toStartOfDay(now()) as date
54
FROM insights_projects_populated_ds
65

7-
TYPE sink
6+
TYPE SINK
87
EXPORT_SERVICE kafka
98
EXPORT_CONNECTION_NAME lfx-oracle-kafka-streaming
10-
EXPORT_KAFKA_TOPIC insights_projects_populated_sink
119
EXPORT_SCHEDULE 30 0 * * *
12-
13-
10+
EXPORT_FORMAT csv
11+
EXPORT_STRATEGY @new
12+
EXPORT_KAFKA_TOPIC insights_projects_populated_sink
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
NODE mentions_list_results
2+
SQL >
3+
%
4+
SELECT *
5+
FROM mentions FINAL
6+
WHERE
7+
1 = 1
8+
{% if defined(projectSlug) %}
9+
AND projectSlug
10+
= {{ String(projectSlug, description="Filter by project slug", required=False) }}
11+
{% end %}
12+
{% if defined(platforms) %}
13+
AND source
14+
IN {{ Array(platforms, 'String', description="Filter by platforms", required=False) }}
15+
{% end %}
16+
{% if defined(keywords) %}
17+
AND keyword
18+
IN {{ Array(keywords, 'String', description="Filter by keywords", required=False) }}
19+
{% end %}
20+
{% if defined(sentiments) %}
21+
AND sentimentLabel
22+
IN {{ Array(sentiments, 'String', description="Filter by sentiments", required=False) }}
23+
{% end %}
24+
{% if defined(languages) %}
25+
AND language
26+
IN {{ Array(languages, 'String', description="Filter by languages", required=False) }}
27+
{% end %}
28+
ORDER BY timestamp DESC
29+
LIMIT {{ Int32(pageSize, 20) }}
30+
OFFSET {{ Int32(page, 0) * Int32(pageSize, 20) }}

0 commit comments

Comments
 (0)