Skip to content

Commit 6f9e65a

Browse files
authored
feat: add ai tracker endpoints (#3953)
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
1 parent 1b1ecbd commit 6f9e65a

6 files changed

Lines changed: 337 additions & 0 deletions
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
DESCRIPTION >
2+
- `ai_code_tracker_commits_ds` contains only authored-commit activities, pre-filtered from the full activities table.
3+
- Populated daily by `ai_code_tracker_commits_copy.pipe`.
4+
- Stores only the fields needed for AI tool detection: timestamp, title, body, attributes.
5+
- Reduces the dataset from ~1B rows to only commits, with sorting keys optimized for the AI pattern matching step.
6+
7+
TAGS "Report"
8+
9+
SCHEMA >
10+
`timestamp` DateTime,
11+
`title` String DEFAULT '',
12+
`body` String DEFAULT '',
13+
`attributes` String DEFAULT ''
14+
15+
ENGINE MergeTree
16+
ENGINE_PARTITION_KEY toYear(timestamp)
17+
ENGINE_SORTING_KEY timestamp
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
DESCRIPTION >
2+
- `ai_code_tracker_ds` contains pre-computed monthly aggregates of AI-assisted commits by tool.
3+
- Populated hourly by `ai_code_tracker_copy.pipe` which scans activities for AI tool signatures.
4+
- Each row represents one (month, toolKey) combination with commit counts.
5+
- Also stores total commits per month (toolKey = '__total__') for percentage calculations.
6+
- `monthStart` is the first day of the month (used for both monthly and yearly aggregation at query time).
7+
- `toolKey` identifies the AI tool (e.g., 'github-copilot', 'claude', 'cursor') or '__total__' for all commits.
8+
- `commitCount` is the number of commits for that tool in that month.
9+
10+
TAGS "Report"
11+
12+
SCHEMA >
13+
`monthStart` Date,
14+
`toolKey` LowCardinality(String),
15+
`commitCount` UInt64
16+
17+
ENGINE MergeTree
18+
ENGINE_PARTITION_KEY toYear(monthStart)
19+
ENGINE_SORTING_KEY monthStart, toolKey
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
DESCRIPTION >
2+
- `ai_code_tracker.pipe` returns AI-assisted commit counts by tool and time period.
3+
- Reads from pre-computed `ai_code_tracker_ds` datasource (materialized hourly by `ai_code_tracker_copy.pipe`).
4+
- Parameters:
5+
- `granularity`: Required. Either 'monthly' or 'yearly'.
6+
- `startDate`: Optional DateTime filter for commits after this date.
7+
- `endDate`: Optional DateTime filter for commits before this date.
8+
- Response: toolKey, toolName, startDate, endDate, commitCount
9+
10+
TAGS "Report"
11+
12+
NODE ai_code_tracker_result
13+
DESCRIPTION >
14+
Aggregate pre-computed AI commit counts by tool and time period
15+
16+
SQL >
17+
%
18+
SELECT
19+
toolKey,
20+
multiIf(
21+
toolKey = 'github-copilot',
22+
'GitHub Copilot',
23+
toolKey = 'chatgpt',
24+
'ChatGPT',
25+
toolKey = 'claude',
26+
'Claude',
27+
toolKey = 'cursor',
28+
'Cursor',
29+
toolKey = 'codewhisperer',
30+
'CodeWhisperer',
31+
toolKey = 'gemini',
32+
'Gemini',
33+
toolKey = 'codeium',
34+
'Codeium',
35+
toolKey = 'aider',
36+
'Aider',
37+
toolKey = 'devin',
38+
'Devin',
39+
toolKey = 'tabnine',
40+
'Tabnine',
41+
toolKey = 'other',
42+
'Other AI',
43+
'Unknown'
44+
) AS toolName,
45+
formatDateTime(
46+
CASE
47+
WHEN
48+
{{
49+
String(
50+
granularity,
51+
description="Time aggregation: monthly or yearly",
52+
required=True,
53+
)
54+
}} = 'monthly'
55+
THEN monthStart
56+
ELSE toStartOfYear(monthStart)
57+
END,
58+
'%Y-%m-%d'
59+
) AS startDate,
60+
formatDateTime(
61+
CASE
62+
WHEN
63+
{{
64+
String(
65+
granularity,
66+
description="Time aggregation: monthly or yearly",
67+
required=True,
68+
)
69+
}} = 'monthly'
70+
THEN monthStart + INTERVAL 1 MONTH - INTERVAL 1 DAY
71+
ELSE toStartOfYear(monthStart) + INTERVAL 1 YEAR - INTERVAL 1 DAY
72+
END,
73+
'%Y-%m-%d'
74+
) AS endDate,
75+
sum(commitCount) AS commitCount
76+
FROM ai_code_tracker_ds
77+
WHERE
78+
toolKey != '__total__'
79+
{% if defined(startDate) %}
80+
AND monthStart >= toDate(
81+
{{ DateTime(startDate, description="Filter commits after this date", required=False) }}
82+
)
83+
{% end %}
84+
{% if defined(endDate) %}
85+
AND monthStart < toDate(
86+
{{ DateTime(endDate, description="Filter commits before this date", required=False) }}
87+
)
88+
{% end %}
89+
GROUP BY toolKey, startDate, endDate
90+
ORDER BY startDate ASC, commitCount DESC
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
DESCRIPTION >
2+
- `ai_code_tracker_commits_copy.pipe` extracts only authored-commit rows from `activities_deduplicated_ds`.
3+
- Runs daily to populate `ai_code_tracker_commits_ds` with a small subset (~commits only) of the full 1B+ activities table.
4+
- This intermediate datasource is then used by `ai_code_tracker_copy.pipe` for fast AI pattern matching.
5+
6+
TAGS "Report"
7+
8+
NODE ai_code_tracker_commits_copy_result
9+
SQL >
10+
SELECT a.timestamp, a.title, a.body, a.attributes
11+
FROM activities_deduplicated_ds a
12+
WHERE a.type = 'authored-commit'
13+
14+
TYPE COPY
15+
TARGET_DATASOURCE ai_code_tracker_commits_ds
16+
COPY_MODE replace
17+
COPY_SCHEDULE 0 2 * * *
Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
DESCRIPTION >
2+
- `ai_code_tracker_copy.pipe` materializes AI-assisted commit counts into `ai_code_tracker_ds`.
3+
- Runs daily (after commits copy) reading from `ai_code_tracker_commits_ds` (only commits, not all 1B+ activities).
4+
- Classifies commits by AI tool based on title, body, and attributes content.
5+
- Stores monthly aggregates per tool plus total commits per month (toolKey = '__total__').
6+
- Uses multiSearchAnyCaseInsensitive for fast single-pass pre-filtering before classification.
7+
8+
TAGS "Report"
9+
10+
NODE ai_code_tracker_copy_totals
11+
DESCRIPTION >
12+
Total commits per month - simple count, no string scanning
13+
14+
SQL >
15+
SELECT toStartOfMonth(timestamp) AS monthStart, '__total__' AS toolKey, count() AS commitCount
16+
FROM ai_code_tracker_commits_ds
17+
GROUP BY monthStart
18+
19+
NODE ai_code_tracker_copy_prefilter
20+
DESCRIPTION >
21+
Fast pre-filter: only keep commits containing ANY AI keyword.
22+
multiSearchAnyCaseInsensitive does a single pass instead of dozens of separate checks.
23+
24+
SQL >
25+
SELECT toStartOfMonth(timestamp) AS monthStart, title, body, attributes
26+
FROM ai_code_tracker_commits_ds
27+
WHERE
28+
multiSearchAnyCaseInsensitive(
29+
title,
30+
[
31+
'copilot',
32+
'chatgpt',
33+
'claude',
34+
'cursor',
35+
'codewhisperer',
36+
'gemini',
37+
'codeium',
38+
'aider',
39+
'devin',
40+
'tabnine',
41+
'ai-generated',
42+
'ai generated'
43+
]
44+
)
45+
!= 0
46+
OR multiSearchAnyCaseInsensitive(
47+
body,
48+
[
49+
'copilot',
50+
'chatgpt',
51+
'claude',
52+
'cursor',
53+
'codewhisperer',
54+
'gemini',
55+
'codeium',
56+
'aider',
57+
'devin',
58+
'tabnine',
59+
'ai-generated',
60+
'ai generated',
61+
'co-authored-by'
62+
]
63+
)
64+
!= 0
65+
OR multiSearchAnyCaseInsensitive(attributes, ['copilot', 'ai-generated']) != 0
66+
67+
NODE ai_code_tracker_copy_classify
68+
DESCRIPTION >
69+
Classify pre-filtered commits by AI tool
70+
71+
SQL >
72+
SELECT
73+
monthStart,
74+
multiIf(
75+
positionCaseInsensitive(title, 'github copilot') > 0
76+
OR (
77+
positionCaseInsensitive(body, 'co-authored-by') > 0
78+
AND positionCaseInsensitive(body, 'copilot') > 0
79+
)
80+
OR positionCaseInsensitive(attributes, 'copilot') > 0,
81+
'github-copilot',
82+
positionCaseInsensitive(title, 'cursor') > 0 OR positionCaseInsensitive(body, 'cursor') > 0,
83+
'cursor',
84+
positionCaseInsensitive(title, 'claude') > 0 OR positionCaseInsensitive(body, 'claude') > 0,
85+
'claude',
86+
positionCaseInsensitive(title, 'chatgpt') > 0
87+
OR positionCaseInsensitive(body, 'chatgpt') > 0,
88+
'chatgpt',
89+
positionCaseInsensitive(title, 'codewhisperer') > 0
90+
OR positionCaseInsensitive(body, 'codewhisperer') > 0,
91+
'codewhisperer',
92+
positionCaseInsensitive(title, 'gemini') > 0 OR positionCaseInsensitive(body, 'gemini') > 0,
93+
'gemini',
94+
positionCaseInsensitive(title, 'codeium') > 0
95+
OR positionCaseInsensitive(body, 'codeium') > 0,
96+
'codeium',
97+
positionCaseInsensitive(title, 'copilot') > 0
98+
OR positionCaseInsensitive(body, 'copilot') > 0,
99+
'github-copilot',
100+
positionCaseInsensitive(title, 'aider') > 0 OR positionCaseInsensitive(body, 'aider') > 0,
101+
'aider',
102+
positionCaseInsensitive(title, 'devin') > 0 OR positionCaseInsensitive(body, 'devin') > 0,
103+
'devin',
104+
positionCaseInsensitive(title, 'tabnine') > 0
105+
OR positionCaseInsensitive(body, 'tabnine') > 0,
106+
'tabnine',
107+
positionCaseInsensitive(title, 'ai-generated') > 0
108+
OR positionCaseInsensitive(title, 'ai generated') > 0
109+
OR positionCaseInsensitive(body, 'ai-generated') > 0
110+
OR positionCaseInsensitive(body, 'ai generated') > 0
111+
OR positionCaseInsensitive(attributes, 'ai-generated') > 0
112+
OR (
113+
positionCaseInsensitive(body, 'co-authored-by') > 0
114+
AND positionCaseInsensitive(body, 'bot') > 0
115+
),
116+
'other',
117+
'__none__'
118+
) AS toolKey
119+
FROM ai_code_tracker_copy_prefilter
120+
121+
NODE ai_code_tracker_copy_by_tool
122+
DESCRIPTION >
123+
Aggregate AI commits by month and tool
124+
125+
SQL >
126+
SELECT monthStart, toolKey, count() AS commitCount
127+
FROM ai_code_tracker_copy_classify
128+
WHERE toolKey != '__none__'
129+
GROUP BY monthStart, toolKey
130+
131+
NODE ai_code_tracker_copy_result
132+
DESCRIPTION >
133+
Union AI tool counts and total counts
134+
135+
SQL >
136+
SELECT monthStart, toolKey, commitCount
137+
FROM ai_code_tracker_copy_by_tool
138+
UNION ALL
139+
SELECT monthStart, toolKey, commitCount
140+
FROM ai_code_tracker_copy_totals
141+
142+
TYPE COPY
143+
TARGET_DATASOURCE ai_code_tracker_ds
144+
COPY_MODE replace
145+
COPY_SCHEDULE 0 3 * * *
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
DESCRIPTION >
2+
- `ai_code_tracker_total_commits.pipe` returns total commit counts per time period.
3+
- Reads from pre-computed `ai_code_tracker_ds` datasource (materialized hourly by `ai_code_tracker_copy.pipe`).
4+
- Parameters:
5+
- `granularity`: Required. Either 'monthly' or 'yearly'.
6+
- `startDate`: Optional DateTime filter for commits after this date.
7+
- `endDate`: Optional DateTime filter for commits before this date.
8+
- Response: startDate, totalCommits
9+
10+
TAGS "Report"
11+
12+
NODE ai_code_tracker_total_commits_result
13+
DESCRIPTION >
14+
Aggregate pre-computed total commit counts by time period
15+
16+
SQL >
17+
%
18+
SELECT
19+
formatDateTime(
20+
CASE
21+
WHEN
22+
{{
23+
String(
24+
granularity,
25+
description="Time aggregation: monthly or yearly",
26+
required=True,
27+
)
28+
}} = 'monthly'
29+
THEN monthStart
30+
ELSE toStartOfYear(monthStart)
31+
END,
32+
'%Y-%m-%d'
33+
) AS startDate,
34+
sum(commitCount) AS totalCommits
35+
FROM ai_code_tracker_ds
36+
WHERE
37+
toolKey = '__total__'
38+
{% if defined(startDate) %}
39+
AND monthStart >= toDate(
40+
{{ DateTime(startDate, description="Filter commits after this date", required=False) }}
41+
)
42+
{% end %}
43+
{% if defined(endDate) %}
44+
AND monthStart < toDate(
45+
{{ DateTime(endDate, description="Filter commits before this date", required=False) }}
46+
)
47+
{% end %}
48+
GROUP BY startDate
49+
ORDER BY startDate ASC

0 commit comments

Comments
 (0)