Skip to content

Commit d3d4926

Browse files
authored
Merge pull request #11 from opensource-observer/carl/oso-1598-polish-datametric-definitions-notebooks
feat(notebooks): polish `data/metric-definitions` notebooks
2 parents dfec55f + 85b91eb commit d3d4926

14 files changed

Lines changed: 884 additions & 2003 deletions

File tree

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
import MarimoIframe from '@/components/MarimoIframe';
2+
3+
export default function Experience() {
4+
return (
5+
<div className="h-full w-full">
6+
<MarimoIframe notebookName="notebooks/data/metric-definitions/experience" />
7+
</div>
8+
);
9+
}

app/components/Sidebar.tsx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@ const navItems: NavItem[] = [
6565
children: [
6666
{ label: 'Activity', href: '/data/metric-definitions/activity' },
6767
{ label: 'Alignment', href: '/data/metric-definitions/alignment' },
68+
{ label: 'Experience', href: '/data/metric-definitions/experience' },
6869
{ label: 'Lifecycle', href: '/data/metric-definitions/lifecycle' },
6970
{ label: 'Retention', href: '/data/metric-definitions/retention' },
7071
],

docs/metric-catalog.md

Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
# DDP Metric Catalog
2+
3+
Bottom-up inventory of every metric used across the 6 DDP insight notebooks, grouped by metric definition category.
4+
5+
---
6+
7+
## Activity (MAD)
8+
9+
Source: `oso.stg_opendevdata__eco_mads` — pre-calculated daily snapshots with rolling 28-day windows.
10+
11+
| Metric | Column | Definition | Used In |
12+
|--------|--------|-----------|---------|
13+
| Monthly Active Developers | `all_devs` | Unique developers with >=1 commit in rolling 28d | developer-report-2025 |
14+
| Full-Time Developers | `full_time_devs` | >=10 active days per 28d window | developer-report-2025, developer-lifecycle |
15+
| Part-Time Developers | `part_time_devs` | <10 active days, regular pattern | developer-report-2025, developer-lifecycle |
16+
| One-Time Developers | `one_time_devs` | Sporadic activity over 84d window | developer-report-2025 |
17+
| Newcomers | `devs_0_1y` | <1 year contributing to crypto | developer-report-2025 |
18+
| Emerging | `devs_1_2y` | 1-2 years contributing | developer-report-2025 |
19+
| Established | `devs_2y_plus` | 2+ years contributing | developer-report-2025 |
20+
| Exclusive | `exclusive_devs` | Active in only one ecosystem (28d) | developer-report-2025 |
21+
| Multichain | `multichain_devs` | Active across multiple ecosystems (28d) | developer-report-2025 |
22+
| Commits | `num_commits` | Total commits in 28d window | developer-report-2025 |
23+
24+
**Status:** Definition matches usage exactly. Primary consumer is 2025 Developer Trends.
25+
26+
---
27+
28+
## Lifecycle
29+
30+
Source: `oso.int_crypto_ecosystems_developer_lifecycle_monthly_aggregated` — monthly state snapshots.
31+
32+
| Metric | Label(s) | Definition | Used In |
33+
|--------|----------|-----------|---------|
34+
| First Time | `first time` | First-ever contribution to ecosystem | developer-lifecycle |
35+
| Full Time (4 variants) | `full time`, `new full time`, `part time to full time`, `dormant to full time` | >=10 active days, various transitions | developer-lifecycle |
36+
| Part Time (4 variants) | `part time`, `new part time`, `full time to part time`, `dormant to part time` | 1-9 active days, various transitions | developer-lifecycle |
37+
| Churned/Dormant (7 variants) | `dormant`, `first time to dormant`, `part time to dormant`, `full time to dormant`, `churned (after first time)`, `churned (after reaching part time)`, `churned (after reaching full time)` | No activity (1-6mo dormant, >6mo churned) | developer-lifecycle |
38+
| Monthly Churn Rate | Derived | `(churned+dormant) / active * 100` per month | developer-lifecycle |
39+
40+
**Status:** Definition matches usage exactly. 16 granular states roll up to 4 categories.
41+
42+
---
43+
44+
## Retention
45+
46+
Source: `oso.stg_opendevdata__repo_developer_28d_activities` joined to ecosystem mappings — computed via CTE.
47+
48+
| Metric | Computation | Used In |
49+
|--------|------------|---------|
50+
| Cohort assignment | Year of first contribution to ecosystem | developer-retention |
51+
| Cohort size | Count of developers per cohort year per ecosystem | developer-retention |
52+
| Retention rate | `active_in_year_N / cohort_size * 100` | developer-retention |
53+
| 1-Year / 2-Year avg retention | Mean retention across cohorts | developer-retention (stat cards) |
54+
| Cross-ecosystem retention | Same formula, compared across ETH/SOL/BTC | developer-retention |
55+
| Quarterly cohort retention | Quarterly variant applied to project-level activity | defi-builder-journeys |
56+
57+
**Status:** Definition matches developer-retention exactly. DeFi Builder Journeys uses a quarterly project-level variant.
58+
59+
---
60+
61+
## Alignment
62+
63+
Two distinct implementations across insights:
64+
65+
### 5-Channel Activity Model (DeFi Builder Journeys)
66+
67+
Source: `ethereum.devpanels.mart_developer_alignment_monthly` (customer-scoped).
68+
69+
| Channel | Column | Definition |
70+
|---------|--------|-----------|
71+
| Home Project | `home_project_repo_event_days` | Activity days on the builder's primary DeFi project |
72+
| Crypto | `crypto_repo_event_days` | Activity days on other crypto repos |
73+
| Personal | `personal_repo_event_days` | Activity days on personal/non-crypto repos |
74+
| OSS | `oss_repo_event_days` | Activity days on open-source repos |
75+
| Interest | `interest_repo_event_days` | Watch/fork events on repos of interest |
76+
77+
Used for: onboarding features, contribution features, current status classification, alluvial flows, balance of trade, feeder project analysis.
78+
79+
### Repo Ecosystem Classification (Speedrun Ethereum)
80+
81+
Source: `stg_opendevdata__ecosystems_repos_recursive` + `stg_opendevdata__ecosystems`.
82+
83+
| Category | Rule |
84+
|----------|------|
85+
| Ethereum | `ecosystem_name IN ('Ethereum', 'Celo')` |
86+
| Other EVM Chain | `ecosystem_name = 'Ethereum Virtual Machine Stack'` |
87+
| Non-EVM Chain | `is_chain = 1` (not EVM) |
88+
| Other (Crypto-Related) | `is_crypto = 1` (not chain) |
89+
| Personal | `user_name = repo_owner` |
90+
| Unknown | No ecosystem mapping |
91+
92+
Used for: classifying where SRE alumni contribute post-program.
93+
94+
### Base Ecosystem Alignment (Public Tables)
95+
96+
Source: `oso.stg_opendevdata__repo_developer_28d_activities` joined to ecosystem mappings.
97+
98+
Formula: `Commits_to_ecosystem / Total_commits * 100%`
99+
100+
Currently queryable but not directly used by any insight notebook. Serves as the building block for the customer-scoped models above.
101+
102+
**Status:** Definition needs rewrite. Current definition describes the base formula; actual insights use richer models built on top.
103+
104+
---
105+
106+
## Uncategorized
107+
108+
### Engagement / Repo Popularity (Ethereum Repo Rank)
109+
110+
Source: `ethereum.dev_engagement_models.*` (customer-scoped) + GitHub API scrapes.
111+
112+
| Metric | Definition | Time Window |
113+
|--------|-----------|-------------|
114+
| `global_engagers_30d` / `7d` | Unique stargazers + forkers | 30d / 7d rolling |
115+
| `eth_devs_30d` / `7d` | Engagers who are Ethereum panel builders | 30d / 7d rolling |
116+
| `eth_dev_pct` | `eth_devs / global_engagers * 100` (signal strength) | 30d |
117+
| `momentum` | `(global_engagers_7d / 7) / (global_engagers_30d / 30)` | 7d vs 30d |
118+
| Community label | "Crypto" if eth_dev_pct >= 1%, else "Mainstream" | 30d |
119+
| Overlap % | `shared_engagers / min(repo_a_size, repo_b_size) * 100` | All-time |
120+
| `alignment_score` | Sum of eth_dev_pct weights across repos a builder engages with | 30d |
121+
| Cumulative engagers | Running count of unique stargazers/forkers per repo | All-time |
122+
123+
Note: These are based on stars/forks (GitHub API), not commits (Open Dev Data). Entirely different primitive from the Activity metrics.
124+
125+
### TVL / Protocol Metrics (DeFi Builder Journeys)
126+
127+
Source: `ethereum.devpanels.mart_defi_project_summary`, `mart_project_tvl_history` (customer-scoped).
128+
129+
| Metric | Definition |
130+
|--------|-----------|
131+
| `current_tvl` | Total Value Locked (DefiLlama) |
132+
| `ethereum_pct` | % of TVL on Ethereum L1 + L2s |
133+
| `tvl_rank` | Rank by TVL |
134+
| `total_repos` | Repos associated with project |
135+
| `qualifying_developers` | Builders with 12+ months on home project |
136+
137+
### Developer Journey Metrics (DeFi Builder Journeys)
138+
139+
| Metric | Definition |
140+
|--------|-----------|
141+
| Onboarding month | First month with home project activity |
142+
| Offboarding month | Last activity month if 6+ months inactive after |
143+
| Tenure months | `offboard - onboard` (or to latest if still active) |
144+
| Is still active | No offboard date |
145+
| Pipeline category | Newcomer (<6mo pre-activity), Crypto-experienced, Non-crypto experienced |
146+
| Contribution cluster | Frequent (>10d/mo), Regular (5-10d/mo), Occasional (<5d/mo) |
147+
| Consistency % | `contrib_months / tenure_months * 100` |
148+
| Feeder projects | Crypto/OSS projects a builder was active in pre-onboarding |
149+
| Balance of trade | Annual imports/exports between ecosystem categories |
150+
151+
### SRE Program Metrics (Speedrun Ethereum)
152+
153+
| Metric | Definition |
154+
|--------|-----------|
155+
| `challenges_completed` | SRE challenges finished |
156+
| `batch_id` | SRE program batch |
157+
| `cohort_year` | Year of SRE enrollment |
158+
| Experience category | Experienced (>12mo), Learning (3-12mo), Newb (<3mo) pre-SRE |
159+
| `velocity` | `SUM(1 + ln(event_count))` per month per user |
160+
| Incremental Ethereum MAD | Active Ethereum devs attributable to SRE vs baseline |
161+
162+
---
163+
164+
## Observations
165+
166+
1. **Activity, Lifecycle, Retention** — clean 1:1 mapping between definitions and insights.
167+
2. **Alignment** — the definition doesn't match the implementations. Needs rewrite to describe the 5-channel model and repo classification.
168+
3. **Repo Rank** runs on an entirely different primitive (engagement events, not commits). No existing metric definition covers it.
169+
4. **DeFi Builder Journeys** is the most metric-rich notebook (~95 measures). Many are project-level lifecycle concepts (onboarding, tenure, pipeline) that don't have definitions.
170+
5. **Speedrun Ethereum** has program-specific metrics that are unique to SRE.
171+
172+
## Potential New Definitions
173+
174+
- **Engagement** — stars, forks, signal strength, momentum (covers Repo Rank)
175+
- **Developer Journey** — onboarding, offboarding, tenure, pipeline categories (covers DeFi Builder Journeys project-level metrics)
176+
- **Experience** — pre-existing activity classification, velocity, SRE attribution (covers Speedrun Ethereum)

0 commit comments

Comments
 (0)