Commit 708f298
authored
feat: replace forSession() scoring with FTS5 BM25 (#48)
## Phase 3 of search improvements (depends on #47)
Replaces the coarse bag-of-words term-overlap scoring in `forSession()`
with FTS5 BM25-based scoring.
### Problem
`forSession()` used manual term-overlap counting: extract top 30 words
>3 chars, count how many appear in each entry via `string.includes()`.
This ignored:
- Porter stemming ("configure" wouldn't match "configuration")
- TF-IDF weighting (all matching terms counted equally)
- Stopwords (common words inflated match counts)
### Solution
**New `scoreEntriesFTS()`** in ltm.ts:
- Runs session context terms against `knowledge_fts` using BM25
- Uses **OR** semantics (not AND-then-OR) because we're scoring all
candidates for ranking, not searching for exact matches — an entry
matching 1 of 40 terms should get a low score, not be excluded
- BM25 naturally weights entries matching more terms higher
- Scores normalized to 0–1 and multiplied by entry confidence
**Improved `extractTopTerms()`** moved to `search.ts`:
- Now uses same STOPWORDS set from Phase 1
- Drops single chars only (not >3 char threshold) — preserves "DB",
"CI", "IO"
- Increased limit from 30 to 40 terms
### Safety net preserved
Top 5 project entries by confidence are always included regardless of
FTS match, preventing the scoring change from accidentally excluding
critical project knowledge.
### Test coverage
- 8 new tests for `extractTopTerms()` (stopwords, 2-char tokens, limits,
punctuation)
- All 12 existing `forSession()` tests continue to pass1 parent 332f7b2 commit 708f298
3 files changed
Lines changed: 159 additions & 45 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
153 | 153 | | |
154 | 154 | | |
155 | 155 | | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
156 | 159 | | |
157 | 160 | | |
158 | 161 | | |
| |||
163 | 166 | | |
164 | 167 | | |
165 | 168 | | |
166 | | - | |
167 | | - | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
168 | 177 | | |
169 | | - | |
170 | | - | |
171 | | - | |
172 | | - | |
173 | | - | |
174 | | - | |
175 | | - | |
176 | | - | |
177 | | - | |
178 | | - | |
179 | | - | |
180 | | - | |
181 | | - | |
182 | | - | |
183 | | - | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
184 | 181 | | |
185 | | - | |
186 | | - | |
187 | | - | |
188 | | - | |
189 | | - | |
190 | | - | |
191 | | - | |
192 | | - | |
193 | | - | |
194 | | - | |
195 | | - | |
196 | | - | |
197 | | - | |
198 | | - | |
199 | | - | |
200 | | - | |
201 | | - | |
202 | | - | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
203 | 216 | | |
204 | 217 | | |
205 | 218 | | |
| |||
279 | 292 | | |
280 | 293 | | |
281 | 294 | | |
282 | | - | |
283 | | - | |
284 | | - | |
285 | | - | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
286 | 303 | | |
287 | 304 | | |
288 | 305 | | |
| |||
295 | 312 | | |
296 | 313 | | |
297 | 314 | | |
298 | | - | |
299 | | - | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
300 | 322 | | |
301 | 323 | | |
302 | 324 | | |
| |||
364 | 386 | | |
365 | 387 | | |
366 | 388 | | |
367 | | - | |
368 | | - | |
369 | | - | |
370 | 389 | | |
371 | 390 | | |
372 | 391 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
173 | 173 | | |
174 | 174 | | |
175 | 175 | | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
176 | 208 | | |
177 | 209 | | |
178 | 210 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | 11 | | |
11 | 12 | | |
| |||
258 | 259 | | |
259 | 260 | | |
260 | 261 | | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
261 | 324 | | |
0 commit comments