Cache validity of a locale name#1254
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1254 +/- ##
=======================================
Coverage 92.04% 92.04%
=======================================
Files 27 27
Lines 4715 4716 +1
=======================================
+ Hits 4340 4341 +1
Misses 375 375
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
babel/localedata.py
Outdated
| file_found = os.path.exists(resolve_locale_filename(name)) | ||
| return True if file_found else bool(normalize_locale(name)) | ||
| if file_found or normalize_locale(name): | ||
| _cache.setdefault(name, {}) |
There was a problem hiding this comment.
This is clever!
Clever enough that it would definitely at least need a comment here and also somewhere around _cache's declaration, and likely also in load(), so future readers can understand the semantics. In a "less type confusion" sense, None could be a better marker...
There was a problem hiding this comment.
(Aside, but just a thought... why do we guard all of load() in the RLock...)
There was a problem hiding this comment.
(Aside, but just a thought... why do we guard all of
load()in the RLock...)
Checking the blame the reason seems lost to time, the lock was already in the initial import.
I would assume either something like "we can change it if it becomes an issue", or (and?) that given the gil there should be essentially no contention on the lock once all the relevant locales have been initialised, it would only occur if a release occurred in the small handful of instructions of the "fast path". This might be an assumption to reconsider given GIL-less runtimes, but at the same time unless someone's complained it might not be worth the risk?
If you can get in contact with Edgewall they might still have the old history stashed away somewhere but that seems unlikely.
akx
left a comment
There was a problem hiding this comment.
On second thought, the additional cleverness is a little too clever. I think just functools.cache on def exists() is better.
After checking for safety, I simplified the return expression as well, afaik |
|
Ah looks like it's not happy because lru_cache blows up immediately on non-hashable content, and there's a test which passes a list to Should I just create a second separate manual cache instead then? Or update the test and |
I think the next version of Babel will be the last to support 3.8 (which is EOL), so please add a
Added in 1903a2b... Bit of a silly test.
Yeah, let's just change that test to pass in a tuple or something. 😅 Blowing up in |
There's a few other locations where it's used the same way so might make more sense to go in a task / todo list for modernization than a spot comment just for that one? Dunno if you already have a task / milestone for that.
I'll do that then, thanks 👍 |
Currently `_cache` is only populated on `load`, this is an issue with code which do a lot of locale parsing / instantiation but for one reason or an other rarely end up needing to actually load the locale data (e.g. date formatting with non-locale-dependent patterns) because every cache miss is an `os.path.exists`. Cache `exists` separately. Update test because `lru_cache` requires all parameters to be hashasble and a list is not that.
Currently
_cacheis only populated onload, this is an issue with code which do a lot of locale parsing / instantiation but for one reason or an other rarely end up needing to actually load the locale data (e.g. date formatting with non-locale-dependent patterns) because every cache miss is anos.path.exists.This change takes advantage of semantics differences between
existsandload:existsjust needs the entry to exist in the cache, whileloadneeds the entry to be truthy. Thusexistscan cache its lookup by setting an empty dict (or evenNonebut that seems a bit too dodgy).Alternatively we could remove the
os.path.exists"slow path" and callnormalize_localeevery time, aslocale_identifiersgets cached on first call, the initial call would be a lot more expensive but then no furtherexistswould need to touch the filesystem.For a third alternative,
existscould be decorated withfunctools.cache, in order to have its own cache completely independent ofload.loadcould also be migrated tofunctools.cacheas, as far as I can tell, thelocaledata._cacheis never invalidated anyway, with the drawback that a locale's data might be resolved multiple times if concurrent calls are tight enough (functools.cachesynchronises the cache but not the resolution of the value).