Skip to content

Cache validity of a locale name#1254

Open
xmo-odoo wants to merge 1 commit intopython-babel:masterfrom
xmo-odoo:master
Open

Cache validity of a locale name#1254
xmo-odoo wants to merge 1 commit intopython-babel:masterfrom
xmo-odoo:master

Conversation

@xmo-odoo
Copy link
Copy Markdown
Contributor

Currently _cache is only populated on load, this is an issue with code which do a lot of locale parsing / instantiation but for one reason or an other rarely end up needing to actually load the locale data (e.g. date formatting with non-locale-dependent patterns) because every cache miss is an os.path.exists.

This change takes advantage of semantics differences between exists and load: exists just needs the entry to exist in the cache, while load needs the entry to be truthy. Thus exists can cache its lookup by setting an empty dict (or even None but that seems a bit too dodgy).

Alternatively we could remove the os.path.exists "slow path" and call normalize_locale every time, as locale_identifiers gets cached on first call, the initial call would be a lot more expensive but then no further exists would need to touch the filesystem.

For a third alternative, exists could be decorated with functools.cache, in order to have its own cache completely independent of load. load could also be migrated to functools.cache as, as far as I can tell, the localedata._cache is never invalidated anyway, with the drawback that a locale's data might be resolved multiple times if concurrent calls are tight enough (functools.cache synchronises the cache but not the resolution of the value).

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.04%. Comparing base (56c63ca) to head (a43df5d).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1254   +/-   ##
=======================================
  Coverage   92.04%   92.04%           
=======================================
  Files          27       27           
  Lines        4715     4716    +1     
=======================================
+ Hits         4340     4341    +1     
  Misses        375      375           
Flag Coverage Δ
macos-14-3.10 91.09% <100.00%> (+<0.01%) ⬆️
macos-14-3.11 91.03% <100.00%> (+<0.01%) ⬆️
macos-14-3.12 91.24% <100.00%> (+<0.01%) ⬆️
macos-14-3.13 91.24% <100.00%> (+<0.01%) ⬆️
macos-14-3.14 91.22% <100.00%> (+<0.01%) ⬆️
macos-14-3.8 90.96% <100.00%> (+<0.01%) ⬆️
macos-14-3.9 91.02% <100.00%> (+<0.01%) ⬆️
macos-14-pypy3.10 91.09% <100.00%> (+<0.01%) ⬆️
ubuntu-24.04-3.10 91.11% <100.00%> (+<0.01%) ⬆️
ubuntu-24.04-3.11 91.05% <100.00%> (+<0.01%) ⬆️
ubuntu-24.04-3.12 91.26% <100.00%> (+<0.01%) ⬆️
ubuntu-24.04-3.13 91.26% <100.00%> (+<0.01%) ⬆️
ubuntu-24.04-3.14 91.24% <100.00%> (+<0.01%) ⬆️
ubuntu-24.04-3.8 90.98% <100.00%> (+<0.01%) ⬆️
ubuntu-24.04-3.9 91.04% <100.00%> (+<0.01%) ⬆️
ubuntu-24.04-pypy3.10 91.11% <100.00%> (+<0.01%) ⬆️
windows-2022-3.10 91.10% <100.00%> (+<0.01%) ⬆️
windows-2022-3.11 91.04% <100.00%> (+<0.01%) ⬆️
windows-2022-3.12 91.25% <100.00%> (+<0.01%) ⬆️
windows-2022-3.13 91.25% <100.00%> (+<0.01%) ⬆️
windows-2022-3.14 91.23% <100.00%> (+<0.01%) ⬆️
windows-2022-3.8 91.07% <100.00%> (+<0.01%) ⬆️
windows-2022-3.9 91.03% <100.00%> (+<0.01%) ⬆️
windows-2022-pypy3.10 91.10% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

file_found = os.path.exists(resolve_locale_filename(name))
return True if file_found else bool(normalize_locale(name))
if file_found or normalize_locale(name):
_cache.setdefault(name, {})
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clever!

Clever enough that it would definitely at least need a comment here and also somewhere around _cache's declaration, and likely also in load(), so future readers can understand the semantics. In a "less type confusion" sense, None could be a better marker...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Aside, but just a thought... why do we guard all of load() in the RLock...)

Copy link
Copy Markdown
Contributor Author

@xmo-odoo xmo-odoo Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Aside, but just a thought... why do we guard all of load() in the RLock...)

Checking the blame the reason seems lost to time, the lock was already in the initial import.

I would assume either something like "we can change it if it becomes an issue", or (and?) that given the gil there should be essentially no contention on the lock once all the relevant locales have been initialised, it would only occur if a release occurred in the small handful of instructions of the "fast path". This might be an assumption to reconsider given GIL-less runtimes, but at the same time unless someone's complained it might not be worth the risk?

If you can get in contact with Edgewall they might still have the old history stashed away somewhere but that seems unlikely.

Copy link
Copy Markdown
Member

@akx akx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, the additional cleverness is a little too clever. I think just functools.cache on def exists() is better.

@xmo-odoo
Copy link
Copy Markdown
Contributor Author

On second thought, the additional cleverness is a little too clever. I think just functools.cache on def exists() is better.

After checking for safety, functools.cache was added in 3.9 but python-requires and the classifiers specify support for 3.8, so I used lru_cache(maxsize=None). It's already used by the function right below as well as in extract so it's consistent.

I simplified the return expression as well, afaik os.path.exists already returns a genuine bool so there's no reason to use a ternary, looks like it was just leftover from previous changes.

@xmo-odoo
Copy link
Copy Markdown
Contributor Author

xmo-odoo commented Apr 10, 2026

Ah looks like it's not happy because lru_cache blows up immediately on non-hashable content, and there's a test which passes a list to exists...

Should I just create a second separate manual cache instead then? Or update the test and exists can't be called with non-hashable content anymore? (technically it is typed as taking an str)

@akx
Copy link
Copy Markdown
Member

akx commented Apr 10, 2026

After checking for safety, functools.cache was added in 3.9 but python-requires and the classifiers specify support for 3.8, so I used lru_cache(maxsize=None). It's already used by the function right below as well as in extract so it's consistent.

I think the next version of Babel will be the last to support 3.8 (which is EOL), so please add a # TODO: Use @cache when dropping Python 3.8 support style comment?

Ah looks like it's not happy because lru_cache blows up immediately on non-hashable content, and there's a test which passes a list to exists...

Added in 1903a2b... Bit of a silly test.

Or update the test and exists can't be called with non-hashable content anymore? (technically it is typed as taking an str)

Yeah, let's just change that test to pass in a tuple or something. 😅 Blowing up in lru_cache when the input isn't hashable is fine, no one should be doing that anyway (and they'll find out the hard way here!)

@xmo-odoo
Copy link
Copy Markdown
Contributor Author

xmo-odoo commented Apr 10, 2026

After checking for safety, functools.cache was added in 3.9 but python-requires and the classifiers specify support for 3.8, so I used lru_cache(maxsize=None). It's already used by the function right below as well as in extract so it's consistent.

I think the next version of Babel will be the last to support 3.8 (which is EOL), so please add a # TODO: Use @cache when dropping Python 3.8 support style comment?

There's a few other locations where it's used the same way so might make more sense to go in a task / todo list for modernization than a spot comment just for that one? Dunno if you already have a task / milestone for that.

Or update the test and exists can't be called with non-hashable content anymore? (technically it is typed as taking an str)

Yeah, let's just change that test to pass in a tuple or something. 😅 Blowing up in lru_cache when the input isn't hashable is fine, no one should be doing that anyway (and they'll find out the hard way here!)

I'll do that then, thanks 👍

Currently `_cache` is only populated on `load`, this is an issue with
code which do a lot of locale parsing / instantiation but for one
reason or an other rarely end up needing to actually load the locale
data (e.g. date formatting with non-locale-dependent patterns) because
every cache miss is an `os.path.exists`.

Cache `exists` separately.

Update test because `lru_cache` requires all parameters to be
hashasble and a list is not that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants