Bug report
Bug description:
in PEP 814, the provided pseudocode for hash(frozendict) is hash(frozenset(frozendict.items())), which makes sense, but the current implementation severely mismatches it in semantics:
|
while (PyDict_Next(op, &pos, &key, &value)) { |
|
Py_hash_t key_hash = PyObject_Hash(key); |
|
if (key_hash == -1) { |
|
return -1; |
|
} |
|
hash ^= _shuffle_bits(key_hash); |
|
|
|
Py_hash_t value_hash = PyObject_Hash(value); |
|
if (value_hash == -1) { |
|
return -1; |
|
} |
|
hash ^= _shuffle_bits(value_hash); |
|
} |
just xoring the key and value hashes is very different from xoring the proper hashes of (key, value) pairs
the current implementation causes collisions to arise on permutations of keys and values separately rather than treating each entry as a single unit:
cases= [
{"a": False, "b": True, "c": True},
{"a": True, "b": False, "c": True},
{"a": True, "b": True, "c": False},
{False: "a", "b": True, "c": True},
{"a": "b", False: True, True: "c"}
]
assert len({hash(frozendict(case)) for case in cases}) == 1
which is definitely not desirable, and potentially even exploitable for DOS attacks like string hashes were before randomization was added.
CPython versions tested on:
3.15
Operating systems tested on:
Linux
Linked PRs
Bug report
Bug description:
in PEP 814, the provided pseudocode for
hash(frozendict)ishash(frozenset(frozendict.items())), which makes sense, but the current implementation severely mismatches it in semantics:cpython/Objects/dictobject.c
Lines 8246 to 8258 in b6854c5
just xoring the key and value hashes is very different from xoring the proper hashes of (key, value) pairs
the current implementation causes collisions to arise on permutations of keys and values separately rather than treating each entry as a single unit:
which is definitely not desirable, and potentially even exploitable for DOS attacks like string hashes were before randomization was added.
CPython versions tested on:
3.15
Operating systems tested on:
Linux
Linked PRs
frozendict_hash: hash (key, value) pairs rather than keys and values separately #149808