How come regex match objects aren't iterable even though they implement __getitem__?
As you may know, implementing a __getitem__ method makes a class iterable:
class IterableDemo:
def __getitem__(self, index):
if index > 3:
raise IndexError
return index
demo = IterableDemo()
print(demo[2]) # 2
print(list(demo)) # [0, 1, 2, 3]
print(hasattr(demo, '__iter__')) # False
However, this doesn't hold true for regex match objects:
>>> import re
>>> match = re.match('(ab)c', 'abc')
>>> match[0]
'abc'
>>> match[1]
'ab'
>>> list(match)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '_sre.SRE_Match' object is not iterable
It's worth noting that this exception isn't thrown in the __iter__ method, because that method isn't even implemented:
>>> hasattr(match, '__iter__')
False
So, how is it possible to implement __getitem__ without making the class iterable?
There are lies, damned lies and then there is Python documentation.
It is not enough to have a __getitem__ for a class implemented in C to be iterable. That is because there are actually 2 places in the PyTypeObject where the __getitem__ can be mapped to: tp_as_sequence and tp_as_mapping. Both have a slot for __getitem__ ([1], [2]).
Looking at the source of the SRE_Match, tp_as_sequence is initialized to NULL whereas tp_as_mapping is defined.
The iter() built-in function, if called with one argument, will call the PyObject_GetIter, which has the following code:
f = t->tp_iter;
if (f == NULL) {
if (PySequence_Check(o))
return PySeqIter_New(o);
return type_error("'%.200s' object is not iterable", o);
}
It first checks the tp_iter slot (obviously NULL for _SRE_Match objects); and failing that, then if PySequence_Check returns true, a new sequence iterator, else a TypeError is raised.
PySequenceCheck first checks if the object is a dict or a dict subclass - and returns false in that case. Otherwise it returns the value of
s->ob_type->tp_as_sequence &&
s->ob_type->tp_as_sequence->sq_item != NULL;
and since s->ob_type->tp_as_sequence was NULL for a _SRE_Match instance, 0 will be returned, and PyObject_GetIter raises TypeError: '_sre.SRE_Match' object is not iterable.
class IterableDemo:
def __getitem__(self, index):
if index > 3:
raise IndexError
return index
demo = IterableDemo()
print(demo[2]) # 2
print(list(demo)) # [0, 1, 2, 3]
print(hasattr(demo, '__iter__')) # False
However, this doesn't hold true for regex match objects:
>>> import re
>>> match = re.match('(ab)c', 'abc')
>>> match[0]
'abc'
>>> match[1]
'ab'
>>> list(match)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '_sre.SRE_Match' object is not iterable
It's worth noting that this exception isn't thrown in the __iter__ method, because that method isn't even implemented:
>>> hasattr(match, '__iter__')
False
So, how is it possible to implement __getitem__ without making the class iterable?
There are lies, damned lies and then there is Python documentation.
It is not enough to have a __getitem__ for a class implemented in C to be iterable. That is because there are actually 2 places in the PyTypeObject where the __getitem__ can be mapped to: tp_as_sequence and tp_as_mapping. Both have a slot for __getitem__ ([1], [2]).
Looking at the source of the SRE_Match, tp_as_sequence is initialized to NULL whereas tp_as_mapping is defined.
The iter() built-in function, if called with one argument, will call the PyObject_GetIter, which has the following code:
f = t->tp_iter;
if (f == NULL) {
if (PySequence_Check(o))
return PySeqIter_New(o);
return type_error("'%.200s' object is not iterable", o);
}
It first checks the tp_iter slot (obviously NULL for _SRE_Match objects); and failing that, then if PySequence_Check returns true, a new sequence iterator, else a TypeError is raised.
PySequenceCheck first checks if the object is a dict or a dict subclass - and returns false in that case. Otherwise it returns the value of
s->ob_type->tp_as_sequence &&
s->ob_type->tp_as_sequence->sq_item != NULL;
and since s->ob_type->tp_as_sequence was NULL for a _SRE_Match instance, 0 will be returned, and PyObject_GetIter raises TypeError: '_sre.SRE_Match' object is not iterable.
Comments
Post a Comment