Skip to content

Comments

gh-144995: Optimize memoryview == memoryview#144996

Open
vstinner wants to merge 8 commits intopython:mainfrom
vstinner:memoryview_equal
Open

gh-144995: Optimize memoryview == memoryview#144996
vstinner wants to merge 8 commits intopython:mainfrom
vstinner:memoryview_equal

Conversation

@vstinner
Copy link
Member

@vstinner vstinner commented Feb 19, 2026

@vstinner
Copy link
Member Author

Results of the benchmark from the issue:

bytes 0.000122 seconds
mview 0.000146 seconds
⇒ 1.197965 time slower

memoryview comparison complexity is no longer O(n) but O(1): values are no longer compared.

}

static int
is_float_format(const char *format)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cover the complex types?

import numpy as np
a = np.array([1+2j, 3+4j, float('nan')], dtype=np.complex128)
mv = memoryview(a)
mv == mv # False

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This memory format is Zd. Oh, my change doesn't work for this memoryview. I should replace the blocklist with an allowlist. I'm not a memoryview/buffer expert. I didn't know that 3rd party projects can have their own format.

@vstinner
Copy link
Member Author

@eendebakpt: I updated the PR to allow formats known to be safe for pointer comparison (integer types), instead of blocking formats known to use floats.

I excluded the format P since I don't know well this format. Or can we allow it?

@eendebakpt
Copy link
Contributor

@eendebakpt: I updated the PR to allow formats known to be safe for pointer comparison (integer types), instead of blocking formats known to use floats.

I excluded the format P since I don't know well this format. Or can we allow it?

I think adding the P is fine (but I am no expert either). Leaving it out is the safe option, we can reconsider if this turns out to be a performance bottleneck.

Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>
# A memoryview is equal to itself: there is no need to compare
# individual values. This is not true for float values since they can
# be NaN, and NaN is not equal to itself.
for int_format in 'bBhHiIlLqQ':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can "?" be tested? Can format starting with "@" be tested? Can the null format be tested?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to test these formats. array.array doesn't support "P" and "?" formats and it doesn't support "@" byte order. Do you have an idea how to test these cases?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

memoryview.cast() supports them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprisingly:

>>> memoryview(b'\0\1').cast('?') == memoryview(b'\0\2').cast('?')
False

even if

>>> list(memoryview(b'\0\1').cast('?')) == list(memoryview(b'\0\2').cast('?'))
True

But this may be platform depending, so I would not test values different than 0 and 1. Or 1 is also not safe?

It may be undefined behavior to interpret random values except 0 as void* (even if it works on x86). Maybe there is a way to create an array of pointers in ctypes? Or it is not worth to bother?

* Optimize also "P" format
* Test also "m != m"
* Handle native formats such as "@b"
@vstinner
Copy link
Member Author

I updated the PR to address @serhiy-storchaka's review:

  • Optimize also "P" format
  • Test also "m != m"
  • Handle native formats such as "@B"

@vstinner
Copy link
Member Author

I added tests on 4 more formats: @b, @b, P and ?.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some doubts about the 'P' test. It may be an operation with undefined behavior (although CPython may be never run on platforms were this does not work, but I am not sure). It would be safer to omit that test. There are no other tests for 'P' format. But the optimization should work for it (if we exclude undefined behavior).

@vstinner
Copy link
Member Author

I modified the tests to check that the result with optimization is the same as the result without optimization:

        def check_equal(view, is_equal):
            self.assertEqual(view == view, is_equal)
            self.assertEqual(view != view, not is_equal)

            # Comparison with a different memoryview doesn't use
            # the optimization and should give the same result.
            view2 = memoryview(view)
            self.assertEqual(view2 == view, is_equal)
            self.assertEqual(view2 != view2, not is_equal)

I have some doubts about the 'P' test. It may be an operation with undefined behavior (although CPython may be never run on platforms were this does not work, but I am not sure). It would be safer to omit that test. There are no other tests for 'P' format. But the optimization should work for it (if we exclude undefined behavior).

For boolean (? format), I use memoryview(b'\0\1\2').cast('?') in the test. While m.tolist() == m.tolist() gives a different result than m == m, the important part here is that the can_compare_ptr optimization doesn't change m == m result.

If you are not confident with my P test and would prefer to remove the test, I would prefer removing the optimization for this type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants