Skip to content

Comments

[SPARK-55625][PS] Fix StringOps to make str dtype work properly#54413

Open
ueshin wants to merge 1 commit intoapache:masterfrom
ueshin:issues/SPARK-55625/string_ops
Open

[SPARK-55625][PS] Fix StringOps to make str dtype work properly#54413
ueshin wants to merge 1 commit intoapache:masterfrom
ueshin:issues/SPARK-55625/string_ops

Conversation

@ueshin
Copy link
Member

@ueshin ueshin commented Feb 21, 2026

What changes were proposed in this pull request?

Fix StringOps to make str dtype work properly.

Why are the changes needed?

In pandas 3, the default dtype for string is now str or StringDtype(na_value=np.nan).
This is one of extension dtypes, but actually handled as if non-extension dtypes.

>>> pser = pd.Series(["x", "y", "z", None])
>>> other_pser = pd.Series([None, "z", "y", "x"])
  • pandas 2
>>> pser
0       x
1       y
2       z
3    None
dtype: object
>>> other_pser
0    None
1       z
2       y
3       x
dtype: object
>>> pser == other_pser
0    False
1    False
2    False
3    False
dtype: bool
  • pandas 3
>>> pser
0      x
1      y
2      z
3    NaN
dtype: str
>>> other_pser
0    NaN
1      z
2      y
3      x
dtype: str
>>> pser == other_pser
0    False
1    False
2    False
3    False
dtype: bool

Does this PR introduce any user-facing change?

Yes, it will behave more like pandas 3.

How was this patch tested?

The existing tests should pass.

Was this patch authored or co-authored using generative AI tooling?

No.

@ueshin
Copy link
Member Author

ueshin commented Feb 21, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant