-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
What happened?
Some test suites are currently failing with errors like:
RuntimeError: Pipeline construction environment and pipeline runtime environment are not compatible. If you use a custom container image, check that the Python interpreter minor version and the Apache Beam version in your image match the versions used at pipeline construction time. Submission environment: beam:version:sdk_base:apache/beam_python3.13_sdk:2.73.0.dev. Runtime environment: beam:version:sdk_base:apache/beam_python3.13_sdk:2.72.0.dev.
Rootcause is that py3.13 wheel format apache_beam-2.73.0.dev0-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl doesn't match expected format:
beam/sdks/python/container/boot.go
Line 422 in 9ff96df
| wheelName := fmt.Sprintf("cp%s-cp%s-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", pyVersion, pyVersion) |
so the staged whl gets ignored, hence we have a mismatch between SDK's version at submission (2.73.0.dev) and at runtime (2.72.0.dev).
The logic in boot.go was influenced by the old assumptions that:
- SDK is always staged to Dataflow worker (still the case for tests, no longer the case for prod)
- When SDK package is staged its name must be predetermined, like .
beam/sdks/python/container/boot.go
Line 72 in 9ff96df
sdkSrcFile = "dataflow_python_sdk.tar"
To fix, we should not assume that the package name is predetermined or do any sort of wheel name validation: we should trust that the wheel name passed via --sdk_location will be installable.
We need to fix the go code in /sdks/python/container, then release a new beam-master container to fix this.
Note that internal Dataflow tests still pass SDK as a source tarball, and source artifact will still be renamed by the stager:
| setup_options.sdk_location, names.STAGED_SDK_SOURCES_FILENAME) |
We can either leave the logic for staging sources as is, or also get rid of renaming in the SDK and then always use --sdk_location artifact filename name in boot.go.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner