Skip to content

Comments

Add env variables to configure manifest cache#2993

Open
kris-gaudel wants to merge 4 commits intoapache:mainfrom
kris-gaudel:kris-gaudel/configurable-manifest-cache
Open

Add env variables to configure manifest cache#2993
kris-gaudel wants to merge 4 commits intoapache:mainfrom
kris-gaudel:kris-gaudel/configurable-manifest-cache

Conversation

@kris-gaudel
Copy link
Contributor

@kris-gaudel kris-gaudel commented Jan 31, 2026

Closes #2952

Rationale for this change

Through discussion in issue #2325, we realized that there was a memory leak in the manifest cache. PR #2951 fixed this memory leak, but we decided that it would be best for developer experience if we developers could configure the cache for their needs.

This includes the ability to disable/enable the manifest cache, and configure its size

Are these changes tested?

Yes - unit tests are included to verify these changes

Are there any user-facing changes?

Yes:

  • PYICEBERG_MANIFEST__CACHE__ENABLED can be used to enable/disable the cache, and
  • PYICEBERG_MANIFEST__CACHE__SIZE can be used to configure the size of the manifest cache

@kris-gaudel
Copy link
Contributor Author

kris-gaudel commented Jan 31, 2026

@kevinjqliu @Fokko @rambleraptor Please lmk what you think!

@thomas-pfeiffer
Copy link

@kris-gaudel I have a few questions:

  • Would it make sense to list the new configuration parameters somewhere in mkdocs/docs/configuration.md, so that it shows up in the documentation as well?
  • Would it make sense to allow PYICEBERG_MANIFEST__CACHE__SIZE to be set to 0 to effectively disable the cache?
  • What's the added benefit of having a _ManifestCacheManager class?
    • Wouldn't _manifest_cache: LRUCache[str, ManifestFile] = LRUCache(maxsize=int(environ.get("PYICEBERG_MANIFEST__CACHE__SIZE", 128))) do the trick as well? (obviously an oversimplified example)

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kris-gaudel for raising this. I think it is reasonable to make this configurable. But I think we should reduce the number of changes and simplify where possible. For example, do we need PYICEBERG_MANIFEST__CACHE__ENABLED or can we set PYICEBERG_MANIFEST__CACHE__SIZE to 0 instead.

@kris-gaudel kris-gaudel force-pushed the kris-gaudel/configurable-manifest-cache branch from 9ceb651 to 4cad1c3 Compare February 21, 2026 23:18
@kris-gaudel kris-gaudel force-pushed the kris-gaudel/configurable-manifest-cache branch from f3b9f3d to f411547 Compare February 22, 2026 00:34

result = []
with _manifest_cache_lock:
cache = _manifest_cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: do we need this variable here, adds a bit of noise with the change

def reset_global_manifests_cache() -> None:
with manifest_module._manifest_cache_lock:
manifest_module._manifest_cache = manifest_module._init_manifest_cache()
clear_manifest_cache()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the init functionality. We can probably remove the clear since it's a no-op after calling your current init function.

_manifest_cache_lock = threading.RLock()


def _init_manifest_cache() -> LRUCache[str, ManifestFile] | None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can still simplify this to:

_manifest_cache_size = Config().get_int("manifest-cache-size") or _DEFAULT_MANIFEST_CACHE_SIZE
_manifest_cache: LRUCache[str, ManifestFile] = LRUCache(maxsize=_manifest_cache_size)

It someone really wants to disable caching. All I gotta do is just set this value to something like 1. The overhead here is pretty negligible. Unless there is a strong opinion for a case in which we never allocate a lrucache instance...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make manifest cache size configurable and allow for disabling

4 participants