Skip to content

Comments

Memoize ruby-macho's internals#942

Open
woodruffw wants to merge 5 commits intomainfrom
ww/perf
Open

Memoize ruby-macho's internals#942
woodruffw wants to merge 5 commits intomainfrom
ww/perf

Conversation

@woodruffw
Copy link
Member

@woodruffw woodruffw commented Feb 11, 2026

This is an experiment in using Claude to squeeze more performance out of ruby-macho's internals. The first commits have the raw outcomes (including a mountain of Markdown from Claude); the last commit is me cleaning things up.

The two key constraints here were (1) no public API changes (it kind of failed at this at Copilot notices, but I think these changes are actually fine), and (2) all existing tests must pass. Other than that, I gave Claude free rein to generate its own benchmarks and evaluate them.

The combination of the changes here make us 50-70% faster on common (read-heavy) workloads. The actual impact on Homebrew will probably be slightly smaller, since Homebrew doesn't re-read the computed properties much. Still, not bad for ~80 lines of diff 🙂

Signed-off-by: William Woodruff william@yossarian.net

@woodruffw woodruffw self-assigned this Feb 11, 2026
@woodruffw woodruffw marked this pull request as ready for review February 11, 2026 04:06
Copilot AI review requested due to automatic review settings February 11, 2026 04:06
@woodruffw
Copy link
Member Author

(The codecov/patch failure seems to be bogus; the actual coverage check didn't decline at all.)

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR experiments with improving ruby-macho’s internal performance by memoizing frequently accessed derived values and adding a load-command type index, while aiming to keep the public API unchanged and maintain test compatibility.

Changes:

  • Added memoization for several derived query methods in MachOFile (e.g., rpaths, linked dylibs, segments, segment alignment).
  • Built an indexed @load_commands_by_type structure during load command parsing for faster command(...) lookups.
  • Simplified aggregation in FatFile using flat_map, and updated local tooling metadata (mise + Bundler lockfile version).

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 4 comments.

File Description
mise.toml Pins a Ruby toolchain version for mise-based development.
lib/macho/macho_file.rb Adds memoization and a type index for load commands to reduce repeated scans.
lib/macho/fat_file.rb Uses flat_map for cleaner/faster list aggregation across slices.
Gemfile.lock Updates the Bundler version recorded in the lockfile.
Comments suppressed due to low confidence (1)

lib/macho/macho_file.rb:268

  • segments is now memoized and returns a cached array. Since this method is public, returning the same mutable array instance can be an observable API change if callers mutate it (previously it returned a fresh array each time). Consider returning a defensive copy of the cached value to keep memoization without changing external behavior.
    # All segment load commands in the Mach-O.
    # @return [Array<LoadCommands::SegmentCommand>] if the Mach-O is 32-bit
    # @return [Array<LoadCommands::SegmentCommand64>] if the Mach-O is 64-bit
    def segments
      @segments ||= if magic32?
        command(:LC_SEGMENT)
      else
        command(:LC_SEGMENT_64)
      end
    end

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Signed-off-by: William Woodruff <william@yossarian.net>
Signed-off-by: William Woodruff <william@yossarian.net>
Signed-off-by: William Woodruff <william@yossarian.net>
Signed-off-by: William Woodruff <william@yossarian.net>
Signed-off-by: William Woodruff <william@yossarian.net>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 4 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

lib/macho/macho_file.rb:269

  • The segments method stores a memoized value but doesn't return a duplicate like the other memoized methods (dylib_load_commands, linked_dylibs, rpaths). Although command already returns a duplicate, that duplicate gets stored in @segments, and then this method returns @segments directly without an additional .dup. This allows callers to mutate the cached value. For consistency and safety, add .dup before returning.
    def segments
      @segments ||= if magic32?
        command(:LC_SEGMENT)
      else
        command(:LC_SEGMENT_64)
      end
    end

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# @return [Array<LoadCommands::SegmentCommand64>] if the Mach-O is 64-bit
def segments
if magic32?
@segments ||= if magic32?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to dup this I think.

Or alternatively there's not too much to gain here from memoization given command(...) is memoized

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants