Vigorous Caching in Pharo or: "How I used Meta Links to Program my Program's Execution"

Foreword

This story is about the implementation of a super simplistic caching framework in Pharo. But I believe that the main value of this post is in the explanation of how I used MetaLinks to alter the functionality of the existing methods without changing their code.

Prolog

From time to time I need to process some data in Pharo. Read it from files, perform some transformations, some filtering, and cleaning; after that, I can easily analyze my data. I keep the scripts for data processing as methods of some class, usually, the path to a file is an instance variable and then the methods call each other gradually refining the result.

But what if you add one more method to refine the result even better? Usually, this means that you have to lose time by re-running everything although it was already run before. Also what if you refer to the same method a couple of times? Do you have to calculate the result more than once? Usually, you solve this by caching: you have an instance variable, if it's nil then you do a calculation and assign the result to the variable, otherwise you just return the variable's value. But I'm too lazy to rewrite each method… you know :). And in the end, it is Pharo, which means that you should be able to somehow make methods work in a slightly different way. That's why I thought: "how about implementing a thing that you can plug into your existing class and your methods will be automatically cached".

The Implementation

(I want to thank Max Leske who helped me by providing an answer on StackOverflow and Marcus Denker who answered plenty of my questions on Pharo Slack chat).

What can you plug into an existing class? Right, a trait. So all implementation is on the class-side of a trait. We will use MetaLinks, these are objects that can be attached to AST nodes and perform some actions before, after, or even instead of them. Whenever you add a MetaLink to a compiled method, the later one is replaced by an instance of a ReflectiveMethod. You can send #compiledMethod and #reflectiveMethod messages to either of the methods in order to obtain exactly the one you are looking for. Also sending #destroyTwin (to either of the methods) will destroy the reflective method and put the compiled method in its original place. Consider this method that attaches a link to a given method:

TVigorousCache class>>installLinkOn: aCompiledMethod
    | link |

    link := MetaLink new
        metaObject: aCompiledMethod;
        selector: #cachedValueWithReceiver:arguments:;
        arguments: #(receiver arguments);
        control: #instead.
    aCompiledMethod ast link: link

Here we define a MetaLink which will run instead of the node it is attached to, and we attach the link to the root AST node of the method. This means that the link will run instead of the whole method. But let's take a closer look at the link itself. First of all the link has a meta object. This can be any object and it will receive a message defined with #selector: at the time when the link should perform and action. In our case, the original compiled method will serve as a meta-object and we will send it #cachedValueWithReceiver:arguments: which is my custom method similar to the original #valueWithReceiver:arguments: (that simply evaluates the compiled method). Pay attention: the arguments that will be passed with the message are special keywords. We will pass the original receiver and parameter for the method that had to be executed if there was no MetaLink. If you want to see the other possible keywords take a look at the RFReification class hierarchy, especially comments and class-side methods. To wrap up, whenever a linked method has to be executed we send it #cachedValueWithReceiver:arguments: with the original receiver and arguments. And here is how cached evaluation works:

CompiledMethod>>cachedValueWithReceiver: aReceiver arguments: anArray

    anArray size isZero ifFalse: [ 
        ^ self valueWithReceiver: aReceiver arguments: anArray ].

    ^ self vigorousCache at: aReceiver ifAbsentPut: [ 
        self valueWithReceiver: aReceiver arguments: anArray ]

First of all, I check if the method has no arguments (if it has some you have to do way smarter caching). Secondly, I check if the method has a cache for a given receiver, and do the classic caching operations… Here is the #vigorousCache implementation:

CompiledMethod>>vigorousCache

    ^ self
        propertyValueAt: #vigorousCache
        ifAbsent: [ 
            self
                propertyValueAt: #vigorousCache
                put: WeakIdentityKeyDictionary new ]

It would be nice to store cache in the instance of a cacheable class, but as I wanted to make my cache completely independent I had to store the data in method properties. I use a weak dictionary, so if the object is garbage-collected its cache is removed as well, and I use identity dictionary for a fast lookup, as I want to store cache per each individual object anyway. When a method gets recompiled its properties are reset and so is the cache.

That's all of the hard stuff. Instead of the original method, the reflective one is called. The reflective one sends the cached version of the evaluation message to the original compiled method. The method is evaluated and the cache is stored and will be returned in all future cases.

You can also benefit from a few other methods like:

TVigorousCache class>>noteCompilationOfMethod: aCompiledMethod meta: isMeta

    isMeta ifTrue: [ ^self ].
    self installLinkOn: aCompiledMethod

This one will install the link on every newly compiled method on the instance side.

You can download the project by running:

Metacello new
    baseline: #VigorousCache;
    repository: 'github://Uko/PharoVigorousCache:v1.0.0';
    load

Then just use the TVigorousCache trait in your class to have the automatic caching. Also, make sure to check the class-side script methods.

Going Super Crazy

In the beginning, I've implemented caching in a different way that had only one cache per method and didn't distinguish between receivers. I will provide a description about it here because it is super cool.

Consider subclassing MetaLink and defining the following initialization:

VigorousCacheMetaLink>>initialize

    super initialize.
    self
        control: #instead;
        metaObject: self;
        selector: #cache:with:arguments:and:;
        arguments: #(node receiver arguments link)

Here instead of executing the node, we send ourselves the node to which we are attached, the receiver, the arguments (as in the previous approach), and the link itself. We could omit the last argument as we are sending the message to ourselves and could simply use the self pseudo-variable. But for the sake of the demo let's have it this way. All the magic happens in the next method:

VigorousCacheMetaLink>>cache: methodNode with: aReceiver arguments: args and: aLink
    | newLink result |

    methodNode removeLink: aLink.
    result := methodNode compiledMethod
        valueWithReceiver: aReceiver
        arguments: args.

    newLink := MetaLink new
        metaObject: result;
        control: #instead.

    methodNode link: newLink.
    ^ result

First of all, we remove this link for the node. Then we calculate the actual result of the method and create a new link. It will also run instead of the method and will simply return the meta-object which is the calculated result. In the end, we return the result to ensure the proper first execution. Each time a method is recompiled the reflective method is destroyed and so is the cache. How cool is this?

Afterword

Maybe the cache framework itself does not make that much sense, I made it for my particular case. Maybe meta links can be used in an even better way, this is just a first try by me. But I hope that I will help someone with this blogpost when he/she will try to understand what can be done by meta links.

Spread the word if you like it.

Special thanks to my beloved wife Natalia for helping me to write this blogpost.