Gissues: Improve reactive collections performance

jods4

Aug 11th 2021

This some stuff I had on the back of my head for months, but just can't find any time to PR. I thought maybe @basvanmeurs or @RobbinBaauw may be interested in having a go at it, as they did awesome work on ref performance before.

I'll write this issue focusing on arrays, but some of it could be applied unchanged to other collections (Maps, Sets).

Arrays are important

In vue, arrays are wrapped behind a reactive proxy that traps every read/write to every array indice.

Arrays can be big. In fact, as big as you can imagine. Data viz is a prime example of applications that manipulate lots of large arrays.

Let's imagine a dashboard that monitors 100 VM. For each it displays metrics (maybe a sparkline) for CPU and memory, based on the last hour, 1 data point per minute. That's 100 x (60 + 60) = 12'000 array elements.

A (raw) indexed array access is crazy fast in modern JS VM. But each reactive array access incurs:

A proxy trap (not so fast)
Tracking (not so fast)
Dependency data structures (memory usage++, perf--).

Key observations

Most applications use arrays as lists, not tuples. By this I mean: they iterate through arrays completely, rather than access specific random indices.

For this usage, arrays are read entirely (e.g. using for-of, map, filter, sort, v-for) and everything in the array is a dependency.

It could all be tracked with a single dependency, no matter the size of array (aka O(1) vs O(n), if not O(n lg n) for sorts). Instead a naive approach tracks every single indice access.

Notice that Vue actually already performs this optimization for keys, values, entries, @@Iterator and forEach, for collections -- which does not include arrays: https://github.com/vuejs/vue-next/blob/7ffa225aa334f0fd7da6ba30bee9109de3597643/packages/reactivity/src/collectionHandlers.ts#L182-L224

Optimization idea

Much like collections above, arrays could often take a single ITERATE dependency and process the native array, bypassing proxies and tracking altogether.

shallowReactive is especially easy because it doesn't modify its contents. Deep and readonly reactives are designed to modify their contents on read, which is probably not the best decision perf-wise but it's too late to change. This makes their optimization a bit more tricky.

Map, filter, reduce -- and the likes

Those methods take a callback and apply it to the complete array. We must ensure the items in callback are wrapped. Callbacks also take the entire array, I don't think we have a choice here but pass the reactive array and incur any tracking work if it is used. In my experience this parameter is rarely used in practice, though.

Here's a sketch of an implementation, using map as an example but the code can be shared:

// Assume we captured map on the target when returning the function below.
// This is required in case user has its own `map` implementations and calls
// the function on a different instance than the one it obtained it from.
// In other words stuff like:
// const aMap = reactive1.map;
// aMap.call(unrelated, ...);
const targetMap = target.map;

// This would be the map function returned by the proxy for `reactiveArray.map`,
// with targetMap above captured.
function map(cb, thisArg) {
  // Native arrays or whatever
  if (!isReactive(this)) return targetMap.call(this, cb, thisArg);
  
  // Bail out to unoptimized for weird usage on non-array?
  // This is so that dependencies are properly tracked
  if (!isArray(this)) return targetMap.call(this, cb, thisArg);
  
  // Register an ITERATE (i.e. whole array dependency) 
  track(this, ITERATE);
  const raw = getRaw(this);

  // shallow -> nothing to do, except ensure the `array` argument is the reactive one
  if (isShallow(this)) 
    return targetMap.call(raw, (item, idx) => cb.call(thisArg ?? window, item, idx, this));
  
  // readonly and deep: must wrap value
  const wrap = isReadonly(this) ? toReadonly : toReactive;
  return targetMap.call(raw, (item, idx) => cb.call(thisArg ?? window, wrap(item), idx, this));
}

entries, values, keys, Iterator

Same idea as map & co., but simpler because there's no callback. Vue actually does it for Map and Sets, as indicated previously.

some, every, find, findIndex

These are basically the same as map & co.

They are special because iteration doesn't go to the end (notice: for-of on an iterator could break as well) but can stop early. In a first approach, I wouldn't care about that. It's theoretically possible to create range dependencies if we really wanted to.

This mean an effect can run again although no impacting change was made, but it's already the case today, so I wouldn't call that a breaking change. Today some has a dependency on length, and even if it stops at index 0, if you push a new value it would run again (although it doesn't have to).

sort

This one is a bit special because it reads values more than once, n lg n times on average. I think it would be worth wrapping all items into a new native array, sort it natively, and apply the result to original array. Maybe less efficient for small arrays, but small is fast anyway. The larger the array, the more efficient it should be.

A first approach could be like above and wrap every item on-demand.

readAll

I think for advanced uses it'd be nice to have a function readAll(reactiveArray) that performs a track(array, ITERATE) and returns the raw, non-reactive array. That would allow users to efficiently use this array in an index-based for loop for example, or perform any other kind of custom analytics with random access (think moving average, etc.).

For non-shallow arrays, instead of returning the raw array, we'd need to return a mapped copy with every element wrapped.

v-for

Let's not forget v-for, one of the main use cases! Currently it iterates with an index-based loop. https://github.com/vuejs/vue-next/blob/master/packages/runtime-core/src/helpers/renderList.ts#L62-L66

To benefit from work above, it should switch to a for-of (using iterator tracking) or map.

[...spread] and Array.from

I believe those are based on the iterator protocol, so they would also benefit from changes above?

basvanmeurs

Aug 12th 2021

Hi jods, thanks for sharing your thoughts again!

I think that you have a valid point regarding array iteration performance.

As far as I can see, only in case of iteration on the shallowRef, we could omit the get trap. That makes it a bit of a niche optimization. Besides, it would require all the mutation traps to trigger the ITERATE as well, which it currently doesn't. I don't know if this has side effects. It might cost performance in some cases.. Alternatively we could introduce a new type of trigger (ALL_VALUES) but that would cost memory and performance.

Like you, I only see a solution by providing work arounds for each individual possible function. To me, the optimization just seems a bit too specific. Especially when using reactivity stand alone. It might be helpful for v-for template cases though.

Personally, in my vue projects, I am aware of reactivity performance considerations. You can generally work around it when using a shallowRef. and triggering it manually. The problem is more in shared state plugins like vuex and pina that just reactify all state. I think in general those shared state plugins are a bad idea, but that is a different discussion..

Having said that, I would also like to have a better work around for this case. Just can't think of one right now..

jods4

Aug 12th 2021

As far as I can see, only in case of iteration on the shallowRef, we could omit the get trap. That makes it a bit of a niche optimization.

My thoughts:

Shallow array are still an important case, esp. for performance conscious users. If you present a large list of readonly rows, shallow array is the right choice, and cutting practically all reactivity cost would be welcome.
If you look closely, there's a "wrapper" function call but no proxy trap for deep and readonly arrays as well. It's a big win for perf as well.
It still cuts all the tracking calls and associated data structures to one single entry, instead of hundreds or more. That looks like a big win to me.

Benchmarks are the only way to do perf seriously, but I think it could be more than a niche optimization.

it would require all the mutation traps to trigger the ITERATE as well, which it currently doesn't. I don't know if this has side effects. It might cost performance in some cases.. Alternatively we could introduce a new type of trigger (ALL_VALUES) but that would cost memory and performance.

Indeed. We would need a benchmark... I believe that the big reduction in how many things are tracked would largely outweight the additional "all_values" or "iterate" track/trigger calls.

To me, the optimization just seems a bit too specific. Especially when using reactivity stand alone. It might be helpful for v-for template cases though.

v-for is a huge use-case, it could also help Vue perform better in benchmarks. filter, reduce (e.g. for a sum) or for-of in computed code are also common cases. Sadly indexed-based for would not benefit from optimization unless we provide additional methods and users are aware of them 😕

Personally, in my vue projects, I am aware of reactivity performance considerations. You can generally work around it when using a shallowRef. and triggering it manually. The problem is more in shared state plugins like vuex and pina that just reactify all state. I think in general those shared state plugins are a bad idea, but that is a different discussion..

I'm with you here. You can actually build your own optimized reactive arrays, that work as described here and even better than the built-in ones thanks to more optimized code in proxy, although that is not really supported by Vue. (I think Evan said track and trigger primitives have no support guarantees, although they open very powerful and interesting opportunities. I use them myself.)

The remark about about VueX and Pina is important. It'd be nice to get the best performance possible without being too performance-conscious and doing weird hacks such as using signaling patterns or custom reactive arrays.

basvanmeurs

Aug 13th 2021

I agree. Let's see if I can create some performance tests. I have another solution in mind that could be nearly as fast, for both shallow and deep reactives and even for arrays, objects and collections, but with less code. If it works.. I'll get back on this next week

jods4

Aug 13th 2021

Awesome that you can take a look at that! Thanks a lot! 🙇‍♂️

For the perf test, I think it'd need to be run at a few different array sizes, to find the sweet spot. As I said in intro, we can make this look as dramatic as we want simply by picking larger and larger arrays.

When a computed runs again, there's book-keeping for all its dependencies. If that computed performs a reduce to compute the sum of 1000 elements, what we're proposing here would turn 1000 dependencies into a single one. How noticeable (in perf and memory) is that improvement for a 20 elements, array? 100, 400, 1000?

basvanmeurs

Aug 16th 2021

I added some benchmarks. As expected, reactive arrays are much slower than raw. See https://github.com/basvanmeurs/vue-next-benchmarks/blob/master/src/reactiveArray.js for the benchmarks, feel free to run them on your local machine or just view the results below.

reactiveArray-benchmarks

Raw is ~ 100x faster than reactive. This is not surprising. To be honest, when using a reactive data model these problems are easily solved by just using a shallowRef and updating the raw array completely (or using triggerRef to do this manually). This is what I've always been doing so far.

Unfortunately, for shared state plugins, marking the array as 'readonly' doesn't help a lot. This test makes me more convinced to stay away from shared state plugins like vuex.

We can try to optimize this, but imho this is just a problem of how the reactivity module is used. No optimization is ever going to get close to using the optimal solution, which is to just use a shallowRef. So I'm not going to spend time on this one, but if you want to give it a go, feel free and I can think along.

As for the solution I was thinking about, which could work for both arrays, collections and objects:

When iterating (calling forEach, sort, map, etc), track the (new) ALL trigger type for the activeEffect (Notice that instead of listening to forEach, map, sort invocations in the proxy get trap, it might be better to just listen for length, Symbol.iterator. That also captures the for-loop and spread operator correctly
In the get trap, check if for the activeEffect, ALL was already tracked; if so, tracking the individual index is not necessary. If not, track indices as normal.
For any mutation trap in the proxy (set, deleteProperty) trigger the 'ALL' type as well.

This would indeed prevent creating and adding to a lot of Deps sets while iterating. I'm pretty sure it could make iterating reatice arrays maybe 10x faster.

A downside is that, when setting all items in the array in a loop, the set-trap will trigger the same reactive effects many times unnecessarily. But I think, as this happens follows the same paths (ALL deps instead of individual index deps) that this could become a lot faster as well due to memory caches.

johnsoncodehk

Oct 10th 2023

Hi! I have a very similar idea and have done a very basic PoC in https://github.com/johnsoncodehk/core/pull/14 and can already see a 20x faster performance improvement.

I'm just throwing this out there because making this work completely would require a lot more careful work that I don't have the time for at the moment, at least to prove feasibility. ;)