Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100x slower for long arrays than recalculating every time #81

Open
valerii15298 opened this issue Jul 7, 2023 · 4 comments
Open

100x slower for long arrays than recalculating every time #81

valerii15298 opened this issue Jul 7, 2023 · 4 comments

Comments

@valerii15298
Copy link

I am trying to build excel like tool in frontend with formulas.
As may guess the main point is to be able to change cell if its dependencies are changed.
From the first look I thought this library is ideal for me, but...
After starting using it I saw performance dropped by more than 1000x and app basically became unresponsive...

When I made isolated repro I see it is really slower(like 100x slower than recalculating every time with), and that is my main problem:

import { memoize } from "proxy-memoize";
const MAX_VALUE = 100;
function getRandomFloat() {
  return Math.random() * MAX_VALUE;
}

let nextId = 1;
function getId() {
  return nextId++;
}

type El = {
  col0: number;
  col1: number;
  col2: number;
  col3: number;
  col4: number;
  id: number;
};
const longArr = new Array(1000).fill(0).map((_, i) => ({
  col0: getRandomFloat(),
  col1: getRandomFloat(),
  col2: getRandomFloat(),
  col3: getRandomFloat(),
  col4: getRandomFloat(),
  id: getId(),
}));

function getCol0Value(arr: El[], rowId: number) {
  const currRow = arr.find((el) => el.id === rowId)!;
  return currRow.col1 * currRow.col2;
}

const mapFuncs = new Map<string, (arr: El[]) => number>();
function getCol0ValueMemoized(arr: El[], rowId: number) {
  const id = `${rowId} - col0`;
  const formula = mapFuncs.get(id);
  if (formula) {
    return formula(arr);
  }
  const memoizedFormula = memoize((arr: El[]) => getCol0Value(arr, rowId));
  mapFuncs.set(id, memoizedFormula);
  return memoizedFormula(arr);
}

function getRows(arr: El[]) {
  return arr.map((el) => {
    // change below `getCol0ValueMemoized` to getCol0Value and see performance boost
    return { ...el, col0: getCol0ValueMemoized(arr, el.id) };
  });
}
const initialGetRowsLabel = "initial get rows";
console.time(initialGetRowsLabel);
const _rows1 = getRows(longArr);
console.timeEnd(initialGetRowsLabel); // 500ms vs 4ms

// change dependency and recalculate
const newArr = longArr.map((el, i) =>
  i === 0 ? { ...el, col3: getRandomFloat() } : el,
);

const afterChangeGetRowsLabel = "after change";
console.time(afterChangeGetRowsLabel);
const _rows2 = getRows(newArr);
console.timeEnd(afterChangeGetRowsLabel); // 100ms vs 5ms

I am using zustand library for it. As you main see main idea is that cell may depend on any other cells from the array(or excel page).

Maybe proxy-memoize just is not designed for such use case?
Or maybe you can recommend some other library, or solution to not recalculate formulas for cell if its dependencies are not changed?
For now I just cannot use this library unfortunately...

@valerii15298
Copy link
Author

I tried my own approach, since in my case formulas only return primitive value, here is what I had come to:

const valueSymbol = Symbol();

// compare deps
function equalDeps(obj, keys) {
  for (const key in keys) {
    if (valueSymbol in keys[key]) {
      if (obj[key] !== keys[key][valueSymbol]) {
        return false;
      }
      continue;
    }
    if (typeof obj[key] !== "object") {
      return false;
    }
    if (!equalDeps(obj[key], keys[key])) {
      return false;
    }
  }
  return true;
}

function createProxy(obj, keys) {
  return new Proxy(obj, {
    get(target, key) {
      keys[key] = {};
      const originalValue = target[key];
      if (typeof originalValue !== "object") {
        keys[key][valueSymbol] = originalValue;
        return originalValue;
      }
      return createProxy(originalValue, keys[key]);
    },
  });
}

function memoize(fn) {
  return function memoizedFunc(obj) {
    if (memoizedFunc.keys && equalDeps(obj, memoizedFunc.keys)) {
      return memoizedFunc.prevValue;
    }

    memoizedFunc.keys = {};

    const proxy = createProxy(obj, memoizedFunc.keys);
    const value = fn(proxy);
    memoizedFunc.prevValue = value;
    return value;
  };
}

It is like 2x faster than using your library but still 50x slower then recalculating every time...
So it seems for my case using Proxies is not the way, and I will need to use selector based dependency comparison(kinda like reselect library).

If you have any ideas how this use case with excel formulas can be solved to not recompute every time I will be very thankful

@dai-shi
Copy link
Owner

dai-shi commented Jul 8, 2023

I think you are right.
I tried changing from el.id to index. While it's much faster but still 100x slower than pure function.

As we discussed in #80, the slowness comes from creating Proxies. Even if you have a big object, if you only touches a few properties, it's fine. But, if you walk the entire object and touch everything, it's costly.

It's very unfortunate. If your simplified version only gets 2x faster, we are probably out of luck.

@dai-shi
Copy link
Owner

dai-shi commented Aug 17, 2024

This should be a design constraint. That doesn't mean to stop coming up with new ideas though.

@valerii15298
Copy link
Author

I thought about making it via functions. Basically the same approach as with proxy but instead first traverse over every value and convert it to function in which remember path of accessed value(I would call it manual proxification). But then I undersatood that every time for new value we need to convert it to functions which is not efficient at all.

Even if you do it level by level(first convert only keys of root object to getters and then based on what getter will be called convert next value to getter), but even this does not seem efficient enough becasue there will be a lot of loops for big objects/arrays...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants