Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow to launch and display more than 200 tasks #109

Open
nekohayo opened this issue May 24, 2015 · 16 comments
Open

Very slow to launch and display more than 200 tasks #109

nekohayo opened this issue May 24, 2015 · 16 comments
Labels
enhancement performance Issues affecting the speed and responsiveness of the application priority:high reproducible-in-git Issues that affect the current dev version

Comments

@nekohayo
Copy link
Member

Essentially, https://bugs.launchpad.net/gtg/+bug/587307 still affects the current (0.3.1) version. If this was solved, there would be no need for issue #43.

@sametmax
Copy link

+1

@mkorvas
Copy link

mkorvas commented Jun 22, 2017

After a crash, gtg has been eating 100% CPU for whopping 40 minutes that it's been launching so far. There is no indication of how much longer it's going to take. I normally don't stop gtg exactly for this reason -- by the time it has launched, half of my working day is gone. It's doing something really stupid, my tasks file has some 750 <task> elements, really not something to justify so much CPU time.

@mkorvas
Copy link

mkorvas commented Jun 22, 2017

(Not sure why Github says that I unassigned @izidormatusov ; I didn't mean it that way.)

@diegogangl
Copy link
Contributor

Of course, this needs to be profiled but IMO python + xml = performance poison. I bet most of that time is spent on disk IO and sloooowly looping over all the xml tags. There are probably some quick fixes to improve performance a little, but the real deal would be switching to sqlite or something like that.

@ksamuel
Copy link

ksamuel commented Dec 18, 2019

Not at all, using lxml, most of the XML processing is done using C. As for I/O, no language is ever the bottleneck: the slowest one is a 100 times faster than disk access.

Anyway, there is no reason those to ever be an issue as you are no supposed to work on the XML directly, but load it in memory, translate it into an optimal datastructure, and only work on that, then dump the result on disk as XML only when necessary and in a separated thread.

All in all, the biggest problem of GTG is not performance, although I'm pretty sure the code is suboptimal give the small loads it has and how slow it is (this smells like quadratic algo to me, and probably working directly with data on the XML).

No, the problem is it's doing the work in the same thread as the UI, so any slow down freezes the interface and makes it feels sluggish.

Now 5 years ago I would have probably dug in the code and provide a patch, but I'm no longer using GTG: I switched to dynalist, a proprietary alternative. However, if someone wish to make it fast, follow what I wrote up there, there 80% of chance that those are the problem.

@ploum
Copy link
Contributor

ploum commented Dec 18, 2019 via email

@diegogangl
Copy link
Contributor

My bad guys, I got this mixed up with #43 which involves reading/writing to xml.
So yeah, besides profiling and finding that possible bug moving to lxml would surely improve performance.

@diegogangl
Copy link
Contributor

I profiled this with cProfile, looks like the main culprit is in in liblarch. I don't really know much about liblarch but it seems to be the __sort_func being called everytime a row is inserted in the model.

We should revisit this for 0.5

Here's the prof in case anyone wants to take a stab:
all_tags.zip

(you can open it in snakeviz)

@nekohayo
Copy link
Member Author

https://github.com/getting-things-gnome/liblarch/blob/master/liblarch_gtk/__init__.py#L340 ?

My 2¢ / hunch, without actually studying the code:

  • If sorting gets called on every operation, that indeed sounds foolish and, short of optimizing the sorting function itself, I suppose there would need to be mechanisms for the app to inhibit sorting and "trigger sorting when the time is right" (maybe based on a signal plus a resettable timer). I'm just guessing here, but I suspect that sorting only once the view/model has been populated, unless you can do cheap sorted insertion, is much more efficient than sorting the whole model on every insertion (if that's what it does currently).
  • Additionally, I guess the sorting function(s) could be reviewed to see if smarter algorithms can be used. If we could somehow "let Python handle the sorting" instead of whatever naïve sort we might have in place (I haven't checked, I'm just guessing) then there might be a massive gain there too.
    • Python's built-in sorting function is ludicrously fast. It uses a derivative version of an adaptive merge sort called Timsort, which runs in "n log2 n" time.

@nekohayo nekohayo added the performance Issues affecting the speed and responsiveness of the application label Apr 17, 2020
@nekohayo
Copy link
Member Author

I did a quick test today and clicked the "All tasks" view (while I had a tag selected), and GTG (re)loads all the tasks when you do that, which is when I realized that something definitely seems to make it single-threaded; otherwise I don't see why it would use "only one of the CPU cores at a time" as revealed by the graph during that time:

cpu-graph

@ploum previously said:

The process to read the XML file was really optimised and in its own thread. The whole thing prompted a huge rewrite of liblarch to be multi-thread safe, something which was really hard because GTK was not multi-thread safe.
But this could be a simple bug preventing thread to operate in parallel due to a lock somewhere.

I don't know if the problem is in liblarch itself or not. For what it's worth, there are a few mentions of "threading" in:

...but nowhere else.

If someone were to figure out a way to ensure liblarch+GTG use all CPU cores when searching & displaying tasks, then it would be blazing fast. Even without lxml, you would then divide the waiting times by a factor of 2, 4, 8 or more, depending on the user's machine.

  • Currently, with 833 tasks, GTG git (to become 0.4) takes ~21 seconds to launch.
  • Once launched, if you switch to another tag, when you click "All tasks" to show the 833 tasks again, it takes ~11-12 seconds to do so (this is surprising to me, I would have expected the numbers to be the same, actually).
  • If it was using 4 CPU cores instead of 1 on my machine, it would probably manage to launch in in 5 seconds and switch between tags in max 2 seconds; that's very significant for perceived speed, responsiveness and usefulness.

lxml might yield immense gains and we should definitely switch to it if upholds that promise (I think @diegogangl already secretly started a branch for that), but even then, unless full multithreading is obtained "for free" with the switch to lxml, I think spending some effort on that wouldn't be a waste. Unless we find out that after a switch to lxml, GTG can start up in less than 5 seconds with 8000 tasks even if using a single CPU ;)

@nekohayo nekohayo changed the title Very slow to display ~250-500 tasks Very slow to display more than 200 tasks Apr 30, 2020
@nekohayo nekohayo changed the title Very slow to display more than 200 tasks Very slow to launch and display more than 200 tasks Apr 30, 2020
@nekohayo nekohayo added the reproducible-in-git Issues that affect the current dev version label Apr 30, 2020
@nekohayo
Copy link
Member Author

nekohayo commented Jul 8, 2020

In case someone reading this wants to learn about some basic performance profiling techniques I mentioned in issue #402, see https://fortintam.com/blog/profiling-specto-and-whole-python-applications-in-general/ and other posts in https://fortintam.com/blog/tag/profiling/

@Neui
Copy link
Contributor

Neui commented Oct 29, 2020

I once tried to tackle this problem since it was also getting annoying for me. Documenting it here before I forget.
In my testing a startup takes around 10 seconds for like 600 open tasks. I/O doesn't seem to be the culprit.

I used python profiling feature and used flameprof to get an rough overview what is happening.

A lot of time is spend (re)filtering liblarch nodes in __update_node, that ends up emitting callbacks to insert data to the GTK Tree Model, which also sorts them (I think I once read they use some kind of tree, but I didn't confirmed it yet).
There is also an "update task" which sets values, I can't remember why they are there (I think tags are being added as we go? Too long ago).

@jaesivsm
Copy link
Contributor

Hello there, unaware the discussion was already taking place here, I did open an issue in liblarch which I'm closing to focus the findings here.

TL;DR :
  • Why __update_node doesn't works on its own (from root, deleting what needs to be deleted) ? Should we change that ?
  • Could/Should we have a task/tag version control logic to limit tree node updates ?
  • Why are Tasks pushing stuff into the queue ?
  • Noob question : what's a gray task ?
What I found

Ok so I did some more profiling myself and here's what I found (or what I'm unsure about) :

  • refilter (which calls might be optimized (maybe)) browse the children and calls __update_node which is recursive. I tried changing the logic, calling __update_node less and it works quite well (guesstimation : splitting the refilter call time by at least 5) also every parent task stays gray so I'm guessing I must be missing something. It's lost to me why calling on __update_node on the root_id would not work, are root nodes a different breed ?
diff --git a/liblarch/filteredtree.py b/liblarch/filteredtree.py
index 0b65bf5..3b31576 100644
--- a/liblarch/filteredtree.py
+++ b/liblarch/filteredtree.py
@@ -324,16 +327,9 @@ class FilteredTree(object):
 
         # Build tree again
         root_node = self.tree.get_root()
-        queue = root_node.get_children()
-
-        while queue != []:
-            node_id = queue.pop(0)
-            # FIXME: decide which is the best direction
-            self.__update_node(node_id, direction="both")
-
-            node = self.tree.get_node(node_id)
-            for child_id in node.get_children():
-                queue.append(child_id)
+        self.__update_node(self.root_id, direction="both")
+        for child_id in self.tree.get_root().get_children():
+            self.__update_node(child_id, direction="both")
 
     def __is_displayed(self, node_id):
         """ Should be node displayed regardless of its current status? """
  • about __update_node optimization :
    • lots of __update_node calls ends up with a update callback. It seems like update is the default (else: action = "update"). Managing some versions for each tasks (hash, sequence or whatever) and not triggering an update callback when nothing as been updated might also be the way to go (but I'm not sure of the implication on Gtk side, not knowledgeable at all on that lib).
    • the few first __update_node calls also removes a lot of the tree (calls on deleted callbacks). And with huge data set, I'm seeing a lot of tasks being removed before being re-added. The tree is also cleaned dry at first on a refilter, it might be overkill.
  • Something I don't understand that might be the cause of a lot of useless processing : there is two way in which the internal data of Gtg are binded to something in gtk :
    • one of liblarch tree, updated through stuff like either __update_node
    • the GTG.core.task.Task.modified method which is called on each and every task modification

@nekohayo
Copy link
Member Author

nekohayo commented Jan 3, 2021

My initial testing with @jaesivsm's branch in PR #530 showed various operations, including startup, having their duration cut roughly in half on the heavy "Bryce" sample dataset (./launch.sh -s "bryce"). 200%, that's nothing to sneeze at! But I don't know how it'd behave with my own personal data (I suspect performance may vary depending on the dataset, especially if yours has a lot of closed tasks).

  • I have posted some benchmark observations on PR 530 as a comment, but they may need to be measured again once a fix is provided for the issue I found there.
  • Arguably it would be a bit annoying to have switching between top tabs not be instantaneous anymore (i.e. the tabs vs tags tradeoff), but I guess I can live with that for now; in theory however, the tabs population should be done async in the background, to avoid locking up the application "at the last minute" when the user clicks those tabs...
  • Even without nice async background loading of "other tabs", the performance gains we would probably get from @jaesivsm's branch, will probably be a welcome relief until we have a real fix for the core problem; when that happens, it'll be an extra boost on top!
  • If we can somehow figure out a big optimization in liblarch, the "async tabs loading in the background" trick might not be strictly necessary if the loading is fast enough (i.e: less than a second for a thousand tasks)
  • If this whole stuff could somehow be made multi-threaded (as I've pointed out in one of my earlier comments above here), we would be able to either further divide the time to process the tasks for each operation, or be able to process all tabs at once (one per CPU thread on machines that have at least 2-3 CPU cores/threads). Multi-threading would be bruteforcing our way by leveraging hardware rather than optimizing the software, but even if we made all the possible optimizations, having multithreaded processing would be nice to have anyway.

@nekohayo
Copy link
Member Author

nekohayo commented Aug 22, 2022

This hopefully would be fixed (I hope) by the big core rewrite in #553, and what I'm about to say here is not exactly new information but it probably is worth noting here "just in case":

  • The XML file format has been optimized (with LXML etc.) so it's really fast, and it has absolutely no bearing on performance because, as I've observed again today with 0.6...
  • ...the number of shown tasks in the UI is what kills performance. On startup, with GTG set to "Actionable" view mode (so that it displays less stuff, as per the optimizations that were made possible in 0.6 with @jaesivsm's code), with my 2000+ tasks dataset, the difference between having the majority (maybe half?) of my tasks displayed on startup, vs "deferring" many hundreds of them into the future (ex: bumping them to the week-end), is the difference between a 48 seconds startup time and a 19 seconds startup time or less on my machine. It really is related to the GTK GUI, or the liblarch tree building, or both. Hopefully @diegogangl's big core rewrite bypasses/eliminates the whole problem.

@diegogangl
Copy link
Contributor

@nekohayo do you consider this is still the case? I'm testing with a +780 tasks file and it loads reasonably fast (considering I still have an HDD :) )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement performance Issues affecting the speed and responsiveness of the application priority:high reproducible-in-git Issues that affect the current dev version
Projects
None yet
Development

No branches or pull requests

9 participants