Pretty slow section of code #51
ross-spencer
started this conversation in
Show and tell
Replies: 1 comment
-
For comparison, the database creation for the 230MB report (no checksums), 580,000 files which didn't complete even after 4 hours processing, is now down to just over two mins: --- 119.1657600402832 seconds ---
real 2m1.929s
user 1m59.907s
sys 0m1.821s So that helps. The demystify part of this still takes over 40 minutes so will be looking for optimizations there too. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This change in
sqlitefid
points to a pretty slow piece of code.I was using a loop to iterate through an increasing amount of data to update a field in a dict - each time the loop ran I'd also be updating the field with exactly the same information... pretty redundant. Don't do it kids!
Anyway, the optimization means the govdocs sample which previously took over 2 mins:
Now only takes 6 seconds, unless I've missed anything.
This will be fixed with the py2/py3 release, but it might be worth back-porting to the py2 only release. We'll see which can be delivered first. The hope is that he code will continue to work on both interpreters.
The code is also a bit of a building site right now, but slowly moving. The diff from the fix today and some small other changes I started making while tracking it down is below. For anyone who can apply the patch for themselves.
As you can already identify from the diff there'll be more tests and there should be some further optimizations in time, but this does seem a pretty chunky one. I'm about to try it on a 230mb report whichwas taking over 4 hours yesterday so we'll see how much this helps.
Beta Was this translation helpful? Give feedback.
All reactions