sort could be faster #194

odinthenerd · 2016-08-08T16:00:11Z

Ok I admit its my pet peeve ;) I find it interesting how combining algorithms are not efficient in runtime stuff but are efficient in metaprogramming. Basically the current implementation is good to about 500 elements, however even with the fast tracks merging 16 elements into 500 elements will be slow.

Coming back to the original partition implementation the problem was that a bad pivot element would cause a partition of everything only to eliminate one element from the input list. If we were to build up a large list using the merge/insertion sort, split it in the middle and then use that as a pivot at least we are eliminating half of the output list each partition even if we are not eliminating anything from the input list.

Another approach would be to sort chunks of lets say 64 elements each and then "partition"-ing them would essentially be a "split_if" rather than a normal partition. Since we need to presort the lists for merging any way we aren't really losing much and splitting a sorted list should be more efficient than a classic partition on all of its elements. This would essentially be a pure merge sort combined with a split_if.

I would really like to get to 1000 elements for kvasir.

odinthenerd · 2016-08-24T12:23:02Z

@jonathanpoelen OMG your sort works up to several thousand elements! You are my hero!

odinthenerd · 2016-08-24T14:46:26Z

I think the algorithm is taking most of its time in merge. Problem is that merge is hopelessly recursive, I think we need to do some partitioning to shorten merge runs.

odinthenerd · 2016-08-24T15:18:52Z

I added a partition here https://github.com/edouarda/brigand/blob/master/brigand/algorithms/sort.hpp#L191 by splitting our 256 element sorted list and using the middle as a pivot element to partition the rest of the input as described above. The result is a 28% speed up on my machine. I think this is the right direction.

odinthenerd · 2016-08-24T19:30:31Z

here is the fast tracked version https://github.com/porkybrain/brigand/tree/merge-partition-sort

on my machine 2000 !!! element list in random order
old : 12.81s
new : 6.67s

not sure yet if I broke any thing, there are a lot of moving parts in this algorithm.

brunocodutra · 2016-08-24T19:41:29Z

Out of curiosity, have you benchmarked metal's sort (on branch develop)? It is not directly fast tracked or anything fancy, so I don't expect it to beat brigand's, but on my machine it sorts a 1000 elements in around 4 seconds and all it takes is just about 20 lines of code, so perhaps it could be much simplified and still remain fast? Don't get me wrong, it's just that I flinch when I see that much fast tracking and the verbosity it entails.

odinthenerd · 2016-08-24T22:44:35Z

Wow I wasn't aware that you had improved it that much. I tested metal several months ago and it was much slower but that sounds quite similar to brigand and, as most of your stuff is, solved in a puristic fashion. I think we can slim down the brigand algorithm quite a lot.

brunocodutra · 2016-08-24T23:58:43Z

I wasn't aware that you had improved it that much

Since our last discussions metal's API became more stable, so I decided to investigate which algorithms would benefit most from fast tracking and it basically boiled down to join, fold and reverse (brunocodutra/metal#41).

sounds quite similar to brigand

Take it with a grain of salt until we compare results on the same machine, because we might be comparing apples to bananas here depending on how different our hardware and setups are, but at a first glance it looks like metal doesn't lag too much behind brigand, which makes me proud to be honest.

odinthenerd · 2016-08-25T09:56:17Z

either you dev system is awesome or I'm doing something wrong, here is my code:

template<typename T, typename U>
using eager_less = meta::number<(T::value < U::value)>;
using a = typename metal::sort<my_list, metal::lambda<eager_less>>::type;

using brigand I get 0.55s for 300 elements and 15.26s for meta. Meta crashes my laptop at 400 elements, brigand goes to like 3000.

I'm thinking about a making a generic "chunker" which splits a list into smaller lists of a fixed size (like chunker16 or chunker256) which would allow us to fast track and still use generic fold and would save us from writing so damn many typenames in every algorithm.

brunocodutra · 2016-08-25T10:17:25Z

I suppose you meant metal::number c: Also, Metal has been eager for a while, so that typenane ::type there is redundant. Finally, check whether results vary if you use metal::less instead. Later tonight I'll try to benchmark brigand's sort on my machine and let you know. Also what are you using for an input list? My numbers were based on a ordered list being sorted with greater (i.e. reversing it).

odinthenerd · 2016-08-25T14:05:51Z

Ok now I'm using a list of ransom metal::number and using a = metal::sort<my_list, metal::lambda<metal::less>> and the numbers are slightly worse than last run.

brunocodutra · 2016-08-25T14:27:08Z

I have been blindly using a list ordered in reverse order as a worst case test for sort, but that obviously doesn't apply for merge sort, now that I take a closer look at it. Silly me, looks like I have been inadvertedly cheating :c

Curiously, that means merging is the culprit, so perhaps it deserves fast tracking. I'll see what I can lern from Brigand's.

brunocodutra · 2016-08-25T14:27:37Z

Thanks for benchmarking btw!

odinthenerd · 2016-08-25T15:07:15Z

no problem, @ldionne made the same benchmarking mistake a while back ;) fast tracking merge is pretty hard because it is essentially a proper fold, @jonathanpoelen's idea of making a kind of partial merge seems to be the key behind the last speed up and is a genius idea. splitting the merged list and using that element as a pivot in order to partition the rest of input seems to bring another speed up (can be seen in my unmerged branch) but I think that can be improved on further.

brunocodutra · 2016-08-25T15:15:27Z

I was just taking a look at his implementation of merge and that partial fast tracking is pretty ingenious indeed. I'm still not convinced it is necessary though. The fact metal's sort crashes your machine means the memory is overflowing and that usually stems from recursive partial template specialization that can often be worked around. I have some ideas I still want to give a shot at before going fast tracking. I'll let you know if I get anywhere.

odinthenerd · 2016-08-25T15:24:41Z

Nice to see attention to sort, it is really something I need done well for kvasir. I would love the hear original ideas!

brunocodutra · 2016-08-25T16:43:27Z

BTW, which compiler are you using for these benchmarks?

odinthenerd · 2016-08-26T11:50:04Z

Clang 3.7

brunocodutra · 2016-08-26T14:25:08Z

So I was able to pin join as the culprit for the memory overflow. I even managed to rewrite it so as to reduce memory consumption by about 20-30% on gcc, but it is still prohibitively high.

If only there was a way to implement take directly just like one implements drop, divide-and-conquer algorithms wouldn't have to depend on join (nor on any linearly recursive algorithm for what's worth).

brunocodutra · 2016-08-27T01:09:30Z

Finally, I decided to implement merge the dumbest way possible and, to my surprise, that's what happens

gcc 6.1	clang 4.0

odinthenerd · 2016-08-31T09:05:09Z

I finally found time to look at your solution. I think one thing your doing that is faster than us is your merge specialization struct _merge<ret, list<xh, xt...>, list<yh, yt...>, lambda<expr>, if_<expr<xh, yh>, true_, false_>> where you know that the last parameter is going to be true_. The problem is that this restricts the concept of a predicate in that it must return a metal::number. Currently (as far as I know) brigand will accept anything that has a ::value as a predicate, which is far less restrictive. I wonder if our loose definition of this is causing a performance hit.

brunocodutra · 2016-08-31T11:05:25Z

Well observed, that is a distinctive feature of Metal. It is very strict with respect to concepts and requirements, which although often requires extra overhead (e.g SFINAE friendliness not matter what), sometimes also opens room for performance improvements as well. I'm not fond of overly lax requirements, because I feel that makes concepts harder to grasp and leads the user to unexpected surprises. And maybe because I'm a little bit too obsessed with symmetry.

odinthenerd · 2017-01-30T11:59:47Z

have not been able to put a whole lot of time into this but brigand::sort could benefit from chunking the input using a meta monad pattern as well as joining many lists rather than 2 at a time. Join can be implemented using more aliases and less complex types as well, see metals implementation of join for a reference. Also the nested alias version of conditional should help too.

Just wanted to keep the thought process going.

edouarda added the enhancement label Aug 16, 2016

brunocodutra mentioned this issue Aug 27, 2016

Optimized sort #202

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sort could be faster #194

sort could be faster #194

odinthenerd commented Aug 8, 2016 •

edited

Loading

odinthenerd commented Aug 24, 2016

odinthenerd commented Aug 24, 2016

odinthenerd commented Aug 24, 2016

odinthenerd commented Aug 24, 2016

brunocodutra commented Aug 24, 2016 via email

odinthenerd commented Aug 24, 2016

brunocodutra commented Aug 24, 2016

odinthenerd commented Aug 25, 2016

brunocodutra commented Aug 25, 2016 via email

odinthenerd commented Aug 25, 2016

brunocodutra commented Aug 25, 2016

brunocodutra commented Aug 25, 2016

odinthenerd commented Aug 25, 2016

brunocodutra commented Aug 25, 2016 via email

odinthenerd commented Aug 25, 2016

brunocodutra commented Aug 25, 2016

odinthenerd commented Aug 26, 2016

brunocodutra commented Aug 26, 2016 •

edited

Loading

brunocodutra commented Aug 27, 2016

odinthenerd commented Aug 31, 2016 •

edited

Loading

brunocodutra commented Aug 31, 2016 via email

odinthenerd commented Jan 30, 2017

sort could be faster #194

sort could be faster #194

Comments

odinthenerd commented Aug 8, 2016 • edited Loading

odinthenerd commented Aug 24, 2016

odinthenerd commented Aug 24, 2016

odinthenerd commented Aug 24, 2016

odinthenerd commented Aug 24, 2016

brunocodutra commented Aug 24, 2016 via email

odinthenerd commented Aug 24, 2016

brunocodutra commented Aug 24, 2016

odinthenerd commented Aug 25, 2016

brunocodutra commented Aug 25, 2016 via email

odinthenerd commented Aug 25, 2016

brunocodutra commented Aug 25, 2016

brunocodutra commented Aug 25, 2016

odinthenerd commented Aug 25, 2016

brunocodutra commented Aug 25, 2016 via email

odinthenerd commented Aug 25, 2016

brunocodutra commented Aug 25, 2016

odinthenerd commented Aug 26, 2016

brunocodutra commented Aug 26, 2016 • edited Loading

brunocodutra commented Aug 27, 2016

odinthenerd commented Aug 31, 2016 • edited Loading

brunocodutra commented Aug 31, 2016 via email

odinthenerd commented Jan 30, 2017

odinthenerd commented Aug 8, 2016 •

edited

Loading

brunocodutra commented Aug 26, 2016 •

edited

Loading

odinthenerd commented Aug 31, 2016 •

edited

Loading