-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast multiplication #11
base: master
Are you sure you want to change the base?
Conversation
|
||
int usable_len = BN_ARRAY_SIZE; | ||
|
||
/* this section speads up algorithm by "cutting" len of bignum*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to pull this part. Let me first comment it, and then tell you the reason why I would like to avoid it,
I appreciate the effort and understand why you would want this here (and in every other functions. You could also maintain a bitset of highest used bit to optimize for speed.
However it adds to the size of both source- and object code. I also fear you might miss out on vectorization optimizations from the compiler, but I wouldn't know for sure. It also makes you susceptible to leaking info via timing if used in encryption algorithms. I mostly want to take a raincheck because it also breaks the beautiful simplicity of the naïive algorithm.
Thanks for the PR - I really appreciate the effort you've put into this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vector operations bases on processor words so don't worry.
I understand the timing problem in cryptography.
I am not sure are you talking about the same peace of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the section with finding actual length of number can be removed.
usable_len = usable_len > BN_ARRAY_SIZE ? BN_ARRAY_SIZE : usable_len; | ||
// | ||
|
||
for (int i = 0; i < usable_len; ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is brilliant. I love how you've reduced the allocation of two bignum's into none. That really cuts straight to the bone 👍
However I would like to avoid the scoped-declarations of variables inside the for-loop for stylistic reasons and to be a bit more portable against shitty/subset compilers. I will pull it as-is and then edit it myself, no worries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Declarations in for loops are provided by C standard, so if compiler does not support the standard it is not a compiler but shit. It is better style to use temporary variables like iterators in scopes. But if you want to keep same style it's ok for me.
The main improvement is reducing time complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old multiplication is O(n^3). New is only O(n^2).
For me it is ok to merge it and edit in way you like ;) |
I got away from this PR @kolkil - sorry about that - I took some time off and forgot everything about software development :) I will try and get it merged and edit whatever small details I want to change. If I keep forgetting, please remind me again that I want to merge this PR. |
Good day, First off, I currently use a heavily modified version of this library and hence would ask for someone to try to reproduce this with a "clean" version of the library. However, as the alternative multiplication function does not use any of the other functions, I am fairly certain this is a rare bug in the logic. I'm not yet sure what pattern triggers it, but I do know this works very well in most cases. However, when trying to integrate this with a custom crypto stack, I started to get weird results and decided to compare this mul algo against the original. The first result is the incorrect result of this faster multiplication algorithm (locally modded, but unmodded seems to cause the same issue), the second result is the correct (verified against OpenSSL BN) solution of the original. If I find out what causes this, I'll report back, but I'll need to familiarise myself with the algorithmic idea for a bit.
|
Could you show the code that generated this? |
@kolkil Thanks, I will check it out asap. I'm a bit low on time right now and cannot provide you a sample caller, but you should be able to reproduce it by just parsing above's operands and calling the alternative mul. Frankly, I still do not understand the algo, but I found it odd there was no overflow check around here: https://github.com/kolkil/tiny-bignum-c/blob/fast_multiplication/bn.c#L309 EDIT: Yes, what I found was correct, and this can overflow too: https://github.com/kolkil/tiny-bignum-c/blob/fast_multiplication/bn.c#L308 I cannot give you a diff as, as I said, my local copy is modded, but for the first overflow it is sufficient to store the additive result separately and compare for being smaller than one of the previous operands (i.e. tmp2 = c->array[i] + (DTYPE)tmp, tmp2 < (DTYPE)tmp). This works because, when you think of the negative two's complement, -1 is MAX_UINTx (x being the width), so a wrapped around result must be smaller than both original operands. By the way, with Clang and an X64 target, this uses CF, so it should be pretty good performing. This can be applied to the add and sub operations as well to get rid of the 64-bit int dependency. If someone manages that for mul too, the array int size could be raised to 64-bit to increase performance (given the mul solution is not slow). For the second mentioned overflow, I just raised tmp_to_add to DTYPE_TMP and instead of initialising back to zero, I left-shifted by 32 bits to not discard the high bits. I am not convinced it cannot overflow again given unlucky input, but I will think of something later, or hope you will come up with a new solution (or an argument for why it cannot overflow again). EDIT2: Looking at the amount of iterations where tmp_to_add can be increased and the high results multiplications can yield, I don't feel like tmp_to_add is a feasible option. It might need a function that propagates additions upwards on the BIGNUM itself Thanks for your work |
I understand the problem, gonna fix it soon, thank you ;) |
@kolkil I don't mean to be rude at all; have you had a chance to look at this? |
This is alternative multiplication method based on column operations.
https://www.youtube.com/watch?v=zm3tQ_BPgm8 - idea is from this video.
I also added test that compares methods.
The current multiplication method should be replaced with new multiplication method.