Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructuring the SSL event handling #228

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

tilsche
Copy link
Contributor

@tilsche tilsche commented May 29, 2018

As per the OpenSSL documentation:

Any TLS/SSL I/O function can lead to either of SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE. In particular, SSL_read() or SSL_peek() may want to write data and
SSL_write() may want to read data.

The previous implementation considered this only partially and inconsistently. Particularly a write could call a read when it was caused by a readable event, but then the read failed and changed the state to reading and no more writing was done.

This PR simplifies the whole read/write handling. Basically the first read/write is always attempted¹ to find out what SSL wants (SSL_ERROR_WANT_READ/WRITE). This is saved in the state and used for fd monitoring. For the next process(), these flags are used as filters. read, write, and proceed is always invoked directly from process.

¹ The constructor does not attempt read/write, it sets the want_flags to readable | writable instead to watch and react to any kind of fd events.

Fixes #207 . This is only weakly tested with ASIO and libev. It needs more test & review.

Thanks to @bmario @kinnarr

tilsche and others added 6 commits May 29, 2018 17:43
always try to write if there is something in the buffer, always try to read
remember why SSL failed to use as monitoring flags - and filter in process()
Also refactors redundant code into a helper function
@tilsche
Copy link
Contributor Author

tilsche commented Sep 4, 2018

Could we maybe get some feedback to this PR? I'm also wondering how this relates to 80ce632

@EmielBruijntjes
Copy link
Member

There were indeed problems in handling SSL connections. The changes in 80ce632 were supposed to fix this. Does the issue still exist?

@tilsche
Copy link
Contributor Author

tilsche commented Oct 27, 2018

We have not tested 80ce632 because we have been running this PR's code-base for the last 5 months.

I don't think 80ce632 covers all problematic cases. The state handling is still messy. Consider the example that we observed failing:

Particularly a write could call a read when it was caused by a readable event, but then the read failed and changed the state to reading and no more writing was done.

I can see this still happening as there is no guarantee that the read will fully succeed just because the socket was actually readable (as opposed to wrongly cached readable).

Overall I think running your own checks on the socket is the wrong way to go if you are in an async loop. This introduces possibilities for and redundancy, as well as many redundant syscalls and thus performance issues.

That said the blocking select call in flush seems highly problematic to me as well, but that may be another topic...

@tilsche
Copy link
Contributor Author

tilsche commented Nov 22, 2018

@EmielBruijntjes We took a look at the recent b81bc34. It seems that it still uses the limited states which can end up inconsistent.

While reproducing issues with this is extremely difficult, I believe there is a clear argument for the code. We are happy and would appreciate to discuss this further.

@EmielBruijntjes
Copy link
Member

I do agree that the earlier implementation of the SslConnected class was not correct. But since the latest changes I have the impression that it does work. Can you explain which (edge) cases we are missing with the current implementation?

@tilsche
Copy link
Contributor Author

tilsche commented Nov 26, 2018

The case I am concerned about is the following:

  1. SSLConnected::write
  2. _out.sendto() returns > 0
  3. still data in _out
  4. socket becomes readable because of handshake data coming in
  5. falling through the loop because of isReadable()
  6. SSLConnected::receive
  7. receive will handle the handshake but fails with SSL_ERROR_WANT_READ because there is no actual payload data to receive
  8. _state = state_receiving
  9. onIdle(readable)
  10. we could write now, but process is never called
  11. even if process would be called we would be stuck in _state == state_receiving which fails with WANT_READ as long as the server doesn't send anything, maybe never

@EmielBruijntjes
Copy link
Member

EmielBruijntjes commented Nov 26, 2018

About point 7: does this indeed happen? I though that SSL_ERROR_WANT_READ is only returned when the read-operation should be repeated (because it was only partially ready). If the read-operation fails because no data is available, it returns 0 and goes back to the idle state.

@tilsche
Copy link
Contributor Author

tilsche commented Nov 26, 2018

If SSL_read returns 0, it is the error case (<= 0).

Also:

  SSL_ERROR_NONE
      The TLS/SSL I/O operation completed.  This result code is returned if and only if ret > 0.

So I don't see a way to fall back to regular idle with 0 bytes received.

@EmielBruijntjes
Copy link
Member

About step 10: is that permitted? If SSL_ERROR_WANT_READ was returned by an earlier call, is it then allowed to call ssl_write() instead of repeating the ssl_read() operation?

@EmielBruijntjes
Copy link
Member

EmielBruijntjes commented Nov 26, 2018

I don't think it is permitted to mix ssl_read() and ssl_write() calls if one of them returned WANT_READ or WANT_WRITE (check https://stackoverflow.com/questions/35138931/can-you-interleave-calls-to-ssl-read-ssl-write-when-one-of-them-returns-ss for example). As far as I see, the only option then is to repeat the call. The scenario that you are painting is, simplified, as follows:

  1. socket becomes readable because of handshake data coming in.
  2. we call ssl_read() because of this.
  3. this returns SSL_WANT_READ because the data was only handshake data and there is no actual data.
  4. from now on the only legal method to call is ssl_read() because openssl does not allow calls to ssl_write() or ssl_shutdown() until the previous ssl_read() is finished.
  5. the program will never write again.

This is either an (unlikely) bug in openssl, or openssl already deals with this and this scenario does not occur.

@tilsche
Copy link
Contributor Author

tilsche commented Nov 27, 2018

I think the StackOverflow answer is wrong, but there is also a link to the mailing list which discusses a slight variation scenario I am concerned about:

  1. You call SSL_write. It returns -1, WANT_READ, because a renegotiation is
    in progress.

1A) The renegotiation data arrives.

  1. Independently, you SSL_read. It reads the renegotiation data but no
    application data, so it returns WANT_READ.

  2. You select for read, since OpenSSL told you to do so. The select will
    take forever, because the renegotiation data has already arrived and been
    read. Your SSL_write would succeed immediately now.

When SSL_read returns anything it invalidates any prior WANT_* return from
SSL_write because the SSL_read may have made forward progress. Any call to
an SSL_* function may consume data. You cannot assume that because the data
hadn't been read when you called SSL_read it still hasn't been read after
you called SSL_write.

This will screw you on renegotiations.

In AMQP-CPP, step 1) would cause a repeated SSL_write instead of SSL_read. But I don't see any reason why this 1A) could not happen without 1). The mailing list post strongly indicates that mixing SSL_read/SSL_write is ok.

One point regarding this PR is that the want-flags should be reset to their default whenever any of the functions has received new want flags. I somehow think the default want-flags for SSL_write should be only writable.

Unfortunately the official documentation is indeed not very helpful. There is one paragraph in man SSL_get_error

It is safe to call SSL_read() or SSL_read_ex() when more data is available even when the call that set this error was an SSL_write() or SSL_write_ex(). However if the call
was an SSL_write() or SSL_write_ex(), it should be called again to continue sending the application data.

The second sentence is rather confusing - I suspect it means "it should eventually be called again". It doesn't say whether it is safe to call SSL_write after a SSL_read failed.

@tilsche
Copy link
Contributor Author

tilsche commented Nov 27, 2018

There's also this paragraph in man SSL_get_error (emphasis mine):

Caveat: Any TLS/SSL I/O function can lead to either of SSL_ERROR_WANT_READ and SSL_ERROR_WANT_WRITE. In particular, SSL_read_ex(), SSL_read(), SSL_peek_ex(), or SSL_peek()
may want to write data and SSL_write() or SSL_write_ex() may want to read data. This is mainly because TLS/SSL handshakes may occur at any time during the protocol
(initiated by either the client or the server); SSL_read_ex(), SSL_read(), SSL_peek_ex(), SSL_peek(), SSL_write_ex(), and SSL_write() will handle any pending handshakes.

One might read into this

  • Server starting a handshake can very well happen, and thus my "failed read - never write" scenario
  • The listed functions all handle pending handshakes, thus you may mix them - and should reset the want-flags from previous calls to their presumed defaults.

@EmielBruijntjes
Copy link
Member

Thanks for all your input. I find it very interesting because it would mean that I made some mistakes in other software too. But there are still a lot of unanswered questions here:

  1. Is it ok to mix ssl_read and ssl_write? I always read the documentation so that the same call had to be repeated. The documentation that you quote does indeed say that it is ok to start writing in the middle of a read-operation, but not the other way around (at least that is how I read it). And you are quoting openssl 1.1, while we want to support openssl 1.0 too, which does not contain this text. Maybe openssl 1.1 is more tolerant than 1.0?

  2. But even if it is not forbidden to mix them, is it then also wrong to not do that, and repeat the same operation until it is finished, before switching to the other operation? My opinion is that this is not wrong.

  3. Does it ever happen that a peer starts a renegotiation handshake without it being part of a normal data transfer or shutdown? In other words: there is no intention to send data, but still the peer sends a SSL handshake? This is the 1A scenario. This would indeed trigger a ssl_read, that could indeed lead to an infinite want-read situation. But I doubt this happens. I think that the handshakes only take place when some meaningful data is sent over the line too (afaik openssl does not have an internal timer that spontaneously start sending data over the line, the other party must thus have called ssl_write() or ssl_shutdown() to trigger the handshake). But it should not be so difficult to write a small server application that sends no data, but that does periodically send renegotiating handshakes? That would be the perfect demonstration that our SslConnected class is conceptually wrong.

@tilsche
Copy link
Contributor Author

tilsche commented Nov 28, 2018

I don't have authoritative answers, just my interpretation of the documentation / mailing list.

  1. I did not find anything that says that you can't. Nothing says that you must repeat an operation before calling the other. For what it's worthSSL_read says (emphasis mine) "The calling process then must repeat the call after taking appropriate action to satisfy the needs of the read function", but SSL_get_error says "can be retried later", "... can be called again". I don't know about 1.1, to some extent I am assuming that it is just refinement in the documentation.

  2. Well - if you need to send data you must call SSL_write - in that sense it's wrong to not call it. I think the problem with AMQP is that it's asynchronous whereas with most other protocols its always clear who needs to send and who needs to receive next.

  3. I read SSL_get_error that way: "TLS/SSL handshakes may occur at any time during the protocol (initiated by either the client or the server)".

@EmielBruijntjes
Copy link
Member

My colleague has been further looking into this. We now think that it could indeed be possible to mix calls to ssl_read() and ssl_write(), but we do have to check it more closely to be sure. We also discovered that although the openssl 1.0 docs says that a repeated ssl_write() must be called with exactly the same parameters as the previous call (our interpretation: with the same memory addresses), that the call can in reality also be repeated with the same data, but stored on a different location. This would allow us to copy buffers only when a repeat is necessary, instead of prematurely before each call just in case the call fails.

Anyway, there is some room for improvement in the SSL handling, because we no longer have to delay writing if a read operation is also in progress, and the other way around. And we do not have to buffer all outgoing data, but only the data that could not be sent at the first attempt. Whether the current implementation is now also broken, is not yet sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants