Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Messaging rework #538

Merged
merged 15 commits into from
Jan 8, 2016
Merged

Messaging rework #538

merged 15 commits into from
Jan 8, 2016

Conversation

vincent-zurczak
Copy link
Member

OK. After two minor issues, I focused on reworking the messaging. 😈
The client part of the messaging (what is used in the DM and agents) was already good enough (we refactored the code more than 6 times, it is mature enough now). What was complex was creating a new messaging implementation.

I created a new interface for messaging implementation. I also wrote a mediation layer between the messaging's business API and the API for messaging implementation. The messaging API now gathers the business logic used by agents and the DM. and it invokes technical implementations of messaging clients. This part is more simple than before. The iPojo metadata were also simplified.

I then created an in-memory implementation of Roboconf's messaging. It only works when the DM and agents run in the same JVM (e.g. with agents in-memory). I then fetched Pierre-Yves's work on the HTTP messaging. After squashing the commits, I refactored the code to rely on the new extension API. It was tested with in-memory agents and with Docker (IP = 172.17.0.1).

Eventually, I worked on using several messaging implementations in integration tests. It led me to submit a patch to PAX Exam. Although I have to complete it so that it is accepted, I successfully tested these upgrades on Roboconf's integration tests. The AgentInMemory test can easily run with Rabbit MQ, in-memory and HTTP messaging implementations. I commented a part of these modifications until my patch is integrated and hopefully released along with PAX Exam.

Eventually, notice that this PR also solves #487.
It is not volunteer, but refactoring the messaging API bundle resulted in no more NPE during fast reconfigurations (which seemed to be the cause of #487).

@vincent-zurczak
Copy link
Member Author

Oops! Invalid user... I think invoking factory.stop() after every test should prevent some conflicts. I suppose there are remaning clients that run and influence other tests.

@vincent-zurczak
Copy link
Member Author

Seriously, having the units test that fail randmly with the HTTP messaging is not a good sign for its maturity. I cannot determine whether this is due to the test conditions (starting a Jetty server by hand), a concurrency issue, waiting delays or if it is an implementation problem. 😡

The fact is real tests (by hand) worked.
And these tests do not fail on my machine. In addition to the code review, it would be good to clone my fork and try to run these tests on your machine too. It might help to find the problem.

@vincent-zurczak
Copy link
Member Author

I updated the messaging tests so that it always works with the HTTP implementation.
The randomness of their success was due to a race condition. Although subscriptions messages are sent first, it sometimes happens that they are processed by the DM (the messaging server in the DM) after messages that were sent right after them. Said differently...

  1. Agent 1 sends a subscription message.
  2. Agent 2 sends a HTTP message.
  3. The DM receives the subscription message.
  4. The DM receives the HTTP message.
  5. The DM starts processing the subscription message.
  6. The DM starts processing the HTTP message.

... is the normal flow. But sometimes, steps 5 and 6 are reversed.
I tried to profile what happens in web sockets, but this is really random. Sometimes, serialization goes faster for one...

Anyway, I added delays to secure this.
It was only occurring when several agents are involved. Questions could be raised by this update, but the abstract unit tests for messaging implementations do not exactly mimic agents behavior. These tests only guarantee some sort of contract for implementations.

@vincent-zurczak
Copy link
Member Author

BTW, I squashed my additional commits to fix tests on Travis.

gibello pushed a commit that referenced this pull request Jan 8, 2016
@gibello gibello merged commit 6fb82c0 into roboconf:master Jan 8, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants