Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Versions chart broken if load balancer is used? #184

Open
larsxschneider opened this issue Jun 5, 2019 · 2 comments
Open

Versions chart broken if load balancer is used? #184

larsxschneider opened this issue Jun 5, 2019 · 2 comments

Comments

@larsxschneider
Copy link
Collaborator

The versions chart uses the haproxy.log to determine the Git client versions in use. This has two disadvantages:

  • only HTTPS Git connections are monitored
  • if a load balancer is used then the source IP is 127.0.0.1 in the haproxy log and therefore we cannot distinguish different machines
@larsxschneider
Copy link
Collaborator Author

larsxschneider commented Jun 5, 2019

This snippet should get the versions from the audit log:

$ zcat -f /var/log/github-audit.log.1* | perl -ne 'print if s/.*agent=git\/(\d+(?:\.\d+){0,2}).*"user_id":(\d+).*/\2\t\1/' | sort | uniq | perl -lape 's/\d+ *//' | sort -r -V | uniq -ic

This should also include SSH connections. However, I wonder should take any Git client into considerations with a regex like this (note: I use the user instead of IP here too):

$ zcat -f /var/log/github-audit.log.1* | perl -ne 'print if s/.*agent=git\/(\d+(?:\.\d+){0,2}).*"user_id":(\d+).*/\2\t\1/' | sort | uniq | perl -lape 's/\d+ *//' | sort -r -V | uniq -ic

Even further we could look for other agents:

$ zcat -f /var/log/github-audit.log.1* | perl -ne 'print if s/.*agent=([^\/" ]+\/\d+(?:\.\d+){0,2}).*"user_id":(\d+).*/\2\t\1/' | sort | uniq | perl -lape 's/\d+ *//' | sort | uniq -ic

This would also record JGit clients and the output would look like this:

      1 	git/2.9.0
      2 	git/2.9.2
      3 	git/2.9.3
      4 	JGit/4.5.0
      5	JGit/4.6.1
      6 	JGit/4.8.0

@pluehne Could that be easily be made compatible with your versions check logic? Could everything not git/* be considered "unknown"?

larsxschneider added a commit that referenced this issue Jun 5, 2019
Previously we read the versions from the haproxy log and used the
originating IP address to distinguish machines/users. In a load balancer
setup this IP address can be the same for all requests and consequently
the number of users per version are not counted correctly.

Fix this by using the audit log to count the Git versions. Also count
the versions by user ID and not by IP address.

see #184
larsxschneider added a commit that referenced this issue Jun 12, 2019
Previously we read the versions from the haproxy log and used the
originating IP address to distinguish machines/users. In a load balancer
setup this IP address can be the same for all requests and consequently
the number of users per version are not counted correctly.

Fix this by using the audit log to count the Git versions. Also count
the versions by user ID and not by IP address.

see #184
@pluehne
Copy link
Contributor

pluehne commented Jun 24, 2019

@larsxschneider: I just realize that you addressed most of this issue in #185, which we already merged.

However, I like the idea of recording the client agents in git-versions.tsv. We could extend the chart to support a mixed data format (with pure versions or versions prefixed by the client agent). Alternatively, we could also write a data migration that prefixes all older entries with git/ to make them compatible (would allow for a cleaner implementation on the dashboard side). What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants