Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to display diff even without --dry-run #208

Open
fgabolde opened this issue May 5, 2015 · 15 comments
Open

Option to display diff even without --dry-run #208

fgabolde opened this issue May 5, 2015 · 15 comments

Comments

@fgabolde
Copy link

fgabolde commented May 5, 2015

Currently, commitable ops display the diff only if --dry-run is set, e.g.

pinto update -vv --dry-run Moose
# [verbose output...]
+[rf-] ILMARI/Devel-OverloadInfo-0.002.tar.gz
-[rf-] DROLSKY/Devel-StackTrace-1.27.tar.gz
+[rf-] DROLSKY/Devel-StackTrace-2.00.tar.gz
-[rf-] ZEFRAM/Module-Runtime-0.013.tar.gz
+[rf-] ZEFRAM/Module-Runtime-0.014.tar.gz
+[rf-] ETHER/Module-Runtime-Conflicts-0.002.tar.gz
-[rf-] DOY/Moose-2.0402.tar.gz
+[rf-] ETHER/Moose-2.1404.tar.gz
+[rf-] ETHER/Test-CleanNamespaces-0.18.tar.gz
-[rf-] ETHER/Test-Warnings-0.013.tar.gz
+[rf-] ETHER/Test-Warnings-0.021.tar.gz

I'm using Pinto from within Jenkins, to automate "promoting" distributions from one Pinto to another (e.g. from preproduction to production). I'd like my build logs to include the changes in all cases, not just when doing dry-run ops.

As a workaround, I'm running pinto update first with --dry-run, then again without.

@thaljef
Copy link
Owner

thaljef commented May 5, 2015

Running two commands doesn't seem unreasonable to me. It might look more natural if you did pinto diff after the update instead of using --dry-run before. However, there's no way to express something like diff HEAD^ as you would in git. So maybe that's the thing we really need here.

Does git have a similar feature (i.e. show a diff after commit)?

@fgabolde
Copy link
Author

fgabolde commented May 6, 2015

Running two commands doesn't seem unreasonable to me. It might look more natural if you did pinto diff after the update instead of using --dry-run before. However, there's no way to express something like diff HEAD^ as you would in git. So maybe that's the thing we really need here.

Yes, this feels more natural than my original proposition.

I was actually thinking, maybe the easiest way to do this is just

$ pinto log -d --latest

which would also go some way towards reducing my main gripe with pinto log -d, which is that it takes an unholy amount of time to display the results when talking to a remote Pinto server with a lot of changes, even if I am only interested in what my coworkers did last week while I was on holiday.

Does git have a similar feature (i.e. show a diff after commit)?

I was going to jokingly reply "even git doesn't have ALL the features", but actually it does... I just checked, and git help commit says that with --verbose, the message template includes the diff.

But anyway, the workflows with git and Pinto are different in this respect. With git, you prepare a commit by adding exactly the changes you want -- it doesn't add stuff by itself. With Pinto, you have your desired end state ("update Acme::Foobar") and it calculates what needs to be done to get there. There needs to be a simple way to ask it "what just happened?"

@thaljef
Copy link
Owner

thaljef commented May 6, 2015

$ pinto log -d --latest

Yes, that would work nicely too. I would still want to implement that in a git-like way. So the argument could be a stack name, or a commit id, or some symbolic thing like HEAD and it would just DWIM.

I just checked, and git help commit says that with --verbose, the message template includes the diff.

And that's exactly what pinto does. But you're doing this non-interactively so you never see the template message. I think that's the key difference in this situation.

@thaljef
Copy link
Owner

thaljef commented May 12, 2015

it takes an unholy amount of time to display the results when talking to a remote Pinto server with a lot of changes, even if I am only interested in what my coworkers did last week while I was on holiday.

Since about version 0.085 pinto streams output to the client, so you should get a pretty quick response. Which version do you have?

NB: pinto --version will tell you what version of the client is installed locally, but you'll have to login to the server and look at the pintod code to know which version the server is.

@fgabolde
Copy link
Author

The client is 0.09996 and the server is 0.09995.

These are large repositories with very frequent changes though:

STATISTICS FOR THE "master" STACK
-------------------------------------

                     Stack      Total
               ----------------------
     Packages       12543      175983
Distributions         969        3493

The output of pinto log -d on this repository was around 629K two months ago, just to give you an idea. With the very scientific method of "counting aloud", I measured ~50s before anything would appear in pinto log -d | less (only one screenful -- the next screenful took 50 more seconds, so this is not just $|).

On one of our smaller repositories, which has 65 revisions currently, it still takes ~5s to start displaying stuff, ~17s if piped to a pager -- so basically all the time, and 20.5s to finish (measured by time).

@thaljef
Copy link
Owner

thaljef commented May 14, 2015

Depending on your machine, most pinto command have a startup time of 2-5 seconds due to the overhead of Moose. The overhead is slightly less when using a remote repository because the server has already loaded modules that would have to be loaded by the client if you were using a local repository. I would like to covert everything to Moo, but that's not a high priority right now (any takers?).

Having said all that, I turned on autoflush in 35342ee and I saw a pretty big improvement for both local and remote repositories. You're welcome to try it out yourself, or just wait for the next release (maybe next week). To get the speedup with a remote repository, you'll have to upgrade both server and client. This probably won't improve your time measurements, but I think it will reduce the perceived latency.

@tartansandal
Copy link
Contributor

Would really need to get NTYprof or similar running on that to see what the
real bottleneck is, however, my guess is this has something to do with
Pinto using both SQLite and an ORM (DBIx::Class). I'm thinking we might be
getting sub-optimal SQL queries (via prefetch and the like) and SQLite is
not smart enough to produce a sufficiently good query plan. If this is
true, we might want to explore plugging in a real database like PostgreSQL
or crafting explicit joins so that DBIx::Class produces more optimal SQL.

Kahlil (Kal) Hodgson GPG: C9A02289
Head of Technology (m) +61 (0) 4 2573 0382
DealMax Pty Ltd GitHub: @tartansandal

Suite 1416
401 Docklands Drive
Docklands VIC 3008 Australia

"All parts should go together without forcing. You must remember that
the parts you are reassembling were disassembled by you. Therefore,
if you can't get them together again, there must be a reason. By all
means, do not use a hammer." -- IBM maintenance manual, 1925

On 14 May 2015 at 11:17, Jeffrey Ryan Thalhammer [email protected]
wrote:

Depending on your machine, most pinto command have a startup time of 2-5
seconds due to the overhead of Moose. The overhead is slightly less when
using a remote repository because the server has already loaded modules
that would have to be loaded by the client if you were using a local
repository. I would like to covert everything to Moo, but that's not a high
priority right now (any takers?).

Having said all that, I turned on autoflush in 35342ee
35342ee
and I saw a pretty big improvement for both local and remote repositories.
You're welcome to try it out yourself, or just wait for the next release
(maybe next week). To get the speedup with a remote repository, you'll have
to upgrade both server and client. This probably won't improve your time
measurements, but I think it will reduce the perceived latency.


Reply to this email directly or view it on GitHub
#208 (comment).

@fgabolde
Copy link
Author

I turned on autoflush in 35342ee and I saw a pretty big improvement for both local and remote repositories. You're welcome to try it out yourself, or just wait for the next release (maybe next week).

Neat!

I'd like to try it now, but I suspect my colleagues don't want me cowboy-patching the Pinto server, and I don't have a huge Pinto repository of my own to test on. I'll wait for the release.

On the topic of git-like revsets, it sounds pretty cool and it will definitely help. I can work on it if you like.

@thaljef
Copy link
Owner

thaljef commented Jun 9, 2015

On the topic of git-like revsets, it sounds pretty cool and it will definitely help. I can work on it if you like.

That would be much appreciated.

@thaljef
Copy link
Owner

thaljef commented Jun 10, 2015

I'll wait for the release.

@fgabolde Pinto-0.09998 has shipped.

@fgabolde
Copy link
Author

@fgabolde Pinto-0.09998 has shipped.

Thanks @thaljef, I'll pester IT to set it up :)

I've started work on the rev-list feature, I'll update you as soon as I've got something approaching usable.

@thaljef
Copy link
Owner

thaljef commented Aug 12, 2015

@fgabolde so did this work out for you?

Is it fast enough now to just use pinto log -d?

@fgabolde
Copy link
Author

@thaljef It's faster but it's still pretty slow :/ It got a little better for a while but since then the repository size has kept increasing, and so has the running time of pinto log.

I came across this as well while trying to add options to pinto log, but since you're using sqlite to store the revision ancestry as well, there is no really fast way to build the full history tree (your only option is to run one query for every ancestry level -- the parents method in the Revision result source). I couldn't get it to be fast without extracting ancestry info and dumping it into a different structure on file. That would potentially cause other problems since we would need to keep it synced with the DB...

@thaljef
Copy link
Owner

thaljef commented Aug 12, 2015

Well, rendering all log entries is always going to take more time as the history gets longer. But the time to render the first N log entries should be constant, and I thought that was the primary issue here. Is that not the case?

Yeah, the data structure pretty much requires a query for each revision. I wonder if it would be any faster to just fetch all revisions (or some large chunk of them) and then build the tree in memory.

@fgabolde
Copy link
Author

Well, rendering all log entries is always going to take more time as the history gets longer. But the time to render the first N log entries should be constant, and I thought that was the primary issue here. Is that not the case?

If you check back to the first comment on this ticket, the issue was originally that I wanted to show dep diffs in Jenkins build logs, and I was using pinto update --dry-run for this, avoiding pinto log -d because it was slow (more than doubling build times in extreme cases, actually). Then the discussion turned to adding features to pinto log -d to limit its output.

Regarding the current speed of pinto log, it's fine... if someone is in front of the machine to press C-c once enough output has appeared. So at least one of the use cases ("what have the guys been doing last week") works now.

Yeah, the data structure pretty much requires a query for each revision. I wonder if it would be any faster to just fetch all revisions (or some large chunk of them) and then build the tree in memory.

Rather than all revisions, we only need the ancestry table and one starting revision for building the tree (fortunately the ancestry table is likely to be much smaller, with fewer columns, all ints). From there we can fetch the revisions in batches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants