-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WORKDIR learned to cache it's potential output layer #3341
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -17,9 +17,11 @@ limitations under the License. | |
package commands | ||
|
||
import ( | ||
"fmt" | ||
"os" | ||
"path/filepath" | ||
|
||
kConfig "github.com/GoogleContainerTools/kaniko/pkg/config" | ||
"github.com/GoogleContainerTools/kaniko/pkg/dockerfile" | ||
"github.com/pkg/errors" | ||
|
||
|
@@ -33,6 +35,7 @@ type WorkdirCommand struct { | |
BaseCommand | ||
cmd *instructions.WorkdirCommand | ||
snapshotFiles []string | ||
shdCache bool | ||
} | ||
|
||
// For testing | ||
|
@@ -81,14 +84,116 @@ func (w *WorkdirCommand) ExecuteCommand(config *v1.Config, buildArgs *dockerfile | |
|
||
// FilesToSnapshot returns the workingdir, which should have been created if it didn't already exist | ||
func (w *WorkdirCommand) FilesToSnapshot() []string { | ||
return w.snapshotFiles | ||
return nil | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was able to resolve this by allowing the cache to contain empty images, just to indicate that the command did not change any files and we are aware of this. With this modification we are able to report back an empty list and no longer need to use the FS snapshot function from RUN command. |
||
} | ||
|
||
func (r *WorkdirCommand) ProvidesFilesToSnapshot() bool { | ||
return false | ||
} | ||
|
||
// String returns some information about the command for the image config history | ||
func (w *WorkdirCommand) String() string { | ||
return w.cmd.String() | ||
} | ||
|
||
// CacheCommand returns true since this command should be cached | ||
func (w *WorkdirCommand) CacheCommand(img v1.Image) DockerCommand { | ||
|
||
return &CachingWorkdirCommand{ | ||
img: img, | ||
cmd: w.cmd, | ||
extractFn: util.ExtractFile, | ||
} | ||
} | ||
|
||
func (w *WorkdirCommand) MetadataOnly() bool { | ||
return false | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sometimes calling ie. calling
without any further context is guaranteed to be metadata only in all images to my understanding. |
||
} | ||
|
||
func (r *WorkdirCommand) RequiresUnpackedFS() bool { | ||
return true | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
} | ||
|
||
func (w *WorkdirCommand) ShouldCacheOutput() bool { | ||
return w.shdCache | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was thinking whether we could optimize this instruction here to not cache the output if no files were created, I'm not sure how to tell from the cache consumer side whether a cache entry is missing or not there on purpose. |
||
} | ||
|
||
type CachingWorkdirCommand struct { | ||
BaseCommand | ||
caching | ||
img v1.Image | ||
extractedFiles []string | ||
cmd *instructions.WorkdirCommand | ||
extractFn util.ExtractFunction | ||
} | ||
|
||
func (wr *CachingWorkdirCommand) ExecuteCommand(config *v1.Config, buildArgs *dockerfile.BuildArgs) error { | ||
var err error | ||
logrus.Info("Cmd: workdir") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I copied that entire block of code here, it is to ensure that even if we hit the cache we still do the metadata operation thingy and actually change the workdir. I could of course put that in a function etc. but I would actually prefer to pass the resolved directory into the cache, s.t. I can have a single line here. Not only reusing the functionality, but reusing the result. cfg.WorkingDir = wr.resolvedWorkingDir There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. passing the result is not possible, as either the regular or the cached ExecuteCommand function is called, never both. So I opted for sharing the functionality, instead of extracting a nameless subroutine that fits the entire bill, I extracted the more meaningful There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmmm... passing the result would be possible but requires some creative thinking. We do store the layer as an image when we push to the cache, so we could change the WORKDIR on that image and with that store that variable too. Come to think of it, labels on the cached image could be used to pass arbitrary data between cache creation and reusing stage. but this would be a bigger redefinition of what the cache does and how it is used, not suitable for a small PR like this. Yet still interesting. |
||
workdirPath := wr.cmd.Path | ||
replacementEnvs := buildArgs.ReplacementEnvs(config.Env) | ||
resolvedWorkingDir, err := util.ResolveEnvironmentReplacement(workdirPath, replacementEnvs, true) | ||
if err != nil { | ||
return err | ||
} | ||
if filepath.IsAbs(resolvedWorkingDir) { | ||
config.WorkingDir = resolvedWorkingDir | ||
} else { | ||
if config.WorkingDir != "" { | ||
config.WorkingDir = filepath.Join(config.WorkingDir, resolvedWorkingDir) | ||
} else { | ||
config.WorkingDir = filepath.Join("/", resolvedWorkingDir) | ||
} | ||
} | ||
logrus.Infof("Changed working directory to %s", config.WorkingDir) | ||
|
||
logrus.Infof("Found cached layer, extracting to filesystem") | ||
|
||
if wr.img == nil { | ||
return errors.New(fmt.Sprintf("command image is nil %v", wr.String())) | ||
} | ||
|
||
layers, err := wr.img.Layers() | ||
if err != nil { | ||
return errors.Wrap(err, "retrieving image layers") | ||
} | ||
|
||
if len(layers) != 1 { | ||
return errors.New(fmt.Sprintf("expected %d layers but got %d", 1, len(layers))) | ||
} | ||
|
||
wr.layer = layers[0] | ||
|
||
wr.extractedFiles, err = util.GetFSFromLayers( | ||
kConfig.RootDir, | ||
layers, | ||
util.ExtractFunc(wr.extractFn), | ||
util.IncludeWhiteout(), | ||
) | ||
if err != nil { | ||
return errors.Wrap(err, "extracting fs from image") | ||
} | ||
|
||
return nil | ||
} | ||
|
||
// FilesToSnapshot returns the workingdir, which should have been created if it didn't already exist | ||
func (wr *CachingWorkdirCommand) FilesToSnapshot() []string { | ||
f := wr.extractedFiles | ||
logrus.Debugf("%d files extracted by caching run command", len(f)) | ||
logrus.Tracef("Extracted files: %s", f) | ||
|
||
return f | ||
} | ||
|
||
// String returns some information about the command for the image config history | ||
func (wr *CachingWorkdirCommand) String() string { | ||
if wr.cmd == nil { | ||
return "nil command" | ||
} | ||
return wr.cmd.String() | ||
} | ||
|
||
func (wr *CachingWorkdirCommand) MetadataOnly() bool { | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I piggy-backed on the flag for
RUN
instructions as I don't think it's sensible to have a separate flag for each instruction.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache copy layers probably got its own flag because depending on the context the files might be huge and invalidate the purpose of a cache, which is speeding things up by downloading a layer instead of executing commands. This should not be a problem in our simple case here, so piggy-backing should be fine.