Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WORKDIR learned to cache it's potential output layer #3341

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pkg/commands/commands.go
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ func GetCommand(cmd instructions.Command, fileContext util.FileContext, useNewRu
case *instructions.EnvCommand:
return &EnvCommand{cmd: c}, nil
case *instructions.WorkdirCommand:
return &WorkdirCommand{cmd: c}, nil
return &WorkdirCommand{cmd: c, shdCache: cacheRun}, nil
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I piggy-backed on the flag for RUN instructions as I don't think it's sensible to have a separate flag for each instruction.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache copy layers probably got its own flag because depending on the context the files might be huge and invalidate the purpose of a cache, which is speeding things up by downloading a layer instead of executing commands. This should not be a problem in our simple case here, so piggy-backing should be fine.

case *instructions.AddCommand:
return &AddCommand{cmd: c, fileContext: fileContext}, nil
case *instructions.CmdCommand:
Expand Down
118 changes: 109 additions & 9 deletions pkg/commands/workdir.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,11 @@ limitations under the License.
package commands

import (
"fmt"
"os"
"path/filepath"

kConfig "github.com/GoogleContainerTools/kaniko/pkg/config"
"github.com/GoogleContainerTools/kaniko/pkg/dockerfile"
"github.com/pkg/errors"

Expand All @@ -33,6 +35,19 @@ type WorkdirCommand struct {
BaseCommand
cmd *instructions.WorkdirCommand
snapshotFiles []string
shdCache bool
}

func ToAbsPath(path string, workdir string) string {
if filepath.IsAbs(path) {
return path
} else {
if workdir != "" {
return filepath.Join(workdir, path)
} else {
return filepath.Join("/", path)
}
}
}

// For testing
Expand All @@ -46,15 +61,7 @@ func (w *WorkdirCommand) ExecuteCommand(config *v1.Config, buildArgs *dockerfile
if err != nil {
return err
}
if filepath.IsAbs(resolvedWorkingDir) {
config.WorkingDir = resolvedWorkingDir
} else {
if config.WorkingDir != "" {
config.WorkingDir = filepath.Join(config.WorkingDir, resolvedWorkingDir)
} else {
config.WorkingDir = filepath.Join("/", resolvedWorkingDir)
}
}
config.WorkingDir = ToAbsPath(resolvedWorkingDir, config.WorkingDir)
logrus.Infof("Changed working directory to %s", config.WorkingDir)

// Only create and snapshot the dir if it didn't exist already
Expand Down Expand Up @@ -89,6 +96,99 @@ func (w *WorkdirCommand) String() string {
return w.cmd.String()
}

// CacheCommand returns true since this command should be cached
func (w *WorkdirCommand) CacheCommand(img v1.Image) DockerCommand {

return &CachingWorkdirCommand{
img: img,
cmd: w.cmd,
extractFn: util.ExtractFile,
}
}

func (w *WorkdirCommand) MetadataOnly() bool {
return false
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sometimes calling WORKDIR is metadata only, there might be some optimization potential here, depending on whether we can know this a-priori.

ie. calling

WORKDIR /

without any further context is guaranteed to be metadata only in all images to my understanding.

}

func (r *WorkdirCommand) RequiresUnpackedFS() bool {
return true
Copy link
Author

@mzihlmann mzihlmann Oct 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WORKDIR required me to unpack the filesystem as otherwise the user is not known.

error building image: error building stage: failed to execute command: identifying uid and gid for user app: user app is not a uid and does not exist on the system

}

func (w *WorkdirCommand) ShouldCacheOutput() bool {
return w.shdCache
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking whether we could optimize this instruction here to not cache the output if no files were created, I'm not sure how to tell from the cache consumer side whether a cache entry is missing or not there on purpose.

}

type CachingWorkdirCommand struct {
BaseCommand
caching
img v1.Image
extractedFiles []string
cmd *instructions.WorkdirCommand
extractFn util.ExtractFunction
}

func (wr *CachingWorkdirCommand) ExecuteCommand(config *v1.Config, buildArgs *dockerfile.BuildArgs) error {
var err error
logrus.Info("Cmd: workdir")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied that entire block of code here, it is to ensure that even if we hit the cache we still do the metadata operation thingy and actually change the workdir. I could of course put that in a function etc. but I would actually prefer to pass the resolved directory into the cache, s.t. I can have a single line here. Not only reusing the functionality, but reusing the result.

cfg.WorkingDir = wr.resolvedWorkingDir

Copy link
Author

@mzihlmann mzihlmann Oct 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passing the result is not possible, as either the regular or the cached ExecuteCommand function is called, never both. So I opted for sharing the functionality, instead of extracting a nameless subroutine that fits the entire bill, I extracted the more meaningful ToAbsPath function. I think the remaining few lines are fine to duplicate. The remaining calls have too many dependencies to be neatly tucked away in a function.

Copy link
Author

@mzihlmann mzihlmann Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm... passing the result would be possible but requires some creative thinking. We do store the layer as an image when we push to the cache, so we could change the WORKDIR on that image and with that store that variable too. Come to think of it, labels on the cached image could be used to pass arbitrary data between cache creation and reusing stage.

but this would be a bigger redefinition of what the cache does and how it is used, not suitable for a small PR like this. Yet still interesting.

workdirPath := wr.cmd.Path
replacementEnvs := buildArgs.ReplacementEnvs(config.Env)
resolvedWorkingDir, err := util.ResolveEnvironmentReplacement(workdirPath, replacementEnvs, true)
if err != nil {
return err
}
config.WorkingDir = ToAbsPath(resolvedWorkingDir, config.WorkingDir)
logrus.Infof("Changed working directory to %s", config.WorkingDir)

logrus.Infof("Found cached layer, extracting to filesystem")

if wr.img == nil {
return errors.New(fmt.Sprintf("command image is nil %v", wr.String()))
}

layers, err := wr.img.Layers()
if err != nil {
return errors.Wrap(err, "retrieving image layers")
}

if len(layers) > 1 {
return errors.New(fmt.Sprintf("expected %d layers but got %d", 1, len(layers)))
} else if len(layers) == 0 {
// an empty image in cache indicates that no directory was created by WORKDIR
return nil
}

wr.layer = layers[0]

wr.extractedFiles, err = util.GetFSFromLayers(
kConfig.RootDir,
layers,
util.ExtractFunc(wr.extractFn),
util.IncludeWhiteout(),
)
if err != nil {
return errors.Wrap(err, "extracting fs from image")
}

return nil
}

// FilesToSnapshot returns the workingdir, which should have been created if it didn't already exist
func (wr *CachingWorkdirCommand) FilesToSnapshot() []string {
f := wr.extractedFiles
logrus.Debugf("%d files extracted by caching run command", len(f))
logrus.Tracef("Extracted files: %s", f)

return f
}

// String returns some information about the command for the image config history
func (wr *CachingWorkdirCommand) String() string {
if wr.cmd == nil {
return "nil command"
}
return wr.cmd.String()
}

func (wr *CachingWorkdirCommand) MetadataOnly() bool {
return false
}
8 changes: 6 additions & 2 deletions pkg/executor/build.go
Original file line number Diff line number Diff line change
Expand Up @@ -400,8 +400,12 @@ func (s *stageBuilder) build() error {
if isCacheCommand {
v := command.(commands.Cached)
layer := v.Layer()
if err := s.saveLayerToImage(layer, command.String()); err != nil {
return errors.Wrap(err, "failed to save layer")
if layer == nil && len(files) == 0 {
// an empty cache image without a layer indicates that no files were changed, ie. by WORKDIR
} else {
if err := s.saveLayerToImage(layer, command.String()); err != nil {
return errors.Wrap(err, "failed to save layer")
}
}
} else {
tarPath, err := s.takeSnapshot(files, command.ShouldDetectDeletedFiles())
Expand Down
33 changes: 19 additions & 14 deletions pkg/executor/push.go
Original file line number Diff line number Diff line change
Expand Up @@ -369,10 +369,6 @@ func pushLayerToCache(opts *config.KanikoOptions, cacheKey string, tarPath strin
// layer already gzipped by default
}

layer, err := tarball.LayerFromFile(tarPath, layerOpts...)
if err != nil {
return err
}

cache, err := cache.Destination(opts, cacheKey)
if err != nil {
Expand All @@ -385,18 +381,27 @@ func pushLayerToCache(opts *config.KanikoOptions, cacheKey string, tarPath strin
return errors.Wrap(err, "setting empty image created time")
}

empty, err = mutate.Append(empty,
mutate.Addendum{
Layer: layer,
History: v1.History{
Author: constants.Author,
CreatedBy: createdBy,
// WORKDIR can create empty layers by design, yet still we must cache them
// to transfer the knowledge that they are empty.
if tarPath != "" {
layer, err := tarball.LayerFromFile(tarPath, layerOpts...)
if err != nil {
return err
}
empty, err = mutate.Append(empty,
mutate.Addendum{
Layer: layer,
History: v1.History{
Author: constants.Author,
CreatedBy: createdBy,
},
},
},
)
if err != nil {
return errors.Wrap(err, "appending layer onto empty image")
)
if err != nil {
return errors.Wrap(err, "appending layer onto empty image")
}
}

cacheOpts := *opts
cacheOpts.TarPath = "" // tarPath doesn't make sense for Docker layers
cacheOpts.NoPush = opts.NoPushCache // we do not want to push cache if --no-push-cache is set.
Expand Down