Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files written/modified using output redirection are ignored by the shim #177

Open
reidsunderland opened this issue Nov 5, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@reidsunderland
Copy link
Member

We're already aware of this, but it wasn't documented in an issue.

@petersilva added a variable, KNOWN_REDIRECTION_BUG, to the tests that can be used to ignore this known issue.

Note: we only expect output redirection to work in subshells that are launched after setting LD_PRELOAD.

@reidsunderland reidsunderland added the bug Something isn't working label Nov 5, 2024
@reidsunderland
Copy link
Member Author

I used strace to find the bug.

This is output redirection without the shim (indented lines are irrelevant):

openat(AT_FDCWD, "dirone/fileone", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3         # fd 3 points to dirone/fileone
	fcntl(1, F_GETFD)                       = 0					# F_GETFD - get file descriptor flags for fd 1, returns 0
	fcntl(1, F_DUPFD, 10)                   = 10					# F_DUPFD - duplicate fd 1 to fd 10
	fcntl(1, F_GETFD)                       = 0					# F_GETFD - get file descriptor flags for fd 1
	fcntl(10, F_SETFD, FD_CLOEXEC)          = 0					# F_SETFD - set file descriptor flags for fd 10 to FD_CLOEXEC - close-on-exec
dup2(3, 1)                              = 1					# duplicate fd 3 to fd 1, closing fd 1 in the process if it is open
close(3)                                = 0					# close fd 3, now fd 1 is the only fd pointing to dirone/fileone (????)
write(1, "fileone\n", 8)                = 8					# write 8 bytes to fd 1
dup2(10, 1)                             = 1					# duplicate fd 10 to fd 1 CLOSING FD 1 in the process. this is where it should post?
	fcntl(10, F_GETFD)                      = 0x1 (flags FD_CLOEXEC)		# 
	close(10)                               = 0

The most important thing to note is the dup2 is what closes fd 1 (stdout), which points to the output file.

With the shim, we see SR_SHIMDEBUG 4 1606638 0.509995 dup2 NO POST path device or proc !:

+ echo '#test 1 sha512 160 stdout redirection in a subdir'
+ echo fileone
SR_SHIMDEBUG 1 1606638 0.509497  dup2 oldfd 3 newfd 1
SR_SHIMDEBUG 4 1606638 0.509574  dup2 newfd is open, so close it explicitly to potentially post.
SR_SHIMDEBUG 4 1606638 0.509609  close fd=1!
SR_SHIMDEBUG 8 1606638 0.509658 close /dev/pts/403 fd=1
SR_SHIMDEBUG 16 1606638 0.509704 set duped_fds[0]=3
SR_SHIMDEBUG 16 1606638 0.509746 set duped_fds[1]=1
SR_SHIMDEBUG 1 1606638 0.509787  dup2 posting /fs/homeu2/ssc/di/ras000/metpx-sr3c-x64-3.24.10p1rc3/dirone/fileone status=1
SR_SHIMDEBUG 4 1606638 0.509832  close fd=3!
SR_SHIMDEBUG 16 1606638 0.50987  is_duped!
SR_SHIMDEBUG 1 1606638 0.509941  dup2 oldfd 10 newfd 1 # why isn't fd1 posted here?
SR_SHIMDEBUG 4 1606638 0.509995  dup2 NO POST path device or proc !
SR_SHIMDEBUG 4 1606638 0.510034  close fd=10!
SR_SHIMDEBUG 8 1606638 0.510082 close /dev/pts/403 fd=10
+ echo
+ echo 'CONTENTS SHOULD BE fileone'
+ cat dirone/fileone

At the time of the dup2 call, fd 1 points to the file that stdout is being redirected to (dirone/fileone) and fd 10 is the pseudoterminal normally connected to stdout.

In dup2, we already have code that explicitly calls close on the newfd (1 in this case). That should trigger a post for fd 1.

sarrac/libsr3shim.c

Lines 1129 to 1135 in 1ed2cae

// If newfd is open, then it will be closed by dup2, perhaps trigger post?
// recipe: https://stackoverflow.com/questions/12340695/how-to-check-if-a-given-file-descriptor-stored-in-a-variable-is-still-valid
if ((fcntl(newfd, F_GETFD) != -1) || errno != EBADF) {
sr_shimdebug_msg(4,
" dup2 newfd is open, so close it explicitly to potentially post.\n");
close(newfd);
}

But in this case, because oldfd (10, /dev/pts/403) is pointing to a /dev/ path, the above close code doesn't get called because this code (below) runs and returns first:

sarrac/libsr3shim.c

Lines 1120 to 1124 in 1ed2cae

if (!strncmp(real_path, "/dev/", 5) || !strncmp(real_path, "/proc/", 6)) {
sr_shimdebug_msg(4, " dup2 NO POST path device or proc !\n");
errno = 0;
return dup2_fn_ptr(oldfd, newfd);
}

The fix is to move the "... NO POST path device or proc" code after the explicit close.

I made this change in bcecd39 and it seems to work.

@petersilva
Copy link
Contributor

While the explanations sound reasonable, just want to highlight aspects of this issue which I don't understand: why this problem isn't on most ubuntu machines, but seemed to show up on redhat 8. ... Pretty sure there was no problem on ubuntu 22.04, for example (perhaps all ubuntu's back to 18...)

@reidsunderland
Copy link
Member Author

That's what I'm still trying to understand in the tests. I think the redirection problem actually is present on all OS versions and the tests are either using bash -c like this for the redirection tests that work:

echo "#test 0 comment 000 shim test posting start"
echo "#test 1 sha512 010 capturing stdout"
bash -c 'echo "hoho" >> ./hoho'

Or we are incorrectly testing and checking for the wrong behaviour, like this example. The redirection is ignored, but the test passes because the touch causes 1 message to be posted.

sarrac/shim_copy_post2.sh

Lines 106 to 112 in e220acc

echo "#test 1 sha512 create test_file (again) from redirection"
echo 2 >test_file
if [ ! "${KNOWN_REDIRECTION_BUG}" ]; then
echo "#no post from touch, refused as repeat"
touch test_file
fi

In this case, shim_post_minterval is not set, so if redirection was actually working properly, we should see 2 posts (and we do, with my fix).


I'm trying to remove all the special KNOWN_REDIRECTION_BUG cases that change the commands run and expected results from the test scripts and compare the results across OS versions and between using bash ./shim_post_run.sh or . ./shim_post_run.sh or ./shim_post_run.sh.

@petersilva
Copy link
Contributor

confusing to me... is that I only set KNOWN_REDIRECTION_BUG on certain Os's .e.g. redhat 8, while things passed without it on ubuntu. It sounds like you're getting to the bottom of it.
good stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants