First of all: this is awesome! I'm extremely pleased you managed to save over 10 bytes from my version! The use of execveat is super cool, and I liked reading all the little optimizations you'd made (if only things were in a favourable position so the F in ELF can be used as
Unfortunately the current version didn't seem to work on my system, both my work machine and my personal one (both running Debian 9) get caught in a loop of invalid syscalls. I traced the problem to waitid returning -1 (EFAULT, bad address) caused by passing in null to some of its parameters. This doesn't stop it from waiting, but it does cause the
mov ax, SYS_execveat later to not work correctly, since the mov into ax doesn't zero the upper bytes of eax, and so the syscall number becomes something wildly high.
This diff changes the waitid call to a waitpid call, which not only has fewer parameters and doesn't return an EFAULT, it also results in a savings of 4 bytes after some fiddling about. Namely, instead of using
mov ax, SYS_waitid we can do
xchg eax, edi to zero eax in a single byte, and then load SYS_waitpid (which, now that it's less than 255, fits in a byte) with the 2-byte
mov al, SYS_waitpid. Code relating to esi can be removed entirely
This saves 4 bytes around the padding area, which I used to do the esi assignment for the execveat call later.
There is a possible issue with this new code, though, since it assumes that the pid returned by waitpid (waitpid returns the pid of the child that exited) will fit within the 4 bytes of ax. On most linux distributions by default pid_max is set to 32768, which is does fit in 16 bits. I think rarely people increase pid_max, especially on desktop systems. However if this is too much of an issue an
xor eax, eax can be placed after the waitpid call, which will use up 2 of the 4 saved bytes.