1*8d67ca89SAndroid Build Coastguard Worker# EINTR 2*8d67ca89SAndroid Build Coastguard Worker 3*8d67ca89SAndroid Build Coastguard Worker## The problem 4*8d67ca89SAndroid Build Coastguard Worker 5*8d67ca89SAndroid Build Coastguard WorkerIf your code is blocked in a system call when a signal needs to be delivered, 6*8d67ca89SAndroid Build Coastguard Workerthe kernel needs to interrupt that system call. For something like a read(2) 7*8d67ca89SAndroid Build Coastguard Workercall where some data has already been read, the call can just return with 8*8d67ca89SAndroid Build Coastguard Workerwhat data it has. (This is one reason why read(2) sometimes returns less data 9*8d67ca89SAndroid Build Coastguard Workerthan you asked for, even though more data is available. It also explains why 10*8d67ca89SAndroid Build Coastguard Workersuch behavior is relatively rare, and a cause of bugs.) 11*8d67ca89SAndroid Build Coastguard Worker 12*8d67ca89SAndroid Build Coastguard WorkerBut what if read(2) hasn't read any data yet? Or what if you've made some other 13*8d67ca89SAndroid Build Coastguard Workersystem call, for which there is no equivalent "partial" success, such as 14*8d67ca89SAndroid Build Coastguard Workerpoll(2)? In poll(2)'s case, there's either something to report (in which 15*8d67ca89SAndroid Build Coastguard Workercase the system call would already have returned), or there isn't. 16*8d67ca89SAndroid Build Coastguard Worker 17*8d67ca89SAndroid Build Coastguard WorkerThe kernel's solution to this problem is to return failure (-1) and set 18*8d67ca89SAndroid Build Coastguard Workererrno to `EINTR`: "interrupted system call". 19*8d67ca89SAndroid Build Coastguard Worker 20*8d67ca89SAndroid Build Coastguard Worker### Can I just opt out? 21*8d67ca89SAndroid Build Coastguard Worker 22*8d67ca89SAndroid Build Coastguard WorkerTechnically, yes. In practice on Android, no. Technically if a signal's 23*8d67ca89SAndroid Build Coastguard Workerdisposition is set to ignore, the kernel doesn't even have to deliver the 24*8d67ca89SAndroid Build Coastguard Workersignal, so your code can just stay blocked in the system call it was already 25*8d67ca89SAndroid Build Coastguard Workermaking. In practice, though, you can't guarantee that all signals are either 26*8d67ca89SAndroid Build Coastguard Workerignored or will kill your process... Unless you're a small single-threaded 27*8d67ca89SAndroid Build Coastguard WorkerC program that doesn't use any libraries, you can't realistically make this 28*8d67ca89SAndroid Build Coastguard Workerguarantee. If any code has installed a signal handler, you need to cope with 29*8d67ca89SAndroid Build Coastguard Worker`EINTR`. And if you're an Android app, the zygote has already installed a whole 30*8d67ca89SAndroid Build Coastguard Workerhost of signal handlers before your code even starts to run. (And, no, you 31*8d67ca89SAndroid Build Coastguard Workercan't ignore them instead, because some of them are critical to how ART works. 32*8d67ca89SAndroid Build Coastguard WorkerFor example: Java `NullPointerException`s are optimized by trapping `SIGSEGV` 33*8d67ca89SAndroid Build Coastguard Workersignals so that the code generated by the JIT doesn't have to insert explicit 34*8d67ca89SAndroid Build Coastguard Workernull pointer checks.) 35*8d67ca89SAndroid Build Coastguard Worker 36*8d67ca89SAndroid Build Coastguard Worker### Why don't I see this in Java code? 37*8d67ca89SAndroid Build Coastguard Worker 38*8d67ca89SAndroid Build Coastguard WorkerYou won't see this in Java because the decision was taken to hide this issue 39*8d67ca89SAndroid Build Coastguard Workerfrom Java programmers. Basically, all the libraries like `java.io.*` and 40*8d67ca89SAndroid Build Coastguard Worker`java.net.*` hide this from you. (The same should be true of `android.*` too, 41*8d67ca89SAndroid Build Coastguard Workerso it's worth filing bugs if you find any exceptions that aren't documented!) 42*8d67ca89SAndroid Build Coastguard Worker 43*8d67ca89SAndroid Build Coastguard Worker### Why doesn't libc do that too? 44*8d67ca89SAndroid Build Coastguard Worker 45*8d67ca89SAndroid Build Coastguard WorkerFor most people, things would be easier if libc hid this implementation 46*8d67ca89SAndroid Build Coastguard Workerdetail. But there are legitimate use cases, and automatically retrying 47*8d67ca89SAndroid Build Coastguard Workerwould hide those. For example, you might want to use signals and `EINTR` 48*8d67ca89SAndroid Build Coastguard Workerto interrupt another thread (in fact, that's how interruption of threads 49*8d67ca89SAndroid Build Coastguard Workerdoing I/O works in Java behind the scenes!). As usual, C/C++ choose the more 50*8d67ca89SAndroid Build Coastguard Workerpowerful but more error-prone option. 51*8d67ca89SAndroid Build Coastguard Worker 52*8d67ca89SAndroid Build Coastguard Worker## The fix 53*8d67ca89SAndroid Build Coastguard Worker 54*8d67ca89SAndroid Build Coastguard Worker### Easy cases 55*8d67ca89SAndroid Build Coastguard Worker 56*8d67ca89SAndroid Build Coastguard WorkerIn most cases, the fix is simple: wrap the system call with the 57*8d67ca89SAndroid Build Coastguard Worker`TEMP_FAILURE_RETRY` macro. This is basically a while loop that retries the 58*8d67ca89SAndroid Build Coastguard Workersystem call as long as the result is -1 and errno is `EINTR`. 59*8d67ca89SAndroid Build Coastguard Worker 60*8d67ca89SAndroid Build Coastguard WorkerSo, for example: 61*8d67ca89SAndroid Build Coastguard Worker``` 62*8d67ca89SAndroid Build Coastguard Worker n = read(fd, buf, buf_size); // BAD! 63*8d67ca89SAndroid Build Coastguard Worker n = TEMP_FAILURE_RETRY(read(fd, buf, buf_size)); // GOOD! 64*8d67ca89SAndroid Build Coastguard Worker``` 65*8d67ca89SAndroid Build Coastguard Worker 66*8d67ca89SAndroid Build Coastguard Worker### close(2) 67*8d67ca89SAndroid Build Coastguard Worker 68*8d67ca89SAndroid Build Coastguard WorkerTL;DR: *never* wrap close(2) calls with `TEMP_FAILURE_RETRY`. 69*8d67ca89SAndroid Build Coastguard Worker 70*8d67ca89SAndroid Build Coastguard WorkerThe case of close(2) is complicated. POSIX explicitly says that close(2) 71*8d67ca89SAndroid Build Coastguard Workershouldn't close the file descriptor if it returns `EINTR`, but that's *not* 72*8d67ca89SAndroid Build Coastguard Workertrue on Linux (and thus on Android). See 73*8d67ca89SAndroid Build Coastguard Worker[Returning EINTR from close()](https://lwn.net/Articles/576478/) 74*8d67ca89SAndroid Build Coastguard Workerfor more discussion. 75*8d67ca89SAndroid Build Coastguard Worker 76*8d67ca89SAndroid Build Coastguard WorkerGiven that most Android code (and especially "all apps") are multithreaded, 77*8d67ca89SAndroid Build Coastguard Workerretrying close(2) is especially dangerous because the file descriptor might 78*8d67ca89SAndroid Build Coastguard Workeralready have been reused by another thread, so the "retry" succeeds, but 79*8d67ca89SAndroid Build Coastguard Workeractually closes a *different* file descriptor belonging to a *different* 80*8d67ca89SAndroid Build Coastguard Workerthread. 81*8d67ca89SAndroid Build Coastguard Worker 82*8d67ca89SAndroid Build Coastguard Worker### Timeouts 83*8d67ca89SAndroid Build Coastguard Worker 84*8d67ca89SAndroid Build Coastguard WorkerSystem calls with timeouts are the other interesting case where "just wrap 85*8d67ca89SAndroid Build Coastguard Workereverything with `TEMP_FAILURE_RETRY()`" doesn't work. Because some amount of 86*8d67ca89SAndroid Build Coastguard Workertime will have elapsed, you'll want to recalculate the timeout. Otherwise you 87*8d67ca89SAndroid Build Coastguard Workercan end up with your 1 minute timeout being indefinite if you're receiving 88*8d67ca89SAndroid Build Coastguard Workersignals at least once per minute, say. In this case you'll want to do 89*8d67ca89SAndroid Build Coastguard Workersomething like adding an explicit loop around your system call, calculating 90*8d67ca89SAndroid Build Coastguard Workerthe timeout _inside_ the loop, and using `continue` each time the system call 91*8d67ca89SAndroid Build Coastguard Workerfails with `EINTR`. 92