Browse Source

[2617] treat ECONNRESET on recv as error unconditionally with detailed notes

JINMEI Tatuya 12 years ago
parent
commit
878c361337
3 changed files with 21 additions and 14 deletions
  1. 5 6
      src/bin/msgq/msgq.py.in
  2. 14 2
      src/bin/msgq/msgq_messages.mes
  3. 2 6
      src/bin/msgq/tests/msgq_test.py

+ 5 - 6
src/bin/msgq/msgq.py.in

@@ -364,13 +364,12 @@ class MsgQ:
             try:
                 data = sock.recv(length - len(received))
 
-            # If the remote client has closed the socket there seems to be
-            # two possible cases: getting ECONNRESET or receiving empty data.
-            # These cases are possible in normal operation, so we report them
-            # using MsgQCloseOnReceive.
             except socket.error as err:
-                if err.errno == errno.ECONNRESET:
-                    raise MsgQCloseOnReceive(str(err), continued)
+                # This case includes ECONNRESET, which seems to happen when
+                # the remote client has closed its socket at some subtle
+                # timing (it should normally result in receiving empty data).
+                # Since we didn't figure out how exactly that could happen,
+                # we treat it just like other really-unexpected socket errors.
                 raise MsgQReceiveError(str(err))
             if len(data) == 0:
                 raise MsgQCloseOnReceive("EOF", continued)

+ 14 - 2
src/bin/msgq/msgq_messages.mes

@@ -99,10 +99,22 @@ the authors when figuring the problem out.
 % MSGQ_RECV_ERR Error reading from socket %1: %2
 There was a low-level error when reading from a socket. The error is logged and
 the corresponding socket is dropped.  The errors include receiving
-broken or (non empty but) incomplete data.  In either case it suggests
+broken or (non empty but) incomplete data.  In either case it usually suggests
 something unexpected happens within the BIND 10 system; it's probably
 better to restart the system, and if it continues it should be
-reported as a bug.
+reported as a bug.  One known, probably non critical case is
+the "connection reset by peer" (or its variants) socket error appearing
+on shutdown.  It's known this happens when the remote client closes the
+connection as part of shutdown process.  Such cases are normally expected
+to be reported as receiving empty data (which we log it at the debug level
+as the MSGQ_CLOSE_ON_RECV message), but for some (yet) unknown reason
+it can also be reported as the system error.  At shutdown time it's expected
+that connections are closed, so it's probably safe to ignore these messages
+in such a case.  We still log them as an error as we've not figured out
+how exactly that can happen.  In future, we may make the shutdown process
+more robust so the msgq daemon can explicitly know when a client shuts down
+more reliably.  If and when it's implemented this error message won't appear
+on shutdown unless there's really something unexpected.
 
 % MSGQ_RECV_HDR Received header: %1
 Debug message. This message includes the whole routing header of a packet.

+ 2 - 6
src/bin/msgq/tests/msgq_test.py

@@ -705,18 +705,14 @@ class SocketTests(unittest.TestCase):
         expected_debugs = 0
 
         # if socket.recv() fails due to socket.error, it will be logged
-        # as error or debug depending on the errno.  In either case the socket
-        # will be killed.
+        # as error and the socket will be killed regardless of errno.
         for eno in [errno.ENOBUFS, errno.ECONNRESET]:
             self.__sock_error.errno = eno
             self.__sock.recv_result = self.__sock_error
             self.__killed_socket = None # clear any previuos value
             self.__msgq.process_packet(42, self.__sock)
             self.assertEqual((42, self.__sock), self.__killed_socket)
-            if eno == errno.ENOBUFS:
-                expected_errors += 1
-            else:
-                expected_debugs += 1
+            expected_errors += 1
             self.assertEqual(expected_errors, self.__logger.error_called)
             self.assertEqual(expected_debugs, self.__logger.debug_called)