Browse Source

[2903] catch some errors from asio methods/functions explicitly

otherwise it would result in an uncatchable exception and force the
program to terminate.  some exceptions are still considered super rare
and fatal, and they are still propagated (mainly from the io_service
and deadline timer).  errors on close() are just ignored (there
wouldn't be much we can do to recover in this case anyway).

Further, changed the handling of errors on the result of async_accept():
instead of stop accepting on some errors, log them and keep accepting
new connections.  on a closer look at the code I believe that's more
appropriate; silent stop would be rather confusing (if the error were
really fatal we should rather invoke an exception and trigger termination),
and cases like EMFILE definitely shouldn't cause this situation.
errors that can caused by stop() should still result in stopping accept(),
and the new code behaves that way.
JINMEI Tatuya 12 years ago
parent
commit
11d850685b
2 changed files with 62 additions and 38 deletions
  1. 11 0
      src/lib/asiodns/asiodns_messages.mes
  2. 51 38
      src/lib/asiodns/tcp_server.cc

+ 11 - 0
src/lib/asiodns/asiodns_messages.mes

@@ -53,6 +53,17 @@ The asynchronous I/O code encountered an error when trying to send data to
 the specified address on the given protocol.  The number of the system
 error that caused the problem is given in the message.
 
+% ASIODNS_TCP_ACCEPT_FAIL failed to accept TCP DNS connection: %1
+Accepting a TCP connection from a DNS client failed due to an error
+that could happen but should be rare.  The reason for the error is
+included in the log message.  The server still keeps accepting new
+connections, so unless it happens often it's probably okay to ignore
+this error.  If the shown error indicates something like "too many
+open files", it's probably because the run time environment is too
+restrictive on this limitation, so consider adjusing the limit using
+a tool such as ulimit.  If you see other types of errors too often,
+there may be something overlooked; please file a bug report in that case.
+
 % ASIODNS_UDP_ASYNC_SEND_FAIL Error sending UDP packet to %1: %2
 The low-level ASIO library reported an error when trying to send a UDP
 packet in asynchronous UDP mode. This can be any error reported by

+ 51 - 38
src/lib/asiodns/tcp_server.cc

@@ -100,41 +100,58 @@ TCPServer::operator()(asio::error_code ec, size_t length) {
 
     CORO_REENTER (this) {
         do {
-            /// Create a socket to listen for connections
+            /// Create a socket to listen for connections (no-throw operation)
             socket_.reset(new tcp::socket(acceptor_->get_io_service()));
 
             /// Wait for new connections. In the event of non-fatal error,
             /// try again
             do {
                 CORO_YIELD acceptor_->async_accept(*socket_, *this);
-
-                // Abort on fatal errors
-                // TODO: Log error?
                 if (ec) {
                     using namespace asio::error;
-                    if (ec.value() != would_block && ec.value() != try_again &&
-                        ec.value() != connection_aborted &&
-                        ec.value() != interrupted) {
+                    const error_code::value_type err_val = ec.value();
+                    // The following two cases can happen when this server is
+                    // stopped: operation_aborted in case it's stopped after
+                    // starting accept().  bad_descriptor in case it's stopped
+                    // even before starting.  In these cases we should simply
+                    // stop handling events.
+                    if (err_val == operation_aborted ||
+                        err_val == bad_descriptor) {
                         return;
                     }
+                    // Other errors should generally be temporary and we should
+                    // keep waiting for new connections.  We log errors that
+                    // should really be rare and would only be caused by an
+                    // internal erroneous condition (not an odd remote
+                    // behavior).
+                    if (err_val != would_block && err_val != try_again &&
+                        err_val != connection_aborted &&
+                        err_val != interrupted) {
+                        LOG_ERROR(logger, ASIODNS_TCP_ACCEPT_FAIL).
+                            arg(ec.message());
+                    }
                 }
             } while (ec);
 
             /// Fork the coroutine by creating a copy of this one and
             /// scheduling it on the ASIO service queue.  The parent
-            /// will continue listening for DNS connections while the
+            /// will continue listening for DNS connections while the child
             /// handles the one that has just arrived.
             CORO_FORK io_.post(TCPServer(*this));
         } while (is_parent());
 
+        // From this point, we'll simply return on error, which will
+        // immediately trigger destroying this object, cleaning up all
+        // resources including any open sockets.
+
         /// Instantiate the data buffer that will be used by the
         /// asynchronous read call.
         data_.reset(new char[MAX_LENGTH]);
 
         /// Start a timer to drop the connection if it is idle
         if (*tcp_recv_timeout_ > 0) {
-            timeout_.reset(new asio::deadline_timer(io_));
-            timeout_->expires_from_now(
+            timeout_.reset(new asio::deadline_timer(io_)); // shouldn't throw
+            timeout_->expires_from_now( // consider any exception fatal.
                 boost::posix_time::milliseconds(*tcp_recv_timeout_));
             timeout_->async_wait(boost::bind(&do_timeout, boost::ref(*socket_),
                                  asio::placeholders::error));
@@ -144,8 +161,7 @@ TCPServer::operator()(asio::error_code ec, size_t length) {
         CORO_YIELD async_read(*socket_, asio::buffer(data_.get(),
                               TCP_MESSAGE_LENGTHSIZE), *this);
         if (ec) {
-            socket_->close();
-            CORO_YIELD return;
+            return;
         }
 
         /// Now read the message itself. (This is done in a different scope
@@ -157,16 +173,7 @@ TCPServer::operator()(asio::error_code ec, size_t length) {
         }
 
         if (ec) {
-            socket_->close();
-            CORO_YIELD return;
-        }
-
-        // Due to possible timeouts and other bad behaviour, after the
-        // timely reads are done, there is a chance the socket has
-        // been closed already. So before we move on to the actual
-        // processing, check that, and stop if so.
-        if (!socket_->is_open()) {
-            CORO_YIELD return;
+            return;
         }
 
         // Create an \c IOMessage object to store the query.
@@ -174,7 +181,10 @@ TCPServer::operator()(asio::error_code ec, size_t length) {
         // (XXX: It would be good to write a factory function
         // that would quickly generate an IOMessage object without
         // all these calls to "new".)
-        peer_.reset(new TCPEndpoint(socket_->remote_endpoint()));
+        peer_.reset(new TCPEndpoint(socket_->remote_endpoint(ec)));
+        if (ec) {
+            return;
+        }
 
         // The TCP socket class has been extended with asynchronous functions
         // and takes as a template parameter a completion callback class.  As
@@ -183,7 +193,8 @@ TCPServer::operator()(asio::error_code ec, size_t length) {
         // the underlying Boost TCP socket - DummyIOCallback is used.  This
         // provides the appropriate operator() but is otherwise functionless.
         iosock_.reset(new TCPSocket<DummyIOCallback>(*socket_));
-        io_message_.reset(new IOMessage(data_.get(), length, *iosock_, *peer_));
+        io_message_.reset(new IOMessage(data_.get(), length, *iosock_,
+                                        *peer_));
 
         // Perform any necessary operations prior to processing the incoming
         // packet (e.g., checking for queued configuration messages).
@@ -198,8 +209,7 @@ TCPServer::operator()(asio::error_code ec, size_t length) {
         // If we don't have a DNS Lookup provider, there's no point in
         // continuing; we exit the coroutine permanently.
         if (lookup_callback_ == NULL) {
-            socket_->close();
-            CORO_YIELD return;
+            return;
         }
 
         // Reset or instantiate objects that will be needed by the
@@ -212,23 +222,22 @@ TCPServer::operator()(asio::error_code ec, size_t length) {
         // finished, the coroutine will resume immediately after
         // this point.
         CORO_YIELD io_.post(AsyncLookup<TCPServer>(*this));
+        if (ec) {
+            return;
+        }
 
         // The 'done_' flag indicates whether we have an answer
         // to send back.  If not, exit the coroutine permanently.
         if (!done_) {
             // TODO: should we keep the connection open for a short time
             // to see if new requests come in?
-            socket_->close();
-            CORO_YIELD return;
+            return;
         }
 
-        if (ec) {
-            CORO_YIELD return;
-        }
         // Call the DNS answer provider to render the answer into
         // wire format
-        (*answer_callback_)(*io_message_, query_message_,
-                            answer_message_, respbuf_);
+        (*answer_callback_)(*io_message_, query_message_, answer_message_,
+                            respbuf_);
 
         // Set up the response, beginning with two length bytes.
         lenbuf.writeUint16(respbuf_->getLength());
@@ -241,12 +250,12 @@ TCPServer::operator()(asio::error_code ec, size_t length) {
         // will simply exit at that time).
         CORO_YIELD async_write(*socket_, bufs, *this);
 
-        // All done, cancel the timeout timer
+        // All done, cancel the timeout timer. if it throws, consider it fatal.
         timeout_->cancel();
 
         // TODO: should we keep the connection open for a short time
         // to see if new requests come in?
-        socket_->close();
+        socket_->close(ec);     // ignore any error on close()
     }
 }
 
@@ -259,14 +268,18 @@ TCPServer::asyncLookup() {
 }
 
 void TCPServer::stop() {
+    // passing error_code to avoid getting exception; we simply ignore any
+    // error on close().
+    asio::error_code ec;
+
     /// we use close instead of cancel, with the same reason
     /// with udp server stop, refer to the udp server code
 
-    acceptor_->close();
+    acceptor_->close(ec);
     // User may stop the server even when it hasn't started to
     // run, in that that socket_ is empty
     if (socket_) {
-        socket_->close();
+        socket_->close(ec);
     }
 }
 /// Post this coroutine on the ASIO service queue so that it will
@@ -275,7 +288,7 @@ void TCPServer::stop() {
 void
 TCPServer::resume(const bool done) {
     done_ = done;
-    io_.post(*this);
+    io_.post(*this);  // this can throw, but can be considered fatal.
 }
 
 } // namespace asiodns