Browse Source

[master] Merge branch 'trac2447'

JINMEI Tatuya 12 years ago
parent
commit
c8b32f1adb

+ 9 - 4
src/bin/auth/auth_messages.mes

@@ -47,10 +47,15 @@ available. It is issued during server startup is an indication that
 the initialization is proceeding normally.
 
 % AUTH_CONFIG_LOAD_FAIL load of configuration failed: %1
-An attempt to configure the server with information from the configuration
-database during the startup sequence has failed. (The reason for
-the failure is given in the message.) The server will continue its
-initialization although it may not be configured in the desired way.
+An attempt to configure the server with information from the
+configuration database during the startup sequence has failed.  The
+server will continue its initialization although it may not be
+configured in the desired way.  The reason for the failure is given in
+the message.  One common reason is that the server failed to acquire a
+socket bound to a privileged port (53 for DNS).  In that case the
+reason in the log message should show something like "permission
+denied", and the solution would be to restart BIND 10 as a super
+(root) user.
 
 % AUTH_CONFIG_UPDATE_FAIL update of configuration failed: %1
 At attempt to update the configuration the server with information

+ 15 - 5
src/bin/bind10/bind10_messages.mes

@@ -82,6 +82,21 @@ the boss process will try to force them).
 A debug message. The configurator is about to perform one task of the plan it
 is currently executing on the named component.
 
+% BIND10_CONNECTING_TO_CC_FAIL failed to connect to configuration/command channel; try -v to see output from msgq
+The boss process tried to connect to the communication channel for
+commands and configuration updates during initialization, but it
+failed.  This is a fatal startup error, and process will soon
+terminate after some cleanup.  There can be several reasons for the
+failure, but the most likely cause is that the msgq daemon failed to
+start, and the most likely cause of the msgq failure is that it
+doesn't have a permission to create a socket file for the
+communication.  To confirm that, you can see debug messages from msgq
+by starting BIND 10 with the -v command line option.  If it indicates
+permission problem for msgq, make sure the directory where the socket
+file is to be created is writable for the msgq process.  Note that if
+you specify the -u option to change process users, the directory must
+be writable for that user.
+
 % BIND10_INVALID_STATISTICS_DATA invalid specification of statistics data specified
 An error was encountered when the boss module specified
 statistics data which is invalid for the boss specification file.
@@ -94,11 +109,6 @@ and continue running as the specified user, but the user is unknown.
 The boss module was not able to start every process it needed to start
 during startup, and will now kill the processes that did get started.
 
-% BIND10_KILL_PROCESS killing process %1
-The boss module is sending a kill signal to process with the given name,
-as part of the process of killing all started processes during a failed
-startup, as described for BIND10_KILLING_ALL_PROCESSES
-
 % BIND10_LOST_SOCKET_CONSUMER consumer %1 of sockets disconnected, considering all its sockets closed
 A connection from one of the applications which requested a socket was
 closed. This means the application has terminated, so all the sockets it was

+ 17 - 8
src/bin/bind10/bind10_src.py.in

@@ -331,11 +331,7 @@ class BoB:
             each one.  It then clears that list.
         """
         logger.info(BIND10_KILLING_ALL_PROCESSES)
-
-        for pid in self.components:
-            logger.info(BIND10_KILL_PROCESS, self.components[pid].name())
-            self.components[pid].kill(True)
-        self.components = {}
+        self.__kill_children(True)
 
     def _read_bind10_config(self):
         """
@@ -427,6 +423,7 @@ class BoB:
         while self.cc_session is None:
             # if we have been trying for "a while" give up
             if (time.time() - cc_connect_start) > 5:
+                logger.error(BIND10_CONNECTING_TO_CC_FAIL)
                 raise CChannelConnectError("Unable to connect to c-channel after 5 seconds")
 
             # try to connect, and if we can't wait a short while
@@ -1145,6 +1142,21 @@ def main():
 
     options = parse_args()
 
+    # Announce startup.  Making this is the first log message.
+    try:
+        logger.info(BIND10_STARTING, VERSION)
+    except RuntimeError as e:
+        sys.stderr.write('ERROR: failed to write the initial log: %s\n' %
+                         str(e))
+        sys.stderr.write("""\
+TIP: if this is about permission error for a lock file, check if the directory
+of the file is writable for the user of the bind10 process; often you need
+to start bind10 as a super user.  Also, if you specify the -u option to
+change the user and group, the directory must be writable for the group,
+and the created lock file must be writable for that user.
+""")
+        sys.exit(1)
+
     # Check user ID.
     setuid = None
     setgid = None
@@ -1177,9 +1189,6 @@ def main():
             logger.fatal(BIND10_INVALID_USER, options.user)
             sys.exit(1)
 
-    # Announce startup.
-    logger.info(BIND10_STARTING, VERSION)
-
     # Create wakeup pipe for signal handlers
     wakeup_pipe = os.pipe()
     signal.set_wakeup_fd(wakeup_pipe[1])

+ 6 - 3
src/bin/msgq/msgq.py.in

@@ -178,6 +178,8 @@ class MsgQ:
             if os.path.exists(self.socket_file):
                 os.remove(self.socket_file)
             self.listen_socket.close()
+            sys.stderr.write("[b10-msgq] failed to setup listener on %s: %s\n"
+                             % (self.socket_file, str(e)))
             raise e
 
         if self.poller:
@@ -543,9 +545,10 @@ if __name__ == "__main__":
 
     msgq = MsgQ(options.msgq_socket_file, options.verbose)
 
-    setup_result = msgq.setup()
-    if setup_result:
-        sys.stderr.write("[b10-msgq] Error on startup: %s\n" % setup_result)
+    try:
+        msgq.setup()
+    except Exception as e:
+        sys.stderr.write("[b10-msgq] Error on startup: %s\n" % str(e))
         sys.exit(1)
 
     try:

+ 30 - 10
src/lib/python/isc/bind10/sockcreator.py

@@ -16,6 +16,7 @@
 import socket
 import struct
 import os
+import errno
 import copy
 import subprocess
 import copy
@@ -36,16 +37,16 @@ class CreatorError(Exception):
     passed to the __init__ function.
     """
 
-    def __init__(self, message, fatal, errno=None):
+    def __init__(self, message, fatal, error_num=None):
         """
         Creates the exception. The message argument is the usual string.
         The fatal one tells if the error is fatal (eg. the creator crashed)
-        and errno is the errno value returned from socket creator, if
+        and error_num is the errno value returned from socket creator, if
         applicable.
         """
         Exception.__init__(self, message)
         self.fatal = fatal
-        self.errno = errno
+        self.errno = error_num
 
 class Parser:
     """
@@ -94,6 +95,13 @@ class Parser:
             self.__socket = None
             raise CreatorError(str(se), True)
 
+    def __addrport_str(self, address, port):
+        '''Convert a pair of IP address and port to common form for logging.'''
+        if address.family == socket.AF_INET:
+            return str(address) + ':' + str(port)
+        else:
+            return '[' + str(address) + ']:' + str(port)
+
     def get_socket(self, address, port, socktype):
         """
         Asks the socket creator process to create a socket. Pass an address
@@ -136,9 +144,9 @@ class Parser:
             elif answer == b'E':
                 # There was an error, read the error as well
                 error = self.__socket.recv(1)
-                errno = struct.unpack('i',
-                                      self.__read_all(len(struct.pack('i',
-                                                                      0))))
+                rcv_errno = struct.unpack('i',
+                                          self.__read_all(len(struct.pack('i',
+                                                                          0))))
                 if error == b'S':
                     cause = 'socket'
                 elif error == b'B':
@@ -147,10 +155,22 @@ class Parser:
                     self.__socket = None
                     logger.fatal(BIND10_SOCKCREATOR_BAD_CAUSE, error)
                     raise CreatorError('Unknown error cause' + str(answer), True)
-                logger.error(BIND10_SOCKET_ERROR, cause, errno[0],
-                             os.strerror(errno[0]))
-                raise CreatorError('Error creating socket on ' + cause, False,
-                                   errno[0])
+                logger.error(BIND10_SOCKET_ERROR, cause, rcv_errno[0],
+                             os.strerror(rcv_errno[0]))
+
+                # Provide as detailed information as possible on the error,
+                # as error related to socket creation is a common operation
+                # trouble.  In particular, we are intentionally very verbose
+                # if it fails due to "permission denied" so the administrator
+                # can easily identify what is wrong and how to fix it.
+                addrport = self.__addrport_str(address, port)
+                error_text = 'Error creating socket on ' + cause + \
+                    ' to be bound to ' + addrport + ': ' + \
+                    os.strerror(rcv_errno[0])
+                if rcv_errno[0] == errno.EACCES:
+                    error_text += ' - probably need to restart BIND 10 ' + \
+                        'as a super user'
+                raise CreatorError(error_text, False, rcv_errno[0])
             else:
                 self.__socket = None
                 logger.fatal(BIND10_SOCKCREATOR_BAD_RESPONSE, answer)

+ 4 - 2
src/lib/util/interprocess_sync_file.cc

@@ -15,6 +15,8 @@
 #include "interprocess_sync_file.h"
 
 #include <string>
+#include <cerrno>
+#include <cstring>
 
 #include <stdlib.h>
 #include <string.h>
@@ -68,8 +70,8 @@ InterprocessSyncFile::do_lock(int cmd, short l_type) {
 
         if (fd_ == -1) {
             isc_throw(InterprocessSyncFileError,
-                      "Unable to use interprocess sync lockfile: " +
-                      lockfile_path);
+                      "Unable to use interprocess sync lockfile ("
+                      << std::strerror(errno) << "): " << lockfile_path);
         }
     }