Uploaded image for project: 'PuppetDB'
  1. PuppetDB
  2. PDB-159

Corrupt KahaDB db.data file may cause exception upon receiving any command

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Normal
    • Resolution: Won't Fix
    • Affects Version/s: PDB 1.5.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
    • Template:
    • Story Points:
      13

      Description

      If the KahaDB db.data file is corrupt, you may receive an exception such as:

      <pre>
      2013-02-13 12:06:10,492 INFO [clojure-agent-send-off-pool-2] [server.AbstractConnector] Started SslSelectChannelConnector@puppetdb1.vm:8081
      2013-02-13 12:06:25,078 ERROR [ConcurrentQueueStoreAndDispatch] [kahadb.MessageDatabase] KahaDB failed to store to Journal
      java.io.EOFException
      at java.io.RandomAccessFile.readFully(RandomAccessFile.java:416)
      at java.io.RandomAccessFile.readFully(RandomAccessFile.java:394)
      at org.apache.activemq.store.kahadb.disk.page.PageFile.readPage(PageFile.java:876)
      at org.apache.activemq.store.kahadb.disk.page.Transaction$2.readPage(Transaction.java:456)
      at org.apache.activemq.store.kahadb.disk.page.Transaction$2.<init>(Transaction.java:447)
      at org.apache.activemq.store.kahadb.disk.page.Transaction.openInputStream(Transaction.java:444)
      at org.apache.activemq.store.kahadb.disk.page.Transaction.load(Transaction.java:420)
      at org.apache.activemq.store.kahadb.disk.page.Transaction.load(Transaction.java:377)
      at org.apache.activemq.store.kahadb.disk.index.BTreeIndex.loadNode(BTreeIndex.java:262)
      at org.apache.activemq.store.kahadb.disk.index.BTreeIndex.getRoot(BTreeIndex.java:174)
      at org.apache.activemq.store.kahadb.disk.index.BTreeIndex.put(BTreeIndex.java:189)
      at org.apache.activemq.store.kahadb.MessageDatabase.upadateIndex(MessageDatabase.java:1240)
      at org.apache.activemq.store.kahadb.MessageDatabase$14.execute(MessageDatabase.java:1066)
      at org.apache.activemq.store.kahadb.disk.page.Transaction.execute(Transaction.java:779)
      at org.apache.activemq.store.kahadb.MessageDatabase.process(MessageDatabase.java:1063)
      at org.apache.activemq.store.kahadb.MessageDatabase$13.visit(MessageDatabase.java:1010)
      at org.apache.activemq.store.kahadb.data.KahaAddMessageCommand.visit(KahaAddMessageCommand.java:241)
      at org.apache.activemq.store.kahadb.MessageDatabase.process(MessageDatabase.java:1007)
      at org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:918)
      at org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:900)
      at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.addMessage(KahaDBStore.java:432)
      at org.apache.activemq.store.kahadb.KahaDBStore$StoreQueueTask.run(KahaDBStore.java:1191)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
      at java.lang.Thread.run(Thread.java:722)
      2013-02-13 12:06:25,102 WARN [qtp326004052-37] [server.AbstractHttpConnection] /v2/commands
      org.springframework.jms.UncategorizedJmsException: Uncategorized exception occured during JMS processing; nested exception is javax.jms.JMSException: java.io.EOFException
      at org.springframework.jms.support.JmsUtils.convertJmsAccessException(JmsUtils.java:316)
      at org.springframework.jms.support.JmsAccessor.convertJmsAccessException(JmsAccessor.java:168)
      at org.springframework.jms.core.JmsTemplate.execute(JmsTemplate.java:469)
      at org.springframework.jms.core.JmsTemplate.send(JmsTemplate.java:543)
      at org.springframework.jms.core.JmsTemplate.convertAndSend(JmsTemplate.java:653)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:601)
      at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
      at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)
      at clamq.jms$jms_producer$reify__2947.publish(jms.clj:29)
      at clamq.jms$jms_producer$reify__2947.publish(jms.clj:30)
      at clamq.protocol.producer$eval2903$fn_2904$G2895_2912.invoke(producer.clj:3)
      at clamq.protocol.producer$eval2903$fn_2904$G2894_2921.invoke(producer.clj:3)
      at clojure.lang.AFn.applyToHelper(AFn.java:167)
      at clojure.lang.AFn.applyTo(AFn.java:151)
      at clojure.core$apply.invoke(core.clj:603)
      at com.puppetlabs.mq$connect_and_publish_BANG_.doInvoke(mq.clj:136)
      at clojure.lang.RestFn.invoke(RestFn.java:439)
      at com.puppetlabs.puppetdb.command$enqueue_raw_command_BANG_$fn__3563.invoke(command.clj:254)
      at com.puppetlabs.puppetdb.command$enqueue_raw_command_BANG_.invoke(command.clj:253)
      at com.puppetlabs.puppetdb.http.v1.command$enqueue_command.invoke(command.clj:22)
      at com.puppetlabs.middleware$verify_accepts_content_type$fn__4049.invoke(middleware.clj:67)
      at com.puppetlabs.middleware$verify_checksum$fn__4058.invoke(middleware.clj:102)
      at com.puppetlabs.middleware$verify_param_exists$fn__4053.invoke(middleware.clj:79)
      at com.puppetlabs.puppetdb.http.v1.command$command_app.invoke(command.clj:27)
      at com.puppetlabs.puppetdb.http.v2$v2_app$fn__6449.invoke(v2.clj:12)
      at net.cgrand.moustache$alter_request$fn__4259.invoke(moustache.clj:54)
      at com.puppetlabs.puppetdb.http.v2$v2_app.invoke(v2.clj:12)
      at com.puppetlabs.puppetdb.http.server$routes$fn__6851.invoke(server.clj:27)
      at net.cgrand.moustache$alter_request$fn__4259.invoke(moustache.clj:54)
      at com.puppetlabs.puppetdb.http.server$routes.invoke(server.clj:27)
      at ring.middleware.resource$wrap_resource$fn__6831.invoke(resource.clj:14)
      at ring.middleware.params$wrap_params$fn__4209.invoke(params.clj:55)
      at com.puppetlabs.middleware$wrap_with_authorization$fn__4035.invoke(middleware.clj:21)
      at com.puppetlabs.middleware$wrap_with_certificate_cn$fn__4039.invoke(middleware.clj:36)
      at com.puppetlabs.middleware$wrap_with_default_body$fn__4042.invoke(middleware.clj:43)
      at com.puppetlabs.middleware$wrap_with_metrics_STAR_$fn_4062$fn_4063.invoke(middleware.clj:119)
      at com.puppetlabs.middleware.proxy$java.lang.Object$Callable$f8c5758f.call(Unknown Source)
      at com.yammer.metrics.core.Timer.time(Timer.java:91)
      at com.puppetlabs.middleware$wrap_with_metrics_STAR_$fn__4062.invoke(middleware.clj:117)
      at com.puppetlabs.middleware$wrap_with_globals$fn__4045.invoke(middleware.clj:54)
      at ring.adapter.jetty$proxy_handler$fn__3880.invoke(jetty.clj:18)
      at ring.adapter.jetty.proxy$org.eclipse.jetty.server.handler.AbstractHandler$0.handle(Unknown Source)
      at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
      at org.eclipse.jetty.server.Server.handle(Server.java:349)
      at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:452)
      at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:894)
      at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:948)
      at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
      at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)
      at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:76)
      at org.eclipse.jetty.io.nio.SslConnection.handle(SslConnection.java:191)
      at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:609)
      at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:45)
      at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
      at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
      at java.lang.Thread.run(Thread.java:722)
      Caused by: javax.jms.JMSException: java.io.EOFException
      at org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:49)
      at org.apache.activemq.ActiveMQConnection.syncSendPacket(ActiveMQConnection.java:1391)
      at org.apache.activemq.ActiveMQConnection.syncSendPacket(ActiveMQConnection.java:1319)
      at org.apache.activemq.ActiveMQSession.send(ActiveMQSession.java:1798)
      at org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:289)
      at org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:224)
      at org.apache.activemq.ActiveMQMessageProducerSupport.send(ActiveMQMessageProducerSupport.java:269)
      at org.springframework.jms.connection.CachedMessageProducer.send(CachedMessageProducer.java:117)
      at org.springframework.jms.core.JmsTemplate.doSend(JmsTemplate.java:592)
      at org.springframework.jms.core.JmsTemplate.doSend(JmsTemplate.java:569)
      at org.springframework.jms.core.JmsTemplate$4.doInJms(JmsTemplate.java:546)
      at org.springframework.jms.core.JmsTemplate.execute(JmsTemplate.java:466)
      ... 56 more
      Caused by: java.util.concurrent.ExecutionException: java.io.EOFException
      at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
      at java.util.concurrent.FutureTask.get(FutureTask.java:111)
      at org.apache.activemq.broker.region.Queue.doMessageSend(Queue.java:799)
      at org.apache.activemq.broker.region.Queue.send(Queue.java:721)
      at org.apache.activemq.broker.region.AbstractRegion.send(AbstractRegion.java:406)
      at org.apache.activemq.broker.region.RegionBroker.send(RegionBroker.java:392)
      at org.apache.activemq.broker.jmx.ManagedRegionBroker.send(ManagedRegionBroker.java:282)
      at org.apache.activemq.broker.BrokerFilter.send(BrokerFilter.java:129)
      at org.apache.activemq.broker.scheduler.SchedulerBroker.send(SchedulerBroker.java:177)
      at org.apache.activemq.broker.BrokerFilter.send(BrokerFilter.java:129)
      at org.apache.activemq.broker.CompositeDestinationBroker.send(CompositeDestinationBroker.java:96)
      at org.apache.activemq.broker.TransactionBroker.send(TransactionBroker.java:317)
      at org.apache.activemq.broker.MutableBrokerFilter.send(MutableBrokerFilter.java:135)
      at org.apache.activemq.broker.TransportConnection.processMessage(TransportConnection.java:499)
      at org.apache.activemq.command.ActiveMQMessage.visit(ActiveMQMessage.java:749)
      at org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:329)
      at org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:184)
      at org.apache.activemq.transport.ResponseCorrelator.onCommand(ResponseCorrelator.java:116)
      at org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
      at org.apache.activemq.transport.vm.VMTransport.iterate(VMTransport.java:241)
      at org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:129)
      at org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:47)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
      ... 1 more
      Caused by: java.io.EOFException
      at java.io.RandomAccessFile.readFully(RandomAccessFile.java:416)
      at java.io.RandomAccessFile.readFully(RandomAccessFile.java:394)
      at org.apache.activemq.store.kahadb.disk.page.PageFile.readPage(PageFile.java:876)
      at org.apache.activemq.store.kahadb.disk.page.Transaction$2.readPage(Transaction.java:456)
      at org.apache.activemq.store.kahadb.disk.page.Transaction$2.<init>(Transaction.java:447)
      at org.apache.activemq.store.kahadb.disk.page.Transaction.openInputStream(Transaction.java:444)
      at org.apache.activemq.store.kahadb.disk.page.Transaction.load(Transaction.java:420)
      at org.apache.activemq.store.kahadb.disk.page.Transaction.load(Transaction.java:377)
      at org.apache.activemq.store.kahadb.disk.index.BTreeIndex.loadNode(BTreeIndex.java:262)
      at org.apache.activemq.store.kahadb.disk.index.BTreeIndex.getRoot(BTreeIndex.java:174)
      at org.apache.activemq.store.kahadb.disk.index.BTreeIndex.put(BTreeIndex.java:189)
      at org.apache.activemq.store.kahadb.MessageDatabase.upadateIndex(MessageDatabase.java:1240)
      at org.apache.activemq.store.kahadb.MessageDatabase$14.execute(MessageDatabase.java:1066)
      at org.apache.activemq.store.kahadb.disk.page.Transaction.execute(Transaction.java:779)
      at org.apache.activemq.store.kahadb.MessageDatabase.process(MessageDatabase.java:1063)
      at org.apache.activemq.store.kahadb.MessageDatabase$13.visit(MessageDatabase.java:1010)
      at org.apache.activemq.store.kahadb.data.KahaAddMessageCommand.visit(KahaAddMessageCommand.java:241)
      at org.apache.activemq.store.kahadb.MessageDatabase.process(MessageDatabase.java:1007)
      at org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:918)
      at org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:900)
      at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.addMessage(KahaDBStore.java:432)
      at org.apache.activemq.store.kahadb.KahaDBStore$StoreQueueTask.run(KahaDBStore.java:1191)
      ... 3 more
      </pre>

      Subsequent commands submitted to PuppetDB then result in the following errors, as the broker has, well - been broken:

      <pre>
      2013-02-13 12:15:53,687 INFO [command-proc-44] [listener.DefaultMessageListenerContainer] Successfully refreshed JMS Connection
      2013-02-13 12:16:56,839 WARN [qtp764430233-39] [server.AbstractHttpConnection] /v2/commands
      org.springframework.jms.UncategorizedJmsException: Uncategorized exception occured during JMS processing; nested exception is javax.jms.JMSException: Could not create Transport. Reason: java.io.IOException: Broker named 'localhost' does not exist.
      at org.springframework.jms.support.JmsUtils.convertJmsAccessException(JmsUtils.java:316)
      at org.springframework.jms.support.JmsAccessor.convertJmsAccessException(JmsAccessor.java:168)
      at org.springframework.jms.core.JmsTemplate.execute(JmsTemplate.java:469)
      at org.springframework.jms.core.JmsTemplate.send(JmsTemplate.java:543)
      at org.springframework.jms.core.JmsTemplate.convertAndSend(JmsTemplate.java:653)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:601)
      at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
      at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)
      at clamq.jms$jms_producer$reify__2946.publish(jms.clj:29)
      at clamq.jms$jms_producer$reify__2946.publish(jms.clj:30)
      at clamq.protocol.producer$eval2902$fn_2903$G2894_2911.invoke(producer.clj:3)
      at clamq.protocol.producer$eval2902$fn_2903$G2893_2920.invoke(producer.clj:3)
      at clojure.lang.AFn.applyToHelper(AFn.java:167)
      at clojure.lang.AFn.applyTo(AFn.java:151)
      at clojure.core$apply.invoke(core.clj:603)
      at com.puppetlabs.mq$connect_and_publish_BANG_.doInvoke(mq.clj:136)
      at clojure.lang.RestFn.invoke(RestFn.java:439)
      at com.puppetlabs.puppetdb.command$enqueue_raw_command_BANG_$fn__3562.invoke(command.clj:254)
      at com.puppetlabs.puppetdb.command$enqueue_raw_command_BANG_.invoke(command.clj:253)
      at com.puppetlabs.puppetdb.http.v1.command$enqueue_command.invoke(command.clj:22)
      at com.puppetlabs.middleware$verify_accepts_content_type$fn__4048.invoke(middleware.clj:67)
      at com.puppetlabs.middleware$verify_checksum$fn__4057.invoke(middleware.clj:102)
      at com.puppetlabs.middleware$verify_param_exists$fn__4052.invoke(middleware.clj:79)
      at com.puppetlabs.puppetdb.http.v1.command$command_app.invoke(command.clj:27)
      at com.puppetlabs.puppetdb.http.v2$v2_app$fn__6448.invoke(v2.clj:12)
      at net.cgrand.moustache$alter_request$fn__4258.invoke(moustache.clj:54)
      at com.puppetlabs.puppetdb.http.v2$v2_app.invoke(v2.clj:12)
      at com.puppetlabs.puppetdb.http.server$routes$fn__6850.invoke(server.clj:27)
      at net.cgrand.moustache$alter_request$fn__4258.invoke(moustache.clj:54)
      at com.puppetlabs.puppetdb.http.server$routes.invoke(server.clj:27)
      at ring.middleware.resource$wrap_resource$fn__6830.invoke(resource.clj:14)
      at ring.middleware.params$wrap_params$fn__4208.invoke(params.clj:55)
      at com.puppetlabs.middleware$wrap_with_authorization$fn__4034.invoke(middleware.clj:21)
      at com.puppetlabs.middleware$wrap_with_certificate_cn$fn__4038.invoke(middleware.clj:36)
      at com.puppetlabs.middleware$wrap_with_default_body$fn__4041.invoke(middleware.clj:43)
      at com.puppetlabs.middleware$wrap_with_metrics_STAR_$fn_4061$fn_4062.invoke(middleware.clj:119)
      at com.puppetlabs.middleware.proxy$java.lang.Object$Callable$f8c5758f.call(Unknown Source)
      at com.yammer.metrics.core.Timer.time(Timer.java:91)
      at com.puppetlabs.middleware$wrap_with_metrics_STAR_$fn__4061.invoke(middleware.clj:117)
      at com.puppetlabs.middleware$wrap_with_globals$fn__4044.invoke(middleware.clj:54)
      at ring.adapter.jetty$proxy_handler$fn__3879.invoke(jetty.clj:18)
      at ring.adapter.jetty.proxy$org.eclipse.jetty.server.handler.AbstractHandler$0.handle(Unknown Source)
      at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
      at org.eclipse.jetty.server.Server.handle(Server.java:349)
      at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:452)
      at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:894)
      at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:948)
      at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
      at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230)
      at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:76)
      at org.eclipse.jetty.io.nio.SslConnection.handle(SslConnection.java:191)
      at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:609)
      at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:45)
      at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
      at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
      at java.lang.Thread.run(Thread.java:722)
      Caused by: javax.jms.JMSException: Could not create Transport. Reason: java.io.IOException: Broker named 'localhost' does not exist.
      at org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:35)
      at org.apache.activemq.ActiveMQConnectionFactory.createTransport(ActiveMQConnectionFactory.java:254)
      at org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:267)
      at org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:239)
      at org.apache.activemq.ActiveMQConnectionFactory.createConnection(ActiveMQConnectionFactory.java:185)
      at org.springframework.jms.connection.SingleConnectionFactory.doCreateConnection(SingleConnectionFactory.java:342)
      at org.springframework.jms.connection.SingleConnectionFactory.initConnection(SingleConnectionFactory.java:288)
      at org.springframework.jms.connection.SingleConnectionFactory.createConnection(SingleConnectionFactory.java:225)
      at org.springframework.jms.support.JmsAccessor.createConnection(JmsAccessor.java:184)
      at org.springframework.jms.core.JmsTemplate.execute(JmsTemplate.java:456)
      ... 56 more
      Caused by: java.io.IOException: Broker named 'localhost' does not exist.
      at org.apache.activemq.transport.vm.VMTransportFactory.doCompositeConnect(VMTransportFactory.java:116)
      at org.apache.activemq.transport.vm.VMTransportFactory.doConnect(VMTransportFactory.java:54)
      at org.apache.activemq.transport.TransportFactory.doConnect(TransportFactory.java:51)
      at org.apache.activemq.transport.TransportFactory.connect(TransportFactory.java:80)
      at org.apache.activemq.ActiveMQConnectionFactory.createTransport(ActiveMQConnectionFactory.java:252)
      ... 64 more
      </pre>

      This entire effect can be easily replicated using:

      <pre>
      truncate -s 32700 db.data
      </pre>

      ... and then attempting to start PuppetDB.

      In most circumstances the KahaDB implementation within ActiveMQ used by PuppetDB should attempt data recovery automatically or at least report a meaningful error. This exception however is not caught properly by KahaDB, and the error returned doesn't give the PuppetDB user a clue as to what is going on.

      *Note:* Be aware that none of this corruption should not affect your long term storage of catalogues, facts and reports etc. stored in PostgreSQL or HSQLDB. It is simply a loss of the working queue of commands submitted to PuppetDB (like `replace catalogue` and `replace facts`) that have no yet been processed. This queue usually only contains seconds or perhaps minutes of queued content.

          1. Workaround

      In this scenario you have a few options, but before you start any activity - you should backup the `/var/lib/puppetdb/mq/localhost/KahaDB` directory. The `KahaDB` directory contains any in-flight commands that potentially have not been processed yet by PuppetDB, while in some circumstances this data may be recovered (by running puppet on your nodes for example) if you are concerned about in-flight data it's best to be conservative. Obviously be aware that data corruption may mean unrecoverable loss of data - and if such a circumstance arose you should re-run puppet (with perhaps --noop) on your nodes to repopulate PuppetDB.

      Possible recovery options:

      • Ideally if you have an uncorrupt version of the main index (`db.data`) you can attempt to restore that file. Be aware that the data must be recent, anything older than a few minutes from when the corruption occurred will have old data which has probably already been processed.
      • If you lack a copy of the original index (`db.data`), you can attempt to just delete it. Upon restarting PuppetDB, KahaDB should then attempt recovery from the corresponding journal files (`db-*.log`).
      • Move the `KahaDB` directory out of the way (for example `KahaDB.old`), thus abandoning the old data . And restart PuppetDB. This is usually enough to make things work again.

      If the exception has differed slightly from the one in this bug report, but the solution was the same - we are interested in both the exception and most probably the content of your corrupt KahaDB directory so we can replicate the issue. Feel free to comment on this bug, or raise a new issue. If you wish to send us data privately this can be arranged as well.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              redmine.exporter redmine.exporter
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Zendesk Support