现象
执行Spark Streaming Application时报错
15/07/09 11:26:55 INFO scheduler.JobGenerator: Stopping JobGenerator immediately15/07/09 11:26:55 INFO util.RecurringTimer: Stopped timer for JobGenerator after time -115/07/09 11:26:55 INFO streaming.CheckpointWriter: CheckpointWriter executor terminated ? true, waited for 0 ms.15/07/09 11:26:55 INFO scheduler.JobGenerator: Stopped JobGenerator15/07/09 11:26:55 INFO scheduler.JobScheduler: Stopped JobSchedulerException in thread "main" org.apache.spark.SparkException: org.apache.spark.streaming.dstream.MappedDStream@5a69b104 has not been initialized at org.apache.spark.streaming.dstream.DStream.isTimeValid(DStream.scala:321) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:342) at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:342) at scala.Option.orElse(Option.scala:257) at org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:339) at org.apache.spark.streaming.dstream.ForEachDStream.generateJob(ForEachDStream.scala:38) at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:120) at org.apache.spark.streaming.DStreamGraph$$anonfun$1.apply(DStreamGraph.scala:120) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) at org.apache.spark.streaming.DStreamGraph.generateJobs(DStreamGraph.scala:120) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$restart$4.apply(JobGenerator.scala:227) at org.apache.spark.streaming.scheduler.JobGenerator$$anonfun$restart$4.apply(JobGenerator.scala:222) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at org.apache.spark.streaming.scheduler.JobGenerator.restart(JobGenerator.scala:222) at org.apache.spark.streaming.scheduler.JobGenerator.start(JobGenerator.scala:92) at org.apache.spark.streaming.scheduler.JobScheduler.start(JobScheduler.scala:73) at org.apache.spark.streaming.StreamingContext.liftedTree1$1(StreamingContext.scala:588) at org.apache.spark.streaming.StreamingContext.start(StreamingContext.scala:586) at DubaNewsMinisiteClick$.main(DubaNewsMinisiteClick.scala:171) at DubaNewsMinisiteClick.main(DubaNewsMinisiteClick.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
原因及方案
- 原因: StreamingContext.checkpoint(...)指定的checkpoint目录下有另一个application生成的checkpoint文件, 导致StreamContext.getOrCreate时加载checkpoint失败, StreamingContext初始化失败
- 解决方案: 删除该checkpoint目录或换一个目录进行checkpoint。 对于重新编译的Application, checkpoint目录下必须没有其他application生成的checkpoint文件