def foreach(f: T => Unit): Unit = withScope {
  val cleanF = sc.clean(f)
  sc.runJob(this, (iter: Iterator[T]) => iter.foreach(cleanF))
}

def foreachPartition(f: Iterator[T] => Unit): Unit = withScope {
  val cleanF = sc.clean(f)
  sc.runJob(this, (iter: Iterator[T]) => cleanF(iter))
}

Sep 2 BigData Spark, 性能优化 Comments Word Count: 851(words) Read Count: 3(minutes)

Kafka与Flume区别

Aug 28 DevOps Flume, Kafka Comments Word Count: 535(words) Read Count: 1(minutes)

Scala异常获取一例

在处理第11行读文件时，由于数据文件出现的不规律，在指定日期内可能存在日志文件不存在的情况，这里需要处理下异常：

def readLog(sc: SparkContext, startDate: String, endDate: String, logNames: List[String]): RDD[String] = {
  val dateLst = DateUtils.getDateListBetweenTwoDate(startDate, endDate)

  var logRdd = sc.makeRDD(List[String]())
  for (date <- dateLst) {
    val year = date.substring(0, 4)
    val month = date.substring(4, 6)
    val day = date.substring(6, 8)
    for (logName <- logNames) {
       val logRdd = logRdd.union(
        try {sc.textFile(s"cosn://fuge/mid-data/fuge/ssp/bid-log/$year/$month/$day/${logName}*")
          .map(x => x.split("\\|", -1))
          .filter(x => x.length >= 2 && (x(1).trim == "6" || x(1).trim == "0")).map(_.toString) // 0和6为请求成功的状态码
        } catch {
          case _: Exception => sc.makeRDD(List[String]())
        }
      )
    }
  }
  logRdd
}

Aug 21 Coding Exception, Scala Comments Word Count: 312(words) Read Count: 1(minutes)

红芯浏览器下载

宣称打破美国垄断、自主研发出国产浏览器内核的红芯浏览器，却被质疑使用的是谷歌Chrome内核，而且是两年前的旧版内核（详细剖析可参考：融资2.5亿的国产浏览器之光，竟然只是谷歌浏览器换了层皮？

Aug 16 Opinions Shame, 红芯浏览器 Comments Word Count: 407(words) Read Count: 1(minutes)

Gitlab问题小结

supervise_redis_sleep 长时间卡死

解决方案：

1、按住CTRL+C强制结束；

2、运行：sudo systemctl restart gitlab-runsvdir；

3、再次执行：sudo gitlab-ctl reconfigure

Aug 8 Issues Gitlab, Issue Comments Word Count: 228(words) Read Count: 1(minutes)

无GUI的CentOS上使用Selenium+Chrome

客户的网站上的监测代码最近连续两次在网站更新时被清除掉，导致无法正常获取网站访问数据，影响到后续大数据分析。

为解决这个问题，决定使用Python Selenium模块来实现网站按钮模拟点击，同时监测我们后台是否能正常收到，以此来判断网站按钮监测代码是否有正常部署。

Selenium很好用很强大，开发和部署也都很简单，是自动化测试非常好的工具，但是问题是我们需要在无GUI的服务器上进行部署，这就牵涉到在无GUI的服务器上安装浏览器的问题，我这里选择的是Chrome。

下面简单分享一个部署过程中遇到的坑，也当作是总结。

Aug 7 Python Chromium, Selenium Comments Word Count: 806(words) Read Count: 3(minutes)

V

V’s speech is recognized by the analysts at Smith Change the World Incorporated as one of the most influential speeches of the near future.

Jul 24 Voice V Comments Word Count: 1k(words) Read Count: 6(minutes)

Hadoop集群中banlancer用法简介

May 3 BigData BigData, Hadoop Comments Word Count: 436(words) Read Count: 1(minutes)