SparkSQL中DataFrame registerTempTable源码浅析

zhao_rock

浏览: 188090 次
性别:
来自: 大连

最近访客更多访客>>

hejianhua66

qq113220715

qryt520

lzyboy

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Spark

大数据实时计算 SparkSQL

dataFrame.registerTempTable(tableName);
最近在使用SparkSQL时想到1万条数据注册成临时表和1亿条数据注册成临时表时，效率上是否会有很大的差距，也对DataFrame注册成临时表到底做了哪些比较好奇，拿来源码拜读了下相关部分，记录一下。

临时表的生命周期是和创建该DataFrame的SQLContext有关系的，SQLContext生命周期结束，该临时表的生命周期也结束了

DataFrame.scala相关源码
/**
   * Registers this [[DataFrame]] as a temporary table using the given name. The lifetime of this
   * temporary table is tied to the [[SQLContext]] that was used to create this DataFrame.
   *
   * @group basic
   * @since 1.3.0
   */
def registerTempTable(tableName: String): Unit = {
    sqlContext.registerDataFrameAsTable(this, tableName)
}

DataFrame中的registerTempTable调用SQLContext中的registerDataFrameAsTable,
SQLContext中使用SimpleCatalog类去实现Catalog接口中的registerTable方法.

SQLContext.scala相关源码
@transient
protected[sql] lazy val catalog: Catalog = new SimpleCatalog(conf)
/**
   * Registers the given [[DataFrame]] as a temporary table in the catalog. Temporary tables exist
   * only during the lifetime of this instance of SQLContext.
   */
private[sql] def registerDataFrameAsTable(df: DataFrame, tableName: String): Unit = {
    catalog.registerTable(Seq(tableName), df.logicalPlan)
}

    在SimpleCatalog中定义了Map，registerTable中按tableIdentifier为key，logicalPlan为Value注册到名为tables的map中

Catalog.scala相关源码
val tables = new mutable.HashMap[String, LogicalPlan]()
override def registerTable(
      tableIdentifier: Seq[String],
      plan: LogicalPlan): Unit = {
    val tableIdent = processTableIdentifier(tableIdentifier)
    tables += ((getDbTableName(tableIdent), plan))
}

protected def processTableIdentifier(tableIdentifier: Seq[String]): Seq[String] = {
    if (conf.caseSensitiveAnalysis) {
      tableIdentifier
    } else {
      tableIdentifier.map(_.toLowerCase)
    }
}

protected def getDbTableName(tableIdent: Seq[String]): String = {
    val size = tableIdent.size
    if (size <= 2) {
      tableIdent.mkString(".")
    } else {
      tableIdent.slice(size - 2, size).mkString(".")
    }
}

阅读以上代码，最终registerTempTable是将表名(或表的标识)和对应的逻辑计划加载到Map中，并随着SQLContext的消亡而消亡

0
顶

1
踩

分享到：

quartz中设置Job不并发执行 | 使用Maven为代码构建依赖包

2015-10-09 13:56
浏览 7057
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论