当前位置: 首页 > news >正文

秦皇岛手机网站制作价格/软文推广网站

秦皇岛手机网站制作价格,软文推广网站,云空间网站怎么做,wordpress 经典主题Dataset是一个强类型的特定领域的对象,这种对象可以函数式或者关系操作并行地转换。每个Dataset也有一个被称为一个DataFrame的类型化视图,这种DataFrame是Row类型的Dataset,即Dataset[Row]  Dataset是“懒惰”的,只在执行行动操…

  Dataset是一个强类型的特定领域的对象,这种对象可以函数式或者关系操作并行地转换。每个Dataset也有一个被称为一个DataFrame的类型化视图,这种DataFrame是Row类型的Dataset,即Dataset[Row]
  Dataset是“懒惰”的,只在执行行动操作时触发计算。本质上,数据集表示一个逻辑计划,该计划描述了产生数据所需的计算。当执行行动操作时,Spark的查询优化程序优化逻辑计划,并生成一个高效的并行和分布式物理计划。

 

示例数据字段解释

// affairs:一年来婚外情的频率 
// gender:性别 
// age:年龄 
// yearsmarried:婚龄 
// children:是否有小孩 
// religiousness:宗教信仰程度(5分制,1分表示反对,5分表示非常信仰)
// education:学历
// occupation:职业(逆向编号的戈登7种分类) 
// rating:对婚姻的自我评分(5分制,1表示非常不幸福,5表示非常幸福)

 

1.导入常用的包

import scala.math._
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.Dataset
import org.apache.spark.sql.Row
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.Column
import org.apache.spark.sql.DataFrameReader
import org.apache.spark.sql.functions._
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
import org.apache.spark.sql.Encoder
import org.apache.spark.sql.DataFrameStatFunctions

 2.创建SparkSession,并导入示例数据

val spark = SparkSession.builder().appName("Spark SQL basic example").config("spark.some.config.option", "some-value").getOrCreate()// For implicit conversions like converting RDDs to DataFrames
import spark.implicits._val dataList: List[(Double, String, Double, Double, String, Double, Double, Double, Double)] = List((0, "male", 37, 10, "no", 3, 18, 7, 4),(0, "female", 27, 4, "no", 4, 14, 6, 4),(0, "female", 32, 15, "yes", 1, 12, 1, 4),(0, "male", 57, 15, "yes", 5, 18, 6, 5),(0, "male", 22, 0.75, "no", 2, 17, 6, 3),(0, "female", 32, 1.5, "no", 2, 17, 5, 5),(0, "female", 22, 0.75, "no", 2, 12, 1, 3),(0, "male", 57, 15, "yes", 2, 14, 4, 4),(0, "female", 32, 15, "yes", 4, 16, 1, 2),(0, "male", 22, 1.5, "no", 4, 14, 4, 5))val data = dataList.toDF("affairs", "gender", "age", "yearsmarried", "children", "religiousness", "education", "occupation", "rating")data.printSchema()
root|-- affairs: double (nullable = false)|-- gender: string (nullable = true)|-- age: double (nullable = false)|-- yearsmarried: double (nullable = false)|-- children: string (nullable = true)|-- religiousness: double (nullable = false)|-- education: double (nullable = false)|-- occupation: double (nullable = false)|-- rating: double (nullable = false)

 3.操作指定的列和行

// 在Spark-shell中展示,前n条记录
data.show(7)
+-------+------+----+------------+--------+-------------+---------+----------+------+
|affairs|gender| age|yearsmarried|children|religiousness|education|occupation|rating|
+-------+------+----+------------+--------+-------------+---------+----------+------+
|    0.0|  male|37.0|        10.0|      no|          3.0|     18.0|       7.0|   4.0|
|    0.0|female|27.0|         4.0|      no|          4.0|     14.0|       6.0|   4.0|
|    0.0|female|32.0|        15.0|     yes|          1.0|     12.0|       1.0|   4.0|
|    0.0|  male|57.0|        15.0|     yes|          5.0|     18.0|       6.0|   5.0|
|    0.0|  male|22.0|        0.75|      no|          2.0|     17.0|       6.0|   3.0|
|    0.0|female|32.0|         1.5|      no|          2.0|     17.0|       5.0|   5.0|
|    0.0|female|22.0|        0.75|      no|          2.0|     12.0|       1.0|   3.0|
+-------+------+----+------------+--------+-------------+---------+----------+------+
only showing top 7 rows// 取前n条记录
val data3=data.limit(5)// 过滤
data.filter("age>50 and gender=='male' ").show
+-------+------+----+------------+--------+-------------+---------+----------+------+
|affairs|gender| age|yearsmarried|children|religiousness|education|occupation|rating|
+-------+------+----+------------+--------+-------------+---------+----------+------+
|    0.0|  male|57.0|        15.0|     yes|          5.0|     18.0|       6.0|   5.0|
|    0.0|  male|57.0|        15.0|     yes|          2.0|     14.0|       4.0|   4.0|
+-------+------+----+------------+--------+-------------+---------+----------+------+// 数据框的所有列val columnArray=data.columns
columnArray: Array[String] = Array(affairs, gender, age, yearsmarried, children, religiousness, education, occupation, rating)// 查询某些列的数据
data.select("gender", "age", "yearsmarried", "children").show(3)
+------+----+------------+--------+
|gender| age|yearsmarried|children|
+------+----+------------+--------+
|  male|37.0|        10.0|      no|
|female|27.0|         4.0|      no|
|female|32.0|        15.0|     yes|
+------+----+------------+--------+
only showing top 3 rowsval colArray=Array("gender", "age", "yearsmarried", "children")
colArray: Array[String] = Array(gender, age, yearsmarried, children)data.selectExpr(colArray:_*).show(3)
+------+----+------------+--------+
|gender| age|yearsmarried|children|
+------+----+------------+--------+
|  male|37.0|        10.0|      no|
|female|27.0|         4.0|      no|
|female|32.0|        15.0|     yes|
+------+----+------------+--------+
only showing top 3 rows// 操作指定的列,并排序
// data.selectExpr("gender", "age+1","cast(age as bigint)").orderBy($"gender".desc, $"age".asc).show
data.selectExpr("gender", "age+1 as age1","cast(age as bigint) as age2").sort($"gender".desc, $"age".asc).show
+------+----+----+
|gender|age1|age2|
+------+----+----+
|  male|23.0|  22|
|  male|23.0|  22|
|  male|38.0|  37|
|  male|58.0|  57|
|  male|58.0|  57|
|female|23.0|  22|
|female|28.0|  27|
|female|33.0|  32|
|female|33.0|  32|
|female|33.0|  32|
+------+----+----+

 4.查看SparkSQL逻辑和物理执行计划

val data4=data.selectExpr("gender", "age+1 as age1","cast(age as bigint) as age2").sort($"gender".desc, $"age".asc)
data4: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [gender: string, age1: double ... 1 more field]// 查看物理执行计划
data4.explain()
== Physical Plan ==
*Project [gender#20, age1#135, age2#136L]
+- *Sort [gender#20 DESC, age#21 ASC], true, 0+- Exchange rangepartitioning(gender#20 DESC, age#21 ASC, 200)+- LocalTableScan [gender#20, age1#135, age2#136L, age#21]// 查看逻辑和物理执行计划	  
data4.explain(extended=true)
== Parsed Logical Plan ==
'Sort ['gender DESC, 'age ASC], true
+- Project [gender#20, (age#21 + cast(1 as double)) AS age1#135, cast(age#21 as bigint) AS age2#136L]+- Project [_1#9 AS affairs#19, _2#10 AS gender#20, _3#11 AS age#21, _4#12 AS yearsmarried#22, _5#13 AS children#23, _6#14 AS religiousness#24, _7#15 AS education#25, _8#16 AS occupation#2
6, _9#17 AS rating#27]      +- LocalRelation [_1#9, _2#10, _3#11, _4#12, _5#13, _6#14, _7#15, _8#16, _9#17]== Analyzed Logical Plan ==
gender: string, age1: double, age2: bigint
Project [gender#20, age1#135, age2#136L]
+- Sort [gender#20 DESC, age#21 ASC], true+- Project [gender#20, (age#21 + cast(1 as double)) AS age1#135, cast(age#21 as bigint) AS age2#136L, age#21]+- Project [_1#9 AS affairs#19, _2#10 AS gender#20, _3#11 AS age#21, _4#12 AS yearsmarried#22, _5#13 AS children#23, _6#14 AS religiousness#24, _7#15 AS education#25, _8#16 AS occupatio
n#26, _9#17 AS rating#27]         +- LocalRelation [_1#9, _2#10, _3#11, _4#12, _5#13, _6#14, _7#15, _8#16, _9#17]== Optimized Logical Plan ==
Project [gender#20, age1#135, age2#136L]
+- Sort [gender#20 DESC, age#21 ASC], true+- LocalRelation [gender#20, age1#135, age2#136L, age#21]== Physical Plan ==
*Project [gender#20, age1#135, age2#136L]
+- *Sort [gender#20 DESC, age#21 ASC], true, 0+- Exchange rangepartitioning(gender#20 DESC, age#21 ASC, 200)+- LocalTableScan [gender#20, age1#135, age2#136L, age#21]

 

转载于:https://www.cnblogs.com/wwxbi/p/6101460.html

http://www.lbrq.cn/news/1368271.html

相关文章:

  • 如何在百度上做网站/南昌seo专业团队
  • 利用电脑做网站/百度怎么推广广告
  • 青海住房和建设厅网站/百度推广渠道代理
  • 二维码怎么制作出来的/网站做优化好还是推广好
  • facebook做网站/矿坛器材友情交换
  • php网站建设制作方案/爱站官网
  • 长沙做软件的公司/seo营销推广
  • 建设沙滩车官方网站/全网搜索指数查询
  • 网站建设公司华网天下买送活动/武汉seo排名
  • 网站建设后的心得/互联网营销师在哪里报名
  • 淘宝商家版登录入口/班级优化大师app下载学生版
  • 试玩平台怎么做网站/全国十大跨境电商排名
  • 济宁网站建设价格/html网站模板免费
  • 做定制网站/百度搜索智能精选
  • 济南品牌网站建设价格/查询网 域名查询
  • 网站开发需要投入多少时间/谷歌seo搜索引擎下载
  • 做网站怎么找客户联系方式/如何推广我的网站
  • 广东网站备案要多久/1个百度指数代表多少搜索
  • 网站设计 优帮云/seo网站优化排名
  • 淘宝网站如何做虚拟/扬州seo
  • 永久免费国外ip代理/宁波网站推广优化哪家正规
  • iis默认网站路径/郑州中原区最新消息
  • 青羊区定制网站建设报价/惠州seo博客
  • 杭州网站建设及推广/百度官网推广平台
  • 网站怎么做精准引流/百度云app
  • 网站建设需要的费用/搜索引擎优化免费
  • 贵州软件开发 网站开发/竞价恶意点击报案
  • 印刷网站建设 优帮云/网络营销网站建设案例
  • 文件包上传的网站怎么做/石家庄网站建设公司
  • 营销型网站重要特点是/全网营销推广怎么做
  • 深度学习中的模型知识蒸馏
  • webrtv弱网-QualityScalerResource 源码分析及算法原理
  • PPT自动化 python-pptx - 9: 图表(chart)
  • web前端React和Vue框架与库安全实践
  • leetcode热题——组合
  • 通俗易懂解释Java8 HashMap