Apache Pig教程 - BinStorage()函数 - IT学院 - 中国软件协会智能应用服务分会

登陆可实现PC、移动端同步进度.

BinStorage()函数

1115 0 1

BinStorage() 函数使用机器可读格式将数据加载并存储到Pig中。Pig中的 BinStorge() 通常用于存储MapReduce作业之间生成的临时数据，它支持多个位置作为输入。

语法

下面给出了 BinStorage() 函数的语法。

grunt> BinStorage();

例

假设在HDFS目录 /pig_data/ 中有一个名为 stu_data.txt 的文件，如下所示。

Stu_data.txt

001,Rajiv_Reddy,21,Hyderabad 
002,siddarth_Battacharya,22,Kolkata 
003,Rajesh_Khanna,22,Delhi 
004,Preethi_Agarwal,21,Pune 
005,Trupthi_Mohanthy,23,Bhuwaneshwar 
006,Archana_Mishra,23,Chennai 
007,Komal_Nayak,24,trivendram 
008,Bharathi_Nambiayar,24,Chennai

让我们将这些数据加载到一个关系中，如下所示。

grunt> student_details = LOAD 'hdfs://localhost:9000/pig_data/stu_data.txt' USING PigStorage(',')
   as (id:int, firstname:chararray, age:int, city:chararray);

现在，我们可以使用 BinStorage() 函数将此关系存储到名为 /pig_data/ 的HDFS目录中。

grunt> STORE student_details INTO 'hdfs://localhost:9000/pig_Output/mydata' USING BinStorage();

执行上述语句后，关系存储在给定的HDFS目录中。你可以使用HDFS ls命令查看它，如下所示。

$ hdfs dfs -ls hdfs://localhost:9000/pig_Output/mydata/
  
Found 2 items 
-rw-r--r--   1 Hadoop supergroup       0 2015-10-26 16:58
hdfs://localhost:9000/pig_Output/mydata/_SUCCESS

-rw-r--r--   1 Hadoop supergroup        372 2015-10-26 16:58
hdfs://localhost:9000/pig_Output/mydata/part-m-00000

现在，从文件 part-m-00000 加载数据。

grunt> result = LOAD 'hdfs://localhost:9000/pig_Output/b/part-m-00000' USING BinStorage();

验证关系的内容如下所示

grunt> Dump result; 

(1,Rajiv_Reddy,21,Hyderabad) 
(2,siddarth_Battacharya,22,Kolkata) 
(3,Rajesh_Khanna,22,Delhi) 
(4,Preethi_Agarwal,21,Pune) 
(5,Trupthi_Mohanthy,23,Bhuwaneshwar) 
(6,Archana_Mishra,23,Chennai) 
(7,Komal_Nayak,24,trivendram) 
(8,Bharathi_Nambiayar,24,Chennai)

标签： Apache Pig Apache Pig教程

你还没有登录，请先登录或注册！

还没有人评论，欢迎说说您的想法！