Pigのインストール
Pigパッケージの配置
$ tar zxvf pig-0.8.1.tar.gz $ sudo mv pig-0.8.1 /usr/local/ $ sudo chown -R hadoop:hadoop /usr/local/pig-0.8.1 $ sudo ln -s /usr/local/pig-0.8.1 /usr/local/pig
Pig環境設定
$ export PIG_HOME=/usr/local/pig $ sudo -e /etc/bashrc ... export PIG_HOME=/usr/local/pig #追記 $ export PATH=$PATH:$PIG_HOME/bin $ sudo -e /etc/bashrc ... export PATH=$PATH:$PIG_HOME/bin #追記
$ sudo -e /usr/local/pig/conf/pig-env.sh $ cat /usr/local/pig/conf/pig-env.sh PIG_CLASSPATH=$HADOOP_HOME/conf $ sudo chown hadoop:hadoop /usr/local/pig/conf/pig-env.sh
PigシェルによるHadoopクラスタ接続確認
$ sudo su hadoop $ pig 2011-04-26 10:54:25,548 [main] INFO org.apache.pig.Main - Logging error messages to: /home/hadoop/pig_1303782865545.log 2011-04-26 10:54:25,934 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:54310 2011-04-26 10:54:26,124 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:54311 grunt> ls / hdfs://localhost:54310/hadoop <dir> hdfs://localhost:54310/tmp <dir> hdfs://localhost:54310/user <dir>
Pigサンプル
Pigのインストール - osacaz4の日記にあるサンプルを動かしてみた
grunt> ls input hdfs://localhost:54310/user/hadoop/input/capacity-scheduler.xml<r 3> 3936 hdfs://localhost:54310/user/hadoop/input/core-site.xml<r 3> 390 hdfs://localhost:54310/user/hadoop/input/hadoop-policy.xml<r 3> 4190 hdfs://localhost:54310/user/hadoop/input/hdfs-site.xml<r 3> 407 hdfs://localhost:54310/user/hadoop/input/mapred-site.xml<r 3> 404 grunt> A = LOAD 'input'; grunt> B = FILTER A BY $0 MATCHES '.*dfs[a-z.]+.*'; grunt> DUMP B; ... ( dfsadmin and mradmin commands to refresh the security policy in-effect. ) ( <name>dfs.name.dir</name>) ( <name>dfs.data.dir</name>)