COSC 416 - Special Topics in Databases
Assignment 3 -  Apache Pig

In this assignment we will experiment with Apache Pig and Pig Latin to simplify executing Hadoop MapReduce programs. We will do all the questions from lab 2 using Pig and compare the difficulty with regular MapReduce programs. We will also write some larger queries with Pig.

Sample outputs are below:

Tutorial

To use Pig, login using SSH to gpu1.ddl.ok.ubc.ca with your Novell account. Type pig to get a shell then enter Pig commands.

Task #1 (2 marks) - List all Games

Write a Pig script that will list all the game records. The data set is available at /user/rlawrenc/416/lab2/small/games.txt. There should be 100 records printed.

Task #2 (3 marks) - Find a Game by its Id

Write a Pig script that given some game id (hard-coded constant) will return the game record if found. The data set is available at /user/rlawrenc/416/lab2/small/games.txt.

Task #3 (10 marks) - Players over 18

Write a Pig script that lists only the players over 18. The output should be sorted by age ascending. The data set is available at /user/rlawrenc/416/lab2/small/players.txt. Note that this question is harder than the rest as it requires defining a UDF. It is best to leave this question until last.

Task #4 (5 marks) - Number of Players per Game

Write a Pig script that will calculate the number of players per game. The output does not have to be sorted. The data set is available at /user/rlawrenc/416/lab2/small/player_games.txt.

Task #5 (5 marks) - Given a Game Id - List the Top 10 Scores for the Game

Write a Pig script that will output the top 10 scores in descending order for a given game id. The data set is available at /user/rlawrenc/416/lab2/small/player_games.txt.

Task #6 (5 marks) - Players in Common

Write a Pig script that will output pairs of game ids and the number of players they have in common. For instance, if game X and game Y have 2,000 players in common (play both games), then output X, Y, 2000. The data does not have to be sorted. The data set is available at /user/rlawrenc/416/lab2/small/player_games.txt.

Task #7 (5 marks) - Count Top Players for Each Game

Write a Pig script that will list all games along with the count of the number of players in each game with a score over 98,000. If a game does not have a player with a score over 98,000, it should still appear in the output with a count of 0.

Task #8 (5 marks) - List All Players with Certain Properties

Write a Pig script that will list all players that either have a score in some game over 90,000 or play a game published by 'Electronic Arts'.

Task #9 (5 marks) - Most Players of Each Gender for a Game for each Publisher

Write a Pig script that for each publisher will list two records. The first will be the publisher id, "female", and the maximum number of women that play one of its games. The second row will be the publisher id, "male", and the maximum number of men that play one of its games. Note that the games may be different.

Task #10 (5 marks) - Show Game Breakdown by Gender

For each game, show the total number of players, the total number of female and male players, and the percentage of female and male players.

Submission

Submit a single Pig script file with answers to all questions using Connect or by email. You can demonstrate your work at any time for feedback and marking.


*Home