[Solved-2 Solutions] Latin pig bag to tuple after group by ?
- A bag is a collection of tuples.
- A tuple is an ordered set of fields.
- A field is a piece of data.
- A Pig relation is a bag of tuples. A Pig relation is similar to a table in a relational database, where the tuples in the bag correspond to the rows in a table.
- Unlike a relational table, however, Pig relations don't require that every tuple contain the same number of fields or that the fields in the same position (column) have the same type.
What is GROUP BY
- The GROUP operator is used to group the data in one or more relations. It collects the data having the same key.
Syntax
- The syntax of the group operator.
Problem:
We have the following data with schema (t0: chararray,t1: int,t2: int)
We need to generate the following results like this: (group by t0, and ordered by t1)
Note that we need only tuples in the second component, not bags.
Solution 1:
- We can use the below code to tuple after GROUP BY
Learn Apache pig - Apache pig tutorial - Group By Operator in pig - Apache pig examples - Apache pig programs
Solution 2:
We can use the below code.
- Tuples should only be used when we know the exact number and position of the fields in the tuple.
- Otherwise then your schema will not be defined and it will be very difficult in order to access the fields. This is because the entire tuple will be treated as a bytearray, and so we will manually have to find and cast everything.