[Solved-1 Solution] Group key value of map in pig ?
What is group by ?
- The
GroupByKey
core transform is a parallel reduction operation used to process collections of key/value pairs. - We use
GroupByKey
with an inputPCollection
of key/value pairs that represents a multimap, where the collection contains multiple pairs that have the same key, but different values. - The
GroupByKey
transform lets you gather together all of the values in the multimap that share the same key.
Problem :
Here we have a file
Pig script
We know that we can take the values feeding in the key. In the above example we took the map that contains the values with respect to the key "a". Assuming that we don’t know the key, we need to group the values with respect to keys in a relation and dump it.
Does pig allow such operations or need to go with UDF ?
Solution 1:
- We can create a custom UDF which converts the map to a bag (using Pig v0.10.0):
Then use the below code
- Now group by key and use a nested foreach: