[Solved-1 Solution] Pig Changing Schema to required type ?
Problem:
We have an existing schema which we need to modify. My source data is as follows with 6 columns:
Each Op value is always C, T or X. If we want to split my data in the following way into 7 columns:
Basically split the Op column into 3 columns: each for one Op value. Each of these columns should contain appropriate value from column Value. How can we do this in Pig ?
Solution 1:
One way to achieve the desired result:
- If we want to skip order by which adds an additional reduce phase to the computation, we can prefix each value with its corresponding op in tuple v.
- Then sort the tuple fields by using a custom UDF to have the desired OpX, OpC, OpT order:
Below is the tuplearrange code: