[Solved-2 Solutions] Self cross-join in pig is disregarded ?
What is cross
- Computes the cross product of two or more relations.
Syntax
Problem :
- If one data have like those:
- And then a cross-join is done on A, A:
Why is second A optimized out from the query?
info: pig version 0.11
== UPDATE ==
If Sort A like:
It will give a correct cross-join.
Solution 1:
- Its needed to load the data twice to achieve what you want. i.e.,
Solution 2:
- We cannot CROSS (or JOIN) a relation with itself. If wish to do this, we must create a copy of the data. In this case, we can use another LOAD statement. If we want to do this with a relation further down a pipeline, its need to duplicate it using FOREACH.
- We have several macros that we use frequently and IMPORT by default in all of my Pig scripts in case we need them. One is used for just this purpose:
The below code helps to cross-join
- Note that even though
A1
andA2
are identical, it cannot assume that the records are in the same order. But if we are doing a CROSS or JOIN, this probably doesn't matter.