[Solved-1 Solution] Pig 0.11.1 - Count groups in a time range ?
What is count() ?
- The COUNT() function of Pig Latin is used to get the number of elements in a bag. While counting the number of tuples in a bag, the COUNT() function ignores (will not count) the tuples having a NULL value in the FIRST FIELD.
Problem
We have a dataset, A, that has timestamp, visitor, URL:
We need to measure number of visits per user per URL in a time window of say, 10 minutes, but as a rolling window that increments by the minute. Output would be:
To make the arithmetic easy, we can change the timestamp to minute of the day, as:
To iterate over 'A' by a moving time window, We create a dataset B of minutes in the day:
We want to do something like:
"GROUP" isn't allowed inside a "FOREACH" loop but is there a workaround to achieve the same result ?
Solution 1:
The below code helps to count groups in a time
myscript.pig