[Solved-1 Solution] Java udf for adding columns in pig ?



What is UDF ?

  • Pig provides extensive support for user defined functions (UDFs) as a way to specify custom processing.
  • Pig UDFs can currently be implemented in three languages: Java, Python, and JavaScript.
  • The most extensive support is provided for Java functions. These functions are new, still evolving, additions to the system.

Problem:

  • An example of writing java udf function to add the pincode by comparing the locality column.

Here is the code.

  import java.io.IOException;
  import org.apache.pig.EvalFunc; 
  import org.apache.pig.data.Tuple;
  import org.apache.commons.lang3.StringUtils;
  public class MB_pincodechennai extends EvalFunc<String>
  {
    private String pincode(String input)
    {
      String property_pincode = null;
      String[] items = new String[]{"600088", "600016", "600053", "600070", "600040", "600106", "632301", "600109", "600083", "600054", "600023", "600095", "600077", "600073", "600003", "603001", "600064", "600094", "600044", "600008",
      };

      for (String itm : items)
      {
        if (StringUtils.containsIgnoreCase(input, itm))
        {
          property_pincode = itm;
          break;
        }
      }
      return property_pincode;
    }

    public String exec(Tuple input) throws IOException
    {
      if (input == null || input.size() == 0)
        return null;
      try
      {
        String str = (String) input.get(0);
        return pincode(str);
      }
      catch (Exception e)
      {
        return null;
      }
    }
  }
  • when we run the above it prints blank values only.This code having some mistake.what is it?

Solution 1:

  • If we change the following to return "Invalid Input". then we will get Invalid Input in Pig Console.
catch (Exception e)
{
return null;   // Change this to return "Invalid Input"
}

Reason :

  • Issue is you are trying to pass pincode=600073 (i.e.Integer) from Pig Script.And you are reading it as String in Java UDF. This casting wont work.
MB_pincodechennai(pincode) 
  • pincode is passed as integer.

For this Issue, we have 2 methods ;

  • Either we can have pincode field as String instead of int in pig script.
  • we can or else parse from Integer to String in Java end before doing the match.
String str = Integer.toString(input);

Related Searches to Java udf for adding columns in pig