`

hive自定义函数-uuid

    博客分类:
  • hive
 
阅读更多

 

 

0 业务目的:

 

将oracle的存储过程业务搬迁到hive, 因此涉及不少 sql ---> hql的替换工作,基本都能实现,

hive函数不支持的oracle函数功能的就用自定义函数,遇到join的不等值连接就用mr实现,

 

现在说说 oracle中insert表中

 

insert into table f_ent_norm_statistics

select xxx  , SYS_GUID() ;

 

oracle的SYS_GUID() 是生成32位byte的uuid,hive的rand()达不到这个目的,因此需要自定义,

这里参考下 rand源码:

@Description(name = "rand",
    value = "_FUNC_([seed]) - Returns a pseudorandom number between 0 and 1")
@UDFType(deterministic = false)
@VectorizedExpressions({FuncRandNoSeed.class, FuncRand.class})
public class UDFRand extends UDF {
  private Random random;

  private final DoubleWritable result = new DoubleWritable();

  public UDFRand() {
  }

  public DoubleWritable evaluate() {
    if (random == null) {
      random = new Random();
    }
    result.set(random.nextDouble());
    return result;
  }

  public DoubleWritable evaluate(LongWritable seed) {
    if (random == null) {
      random = new Random(seed.get());
    }
    result.set(random.nextDouble());
    return result;
  }

}

 

主要是

1 random = new Random(seed.get());

2 @UDFType(deterministic = false)  如果不加入这句,那么hql跑的时候只会返回一个值。

 

 

下面是我的 udf uuid写法:

/**
 * 
 * @author zm
 * return uuid of 32bytes
 * eg: return  F18031C69D8345DEB305D4B2E796A282   like oracle SYS_GUID()
 */
@UDFType(deterministic = false)
public class SysGuidFun  extends UDF{ 

	 public Text evaluate() {
		
	   String id = UUID.randomUUID().toString();   
	   id = id.replace("-", ""); 
      
	   return new Text(id);	
	 }
	 
}

 

 

 

 

 

 

 

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics