Performance Mangement

Optimizing HIVE -BIG DATA

a. Replacing MR with Tez:-Tez offer API to handle petabytes of data from across clusters. b. Follow ORC file format for great performance. c. Partiting:-Right partitioning based on logical requirement of data. d. Bucketing:-Next level of partitioning is more useful when lots of other requirments comes into picture to use same data. Bucketing allow user to use data as per requirement. e.Vectorization:-One must do the right logical reading by scan,Agg,Filter,Join. f.CBO-Make sure you do analysis of resource usage before finilizing your parallelism.Check Cost of each process and compare it. g.Indexing:- Make sure your tables are indexed.

Performance Mangement

Search This Blog

Posts

Optimizing HIVE -BIG DATA