Skip to main content

Posts

Showing posts from August, 2021

Optimizing HIVE -BIG DATA

 a. Replacing MR with Tez:-Tez offer API to handle petabytes of data from  across clusters. b. Follow ORC file format for great performance. c. Partiting:-Right partitioning based on  logical requirement of data. d. Bucketing:-Next level of partitioning is more useful when lots of other requirments comes into picture to use same data. Bucketing allow user to use data as per requirement. e.Vectorization:-One must do the right logical reading by scan,Agg,Filter,Join. f.CBO-Make sure you do analysis of resource usage before finilizing your parallelism.Check Cost of each process and compare it. g.Indexing:- Make sure your tables are indexed.