Author Topic: Why Uber created Hudi, an open source incremental processing ...  (Read 45 times)

YELLO

  • Hero Member
  • *****
  • Posts: 430
    • View Profile
the process of rebuilding its Big Data platform, Uber created an open-source Spark library named Hadoop Upserts anD Incremental (Hudi). This library permits users to perform operations such as update, insert, and delete on existing Parquet data in Hadoop. It also allows data users to incrementally pull only the changed data, which significantly improves query efficiency. It is horizontally scalable, can be used from any Spark job, and the best part is that it only relies on HDFS to operate.

Why is Hudi introduced?
Uber studied its current data content, data access patterns, and user-specific requirements to identify problem areas. This research revealed the following four limitations:

Scalability limitation in HDFS

Need for faster data delivery in Hadoop

No direct support for updates and deletes for existing data

Faster ETL and modeling

To know about Hudi in detail, check out Uberís official announcement.


https://www.google.com/url?sa=t&source=web&rct=j&url=https://eng.uber.com/uber-big-data-platform/&ved=2ahUKEwjFnfPj6ZLeAhUOuVkKHd_YAnIQFjAAegQIBhAB&usg=AOvVaw396zZMHCUSVk1SI50gzv9C&cshid=1539963162255

https://www.google.com/url?sa=t&source=web&rct=j&url=https://hub.packtpub.com/why-uber-created-hudi-an-open-source-incremental-processing-framework-on-apache-hadoop/amp/&ved=2ahUKEwjX6qev6JLeAhWM1lkKHalODyUQFjAAegQIARAB&usg=AOvVaw0k0pUlq1F0Dz4CG_AwsS1a&ampcf=1
« Last Edit: October 19, 2018, 11:33:15 AM by YELLO »