Uber has become a widely recognized platform for connecting riders with drivers, but its success goes beyond just being a transportation service. Behind the scenes, Uber is a data and analytics powerhouse, leveraging data-driven decisions to optimize its operations. At the core of Uber’s analytics is Presto, an open-source SQL query engine that plays a critical role in driving their success.
Uber operates in more than 10,000 cities worldwide, processing over 18 million trips daily and dealing with vast amounts of data. To handle this massive scale, Uber chose Presto, developed by Facebook, for its high scalability and ability to separate analytical compute from data storage. Presto’s cost-based query optimization and extensibility make it a versatile tool for Uber’s analytics arsenal.
Uber initially used a traditional analytical database platform, but as their business grew, they needed a solution that could handle large amounts of data and provide near-real-time engagement. Presto’s distributed platform and commitment to ANSI-SQL made it the ideal choice for Uber, enabling them to process queries faster and improve performance compared to their previous system.
As Uber’s use of Presto continued to grow, they actively participated in the Presto open-source community and made significant contributions. They automated cluster management, improved workload management, and enhanced security features. Uber also leverages Presto’s advanced analytical capabilities for different data types, uses out-of-the-box and custom functions, and supports real-time queries in collaboration with Apache Pinot.
Thanks to Presto, Uber benefits from speed and scalability, self-service analytics, data exploration and innovation, operational efficiency, federated data access, and real-time analytics. Presto has become an integral part of Uber’s data ecosystem, allowing them to process petabytes of data, support diverse analytical use cases, and make informed decisions at an unprecedented scale.
If you’re interested in exploring Presto, you can visit the Presto website to get started. IBM Watsonx.data, a Presto-based open data lakehouse, is also recommended for production use, offering querying, governance, and open data formats to access and share data.
FAQ
What is Presto?
Presto is an open-source distributed SQL query engine designed for data analytics. It can handle datasets of all sizes, from gigabytes to petabytes, and supports a wide range of analytical use cases. Presto separates analytical processing from data storage, allowing for scalability and efficient query optimization.
Why did Uber choose Presto?
Uber chose Presto for its scalability, speed, and support for ANSI-SQL. As Uber’s data needs grew, Presto enabled them to handle massive amounts of data and process queries quickly. Presto’s distributed architecture and advanced features made it a suitable choice for Uber’s analytics requirements.
How does Presto contribute to Uber’s success?
Presto plays a vital role in Uber’s data-driven success by providing speed and scalability, enabling self-service analytics, facilitating data exploration and innovation, optimizing operational efficiency, supporting federated data access, and allowing for real-time analytics. Presto empowers Uber to process large volumes of data, make informed decisions, and continuously improve their services.
How can I get started with Presto?
If you’re new to Presto, you can visit the Presto website and utilize their “Getting Started” resources to try it out. Alternatively, if you’re ready to use Presto in production, you can explore IBM Watsonx.data, an open data lakehouse built on Presto with additional querying, governance, and open data formats.
Is Presto only used by Uber?
No, Presto is an open-source project used by many organizations worldwide. While Uber is a notable user of Presto, other companies across industries leverage Presto’s capabilities for their data analytics needs.
Summary
Uber’s success as a data-driven company relies on the power of Presto, an open-source SQL query engine. Presto enables Uber to process large volumes of data, support various analytical use cases, and make informed decisions at an unprecedented scale. Uber’s contributions to the Presto open-source community have further enhanced the capabilities of Presto for organizations worldwide. If you’re interested in using Presto, you can explore the Presto website or consider IBM Watsonx.data for production use.