Come join us at the next event of our free technical meetup series, "Dynamic Talks", in San Francisco! Dynamic Talks is an ongoing meetup series featuring technical talks from some of the leading experts in tech in major cities around the US. Enjoy talks about the most innovative subjects in AI, ML, voice platforms, the Cloud, and search. Enjoy a night of technical talks and networking opportunities at the Microsoft Reactor space in San Francisco. We hope to see you there!
Topic: "Big Data Analytics"Agenda
[6:00 pm - 6:30 pm]: Guests arrive, food and drinks are served
[6:30 pm - 7:15 pm]: The first talk will be by Max Martynov on "Implementing data quality automation with open source stack," followed by a Q&A
[7:15 pm - 7:30 pm]: Networking break
[7:30 pm - 8:15 pm]: Egor Pakhomov will present the second talk on "Writing Spark pipelines with less boilerplate code", followed by a Q&A
[8:15 pm - 9:00 pm]: More networking, closing remarks, and the event concludes
Max Martynov talk details:
Title: "Implementing data quality automation with open source stack."
Abstract: The quality of business decisions, machine learning insights, and executive reports depend on the quality and integrity of the underlying data. There are many ways that data can get corrupted in an analytical data platform from de-synchronization with the system-of-record to defects in data pipelines. We will show how to detect and prevent data corruption with automation, open-source tools, and machine learning.
About Max Martynov:
Max Martynov is the VP of Technology at Grid Dynamics, leading High-Performance Computing practices for enterprises. Over the last decade, his technical focus evolved from HPC and scalable distributed platforms to Cloud, BigData, DevOps, and Microservices architecture. He is also an author of the book "Continuous Delivery Blueprint: Software change management for enterprises in the era of cloud, microservices, DevOps, and automation", which is now available on Amazon.
Egor Pakhomov talk details:
Title: "Writing Spark pipelines with less boilerplate code"
Abstract: Apache Spark is a general-purpose big data execution engine. You can work with different data sources with the same set of API in both batch and streaming mode. Such flexibility is great if you are experienced Spark developer solving a complicated data engineering problem, which might include ML or streaming. In Airbnb, 95% of all data pipelines are daily batch jobs, which read from Hive tables and write to Hive tables. For such jobs, you would like to trade some flexibility for more extensive functionality around writing to Hive or multiple days processing orchestration. Another advantage of reducing flexibility is creating "best practices", which can be followed by less experienced data engineers. In Airbnb, we've created a framework called "Sputnik," which tries to address these issues. In this talk, I'll show the typical boilerplate code, which Sputnik tries to reduce and concepts it introduces to simplify pipeline development.
About Egor Pakhomov:
Egor is a Spark contributor and Senior Software Engineer in AirBnB, where he works on infrastructure to simplify creating and managing Spark pipelines. Before joining Airbnb, he worked in Apple on configurable, high-load streaming and batch pipelines. Egor led the development team in Anchorfree, responsible for a solution on top of Hadoop, which enabled effective data engineering. This solution included in-house DSL for defining DAGs of Spark jobs, Apache Zeppelin, Impala, Tableau. Egor has been working with Apache Spark since version 0.9.
Parking is available in the Moscone parking garage at 255 3rd Street, San Francisco, CA 94103, or street parking. Alternatively, use public transport when possible.
|Ticket Information ||Ticket Price |
| RSVP || Free |