WHUG #36: „Big Data on Kubernetes” and „Apache Beam – what do I gain?”

Kiedy:
25 październik 2018@18:00 – 20:00
2018-10-25T18:00:00+02:00
2018-10-25T20:00:00+02:00
Gdzie:
Sandomierska 13
Warsaw
Poland

Lokal na Mokotowie
Sandomierska 13 · Warszawa

We are happy to invite you on the 36th meetup of WHUG. We will host two guests: Maciej Bryński and Łukasz Gajowy. Please notice that, we are going to meet in a pub „Lokal na Mokotowie” (http://localnamokotowie.pl/) at Sandomierska 13, Warsaw (entrance from Rejtana)!
Please check details about meetup below.

1) Title: Big Data on Kubernetes

Abstract:
In my presentation I’d like to show how different Big Data technologies can be used on Kubernetes cluster.
These include: Kafka, HDFS, Spark and Flink.
I will tell about my experience and problems that need to be solved. I will also present how to use Kubernetes with existing Hadoop infrastucture.

BIO:
Big Data Architect at XCaliber Poland. Apache Spark Instructor and Contributor. Big fan of other Big Data technologies including Apache Flink, Kafka, Cassandra and many more. In addition to designing Big Data systems and processes on a daily basis, he also possesses hands-on expertise with tools needed to create those systems from scratch.

2) Title: Apache Beam – what do I gain?

Abstract:
The Dataflow model known from Flink and Google Cloud offers an “approach shift” when dealing with data. We no longer treat Stream as a special case of Batch and try to fit it in finite chunks – we use a well-designed Unified Model to implement both Batch and Stream scenarios in a consistent manner.
“But I want to use Spark so this is not for me…” Try Apache Beam. It also implements the Dataflow model but (and this is new) it abstracts from any data processing backend. What if you could use this Unified Model once and run it on a runner of your choice?
“But we only do Python!” Have you tried Beam’s multiple sdks (Java, Python, Go, Scala)? Beam (once it gets there) will be portable on every runner with every sdk that a developer has used.
Choose your language, write code once, run on any backend you want. Those are the goals the project aims to achieve. I’ll go through the basics of the Dataflow model. I’ll talk about Beam in more detail and familiarize you with the current state of the project. If there’s time, I’ll also try to briefly show the current most important efforts in the project (such as portability).

BIO:
Łukasz Gajowy is an engineer interested in distributed processing and Open-source software development. He got into both topics so badly that recently he’s become an Apache Beam committer. Other than that Łukasz works at Polidea, received an MSc in Information Technology at Warsaw University of Technology, has 6 years of professional experience (mostly in JVM areas) and enjoys jogging in his free time.

https://www.meetup.com/warsaw-hug/events/255227113/