Open in app

Sign In

Write

Sign In

Ankur Ranjan
Ankur Ranjan

31 Followers

Home

About

Feb 26

Stateful transformations in Spark Streaming — Part 1 | Spark Streaming Session 3

In the previous article of this series i.e. Spark Streaming in layman’s terms we have understood the following things. Different streaming sources Stateful vs Stateless transformation For those who are reading this article without reading the previous article of this series, I recommend reading the last article or watching the…

Spark

6 min read

Stateful transformations in Spark Streaming — Part 1 | Spark Streaming Session 3
Stateful transformations in Spark Streaming — Part 1 | Spark Streaming Session 3
Spark

6 min read


Feb 5

Spark Streaming: Session 2

In the first article about the spark streaming series, we have understood the following important concept. We have also written word count code to understand these concepts. For those who are reading this article without reading the first article of this series, I recommend reading the previous article or watching…

4 min read

Spark Streaming: Session 2
Spark Streaming: Session 2

4 min read


Feb 5

Spark Streaming — Part 1

A few months back, I was given a codebase that used Spark Streaming and it was written in scala. We were supposed to make major changes and go for the new version of the project. I had worked on Spark Structure Streaming before but not in Spark Streaming. I found…

6 min read

Spark Streaming — Part 1
Spark Streaming — Part 1

6 min read


Published in Towards Dev

·Dec 25, 2022

Disaster Recovery in Kafka Servers

I recently tried onboarding disaster management for our streaming pipeline which involves Kafka, Spark Streaming and MongoDb in one of our use cases at Walmart Global Tech The last few weeks were a good learning curve for me and I really enjoyed all these awesome implementations of the streaming pipeline…

Kafka

8 min read

Disaster Recovery in Kafka Servers
Disaster Recovery in Kafka Servers
Kafka

8 min read


Sep 12, 2022

File Formats in Big Data World — Part 1

One of the most fundamental decisions to make in the Data Engineering world is to choose the proper file formats in different zones of the Big Data Pipeline. It helps the team to fetch the data faster and lower the cost of the project. …

Data Engineering

7 min read

Data Engineering

7 min read


Mar 22, 2022

Demystify different compression codec in big data

When we are working with big data files like Parquet, ORC, Avro etc then you will mostly come across different compression codec like snappy, lzo, gzip, bzip2 etc. In this article, we will try to understand some of these compression codecs and discuss basic fundamental differences between them. Before starting…

Big

4 min read

Big

4 min read


Mar 21, 2022

LeetCode Curated SQL Solutions and Discussion — Week 1

SQL is a must when you are in the domain of Data. Let it be Data Engineering, Big Data, Data Analyst or BI Developer, everyone who is working with Data should have a good understanding of SQL. I feel that reading SQL theoretically is not gonna help that much. So…

Leetcode

6 min read

LeetCode Curated SQL Solutions and Discussion — Week 1
LeetCode Curated SQL Solutions and Discussion — Week 1
Leetcode

6 min read


Apr 1, 2021

Apache Sqoop - One smart tool for Big Data World.

When we talk about the big data world then there are always three things involved and they are storage, processing & scalability. Here we always like to store a massive amount of data, process it efficiently in a given amount of time & above all design systems that are highly…

Sqoop

8 min read

Apache Sqoop — One smart tool for Big Data World.
Apache Sqoop — One smart tool for Big Data World.
Sqoop

8 min read


Mar 25, 2021

I think there is one mistake in the definition of Executors i.e.

I think there is one mistake in the definition of Executors i.e. `Executor in Spark are worker nodes`. Actually, Executors are not worker nodes instead they are the worker nodes processes. And I must agree this is really very well written.

1 min read

1 min read


Jul 28, 2020

Apache Airflow — Part 1

Every programmer loves automating their stuff. Learning and using any automation tool is fun for us. A few months ago, I came across a wonderful open source project called apache-airflow. I have tried to discover this open source project and use it in my existing codebase. This blog series is…

Airflow

5 min read

Apache Airflow — Part 1
Apache Airflow — Part 1
Airflow

5 min read

Ankur Ranjan

Ankur Ranjan

31 Followers

Data Engineer III @Walmart

Following
  • Adrian

    Adrian

  • 💡Mike Shakhomirov

    💡Mike Shakhomirov

  • Rajat Dangi 🛠️

    Rajat Dangi 🛠️

  • Maya Shavin

    Maya Shavin

  • Nikita Chaudhary

    Nikita Chaudhary

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech