Apache Hudi: Copy on Write(CoW) TableAs Data Engineer, we frequently encounter the tedious task of performing multiple UPSERT(update + insert) and DELETE operations in batch…Oct 6, 2023Oct 6, 2023
Supercharging Apps with Polyglot Persistence: A Simple GuideAfter working for more than 4 years on Data Intensive applications in a startup, consultancy and product-based companies. I think that the…Sep 4, 2023Sep 4, 2023
Solve Small File Problem using Apache HudiOne of the biggest pains of Data Engineers is small file problems.Aug 25, 2023Aug 25, 2023
Published inDev GeniusOptimize MERGE job in BigQueryI love BigQuery and think It is one of the best products ever made by the Google Cloud Platform.Jul 31, 20231Jul 31, 20231
Stateful transformations in Spark Streaming — Part 1 | Spark Streaming Session 3In the previous article of this series i.e. Spark Streaming in layman’s terms we have understood the following things.Feb 26, 2023Feb 26, 2023
Spark Streaming: Session 2In the first article about the spark streaming series, we have understood the following important concept.Feb 5, 2023Feb 5, 2023
Spark Streaming — Part 1A few months back, I was given a codebase that used Spark Streaming and it was written in scala. We were supposed to make major changes…Feb 5, 2023Feb 5, 2023
Published inTowards DevDisaster Recovery in Kafka ServersI recently tried onboarding disaster management for our streaming pipeline which involves Kafka, Spark Streaming and MongoDb in one of our…Dec 25, 2022Dec 25, 2022
File Formats in Big Data World — Part 1One of the most fundamental decisions to make in the Data Engineering world is to choose the proper file formats in different zones of the…Sep 12, 2022Sep 12, 2022
Demystify different compression codec in big dataWhen we are working with big data files like Parquet, ORC, Avro etc then you will mostly come across different compression codec like…Mar 22, 2022Mar 22, 2022
LeetCode Curated SQL Solutions and Discussion — Week 1SQL is a must when you are in the domain of Data. Let it be Data Engineering, Big Data, Data Analyst or BI Developer, everyone who is…Mar 21, 2022Mar 21, 2022
Published inAnalytics VidhyaApache Sqoop — One smart tool for Big Data World.When we talk about the big data world then there are always three things involved and they are storage, processing & scalability. Here we…Apr 1, 2021Apr 1, 2021
I think there is one mistake in the definition of Executors i.e.And I must agree this is really very well written.Mar 25, 20211Mar 25, 20211
Published inAnalytics VidhyaApache Airflow — Part 1Every programmer loves automating their stuff. Learning and using any automation tool is fun for us. A few months ago, I came across a…Jul 28, 2020Jul 28, 2020
Published inAnalytics VidhyaPart 2- A Beginners Guide to Time profiling in Python.Hello folks, welcome back. If you are joining back from my last blog then pretty much context has been set about time profiling, If you…Jul 5, 2020Jul 5, 2020
Published inAnalytics VidhyaA Beginners Guide to Time profiling in Python.Writing a python code gives us great power to showcase our idea by easily programming them but like someone has rightly said “With great…Jun 27, 2020Jun 27, 2020