Apache Hudi: Copy on Write(CoW) TableAs Data Engineer, we frequently encounter the tedious task of performing multiple UPSERT(update + insert) and DELETE operations in batch…Oct 6, 2023Oct 6, 2023
Supercharging Apps with Polyglot Persistence: A Simple GuideAfter working for more than 4 years on Data Intensive applications in a startup, consultancy and product-based companies. I think that the…Sep 4, 2023Sep 4, 2023
Solve Small File Problem using Apache HudiOne of the biggest pains of Data Engineers is small file problems.Aug 25, 2023Aug 25, 2023
Published inDev GeniusOptimize MERGE job in BigQueryI love BigQuery and think It is one of the best products ever made by the Google Cloud Platform.Jul 31, 20231Jul 31, 20231
Stateful transformations in Spark Streaming — Part 1 | Spark Streaming Session 3In the previous article of this series i.e. Spark Streaming in layman’s terms we have understood the following things.Feb 26, 2023Feb 26, 2023
Spark Streaming: Session 2In the first article about the spark streaming series, we have understood the following important concept.Feb 5, 2023Feb 5, 2023
Spark Streaming — Part 1A few months back, I was given a codebase that used Spark Streaming and it was written in scala. We were supposed to make major changes…Feb 5, 2023Feb 5, 2023
Published inTowards DevDisaster Recovery in Kafka ServersI recently tried onboarding disaster management for our streaming pipeline which involves Kafka, Spark Streaming and MongoDb in one of our…Dec 25, 2022Dec 25, 2022
File Formats in Big Data World — Part 1One of the most fundamental decisions to make in the Data Engineering world is to choose the proper file formats in different zones of the…Sep 12, 2022Sep 12, 2022
Demystify different compression codec in big dataWhen we are working with big data files like Parquet, ORC, Avro etc then you will mostly come across different compression codec like…Mar 22, 2022Mar 22, 2022