Ankur RanjanApache Hudi: Copy on Write(CoW) TableAs Data Engineer, we frequently encounter the tedious task of performing multiple UPSERT(update + insert) and DELETE operations in batch…Oct 6, 2023Oct 6, 2023
Ankur RanjanSupercharging Apps with Polyglot Persistence: A Simple GuideAfter working for more than 4 years on Data Intensive applications in a startup, consultancy and product-based companies. I think that the…Sep 4, 2023Sep 4, 2023
Ankur RanjanSolve Small File Problem using Apache HudiOne of the biggest pains of Data Engineers is small file problems.Aug 25, 2023Aug 25, 2023
Ankur RanjaninDev GeniusOptimize MERGE job in BigQueryI love BigQuery and think It is one of the best products ever made by the Google Cloud Platform.Jul 31, 20231Jul 31, 20231
Ankur RanjanStateful transformations in Spark Streaming — Part 1 | Spark Streaming Session 3In the previous article of this series i.e. Spark Streaming in layman’s terms we have understood the following things.Feb 26, 2023Feb 26, 2023
Ankur RanjanSpark Streaming: Session 2In the first article about the spark streaming series, we have understood the following important concept.Feb 5, 2023Feb 5, 2023
Ankur RanjanSpark Streaming — Part 1A few months back, I was given a codebase that used Spark Streaming and it was written in scala. We were supposed to make major changes…Feb 5, 2023Feb 5, 2023
Ankur RanjaninTowards DevDisaster Recovery in Kafka ServersI recently tried onboarding disaster management for our streaming pipeline which involves Kafka, Spark Streaming and MongoDb in one of our…Dec 25, 2022Dec 25, 2022
Ankur RanjanFile Formats in Big Data World — Part 1One of the most fundamental decisions to make in the Data Engineering world is to choose the proper file formats in different zones of the…Sep 12, 2022Sep 12, 2022
Ankur RanjanDemystify different compression codec in big dataWhen we are working with big data files like Parquet, ORC, Avro etc then you will mostly come across different compression codec like…Mar 22, 2022Mar 22, 2022