Sitemap

Member-only story

Spark: Key Topics for Data Engineering Interviews Part - 2

Key Topics for Data Engineering Interviews

Pravash
5 min readApr 29, 2025

In this continuation (Key Topics for Data Engineering Interviews Part — 1), I will explore some more crucial concepts that not only illuminate the inner workings of Spark but also serve as key markers in Spark interviews

Whether you’re gearing up for a technical discussion or simply looking to deepen your understanding, this exploration promises to be a rewarding endeavor into the core of Spark’s essence.

Lets get started —

1️⃣1️⃣ Window Functions vs. Group By

groupBy:

  • Functionality: Aggregates multiple rows into a single result per group.
  • When to Use: Use GROUP BY when you need pure aggregation — like getting totals, averages, countswithout needing details about individual rows.
  • Performance: Efficient for simple aggregations over grouped datasets.
    However, shuffle still happens, especially with large groups.
  • Example:
    avg_salary_df = df.groupBy(“dept_id”).agg(avg(“salary”).alias(“avg_salary”))

window:

  • Functionality: Adds aggregated information to each row without collapsing rows.

--

--

Pravash
Pravash

Written by Pravash

I am a passionate Data Engineer and Technology Enthusiast. Here I am using this platform to share my knowledge and experience on tech stacks.

No responses yet