sum("a", 0, 1) → column is named a_sum
average(a", 0, 1) → column is named a_average
count_distinct("a") → column is named… you guessed it, count_distinct(a) 🙃
Honestly though, the count_distinct syntax is much nicer (and more consistent with PySpark). @Maegereg @tmager what would you think of using the parenthesis syntax by default everywhere?