It's a minor annoyance that comes up often in GitHub comments: syntax highlighting for Python console sessions. You want the input code (after the prompt) to be highlighted, but the output (which is generally just text or logs) to remain neutral. Turns out there's a syntax highlighter that does just that.
A lightweight method to interrupt (hung) Python processes after a set time using the
Python's logging functionality is very nice once you get the hang of it, but many people either disagree or don't bother to use it. Learn how to redirect other people's pesky print statements into your nice logging setup.
One of the handy features that makes (Py)Spark more flexible than database tools like Hive even for just transforming tabular data is the ease of creating User Defined Functions (UDFs). However, one thing that still remains a little annoying is that you have to separately define a function and declare it as a UDF. With four lines of code you can clean those definitions right up.