Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Compared to pandas? No, it's not.

I'm not saying you shouldn't use pandas, it depends on the size of the data. I'm working right now on a project where a SELECT * of the entire fact table would be a couple hundred gigabytes.

The flow is SQL -> pandas -> manipulation, and as always in pipelines like those, the most work you can do at the earliest stage, the better.



Yeah, speaking as a data person, the SQL argument is correct. Python/R are much, much, much slower for this kind of work.

OTOH, SQL is super limiting for a lot of data analysis tasks and you'll inevitably need the data in weird forms that require lots of munging.

Personally, I'm a big fan of using SQL/Airflow/whatever to generate whatever data I'll need all the time at a high level of granularity (user/action etc), and then just run a (very quick) SQL query to get whatever you need into your analytics environment.

Gives you the best of both worlds, IME.


> OTOH, SQL is super limiting for a lot of data analysis tasks and you'll inevitably need the data in weird forms that require lots of munging.

OTGH, to some (I suspect often rather large) extent, that's because most people (I suspect including many "data scientists") are pretty bad at doing their data munging in SQL.


Perhaps (although this tends to be a core skill for any experienced DS - I 100% would not hire anyone without basic SQL, as it's just setting them up for failure). SQL is the only tool I've used everywhere I worked.

But, SQL is bad at lots of stuff. For example, if you need to compare id1 and id2 (with some kind of locality sensitive hash or something). This requires a full join, which is prohibitive in any large data environment.

And honestly, writing 50 case when statements when I could write a function in R or Python is not my idea of a good time.

I love SQL, and do a lot with it, but there are definitely cases where it's the wrong tool. As an example, retention analyses are really annoying to do in SQL, but pretty easy in R/Python (to be fair, the article actually provides a solution here, but it's not standard).




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: