Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am a pandas user considering refactoring part of my ETL pipeline to SQL. I see the trade off as memory efficiency vs expressiveness, and for simple queries on big data, SQL wins. Would you disagree that Pandas/Python is more expressive than SQL? I’m less experienced in SQL but based on my limited experience there, it seems Pandas is clearly more expressive. What is the SQL equivalent of Pandas .apply(lambda x) ?


ClickHouse has lambdas for arrays. They are very useful. Here's an example.

  WITH ['a', 'bc', 'def', 'g'] AS array
  SELECT arrayFilter(v -> (length(v) > 1), array) AS filtered
  
  ┌─filtered─────┐
  │ ['bc','def'] │
  └──────────────┘
The lambda in this case is a selector for strings with more than one character. I would not argue that they are as general as Pandas, but they are might useful. More examples from the following article.

https://altinity.com/blog/harnessing-the-power-of-clickhouse...


Please define an example lambda function, in order to see whether there is an sql equivalent. Imo >90% of the data issues i have seen, can be solved with sql queries. I have seen some in the finance sector which would require super complex udfs, but other than these, sql is the first choise to solve a data issue.


I do work in finance sector and need complex functions :)

And it’s interesting that pandas was invented at a hedge fund, AQR.

I agree that SQL may be better for vanilla BI.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: