Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A story of a large loop with a long instruction dependency chain (johnnysswlab.com)
39 points by signa11 on March 1, 2024 | hide | past | favorite | 4 comments


Since the site seems to have been hugged to death, here's an archive link: https://web.archive.org/web/20240229063944/https://johnnyssw...


At least now they're not exceeding their bandwidth cap ;)


I know when I was coding an avx2 matmul last month, having multiple dot product dependency chains operating in parallel was the single biggest thing that brought it in the same league of performance as MKL. It was like a night and day difference the first time I ran the program after doing that. Using lookaside L1 cache didn't help me very much, since it worked better to share register loads across operations.


[flagged]


The punchline is inside the article:

> We at Johnny’s Software Lab LLC are experts in performance.

Comment not intended to come across as snidey, just tickled me a bit after reading yours and clicking the archive link.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: