Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Again, the kernel will use a simple LRU algorithm, where the granularity is the page.

I don't think it's accurate to describe any performance critical part of the Linux kernel as "simple." For an overview of the page replacement policy, see http://kerneltrap.org/node/7608. I wondered if CLOCK-Pro [1, 2] had made it into the kernel yet, but it looks like it has not.

This author makes compelling arguments for implementing application level paging. But the nice thing about doing systems work is we never have to rely on arguments alone to evaluate something - show me numbers.

[1] http://linux-mm.org/ClockProApproximation

[2] http://www.cse.ohio-state.edu/~fchen/paper/papers/usenix05.p...



There was no mention of Linux in the article. This system also runs on the various *BSDs, Solaris, MacOS...


Then it's worth noting that CLOCK-Pro is used in NetBSD.

My point here is that a lot of people have spent a lot of time working on the page replacement problem. I am very open to the idea that an application can beat the page replacement policy in the underlying kernel, for a variety of reasons. But: numbers. Always evaluate. The implementation of these algorithms in practice is always more subtle than our high level understanding, so we need to do real performance comparisons to know if we actually improved anything.

(I'm getting my information from the algorithm's author's page: http://www.ece.eng.wayne.edu/~sjiang/ Jiang graduated from William and Mary while I was a young grad student there.)


I'm a bit torn on this. I realize that you need to benchmark to really prove anything, and that optimization intuitions are often wrong, but on the other hand, you really need a wide variety of world benchmarks to approach any form of real world "proof". In the early implementation stages, I think clear thinking such as provided in this article is probably more important than spending 50 times as long setting up an array of benchmarks. Hopefully Salvatore has done some micro-benchmarks during development to guide his thinking, but even if not, it's hard to refute the thinking in this article.

If you think about the problem space of Redis vs Varnish, it's intuitively obvious that Varnish deals with a wide variety of general data without many opportunities to optimize beyond general purpose algorithms such as an OS provides. Whereas Redis has specific data types often with small footprints, and very careful attention paid to the details of optimization for memory and disk usage.


I think you're overstating the time it takes to come up with benchmarks to evaluate performance optimizations. Even micro-benchmarks tailored to showcase your performance under ideal circumstances are a start. The longer you go without doing any performance comparisons, the longer you go without knowing if your work was worth it.

I'm not trying to deride his work - it's a neat project, and I will probably read through his earlier entries more. I'm down with all of the reasons provided, but I recognize that as humans, we tend to believe in things we understand. Hence, we need to evaluate.


Perhaps I am, but why are we assuming he hasn't done any benchmarks? It seems quite likely that his opinion is informed by actual results. Does he need to draw up graphs and spend more time on a blog post to be taken seriously?


Well, yes. I take him "seriously," but I'm not yet convinced his techniques outperform the kernel. That's how systems work is done. If you want to convince people that your way is better, then you need data to back it up.


You can't take any of his arguments at face value? Like the blocking argument?


Remember that I started this discussion off by saying "This author makes compelling arguments for implementing application level paging." I'm down with his arguments - people have actually made them before, and I thought they were good arguments then, too. But I can't say "Yes, I agree, what you have done improves performance" until I actually see performance comparisons.

But that's only the first level point. The second level point is: are the optimizations worth it? That is, if you only improve performance by less than 1%, then it's probably not worth the hassle. These are the sorts of things that experiments can tell you.

If it sounds like I'm being pedantic: well, yes, I am. I do systems research. This is the same standard I hold myself and my peers to. If someone asked me to review a systems paper that claimed to improve something, but had no results, I'd reject it. I recognize this is a blog post and not an academic paper, but my standard for "do I accept that this is a better approach" has not changed. And I have seen plenty of blog posts with experiments.


No, performance of complex systems is really really hard to predict.


Yes, but again, why do we assume Salvatore is just pulling this stuff out of his ass?


Agreed. "Problems" without measurements aren't worth much. My initial reaction was to your assumption about Linux, but we agree that assumptions in general aren't worth much.


How is LRU not simple? The page replacement policy in Linux is simple it's just makes use of available information - The algorithm does not make the system. Clock-pro is not in Linux and I doubt it ever will be. The goal in creating manageable code is passing the information that you know to the next person that doesn't by using comments and even Linux doesn't do this at all so hack-ability of any non trivial parts is near impossible without a few months of poking.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: