For a Scala developer that signature makes sense. You want to add an element of ...

rasmuskl · on Nov 12, 2014

Oh btw C# does have immutable collections now. Adoption isn't too high yet AFAIK, but they're pretty awesome..

http://msdn.microsoft.com/en-us/library/dn385366(v=vs.110).a...

bad_user · on Nov 12, 2014

Interesting, thanks for the link.

smsm42 · on Nov 13, 2014

>>> This is what happens when you pass judgement unto things you don't understand

I think it's kind of the point if the judgement is on the question "what is easier to understand". I think this is the sentiment many people staring to learn Scala are feeling - the barrier to entry, even if you are coming not from the blank slate but from the background of programming many years in many languages, is pretty high. It's not the judgement on "whether Scala and its collections are good/done right", which is entirely separate question from "whether it is easier for someone to understand how C# collections or Scala collections work".

>>> And btw - working with Option is super awesome, no category theory needed.

Well, if you want to do something like making a function that works on Option from a function that works on the underlying type, you pretty soon find yourself in that general area.

jbooth · on Nov 12, 2014

I agree that Option types are great, so great that I'm happy to work in a much 'weaker' language, use option types, and solve 90% of the problems that more 'theoretically robust' languages address with none of their downsides.

As far as not understanding how awesome immutable data structure are.. I'd take a step back before you make assumptions about what other people understand. Do you know anything about cache hierarchy and memory models on modern CPUs? Performance is important to some of us.

bad_user · on Nov 12, 2014

> Do you know anything about cache hierarchy and memory models on modern CPUs?

Yes I do. Worked 3 years on a soft real-time system with massive load, profiled the shit out of everything. There's an interesting discussion we could have about when immutable data-structures work best, when they've got problems and when it doesn't matter, especially given the extra benefits in dealing with accidental complexity. This isn't the right place though.

> Performance is important to some of us

Yes it is, but performance problems are fixed by means of profiling and optimizing the bottlenecks. Even in a system that has massive load, in many cases in doesn't matter and in some cases immutability increases performance by eliminating contention on reads. And seriously, most people invoking performance problems are not having those performance problems to begin with, therefore my assumption.

jbooth · on Nov 12, 2014

Well, that's good, sorry for asking. I've had a lot of people point at the big-0 notation of that linked-list append and tell me that it's faster than an ArrayList append on average. Usually this is right after telling me that "I just don't get it" with immutable collections. So I hope you can see why I'd react that way.

As I'm sure you know, when it comes to real-world situations, reducing inter-thread communication and isolating anything mutable is the key concern. That's why I find immutable wrappers that mock mutability to be a bit of a sideshow. You shouldn't have read contention with locking in the first place unless it's for a very good reason.

bad_user · on Nov 12, 2014

> You shouldn't have read contention with locking in the first place unless it's for a very good reason.

True, but you know how it is in practice :-)

For example I found that using persistent data-structures work best when you've got single producer, multiple consumers scenarios - so you mutate some state and you want to signal it over asynchronous boundaries to multiple consumers. With an immutable data-structure you just signal it, worry free and then you can keep on changing that state, completely non-blocking / wait-free and with good algorithmic complexity.

Actually non-blocking logic becomes really easy, as you can always shove an immutable value into an atomic reference (note - I'm not saying "wait free", which still takes a lot of work :))

So really, persistent data-structures are great in a multi-threading context, as long as you don't have multiple producers pounding on the same reference holding such an immutable value - if you do that, things can get bad, when compared to specialized concurrent mutable data-structures - because a good concurrent data-structure is able to distribute the contention in multiple buckets instead of just one. But then again, having multiple producers pounding on the same resource is just asking for trouble and has to be avoided, because Amdahl's law.

Also, as you've hinted at, the problem with a normal linked List is the level of indirection. And in general, persistent data-structures imply the usage of trees, which also implies indirections. More advanced persistent data-structures are much better than the linked list is and this is an active area of research, but on the whole there's still much room for improvement.

On the other hand, in my opinion when speaking about performance, the first problem one has is to actually use the available CPUs (e.g. getting CPU usage over, say 70-80%). Which usually is hard to achieve if you have a combination of CPU-bound and I/O-bound tasks and your I/O stuff is not asynchronous. Only after that you can then move on to optimizing the memory access patterns for cache locality and for minimizing the stop-the-world freezes.

Speaking about GC, that's another topic - persistent data-structures have a tendency to generate junk that is neither short term or long term and that invalidates the assumptions that current GCs are making. The JVM at least has really good GCs, but without paying for a pauseless one (like that one from Azul Systems), you can still end up into trouble if you don't pay attention - but then you fire up YourKit's Profiler, find the source for those STWs, optimize and it works out well.

All in all I encourage everybody to find a good library that implements persistent data-structures and integrate them in their toolbox.

jbooth · on Nov 13, 2014

Hey, so I agree with your analysis in general, and sorry for being opaque earlier (was at work), but you totally got what I was driving at and explained it probably better than I could have.

The one thing I disagree with, in many server applications, is using the available CPUs is pretty easy. You've got thread pools handling various tasks, just crank them up. In JVM-land, a very heavy 512kb stack per thread is still not really much penalty to pay as long as you re-use them. Aggregate application performance then becomes a matter of completing tasks faster while creating less garbage.

So it all comes down to what you consider a 'task' and how you handle the handoffs between them. The architecture decisions at this level dwarf the improvements from using an array vs list, as you implied, but they also make the usage of immutable types somewhat irrelevant IMO. Seal off mutable code within single-task boundaries and it doesn't matter how ugly it is, as long as you're passing immutable types (just plain javabeans with final members are fine) between boundaries.

Anyways, just my opinion. Great comment.

room271 · on Nov 13, 2014

> Actually non-blocking logic becomes really easy, as you can always shove an immutable value into an atomic reference (note - I'm not saying "wait free", which still takes a lot of work :))

Why the atomic reference here? I know that provides CAS but if we're talking about a single writer aren't you okay to just replace things anyway?

bad_user · on Nov 13, 2014

Yes, I wasn't talking about a single writer. That was a new paragraph. Sorry about not making that clear.

room271 · on Nov 13, 2014

Ah cool, just wanted to clarify as I'm only a beginner for much of this stuff, but found your posts very interesting.

Thanks!

frowaway001 · on Nov 12, 2014

Nice effort, but judging from the non-sense emitted by jbooth, he won't understand the things you are explaining anyway.

Dewie · on Nov 13, 2014

> Well, that's good, sorry for asking.

Gotta test for any chinks in the armour.

teacup50 · on Nov 12, 2014

> Do you know anything about cache hierarchy and memory models on modern CPUs?

Yes.

> Performance is important to some of us.

... and in those cases, you don't have to use data structures that model your problem domain poorly.

The problem isn't "immutable data structures" or "theoretically robust" languages. The problem is finding ways to express computations and their constraints in a way that can be efficiently modeled for your problem domain.

m_mueller · on Nov 12, 2014

The problem with this logic is that you won't know about your performance problem until a system is scaled up to production level tasks - except if you build a comprehensive performance model beforehand, which takes about as long as the implementation (and is thus nonsensical for most tasks) and is really hard to do IMO in an FP language.

So what will probably happen is that once you see your performance problems, you can pray that it's only some hot spots that you can then replace with faster code - often it's not, so you have 100 places using around 1% of your time budget for example, which is when you can go and start over.

breischl · on Nov 12, 2014

>>And I really wish that C# would grow up a little in this regard, as modern programming languages need immutable collections as well, with a nice API to go along with it May I suggest the Immutable Collection library from Microsoft? http://msdn.microsoft.com/en-us/library/dn385366(v=vs.110).a...

Admittedly not part of the core library, but installing a NuGet package is pretty darn easy.

>>This is what happens when you pass judgement unto things you don't understand ;)

[edit: Guess I should've refreshed the page to see the prior response before writing this. Apologies.]