Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For a Scala developer that signature makes sense. You want to add an element of type B to your collection with elements of type A, however the addition of an element of type B may not be supported (e.g. you may want to add a String to a BitSet and get in return a plain Set[Any]), so the above works only as long as there is a builder available that can build the new collection. The function also works on covariant collections, because it's building a new collection (i.e. it's for immutable collections), so it doesn't have the covariance gotcha of arrays.

Scala's collections have some quirks, but not as many as .NET's collections and you're actually comparing apples to oranges, because you won't find the equivalent of an "add" that returns a new collection instead of modifying the old one in .NET.

This is what happens when you pass judgement unto things you don't understand. Working with immutable data-structures is really, really awesome and Scala's API for these collections is very friendly and very type-safe - as in, if you feel the need to use `isInstanceOf` / `asInstanceOf`, then you're probably doing something wrong ;-)

And I really wish that C# would grow up a little in this regard, as modern programming languages need immutable collections as well, with a nice API to go along with it. And btw - working with Option is super awesome, no category theory needed.

That said, as I was saying in another comment, I'm really excited about this announcement, because this is mostly about the runtime, not the language. You can run things built with Scala on top of .NET right now by means of IKVM. And the JVM finally has some credible competition.



Oh btw C# does have immutable collections now. Adoption isn't too high yet AFAIK, but they're pretty awesome..

http://msdn.microsoft.com/en-us/library/dn385366(v=vs.110).a...


Interesting, thanks for the link.


>>> This is what happens when you pass judgement unto things you don't understand

I think it's kind of the point if the judgement is on the question "what is easier to understand". I think this is the sentiment many people staring to learn Scala are feeling - the barrier to entry, even if you are coming not from the blank slate but from the background of programming many years in many languages, is pretty high. It's not the judgement on "whether Scala and its collections are good/done right", which is entirely separate question from "whether it is easier for someone to understand how C# collections or Scala collections work".

>>> And btw - working with Option is super awesome, no category theory needed.

Well, if you want to do something like making a function that works on Option from a function that works on the underlying type, you pretty soon find yourself in that general area.


I agree that Option types are great, so great that I'm happy to work in a much 'weaker' language, use option types, and solve 90% of the problems that more 'theoretically robust' languages address with none of their downsides.

As far as not understanding how awesome immutable data structure are.. I'd take a step back before you make assumptions about what other people understand. Do you know anything about cache hierarchy and memory models on modern CPUs? Performance is important to some of us.


> Do you know anything about cache hierarchy and memory models on modern CPUs?

Yes I do. Worked 3 years on a soft real-time system with massive load, profiled the shit out of everything. There's an interesting discussion we could have about when immutable data-structures work best, when they've got problems and when it doesn't matter, especially given the extra benefits in dealing with accidental complexity. This isn't the right place though.

> Performance is important to some of us

Yes it is, but performance problems are fixed by means of profiling and optimizing the bottlenecks. Even in a system that has massive load, in many cases in doesn't matter and in some cases immutability increases performance by eliminating contention on reads. And seriously, most people invoking performance problems are not having those performance problems to begin with, therefore my assumption.


Well, that's good, sorry for asking. I've had a lot of people point at the big-0 notation of that linked-list append and tell me that it's faster than an ArrayList append on average. Usually this is right after telling me that "I just don't get it" with immutable collections. So I hope you can see why I'd react that way.

As I'm sure you know, when it comes to real-world situations, reducing inter-thread communication and isolating anything mutable is the key concern. That's why I find immutable wrappers that mock mutability to be a bit of a sideshow. You shouldn't have read contention with locking in the first place unless it's for a very good reason.


> You shouldn't have read contention with locking in the first place unless it's for a very good reason.

True, but you know how it is in practice :-)

For example I found that using persistent data-structures work best when you've got single producer, multiple consumers scenarios - so you mutate some state and you want to signal it over asynchronous boundaries to multiple consumers. With an immutable data-structure you just signal it, worry free and then you can keep on changing that state, completely non-blocking / wait-free and with good algorithmic complexity.

Actually non-blocking logic becomes really easy, as you can always shove an immutable value into an atomic reference (note - I'm not saying "wait free", which still takes a lot of work :))

So really, persistent data-structures are great in a multi-threading context, as long as you don't have multiple producers pounding on the same reference holding such an immutable value - if you do that, things can get bad, when compared to specialized concurrent mutable data-structures - because a good concurrent data-structure is able to distribute the contention in multiple buckets instead of just one. But then again, having multiple producers pounding on the same resource is just asking for trouble and has to be avoided, because Amdahl's law.

Also, as you've hinted at, the problem with a normal linked List is the level of indirection. And in general, persistent data-structures imply the usage of trees, which also implies indirections. More advanced persistent data-structures are much better than the linked list is and this is an active area of research, but on the whole there's still much room for improvement.

On the other hand, in my opinion when speaking about performance, the first problem one has is to actually use the available CPUs (e.g. getting CPU usage over, say 70-80%). Which usually is hard to achieve if you have a combination of CPU-bound and I/O-bound tasks and your I/O stuff is not asynchronous. Only after that you can then move on to optimizing the memory access patterns for cache locality and for minimizing the stop-the-world freezes.

Speaking about GC, that's another topic - persistent data-structures have a tendency to generate junk that is neither short term or long term and that invalidates the assumptions that current GCs are making. The JVM at least has really good GCs, but without paying for a pauseless one (like that one from Azul Systems), you can still end up into trouble if you don't pay attention - but then you fire up YourKit's Profiler, find the source for those STWs, optimize and it works out well.

All in all I encourage everybody to find a good library that implements persistent data-structures and integrate them in their toolbox.


Hey, so I agree with your analysis in general, and sorry for being opaque earlier (was at work), but you totally got what I was driving at and explained it probably better than I could have.

The one thing I disagree with, in many server applications, is using the available CPUs is pretty easy. You've got thread pools handling various tasks, just crank them up. In JVM-land, a very heavy 512kb stack per thread is still not really much penalty to pay as long as you re-use them. Aggregate application performance then becomes a matter of completing tasks faster while creating less garbage.

So it all comes down to what you consider a 'task' and how you handle the handoffs between them. The architecture decisions at this level dwarf the improvements from using an array vs list, as you implied, but they also make the usage of immutable types somewhat irrelevant IMO. Seal off mutable code within single-task boundaries and it doesn't matter how ugly it is, as long as you're passing immutable types (just plain javabeans with final members are fine) between boundaries.

Anyways, just my opinion. Great comment.


> Actually non-blocking logic becomes really easy, as you can always shove an immutable value into an atomic reference (note - I'm not saying "wait free", which still takes a lot of work :))

Why the atomic reference here? I know that provides CAS but if we're talking about a single writer aren't you okay to just replace things anyway?


Yes, I wasn't talking about a single writer. That was a new paragraph. Sorry about not making that clear.


Ah cool, just wanted to clarify as I'm only a beginner for much of this stuff, but found your posts very interesting.

Thanks!


Nice effort, but judging from the non-sense emitted by jbooth, he won't understand the things you are explaining anyway.


> Well, that's good, sorry for asking.

Gotta test for any chinks in the armour.


> Do you know anything about cache hierarchy and memory models on modern CPUs?

Yes.

> Performance is important to some of us.

... and in those cases, you don't have to use data structures that model your problem domain poorly.

The problem isn't "immutable data structures" or "theoretically robust" languages. The problem is finding ways to express computations and their constraints in a way that can be efficiently modeled for your problem domain.


The problem with this logic is that you won't know about your performance problem until a system is scaled up to production level tasks - except if you build a comprehensive performance model beforehand, which takes about as long as the implementation (and is thus nonsensical for most tasks) and is really hard to do IMO in an FP language.

So what will probably happen is that once you see your performance problems, you can pray that it's only some hot spots that you can then replace with faster code - often it's not, so you have 100 places using around 1% of your time budget for example, which is when you can go and start over.


>>And I really wish that C# would grow up a little in this regard, as modern programming languages need immutable collections as well, with a nice API to go along with it May I suggest the Immutable Collection library from Microsoft? http://msdn.microsoft.com/en-us/library/dn385366(v=vs.110).a...

Admittedly not part of the core library, but installing a NuGet package is pretty darn easy.

>>This is what happens when you pass judgement unto things you don't understand ;)

[edit: Guess I should've refreshed the page to see the prior response before writing this. Apologies.]




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: