The new sequence IDs are the most interesting things in this release I think. Having an officially supported cross datacenter replication strategy would be real nice.
Lots of folks will be mad, but removing multiple mapping types is a nice change too. It was a feature that never really made sense. Index-per-type was always the better strategy, even going back to the 0.23 days.
As others in this thread will no doubt point out though - the ES folks are moving awfully fast. I still support 1.7.5 clusters that will take heroic efforts to update. I'd love to use the new features on those clusters, but there simply isn't a valid business case to take on the risk. This isn't like small NPM packages that you can update on a whim - some of these systems require terabytes of re-indexing to upgrade :/
Cross data center replication is really a much needed feature.
The way Elasticsearch is going though looks promising. With sorted indices, single mapping type and the other changes we might give it another try after switching to Algolia.
Is there a safe way now to query Elasticsearch directly without the need to go via proxy scripts on the server? This just adds so much overhead to the queries compared to Algolia.
Jason from Elastic here. We are actively developing cross datacenter replication (internally we are calling it "cross cluster replication" so you will likely see it referred to this in the future but of course this is subject to change).
I can not give a timeframe, but it is one of the top features on the ES roadmap. :)
> Cross data center replication is really a much needed feature.
This works pretty well already if you are running on your own hardware and have a good network. We've been running a three data center setup across the US for four years. Next year we may extend it across the Atlantic.
You have always been able to use the query DSL to write queries, aggregations, etc. If you're referring to scripting then yes, in 5.0 the "Painless" scripting language was introduced which allows you to send scripted queries via API request without having to store them on the server. The language was designed to not allow for exploits like when using other languages for running scripts on Elasticsearch.
As in a publicly facing front end? If that's your case, you wouldn't ever want to expose Elasticsearch to your front end directly. If you have a private front end that is inside your firewall then just create HTTP requests to Elasticsearch - it has a RESTful API.
But querying from a publicly facing front end would be a poor idea - would you expose a database directly to the front end?
It is called Elasticsearch and not Elasticdatabase, at one point it sounded like good idea to jump on the nosql bandwagon.
It is a fantastic idea to call the index directly from the frontend and could be solved with a read only type of index or api key with read only scope.
The current design with an unnecessary security layer outside of Elasticsearch is a poor idea adding too much administrative overhead and ridiculous latency.
They have what you are looking for with their X-Pack Security addition, which requires a license, though under very favourable terms compared to others...
I wish after 1.x they had maintained backwards compatibility with upgrades as they had kind of promised... there is just so much money in making it hard to upgrade ;)
That consulting arm of Elastic needs something to do :)
More charitably, I can understand why they felt the need to make a hard break, but difficult upgrades plus fast release cycle means a fair bit of friction.
The no downtime upgrades will be nice, but on a big production system, I wouldn't feel comfortable upgrading major versions every 6 months.
They are supposed to implement it using a join field - but it's not as nice as it is now (AFAIK you can't have multiple join types.. eg, I can't have grandchildren or two different kinds of children). It's really unfortunate because the issues they had with multiple types could have been handled more elegantly and still kept the immense power of parent-child fields.
It feels from the discussion of the github issues that this was something that had caused some elastic developers pain and they decided to kill it.
The 6 way is maybe better than the 5.x way - but there are problems with it and they could have done the 6.x better than they did.
There were two fundamental problems: 1) types had different mappings which was confusing since internally it's the same index and there is only one mapping. 2) for the use case where you have one type per index, you still had to arbitrarily create a "type".
It could have been done by:
1) making the mapping definition only at the index level - there's no such thing as a mapping for a type (this is how it works internally anyway.)
2) with a "type" field being optional and specified via a query string instead of url path. This would have left all the internals alone. Eg, there could have still be an internal meta "_type" field which would have had the default value of "default" or something. For those people that needed multiple types, specifically for more complex parent-child, they still could have done it.
The current approach is far more complicated because the have to change the internals to support both the new and old way during the transition and deal with a lot of internal things breaking because everything expects a "_type". You can checkout the github issues and see the work involved.
I detailed some of my issues/suggestion at https://discuss.elastic.co/t/parent-child-and-elastic-6/8572... but to no avail. After I looked through some of the github issues, I realized how long this was in the pipeline for and how much inertia there was in the direction.
Lots of folks will be mad, but removing multiple mapping types is a nice change too. It was a feature that never really made sense. Index-per-type was always the better strategy, even going back to the 0.23 days.
As others in this thread will no doubt point out though - the ES folks are moving awfully fast. I still support 1.7.5 clusters that will take heroic efforts to update. I'd love to use the new features on those clusters, but there simply isn't a valid business case to take on the risk. This isn't like small NPM packages that you can update on a whim - some of these systems require terabytes of re-indexing to upgrade :/