I live in a Russian city of 100k people and there are at least 3 medical MRI machines nearby that I visited, of a total 8 offers that maybe share the machines (though likely not), all of them of 1T field strength or above. I can get a brain MRI this week for some $70, or a full-body MRI (the most expensive as per the price list) for $400. If anything, these services got some 30%-50% cheaper in the past decade. Barring some African hellhole, the argument to scarcity or expense of these machines is complete and utter bullshit. It's just that HN is America-centric, and American healthcare is a cesspit of waste and corruption.
Did a similar test in plain C: https://godbolt.org/z/ffYcWhxz3
It's not quite the same: I've used an increment instead of zeroing, otherwise the entire benchmark gets optimized away. Still got just about the same result (3.7x speedup for 100 iterations), so wasm did good there. Actually, now that I think of it, SIMD code performance probably depends on good register allocation more than on any optimization.
It looks promising! But fixed-width lanes don't seem too cross-platform? I don't just mean the v256 and v512 types that may become ubiquitous in a few years, but also things like optimizing for different L1 cache sizes, doing some operation macro-fusion on the SIMD unit, or directly supporting leading/trailing elements to reduce code size?
In practice it's not possible to optimize "generally" for all possible target architectures your wasm will run on. You're going to optimize for x86-64 or ARM, and probably going to specifically optimize for modern intel, modern amd, or apple's m1. If you try to optimize for everything you're going to run into really painful tradeoffs and probably have mediocre performance on a bunch of architectures after a lot of hard work.
Wouldn't it be possible to have a binary containing multiple versions of your program compiled optimized for various CPU configuration and have a switch at runtime which would select depending on your CPUid. I think intel have a compiler for that.
Yes, our github.com/google/highway does that for SSE4/AVX2/AVX-512. It targets at the level of instruction sets, though, not specific microarchitectures.
Why not? Fixed-size SIMD architectures use mostly the same operations, so if you target SSE2 initially, the code should run just fine on NEON. A runtime that ships a JIT compiler also has the unique opportunity to further optimize SIMD code by using more lanes or limiting the working set to the host platform's L1 cache size. Even the AOT compilers like GCC or clang emulate platform-specific intrinsics using generic vector ones. This should count for something, no?
They are similar but not the same, for instance SSE has movemask, but NEON does not, so it gets emulated(slowly) when targeting that platform. The cross lane ops are different enough that you might need to rewrite for other platforms. And then you run into situations where an instruction is very fast on one architecture but horribly slow on another because its basically emulated.
This isn't really relevant to wasm, though. You can't expect it to support platform-specific hacks just for SIMD, so you'll have to make do with the lowest common denominator anyway.
Ads aren't the problem, the surveillance is. That you can't have the former without the latter is a myth FB and Google peddle to justify their existence. They don't even need your data all that much - the duopoly the myth perpetuates is what matters. There's no conclusive proof that personalized ads are more efficient than old banner networks, much less that FB's or Google's services are worth the huge share of profits they take as intermediaries.
Please clarify what do you mean by incorporating a C compiler. Does this mean you're abandoning the "custom C frontend inside the D compiler" effort in favor of something like clang? To me, the other possibility - having *yet another* C compiler to look out for - is just terrifying.
Won't this require years of busywork? The various intrinsics, function attributes, C compiler args compatibility, ABI profiles for various platforms/systems (including the Windows GCC/VC ABI idiocy?)
upd: Also, I'm not sure that implementing -funsigned-char in 2022 will be all that great for morale!
Of course AWS is the mainframe in this analogy! It's a huge throughput- and reliability-optimized overhyped crazy expensive server that you hire someone else to run for you. It's designed the same way, with dedicated storage, compute and backplane servers. Even the discourse is the same to the 80s: "why we have to rent a $$$ monster when a $ commodity server outperforms it". How is this not obvious is beyond me.
I've briefly encountered mainframes in the 90s. I thought it was a perfect analogy to AWS where "intelligence" is in the center and we're just piping stuff over the network for AWS to compute. I don't understand how selfhosted/self-provisioned servers is supposedly comparable to a mainframe, on the contrary.
> It's 2022 and we're about to rediscover something we know for 40 years already: mainframes are freaking expensive.
My interpretation: AWS is freaking expensive, AWS is in the center performing all computations; AWS is the mainframe we can well do without.
On a side note, Prometheus seems to be built for bloat. AFAIK, it isn't even designed to consume metrics other than from apps linked to its client library. It's like a microservice, but with the footprint of an operating system.
I'm talking about the way I'm expected to provide metrics for my apps. Rather than exporting free-form JSON and then scripting Prometheus to understand it, I'm expected to use a custom client library to export the metrics. As for Kubernetes, you can only use it with Prometheus because of not insignificant amount of work on both sides. Basically, the latter is designed for vendor lock-in.
Prometheus scrapes the same text format as OpenMetrics 1.0 and over 700 public exporters use this format, and there are TONS of other non-Prometheus software that consume the exact same text format. Prometheus's biggest competitor, Datadog (which is not open source mind you), consumes it too. I think even Grafana consumes it directly. It's becoming an IETF standard[0].
Would I have preferred JSON over a custom text format like this? Yeah. But to claim an open source project like Prometheus with effectively no business at all is using a text format like this to have vendor lock-in? That's quite a stretch.
> Prometheus scrapes the same text format as OpenMetrics 1.0
I find the GP's claims weird - I've written a relative ton of collectors, exporters, and translators and the format is pretty OK, not worse than most that came before it and better than lots - but I think this relationship is backwards. Prometheus "scrapes OpenMetrics" because OpenMetrics was formal documentation of what Prometheus was already doing for years.
I would not have preferred JSON. That an exposed metric is also a query is also pretty close to a schematic definition is nice.
I apologize for my mistake, then. My understanding was based on reading the Prometheus docs on making exporters alone - something I needed urgently for a job.
Include the client library if you want, but the wire format is ridiculously simple. I'll implement it from memory in a HN comment.
http.HandleFunc("/metrics", func(w http.ResponseWriter, req *http.Request) {
w.WriteHeader(http.StatusOK)
w.Header().Add("content-type", "text/plain")
w.Write([]byte("# HELP foo_bar The numbers of foos barred.\n# TYPE foo_bar counter\nfoo_bar 42\n"))
})
The client library is largely to keep track of running counters (and gauges, histograms, etc.), with a small amount of code to actually report those metrics when scraped. It's a very simple format.
IIRC (it's been almost a decade since I used varz), having multiple label values would be a map of maps in varz. It got quite ugly if you wanted to have a number of dimensions.
The other commenters have pointed out that it _is_ based on another open standard, but admittedly one less common than say, JSON. So you'll generally have to implement your own metrics producer or use a client library, that's true.
However it's also a dead simple format and you can probably implement it with a for-loop or a shell script.
Prometheus supported a JSON representation in the beginning. It was deprecated and removed before 1.0. The current exposition format was created because it cut CPU and memory for scraping metrics in half.
JSON, especially free-form JSON, is not a good format for efficient metrics monitoring.
The design consideration was not that it had to be simple to implement. It's that it had to be easy to parse by a human during an outage when nothing else works.
There are a frustrating number of fundamental corner cases due variance to floating point text formats, and slightly more in the descriptor if you also need that. It's simple to implement an expositor for a limited set of cases. As usual, it's much more difficult to parse what you actually find in the world.
Yea, there still some corner cases and implementation bugs out there. We spent months deliberating how to deal with some of these. Because the base libraries in some languages just don't produce string output from IEEE 754 the same way.
IIRC, Java is different from Python is different from Go. So, really, this is a standardization in languages problem. We tried to work around these as best we could in the OM format.
Everything is layers. We build things on top of other things. If every layer had that attitude, then the bloat would be enormous. It's already getting there.
We should praise judicious effort into optimizing any of the resources used in the systems build, at every layer.
I understand where you're getting at. IMO you're barking at the wrong tree: the problem with bloatware will not be solved by prometheus shipping a lighter statically linked binary.
It's all fun and games until you're stuck for a hour downloading 600MB of updated packages over a metered LTE. The same is with RAM usage: 512MB was enough for a phone back in 2014, now a smart TV with 2GB is barely capable of multitasking. Sure, binary sizes don't matter in most contexts. But when they do, it's a PITA.
Sure, but we're talking about an application written for a cloud/hosted environment in a datacenter somewhere. nicking at the size of a statically linked binary meant for production grade environments with fast computers and fat pipes feels overly pedantic no? Especially when we're talking about a mere 100MB
What you're kind of missing is that the S5 was a flagship phone. Generally, one has to save for more than a month to afford a purchase like that. The idea of working an extra month so that some FAANG prick meets their KPI by cutting corners on optimization doesn't even look like feudalism. It looks like idiocracy. Paying the lip service of fat shaming code bloat is the cost-effective option by comparison :)
Don't FANNG people obsess over bloat because they're trying to reach billions of customers? It might not seem that way since their pages are bigger but I'd be surprised if they were happy to leave 10s of millions of customers on the table.
They're just poster children for the particular brand of disdain $100k+/year "tech workers" bear for their users: they make enough for the shiniest of toys, so they're too far above spending their valuable time to make their software run smooth on our $100 crap phones. Nevermind that each Fb client update likely produces hundreds of tons of toxic trash called gadgets. Sure, sometimes they do optimizations. Generally, though, both Fb and Google keep exploring the physical limits to code bloat. Remember that one time that Fb hit the JVM class count limit?
FAANG are the worst offenders. Didn't facebook employ ungodly hacks to unload/load parts of the android app to navigate around the 65k method limit of dex? Have you looked at the js monstrosity of the Google hardware shop website?
You misunderstand the point. You're comparing a 1080p phone to a 4k television when texture memory is what will take up the vast majority of ram. Code footprint is pretty irrelevant.
Still the TV does fine with 2GB. Doesn't seem fair to complain.
I wasn't speaking of a 4k TV, but still, this doesn't check out. A single 2160p framebuffer is 8MPix, or 32MiB. Not counting the original FB size, the extra 1.5GiB are enough for 48 whole framebuffers. You don't need that much image data all at once, the number is ridiculous. No, I believe it's just that the code became that much less efficient.
Think of each app and all the texture content that needs to be loaded. App textures get 4x as big, all things being equal. You see a 4x change in ram across those devices.
I can guarantee 100% my Prometheus instance will never be running on metered LTE. If such a situation arises then my operational metrics are the least of my concern.
Inventing arbitrary formulae to calculate your wage is merely a dishonest negotiation tactic. Govt companies do it all the time, don't read too much into it. Also, it's perfectly fine to renegotiate your wage once you're off your probation period, as once you're in, you have a better idea of your worth to the company. Just don't do it too often.
A decade ago, an article [1] was published in the Russian "Hacker" magazine where the author alleged that a Russian OEM manufacturer's motherboard sourced from China had a BMC chip (which should've been disabled as per the mobo spec) inject a hypervisor into the host machine.
It was, again, allegedly, discovered because the author was developing some kind of distributed computing software that required a hypervisor of its own, and this exact mobo was crashing in a way that was consistent with a hypervisor being already present. The author goes further to describe how he devised a way to consistently detect hypervisors by measuring platform register access timings, and tried to report the findings to the FSB (Russian CIA/FBI) to no avail.
I personally don't put much stock in the story, as the magazine was a rag and I could come up with something like that at the time, but there it is.