How did they do it? I thought that their GC critically relied on virtual memory support that current OSes do not provide, hence the need to run virtualized.
I bet they just sacrifice granularity and use the MMU to do what they need. Unmap an entire page and handle the segfault would work, but be not as good as the VTD stuff they use in the hypervisor version.
That's roughly what they do in the hypervisor version too (protection is at page granularity). The big problem is that this causes major suckage on an unmodified kernel because mprotect() changes mappings synchronously and only works on a linear region at a time. So basically every page that gets scanned involves a separate global TLB shootdown.
The Azul collector doesn't need the changes in page protection to be visible immediately, so their kernel patch allows batching together the changes and committing in a single operation.
As far as I can tell from reading their whitepaper, however, they still require kernel patches.