It's clear that the MBA is throttling the calls at the Kernel level since there is nothing at the protocol or hardware level that would necessitate something as slow as 60k calls/sec. While that speed is certainly not bad, especially for a consumer laptop and may beat out of the box Linux distributions, it doesn't mean that the bottleneck isn't something that's hardware dependent.
Ultimately, if you want real speed doing something real simple, you're going to want to use FPGAs anyway which would be a trivial consulting fee to implement compared to a development effort that will go deep into your kernel, hardware drivers and probably protocol tuning as well.
Ultimately, if you want real speed doing something real simple, you're going to want to use FPGAs anyway which would be a trivial consulting fee to implement compared to a development effort that will go deep into your kernel, hardware drivers and probably protocol tuning as well.