It's not just I/O, it's also mutexes, condition variables, time, etc. It's not horrible, but it does add up, so calling it super efficient is a stretch.
A modern allocator with per-thread cache can satisfy some allocations in 20-30 cycles - dynamic dispatch can easily double that, even if the target is still in cache.
It's one of these things where it's extremely use case dependant - like many performance issues, you probably don't care about it - but when you do it matters.
Inderect call cost is a few cycles, if predicted. Now, you can argue, that it may be mispredicted and misprediction would cost about 20-30 cycles. But if it is mispredicted, then you are not calling into allocator often enough. And if you don't hammer it hard, then why do you care about preformance?