It does seem surprising, but ARM BASIC was fast. Seriously fast. I read somewhere that it was later optimized further for the ARM3, so that its core working set (for common operations) would fit inside the ARM3's cache. When you consider that was only 4kb, you get a feel for how well-written ARM BASIC was.
It was very fast, and the built-in assembler and easy integration of your code from BASIC, e.g. [.mol% mov r0, #42:mov pc, r14]:PRINT USR mol%, meant there were easy options when it wasn't quite fast enough.
But then you've got to remember that Sophie Wilson wrote the 16KiB 6502 BASIC interpreter for the BBC micro, and the BASIC for the Z80 second processor IIRC, and then brought that experience to bear when designing the ARM instruction set. Having done so, she then wrote BBC BASIC V in ARM; hardly surprising if it knew every little nuance and could wring speed out of it.
In later years, the BBC BASIC code, 6502 and ARM, were good stress tests for emulators because of her wide use of the instruction sets.