Now that you mention it, I know the basic "add/sub is faster than multiply is much faster than divide" order, but I have no idea how it compares to casts.. why isn't that ever mentioned anywhere?
I'm guessing for the same reason casts are slow; they're not bottlenecks for the most common number crunching workloads, so it doesn't occur to anyone they might be bottlenecks for somewhat less common ones.