Because each job only runs once in the middle of the night when it detects any changes. Also a lot of our tests are very finicky and need to be restarted multiple times before they run successfully, so a failing test isn't necessarily an indication of bad code. On top of that, a lot of our tests are not actually testing anything useful, they are merely fulfilling the customer requirements from specifications - the only useful test for a lot of these requirements are full system tests, which are not easily automated since there are a lot of complex interconnected systems involved with expensive physical hardware that would not be easy to fake.