Because each job only runs once in the middle of the night when it detects any c...

Because each job only runs once in the middle of the night when it detects any changes. Also a lot of our tests are very finicky and need to be restarted multiple times before they run successfully, so a failing test isn't necessarily an indication of bad code. On top of that, a lot of our tests are not actually testing anything useful, they are merely fulfilling the customer requirements from specifications - the only useful test for a lot of these requirements are full system tests, which are not easily automated since there are a lot of complex interconnected systems involved with expensive physical hardware that would not be easy to fake.