It sounds like you're conflating a binary barrier with a service barrier here. In your example you have two cleanly defined services which happen to be in the same binary. The issue with monoliths is when they all do the same thing because the code is a big ball of mud that "cheats" by calling shared subroutines and doesn't have a clearly delineated API. Therefore you have only one knob to turn to scale.
The 20+2 distinction isn't clear because each part of the code is causing action-at-distance on the others. Failures become a lot harder to isolate. That's what people mean by scaling is hard with a monolith.
That doesn't answer the question though. Suppose the apis are completely independent. Say one is a chess server and the other generates haikus. Granted that's silly architecturally to put into a single binary, but that's not the question. My question is why specifically managing scalability becomes easier when deploying them independently. My thought is that it actually becomes more difficult as you have to manage each one independently, whereas if they were a single binary, all you have to care about is the net sum of your resource needs.
Usually because they have different usage patterns.
Chess API is a daily cycle between 10-1,000 TPS and is CPU intensive.
Haiku is uaually 1-10 TPS apart from Fridays (when everyone's device gets a new one) and holidays when it spikes to 100,000 TPS and is IO intensive.
Scaling a single service with both these API endpoints being called for different pattens like the above is a pain. Splitting them allows for choosing different host types (e.g. More CPU or Memory / SSD etc.) and makes scaling (especially planned / dynamic scaling) easier.
Thanks, and to add to that, it occurred to me that they may have different criticality too. Some services you may want to scale very aggressively because a failure would be catastrophic. Whereas other services may be even more cpu intensive on average but failures are acceptable so you let them run at 90% load. I'd imagine this would be a far more difficult balancing act if they were both in the same process.
For example if you had your chess AI engine running in the same monolith as your web server, it could slow down your response time to the point of timeout. But if they were separate services, your web server could stay snappy and give a meaningful response to the problem. "our ai service is overloaded right now, but here is a nice haiku while you wait."
Though still, I'd think of that as a fairly advanced use case. Not something small projects should have to think about.
I think your original question is a good one. It must be thoroughly proven and not just taken as gospel.
You may find cases where decoupling a service is a good idea. That doesn't justify decoupling everything by default. The more you decouple the more rigid becomes the whole system.
Other than freeing up memory where the binary would be loaded and maybe some network ports I don't see any advantage to scaling two "light weight" services separately.
I see your point. Having services entails more complexity.
That said it allows you to tailor your code to its specific need. One of the services needs a certain database, or queuing or other connectivity considerations while the other doesn't or has different needs. Having them bundled prevents you from splitting things out. You don't have a limited and specific design.
No, the whole point is that most teams just shove everything into one monolithic program. If you "suppose the APIs are completely independent", that's not a monolith.
The 20+2 distinction isn't clear because each part of the code is causing action-at-distance on the others. Failures become a lot harder to isolate. That's what people mean by scaling is hard with a monolith.