So this is stuff I've picked up from modding games like Skyrim. It probably works slightly differently in Remake, but this should at least give you some idea of the amount of work going into things like "I want more diverse enemy models in my games". And why a game company might go "Screw that, the amount of work going into it isn't worth the pay-off".
So there's really two things going on here. First is the amount of work that would go into making different models/textures and how they link up. Which is linked to how character creation works (or doesn't work in the case of the Remake) and how 3D modeling in games is handled. Which linked to how the game engine handles texture data. Some game engines/creation suites are more flexible than others are.
The second issue is how is enemy generation in the game handled, and how RNG based it is or not. This is more dependent what kind of game is getting made than anything else.
When it comes to modeling/texture data and the character creation system... there's really two system working here... or not working. In a lot of games "creatures" are the mob models. The entire "creature" is created all as one model and with one texture. (For people who don't work with 3D assests a lot, a "model" is the 3D shape, a "texture" is the 2D image that gets wrapped around the model so it looks "real"). A lot of the time, which texture a model uses is baked into the model itself in some way. You'll have a 3D artist make the model and create a map for another 2D artist that has information about where the different parts of the 3D model are.
So you have all that. And this brings up the issue of how you change the color of a texture. Because it's not so simple as playing around with a colorization slider. Before that can happen, the area of the texture to be colorzied has to be defined. Oftentimes, the entire texuture shouldn't be colorized, only a small part of it. And that area has to be difined in the game somehow... for every texture that needs a variable color. A game could figure out how to point to different colored textures, but that would mean it would need to have two textures... which would increase the size of the game. Also, doing that is less flexible in the long run...
This is... a lot simpler in some ways for games that end up needing to developing a modular character creator that is very robust. Usually for games with a customizable character. On the one hand, it is way more work in the short term to develop a character creator than to just use custom models for everything. Mainly because all the models the character creator uses have to be able to work together... as do the customizable textures that can be swapped in and out. On the other hand, appearance data for such a character creator is all usually stored in the game itself (or on save files for the PC). So instead of having to store a ton of individual models, it's just a bunch of data in a table that the computer can load up when an area of the game is zoned into. This also tends to mesh in really well with a robust loot system that has lots of clothing options as most clothing is modeled with all the body parts it is "attached" to. So this ends up killing lots of birds with one stone so speak. The player gets a lot of options on how to customize their character's looks and the NPCs that are the same "type" of model as the player character already have a lot of body part models to draw from. In other words, this has a lot more flexibility than just doing models of things with no swappable parts... but it also takes way longer to implement.
As far as the Remake goes... I don't think they are using a character creator (in fact, given what people have found by loading up models in Unreal Engine, I'm 99% sure they are not). All of the player characters are too different from each other for them to not be doing a ton of custom work on them. The same thing goes for a lot of the "Main Character" NPCs. We also see a lot of "repeat" NPCs in certain areas of the game which suggets swapping around facial features and clothing options is difficult and not modular in nature.
And now we get to the second issue... implementing all those different models into the game as enemies... This applies no matter how the models of enemies are handled. The big difference here is what kind of game is being made, one with enemy RNG or one without. In a sandbox game like Skyrim, having a lot of variation is... really needed. The player is going to go by the same areas a bunch of times and there needs to be a way to vary what enemies/NPCs they come across. In a game like Remake which is more linear... they kinda need to have the same enemies every time. There's not a lot of room for RNG and therefor not a lot of room for varying the enemies the player comes across.
I don't know how the Remake does it, but in Skyrim... you essentially have a list of possible enemies that can spawn that are all some variation on an enemy type. And then the game will "roll" for what enemy from that list it should spawn. So like... if you come across a bandit camp, the game might roll fives times on the list "Lvl.1-5Bandits" and spawn all those enemies. On the list would be a whole list of pre-configured NPCs with gear and loot and also their appearance data. And that appearance data isn't randomized either. Someone had to go through and pick all the NPC appearance data by hand... for each NPC. This gives the illusion of having a lot of diverse enemies in a game. Often because that "Lvl.1-5Bandits" list would have one male and female enemy of each race in the game. And each of those races only has so many features they can have.
In a game like the Remake though, having a ton of randomized encounters isn't part of the game design, especially given the very tight combat system the game has. So it has very fixed encounters when it comes to enemy numbers and types. Which means... all the enemy encountrs have almost no RNG in them and are heavily scripted. FF games in general are not sandbox games.
This combined with a very high graphic fidelity and game models that probably take a lot of time an effort to get animated correctly. From things we've seen in the game, it's rather obvious that the devs were running out of time to finish the game and were cutting anything not needed from game development. So we have things like Cloud's Wall Market outfits having no combat animations because the devs knew when they were making the models there would be no combat in Wall Market. Things like having enemy "variations" that are essentially just recolors is... not needed to get the game out and would drain resources from other areas that needed to get done.
And yeah... this is why robots, faceless mooks, clones... and anything else that isn't visually diverse is so common in video games. It saves the devs a lot of work!