Fixed-Point Designer does not support per element scaling for vectors and matrices.
A good solution to your need will depend on the nature of your problem.
Idea 1: set up automated testsuite to see what matters
Set up a testsuite to automatically exercise your design over a rich variety of conditions to make sure it meets system level requirements. This will allow you to productively explore lots of design adjustments. You can quickly discard the ones that don't meet requirements.
Idea 2: prune out and/or quantize away the small contributors
Machine learning layers often involve intensive amounts convolutions, dot products, and matrix multiplies. Two of the techniques used to drastically reduce size, reduce power consumption, and increase speed are quantization and pruning. The idea in pruning is that some of the parts of the calculation just don't make a meaningful contribution to the higher level system behavior so "cut them out." For example, whole rows or whole columns from a matrix multiply might have no meaningful impact so get rid of them. Perhaps the very small matrix elements in your design also don't matter and can be pruned out or quantized to zero. The automated testsuite described above can allow you to productively determine if this is the case.
Idea 3: Split it up
As you mentioned, spliting up the algorithm to use fewer longer vector signals and more scalar or shorter vector signals is one way to go. If the dimensions involved are modest, this may be the best way to go. We've worked with many Simulink users that had concatenated signals for convienience and this worked fine in the luxury of doubles. But when they realized that big differences in ranges were preventing them from getting high efficiency on their embedded devices, the removed the concatenation with modest effort.
Idea 4: State Space to Biquads
If the matrix calculations are part of a State Space model (especially if SISO), then a best practice to avoid loss of precision issues (even with floating-point) is to convert the State Space model to a transfer function and then split that transfer function up in to biquad sections. This often works like a charm for fixed-point and also single precision floating-point designs that are struggling with precision loss problems. Idea 5: Look at the bigger problem
We've encountered several users of Simulink and Coders that were struggling to do embedded efficient matrix inverses. When looking at the problem from a higher level it was realized that they were trying to solve a system of linear equations. To solve the higher level problem, very efficient approaches like hardware friendly QR could be used.
Hope some of this point to a useful direction for your particular designs need.
Andy