You really can't from observer. The feeling of pvp and the ability levels of players vary massively.
Like the common complaint of "Wizards shield is OP". Unless you are taking the damage, managing your hp / suns, taking the ticks and reacting as a player you can not properly 'get' how pvp feels to each class.
Observer you can see some things for sure, but a lot of that you can do with 2 characters in test - but only to make sure your damage calculations are reacting as intended as mostly it can be done on a calculator.
People with their kits on the line play differently. Make more mistakes, panic, fat finger keys, all sorts.
I get what you're saying. However if they make more mistakes and fat fingers, why would you balance around that?
if people **** up, are generally bad at the game, you don't balance around that; they just need to get better at the game.
Also what if you (the GM) are sub-par at PvP compared to the standard, do you then balance around your poor PvP with a class, rather than how the class is intended to be played?
There are different methods of balancing around PvP, being in the fray can help, but you can collect ample data, information and general feel for the balance without needing to get directly into the fray.
Not arguing with your method by the way, just assume we have different ways to approach the same conclusion