The thing that can be measured is performance of a program for some programming language, compiled with particular compiler, which was run on particular machine.
People comparing "performance of programming languages" do exactly this. They take some problem, write programs to solve it in chosen languages, compile programs with some selected compilers, and then run them on some machine. This measures exactly what it measures. It is not absolute performance of a programming language, which even cannot have any reasonable definition.
I can think of 2 different options of performance measurement of a compiler (in context of solving some particular problem).
- Program optimized for best performance.
- Program written in idiomatic style for a language.
The 2nd option is more important. Language should be used in a way it was meant to be used. If program is written as if "fighting the language", then different language should be chosen, the one which allows to model solution in more straightforward way. (If you are measuring performance of C++, you probably should not write custom GC or Prolog interpreter)
So this is what performance of programming language usually means. It is subjective evaluation of performance of popular compiler(s), given various different problems and typical solutions in that language. It's not that it is bad or something, but such statements should always be preceded with "In my opinion...",  and some elaboration on context is very much in order.
