This problem has been studied to some extent by various loudspeaker designers and other experts. For example, the renowned Danish company Bruel & Kjaer, a manufacturer of audio measurement equipment (Møller, 1974).
The B&K team tested five different loudspeakers in three different rooms, with sound quality evaluated by five listeners in a blindfold listening test. The results were surprising: it was impossible to predict which loudspeaker would perform best in any given room. None of the loudspeakers sounded the best in all three rooms. This is understandable because each room creates a different acoustic load for the loudspeaker. Changing the load changes the results, much like how amplifiers perform differently depending on the load.
What advice did the B&K research team offer to customers looking to buy a loudspeaker system? They recommended taking various loudspeakers home and measuring them using pink noise.* Finally, select the pair that provides the smoothest response at the listening position within the frequency range of 60 Hz to 6000 Hz. Typically, this loudspeaker also sounds the best.
Before founding Gradient Ltd., Jorma Salmi, along with his colleague Anders Weckström, decided to study the room/loudspeaker interface problem (Salmi and Weckström, 1982). Instead of testing and measuring loudspeakers in ordinary rooms, they utilized an anechoic chamber. See their findings in "The Absolute Listening Test".
*This method is still in use today. For instance, John Atkinson of Stereophile uses a computerized system that averages 60 measurements for a single loudspeaker, totaling 120 measurements for a stereo pair. This type of measurement correlates well with subjective listening results. As seen in Stereophile (March 1997), the in-room measurement of the Gradient Revolution was an impressive 32 Hz - 10 kHz +/- 1.3 dB (see fig. 5: Stereophile Review).