Dear Pawel and Eero,
Thank you very much for your responses. The "physical" approach (remixing
of channels) is one that I did not consider up to now, and which might
with some movies really help. We will probably use only very recent
movies, as we have found that the degree of familiarity is important for
the elicitation of strong emotion. (Believe it or not: many students would
not know Casablanca.) The movies would be in Dolby.
What I really thought was that by now auditory scene analysis (ASA) would
be advanced far enough to solve this problem. It is simpler than
segregating the voice from songs, because it would normally deal with
spoken speech (as opposed to sung speech) that is overlayed with
instrumental music. I have listened years ago to some demonstration of
somebody who presented the result of a computational ASA approach, and it
was (as a demo) quite convincing. Unluckily I don't remember the details,
nor the name. I have no idea whether such algorithms are by now mature
enough to solve such a task.
Best,
Christian Kaernbach