Beyond stereo 3 - The virtual Acoustic.
May 12, 2004 - by Franck ERNOULD
Taduced by vince-y.
In this third and last part of our article serie about Above the Stereo, we are dealing now with a high end field seriously called virtual acoustic. Some powerful computer tools simulate in real time the behaviour of acoustic environments. All this knowledge is about to become applications for a large public.
The virtual acoustic (in the meaning of rooms that don't really exist) did not appear in one day ! The acousticians started first to model some halls. Then they put one, then several unreal sources, by creating one then several hearing points… from which they could recreate the moves. But in order to really feel it like in reality, there remained some slight perceptive aspects to take into account. Those problems are now almost solved and whereas the applications look like oil plants, it may happen that the virtual acoustic will be everywhere within 10 years.
|Beyond the reality|
Each studio reverb has got some programs such as Room, Hall, Stadium each of several sizes : Small, Medium, Large… There's nothing surprising in it. The analogy to a real room sounds obvious, and by changing some parameters such as the volume of the room or the "HF damp" (high frequency damper) that are available in a 480 Lexicon for instance, we can get some living rooms : they have a wooden sound or on the contrary these are very reflective, etc. This approach is intuitive but it does not focus on the values exactness, so there's no real link to the reality. It is a bit like movies using synthesis pictures : that's only after a while that we've dealt with realism, we coped at first with imperfectly rendered moves.
The acoustic of real halls is a science : a room is summed up as hundreds or areas – we are speaking about facets – and also as reflection characteristics (different for low or high sounds), as mirror sources, etc. The acoustician knows how to compute for several frequencies the reverberating times, the understanding factors, the sound levels at one or another point according to the placement of specific sources, and many other merely useful parameters required by the specification of the room, whether it is a concert hall, a train station, a stadium etc. The formulas are often complex to deal with and need hours of computing with some fine enough facets grid.
|The first steps|
A hundred years ago, it did not sound easy to get from the theory to the simulation. We were building some reduced size rooms that we were putting under water, on which we could observe how some flat waves created by a small vibrator were moving. Another possibility consisted in a room reduced in volume, let's say 1/20, which was bombed by scaled sounds, thus having a frequency 20 times higher than in reality, these are almost ultra-sound that the air soaks up widely ! We were working in azot while we were looking carefully at the behaviour of the reduced model… The arrival of computers was really welcomed ! Computing could be carried out by processors, the drawing and modelling tools as well : the virtual acoustic was born !
This field enables us to deal with rooms on the screen, modify the areas, the material…
Instantaneously or almost (from many hours years ago to tenth of seconds now !) the computer uses algorithms and gives its results. Thus we can find out problems and try freely some other approaches and do the real work only when we are sure, knowing exactly about it. Architects liked this easiness, although there may still be some gaps between the theoretical results published as arrays, histograms and curves and the values which are really measured after the work is done, especially below 500 Hz. But the global trends are ok…
|Seeing Is ok, hearing Is better !|
Soon a second step was achieved. Why not hearing what we are modelling ? If the words to talk about the sound are various, on the contrary everyone agrees when listening to the sound ! Using geometric acoustic models, nothing prevents us from simulating the way some sound waves go through the modelled room (with their jumps, diffusions, absorption on any materials…) and then computing its impulse response, then re-build the sound that we would hear owing to a convolution. A few software are using this approach : CATT Acoustics for instance, and also the Bose Auditioner II which might be the most famous. Sound diffusion oriented, this system is not for sale, but carried out by the office workers of the software builder. Based on the technology developed by the Australian Lake DSP (due to this demand, their activity steadily increased in the early 90's), the software enables the modelling thanks to the modeller running MacOS (you put the room characteristics and the speakers/sound sources, you are helped by some complete libraries of materials and speaker types), and also the listening thanks to the Auditioner (amplifier + speakers + DSP set linked with the Mac via Ethernet). Once computed, the Modeller shows a picture with the expected sound levels, a list of values and a file with the impulse response of the modelled room. This file is sent to the Auditioner, then you just have to transmit the signals so that it lets be heard as if they came from the modelled room. In order to avoid computing jitter, the virtually diffused sounds must be as neutral as possible : music recorded from a deaf room or nearer taken voice. Otherwise the reverb would add to that of the simulated room which would cause confusion !
The listening is necessarily done on the Bose system which is derived from the famous Acoustimass speakers and therefore conceived. The tolerances are severe : there is a 0,1 dB of response difference between the 2 speakers. The listener puts his head onto a chin poser, the angle and the distance from the satellites speakers are also fixed. Hence the hearing location is the same for all. We thus have the illusion of hearing the sound diffused from the room – this is incredible on a virtual ice skating stadium for instance ! The hearing results are amazing and the room simulation is quite accurate.
|And more ...|
The Auditionner II needs a listening system, because the type of listening system has to be taken into account in the auralization algorithms in order to hear the acoustic creation. Whether you hear with headphones or with speakers, the "form making" computing is not the same, they change whether you want a binaural hearing (headphone) or transaural (speakers).
Other systems of modelling/listening enable to choose the type of hearing : CATT Acoustics for instance (saled in France by Euphonia) is running on an average Pentium (see screen shots). Apart from the modelling, there's an auralization module compatible with several good quality sound cards. The quality aspect is primordial. Therefore, the hundred first reflections give the subjective quality of a reverb, it is important to play them with the highest fidelity in order to refine the sound signature of a concert hall for instance. Do you need the finest system ? Lake DSP proposes its big systems dedicated to virtual acoustic such as the Huron station. It runs Windows NT on a strong PC having ISA cards, needed for the motorization of software such as Convolver, MultiScape (room simulation, with effects of going through another room, doors, walls...), Headscape (building of acoustic environments within the headphone) or AniScape (animation of sound sources). Animation ? Yes, it is time to talk about animation !
|From one place to another|
Creating the ambience of a room is nice ; enabling the listener to move within the room and thus hearing the changes of the acoustic field as in reality is much more spectacular ! If you got DSP power, the software can chain the listening points by smoothing the variations to simulate the move of the sound, thanks to some mathematical approximations. The sooner said, the sooner written ! But when you tried at first in the headphones (chosen for its higher immersion quality), it was not really accurate. The listener remained out of the built sound… Nothing is surprising there : whereas the eye easily accommodates to a perspective effect (see how easily the watcher is immersed into a movie, even a 2D representation), the ear does not let it go easy ! The ear is much more wild, we can draw it up less easily… and the brain do keep the correlation of the information between the diverse captors : eyes, hears… without forgetting the stability centres which are located in the latter.
So we had to work on it again to refine the perception. In this case, is there anything better than asking help to the psycho-acousticians ? Those came with interesting hearing concepts for binaural and transaural listening, and especially transfer functions for the head (see inlay article "the head of the customer").
Indeed in the reality we are moving the head all the time without noticing it, insensitively, but those micro changes reflect themselves in the answer from the ears owing to an instantaneous feedback. Unfortunately, hearing with headphones stops entirely this aspect. Hence the brain reacts by sending the sounding image within the head in the middle of the ears. Conclusion is : if the headphone does not follow the moving head dynamically, in the best case the sound keeps inside and we don't trust it, in the worst case, we get headaches or sea sickness...To "big plagues", big solutions : let's put a movement captor in the hearing system. Some electro-magnetic anti blur of the picture system widely used in camcorders seem to fit. As a conclusion, there is a long way to let us be in the "sound bath" : fine re-building, taking into account the morphology of the head of anybody, movement captors... We can dream of the day when the game player having a headphone is re building the picture and the sound in real time, taking account of any possible parameter : the return back to earth will be tough ! The immersion experiment in a virtual sound world is astonishing : we really feel in it. It's nothing to do with the directed monophony of most of the 5.1 mix.
From the beginning of this article, we kept going on with a deterministic approach. We can say that the acoustic is not very friendly defined : mathematics and again mathematics ! This may frighten more than one amateur. That state of fact could not be ignored by the French researchers from the IRCAM (Institute for Research and Acoustic and Music Co-ordination), who are very interested by the sound space integration in musical sheets, but having an acute interface problem : how to suggest to a composer to deal with thousands of facets ? The Spatialisator ("Spat" intimately speaking), a software developed right from the early 90's and directed by Jean-Marc Jot and Olivier Warusfel, in collaboration with France Telecom and Espaces Nouveaux, is easy to use. The idea consists in building a sound scene by manipulating both the location and the room effect while being based only on less abstracts notions : "nearer", "more ambience", "more development", "the violin is playing backwards"... In other words, it is more intuitive, understandable and gratifying for a musician than a physical approach in which we define a virtual room, in which we put speakers and sources, being not quite adapted to a try-error method... In parallel, a fundamental work was carried out on the reverb fields decomposition into several phases (early reflection, intermediate field, late reflection) but distinguishly managed. This patented innovation leads to very realistic re-creation that can't be done with traditional hardware effects.
Spat offers several interface levels : we can reach the heart of the parameters of the room effect at the lowest level, in the room window in which we find the usual commands of an artificial reverb : balance between dry sound/wet sound (Direct cursor) for instance, which we use in order to give a distance impression or deepness into a mix. The main working window (SpaT_Oper) has got cursors with widely understandable names, which command themselves some combination of low level parameters. Thus the Source Presence and Room Presence parameters are linked to the distance notion that we previously described, but it goes further : the sound keeps living (some early reflections are kept), on the contrary a far sound keeps changing whereas a traditional reverb would give the same blur on the lowest third of the Direct cursor curse. The decomposition into several phase of the reverb field is thus proved according to Spat. A higher level interface enables to move the sound sources around a circle, taking into account the directivity effect of the so-called sources (a speaker who turns back for instance), or their directivity (that increases the room effect). But for a composer, what is interesting is not to fix a sound scene but to be able to move into the space : going toward the bass, the drums, the violin,... As far as interactivity is concerned, Spat is well granted : all its parameters - in other words what we do in real time with the mouse on the screen - can be piloted by midi and thus recorded owing to any sequencer... There's nothing more direct and easy for a composer !
Another essential aspect of the Spatialisator is the separation of "re-creation" from the "rendering". In other words, all that the composer is creating is done in a undefined format, the result can be encoded in stereo, transaural, binaural, Dolby Stereo, 2, 4, 6, 8 or 12 speakers without having to change anything. The software copes with the demanded perceptive criteria, in order to render the spatializations as precisely as possible on the system available. Olivier Warusfel showed us a limited configuration of the Spatialisator : a Mac G3 300 MHz, a korg 1212 audio card, 2 coaxial speakers Tannoy PBM8. The result is very interesting ! Visitors of the "Sound" exposition at La Villette could have a demo with tenth of speakers in the output. Spat also runs with jMAX/FTS on Silicon Graphics or Linux PC platform. A derivation is expected as Pro-Tools TDM plug-in format. Lake DSP releases a software of similar aspect too, the Audio Display Tools. It runs of course on Huron station, it enables to manage the move of 32 sources at maximum within a quite intuitive 3D environment.
|A few practical examples|
"All this is beautiful, but except for architect simulations and electro-acoustic music, what is the virtual acoustic made for ?" are you going to ask. Made for many things ! All those techniques are coming into the sound engineering world. Thus a manufacturer such as Studer puts some virtual acoustic algorithms into its products (for instance the big digital mixer D950). The purpose is to move a sound into a room in a finer way than with a mere panpot, even a motorised joystick. Pity that the "listening" aspect is not taken into account for surround format mix... When the Spatialisator is going to be released as TDM plug-in, many sound engineer and musician should be charmed ! Every year, we attend the Virtual Acoustic Study Days which are organised by the tireless Bruno Suner from Euphonia company. During the lectures there are many practical examples ! By Renault, a deep study from the characteristics and the acoustic properties of the vehicle inside helps to correct the flaws "physically", then "electronically" by programming the first radio system in order to compensate a hole or a hump in the response curve, or by changing the location of the speakers. In the train stations we can increase the understandableness of the broadcasted messages by playing on the signal sent to the speakers : to "tele-command" their directivity in a way. On video-lecture, we can show a virtual meeting table. The sound must follow and be stuck to the image. But when two people speak at the same time on the same transmitting channel, it is a mess : we don't understand anything ! Replacing the people in a trustable environment makes the ear able to stick to the process of an intelligent listening, to re-create the "presence cone".
Furthermore a good spatialization adds comfort to the listening. Last, let's think about a theatre application, a field where the sound mixer are taking pain to join to the live the sound from sentences that are mixed in the studio for audio quality reasons or understandableness. It would be enough to take the "audio signature" of the filming scene, and then after recording the sentences in a very damped studio, making a convolution of the 2 signals. Straight, easy, with few timbre corrections due to the different microphones, who would hear the join !
|And many others !|
In control rooms, the radar screen may be over crowdy. A spatialized alert signal telling that a new plane is coming will focus the attention of the plane controller on the right spot on the screen. An alert is felt more vividly thanks to an audio signal than a visual signal. It is not bound to create a sophisticated acoustic field, but a mere spatialization that enables to put a hierarchy between 6 or 8 simultaneous planes. We try nowadays to give to blind people an audio equivalent of the physical environment that they can't see anymore, in matter of relief and shape recognition. Thanks to headphones and "sound rays", they can have an idea of the room they are in. Concurrently to the synthesis of amazing sound fields, the research by virtual acoustic also want to get as easily as possible the spatialization sound information (for instance, microphones such as Soundfield SPS422). An application example is given by the CEA : when nuclear power plants are demolished, some robots work in radiated areas in which man don't go. Their piloting is done by visual interface for now, owing to the information from the remote controlled cams. But this interface shows its limits in case a tool is lost, it might be very time consuming to find it by turning the cams randomly. A sound correctly captured and transmitted to the operator in a headphone developed for virtual acoustic applications would enable to know straightly where to look for the lost tool, its location as well as its distance...Last but not least, linking the multi-channel sound and the virtual acoustic, Dolby and Lake DSP just worked together to a project called Dolby Headphone, whose first users are the "business class" passengers of the Singapore Airline company from the 1st May. Each passenger already had a phone and its video screen, the choice of the movie to watch (yes, businessmen know about living !). Dolby bought to Lake DSP the licence for playing the 5.1 sound into headphones. A different signal treatment according to the plane the movie is played in can increase even more the sound impression by taking into account the ambience sound print thanks to an accurate compression and equalisation : that of an 747, an A340... First class ! Dolby headphones should be derived soon for the wide public, video and computer at least. As an anecdote, the plane companies pay a licence fee to Dolby at each flight !
|Some Surround headphones|
The market of transposition (we can't talk about "reduction") of multi-channel environments taking advice of the real theatre rooms to the headphones is growing. After all, Sennheiser, Lucas then AKG with its Hearo cleared the field : those wireless headphones with a decoder base and a specialized DSP dealed with algorithms coming from virtual acoustic in order to make the 4 channels of a Dolby Surround mix fit into only 2 earphones - the Lucas even enables to choose the location in the room ! Lake is declining the Dolby Headphones principles in the DVD playing field thanks to its Personal Surround based on home made chip. The good idea are in the vibes : Sony just released the MDR DS5000, so called Digital Surround Headphone System - a direct concurrent to the Personal Surround. As a hazard, the advert brochure is talking about HRTF, binaural listening, early reflections... Now you know what all this is about ! As we said at last, one of the creator of the Spatialisator, Jean-Marc Jot went to E-mu/Creative Labs. Working together with the IRCAM, they are adapting all the techniques of the Spatialisator to the game world. The research phase and the developement is ending, and it should soon be released... This is concurrency for QSound, Aureal and others, which 3D approaches did not dare to take into account the real room effects, but it seems that this may change in a few... So you see, the virtual acoustic is useful ! When are we going to have a synthesis picture movie for instance, using all the resources in the sound field instead of being stuck to the Surround mix...
How to get from a physical model aimed at using simple acoustic formulas to something you can hear ? Nothing is impossible to a mathematician who was fed with Fourier transforms… Without getting through details, the computer takes the physical data to calculate a sounding picture of the room, at a given point location. A mathematical operation called "convolution product" is applied to the sound the most neutral as possible (recorded into deaf rooms, or a sound of a keyboard) owing to the impulse answer of the room (its "sounding picture" we were talking about), then the result is given as a sound modulation that any good programmed DSP can compute, we will feel that the sound, neutral at first, is really reasoning in the room that does not exists ! Let's remind that the revolutionary reverb Sony DRE-S777 that was showed in the SATIS last year is based on that principle. Thanks to the CD-ROM player we can play some room impulses coming from everywhere in the world, in Japan or elsewhere…
How to listen to virtual acoustics ? We could think it works only with headphones in order to avoid the response of the room we are listening in. Each ear gets only the sound that was computed for itself and the listener is "bathing" within the created space. So called binaural, the listening with headphones is not the only one… Being malign we can re-create a trustable ambience with a transaural listening owing to two speaker. But the diaphony problem can't be avoided. The confidentiality left/left-right-right is gone, because the right ear always get a part of what the left speaker is emitting and vice-versa. Thus the information that were finely rebuild are messed. At a precise hearing point location, a solution consists in computing what the ear which is not concerned would have got, then removing that signal from the source. This is called crossed ways cancellation.
As we don't always sit at the same location in front of the hi-fi system, that principle which works only in a single location is too restrictive, but the computer is adapted to a transaural listening : in front of the screen, the head is always at the same location, about fixed between the two speakers : this is the optimal hearing zone ! Anyway, even out of the sweetspot, the principles of the transaural are still ok, the sound does not disappear into a de-spatialized magma : even if the localisation are less precise, the spatialization remains respected.
As we can notice, the 2 types of listening are very different, that's why most softwares let us choose a binaural or transaural format to hear the result of the computing.
The customer head
Even if we don't notice them, the sounds that we hear are filtered because of the mask effect of the head, and the geometrical form of our ear pavilion, and the way to the tympanum… This phenomena which alters the spectrum and the sentences is called HRTF (Head-Related Transfer Function). Even if the computer is calculating a wonderful room ambience, the listener won't believe in it through the headphones ! What shall we do ? Modelling the ear and the altering that it puts on the sound waves ! Thus we have to do some kind of print of the ear of each listened, then producing a file that will be used by the simulation software. As if they were not already busy enough ! Complex and fastidious ? Yes. Not feasible ? No : 3D scanners exist, their price will drop, and we can imagine that we'll find some day a headphone/DSP couple with a free picture of the head of the buyer. Another solution consists in studying a few heads in order to calculate an average model, which should be adapted according to the morphology of anybody.
The reputation of Sony in matter of wireless or wired headphones is not to be proved, they just came wonderfully in the "Surround headphone" field thanks to the MDR-DS5000, which takes Dolby AC-3 sound, so as 5.1 type ! This is the dreaming companion to a Video-DVD player : You just have to plug it in the digital output in order to appreciate the sound track in good conditions without needing a huge Home Theatre amplifier or having to put 5 similar speakers and a bass booster in the living room. This makes painless the quite high price (532 Euro).
At a more technical level, the headphone has got an opened structure, the reception distance is 10 meters, the reception angle is 90°, the accumulators gives about 30 hours of autonomy, the band width is 12 Hz to 24 KHz, and the base having the DSP has got a usual headphone plug with level !
Special thanks to Bruno Suner, Vincent Puig and Olivier Warusfel for their help at writing this article.
© Franck ERNOULD
|Be the first to comment on this article|
Become a member now!
It free and fast and it'd allow you to post news, ads, messages in the forums, change your language/time setting...