Theoretica Applied Physics

BACCH® 3D SOUND Frequently Asked Questions (FAQ)

What is BACCH 3D Sound?
How Does BACCH 3D Sound differ from surround sound?
Does BACCH 3D Sound require special speakers and/or special room treatment?
Does BACCH 3D Sound require a special placement of the speakers?
How does BACCH 3D Sound work?
What are BACCH Filters?
How big is the sweet spot in which the listener can hear BACCH 3D imaging?
Is BACCH 3D Sound compatible with existing stereo recordings?
Is the 3D realism of BACCH 3D Sound the same with all types of stereo recordings?
Is BACCH 3D Sound compatible with analog audio?
Why call it "BACCH 3D Sound"?
How do the BACCH-SP and BACCH4Mac products compare?
How does the u-BACCH Plug-in differ from BACCH4Mac Intro (and other editions of BACCH4Mac)?
How does BACCH enhance the spatial realism in the reproduction of acoustical recordings made in real acoustical environments?
How does BACCH enhance the spatial imaging of "studio-mixed" recordings without altering the sound intended by the mixing engineer?

What is BACCH 3D Sound?

BACCH 3D Sound is a breakthrough audio technology (developed at Princeton University) that yields unprecedented spatial realism in speaker-based audio playback allowing the listener to hear, through only two loudspeakers, a truly 3D reproduction of a recorded sound field with uncanny accuracy and detail, and with a level of high tonal and spatial fidelity that is simply unapproachable by even the most expensive and advanced existing high-end audio systems.

BACCH 3D Sound relies on canceling an undesired artifact, called crosstalk, that occurs whenever stereo sound is played through loudspeakers, thus allowing the 3D cues which the brain needs to hear in 3D, and which exist in abundance in practically all well-made stereo recordings, to naturally reach the brain of the listener.

How Does BACCH 3D Sound differ from surround sound?

BACCH 3D Sound has nothing to do with surround sound. Surround sound, which was originally conceived to make the sound of movies more spectacular, does not (and cannot) attempt to reproduce a true 3D sound field. What 5.1 or 7.1 surround sound aims to do is provide some degree of sound envelopment for the listener by surrounding the listener with five, seven, or more loudspeakers. For serious music listening of music recorded in real acoustic spaces, audio played through a surround sound system can at best give a sense of simulated hall ambiance but cannot offer an accurate 3D representation of the sound field.

In contrast, BACCH 3D Sound’s primary goal is accurate 3D sound field reproduction. It gives the listener the same 3D audio perspective as that of the ideal listener in the original recording venue.¹ Soundstage "depth" and "width", concepts often used liberally in hi-end audio literature to describe an essentially flat image (relative to that in BACCH 3D Sound), become literal terms for BACCH 3D Sound. If, for instance, in the original sound field a fly cicrles the head of the ideal listener during the recording, a listener of that recording played back through the two loudspeakers of a BACCH 3D Sound system will hear, simply and naturally, the same fly circling his or her own head. If, in contrast, the same recording is played through standard stereo or surround sound systems the fly will be perceived to be inside the loudspeakers or, through the artifice of the phantom image, in the limited vertical plane between the loudspeakers.

Does BACCH 3D Sound require special speakers and/or special room treatment?

BACCH 3D Sound will greatly enhance the spatial fidelity of sound reproduction through any loudspeakers. Loudspeakers that have high sound directivity² will give the best and most accurate 3D imaging in a highly refelctive room with little or no sound treatment, as room reflections, which degrade imaging, are minimized by such loudspeakers.

However, even loudspeakers with low directivity (i.e. omni-directional loudspeakers) will give a spectacularly spatial soundstage with BACCH 3D Sound in a typical listening room. As the importance of room reflections is decreased (by increasing the ratio of directed to eflected sound through room treatment and/or higher-directivity speakers and/or nearfield listening) the image’s depth and 3D imaging approach the depth and spatial characteristics of the original sound field.

An ongoing investigation of speaker directivity at Princeton University's 3D3A Lab, has shown that dipole speaker designs, electrostatic speakers, as well as speakers with horns and waveguides offer significant advantages in 3D imaging with BACCH 3D Sound in highly reflective rooms, as they increase the ratio of direct to reflected sound. Abating early room reflections with physical room treatment (i.e. using sound absorbers on sound-reflective surfaces) in a listening room is always beneficial to any audiophile-grade sound system. For BACCH 3D Sound the effect of sound treatment is equivalent to using loudspeakers with high directivity, or listening in the nearfield. The more directive the loudspeakers are, the less sound treatment is needed for BACCH 3D Sound to produce a full and accurate 3D sound image.

Therefore, in a reflective untreated listening room, directive loudspeakers are more desirable. In a well treated listening room with sound-absorbing surfaces, any loudspeakers, even omnidirectional ones, will produce an excellent 3D image.²

Although it may not always be practical for some listeners, a simple way to increase the ratio of direct to reflected sound, and thus further enhance the 3D imaging of a BACCH system, in even a very reflective room, is simply to listen in the nearfield of the speakers.

Does BACCH 3D Sound require a special placement of the speakers?

While previous non-optimized crosstalk cancellation (XTC) methods required the speakers to be placed very close to each other (the so-called “dipole” configuration), this is not at all the case with BACCH 3D Sound. To a zeroth-order, speakers placement (excluding the effects of room reflections) does not matter with BACCH 3D Sound. The 3D image you get through a BACCH filter created for a given speakers configuration will be essentially the same as the image obtained for a completely different (even asymmetric) speakers configuration as long as the BACCH filter corresponding for that configuration is used for listening. With BACCH 3D Sound the speakers are only the conduit of the sound waves to the ears of the listener, who then perceives the location of the original sound sources in the recording and not the location of the speakers.

This is often a startling fact for audiophiles as they are used to a phantom image that is anchored in the speakers and whose dimensions (essentially width) is strongly dependent on the speakers span (the angle the speakers sustain from the location of the listener). This is why the recommended speakers span for regular stereo listening is the 60 degrees of the equilateral triangle (often called “stereo triangle”). With BACCH 3D Sound we recommend the same stereo triangle only because this is what audiophiles are used to and because it allows them to compare the sound through the BACCH filter to that without it (by hitting the bypass button) and hear the significant enhancement to the imaging compared to a case they are well familiar with.

This unique independence of the BACCH-purified 3D image on speakers placement leaves the user with far more latitude in speakers placement (e.g. to satisfy esthetic or practical requirements) than is possible with regular stereo. The main thing to keep in mind while choosing a speaker placement for BACCH 3D Sound, is to minimize reflections of sound from nearby surfaces, as early reflections are the enemy of good imaging.

How does BACCH 3D Sound work?

Imagine a musician who stands on the extreme right of the stage of a concert hall and plays a single note. A listener sitting in the audience in front of stage center perceives the sound source to be at the correct location because his brain can quickly process certain audio cues received by the ears. The sound is heard by the right ear first and after a short time delay (called ITD) is heard by the left ear. Furthermore there is a difference in sound level between the two ears (called ILD) due to the sound having travelled a little longer to reach the right ear, and the presence of the listener’s head in the way. The ILD and ITD are the two most important types of cues for locating sound in 3D and are to a good extent preserved by most stereophonic techniques.³

When the stereo recording is played through the two loudspeakers of a standard stereo system, the ILD and ITD cues are largely corrupted because of an important and fundamental problem: the sound recorded on the left channel, which is intended only for the left ear, is heard by both ears. The same applies to the sound on the right channel. Consequently, an audiophile listening to that recording on standard stereo system will not correctly perceive the musician to be standing on the extreme right of the stage but rather at the location of the right speaker. Consequently the perceived soundstage is mostly confined to an essentially flat and relatively limited region between the two loudspeakers irrespective of the quality and cost of the hardware in the standard stereo system—the 3D image is greatly compromised.⁴

In order to insure the correct transmission of the ILD and ITD cues to the brain of the audiophile, the sound from the left loudspeaker to the right ear, and that reaching the left ear from the right loudspeaker (called "crosstalk") should be cancelled.

The technique of crosstalk cancellation (XTC) has been known since the 1960's and can be applied by filtering the recorded sound through an XTC filter before feeding it to the speaker. This can easily be done digitally. However, until recently, XTC filters have had a detrimental effect on the sound as they inherently add a strong spectral coloration to the processed signal (i.e. they severely change the tonal character of the sound). This is why, until the advent of BACCH, XTC had not been widely adopted by stereo manufacturers and audiophiles.

BACCH 3D Sound is based on a breakthrough in XTC filter design, that allows producing optimized XTC filters, called BACCH filters, that add no coloration to the sound for a listener in the sweet spot (or even outside of the sweet spot). Not only do BACCH filters purify the sound from crosstalk, but they also purify it from aberrations by the playback hardware in both the frequency and time domains.

The XTC coloration problem, and the theoretical underpinnings of the BACCH method are discussed in detail in this book chapter.

The result is a 3D soundstage with a striking level of spatial and tonal fidelity never experienced before by audiophiles.

What are BACCH Filters?

There are two types of BACCH filters. The individualized BACCH filter (sometimes called i-BACCH) is custom-made using in-situ acoustic measurements of the audiophile’s entire listening chain, including his hi-fi hardware, loudspeakers, head, torso and ears. It is designed by sending special test tones through the hi-fi chain and recording the sound with miniature microphones placed at the entrance of the audiophile’s ear canals as he is sitting in a sweet spot of his choice. It takes about one minute to do this measurement. Theoretica’s BACCH-SP processor and BACCH-dSP application allow users to easily produce individualized BACCH filters for their systems.

The universal BACCH filter (called u-BACCH) is the same as i-BACCH except a special dummy head, having microphones in its ears, is used to make the measurements instead of the audiophile’s own head and ears. A u-BACCH filter yields a bit less accurate 3D image then i-BACCH when used by the audiophile himself to listen to his hi-fi system, but is more compatible with other listeners (who do not have i-BACCH filters designed for them). Since the dummy head was designed to represent the sound diffraction characteristics of an "average" human head, the difference between the sound through the two types of filters is subtle but perceivable by a discerning audiophile.⁵

How big is the sweet spot in which the listener can hear BACCH 3D imaging?

Without Theoretica's advanced head tracking technology, BACCH 3D Sound produces a sweet spot that is quite long (many meters long depending on the directivity of the speakers used) in the longitudinal (backward-forward) direction, but quite narrow (about 40 cm), in the lateral (side-to-side) direction. A listener in that sweet spot can move backward and forward by at least 2 meters without losing much of the 3D imaging, but, without the head tracking feature enabled, must stay within the 40 cm side-to-side limit of the sweet spot, which is not a big concern for a stationary listener.

To completely remove the lateral sweet spot limitation, Theoretica has developed a powerful advanced head tracking technology, that uses either an infrared camera for head tracking even in pitch darkness (which works over a range of only about 2 meters), or a regular webcam for head tracking in regular and dim lighting (which works over a range up to more than 8 meters). All of Theoretica's products (i.e. all of the BACCH-SP models and the BACCH4Mac system) come equipped with this very easy to set up head tracking technology, which automatically and accurately (with millimeter accuracy) tracks the head of the main listener and adjusts the sweet spot dynamically in the lateral direction so that the listener is always in the optimal sweet spot as long as the listener's head is in the calibrated view field of the camera.

Any listener outside the sweet spot, would hear regular stereo sound as if BACCH 3D sound processing is bypassed (i.e. BACCH 3D Sound causes no audible alteration to the sound outside the sweet spot, so listeners outside the sweet spot can listen without any detriments, albeit without BACCH's 3D imaging). On the other hand, due to the elongated dimension of the sweet spot in the longitudinal direction, additional listeners can sit immediately in front of or behind the main listener and remain in the BACCH 3D Sound sweet spot, thus enjoying practically the same 3D image as the main listener (although this, admittedly, makes for an unorthodox multiple listener configuration, often seen at BACCH 3D Sound demos in audio shows around the world.)

Is BACCH 3D Sound compatible with existing stereo recordings?

Yes. Unlike other 3D audio techniques all of which require nonstereophonic recording techniques and coding, and many more than two loudspeakers for playback, BACCH 3D Sound is fully compatible with all existing stereo recordings, and requires a single pair of loudspeakers. Therefore audiophiles can delight in re-listening to their existing collections of stereo recordings through BACCH 3D Sound and discover the striking spatial and tonal fidelity that was missing or marred by standard stereo playback.

The vast majority of stereo recordings (and practically all recordings made in real acoustical spaces) contain spatial cues (inter-aural level difference, ILD cues, and inter-aural time difference, ITD cues) that would allow the ear-brain system to perceive the location of the sound source in 3D space, completely independent of the location of the speakers, if transmitted correctly to the listener. The problem (well-known among spatial audio scientists and engineers, but not well advertised in the commercial audio industry) is that crosstalk inherent to speakers-based playback limits the range of these cues at the listener’s position (to essentially the ILD and ITD values for sources located at, and between, the speakers) and the listener perceives mostly an image that is artificially anchored at the speakers with no 3D extent except for some 1-D extent (the phantom image) between the speakers. All that the BACCH filter does is remove the artifice of crosstalk (without introducing any other artifacts) so these inherent spatial cues are perceived correctly by the listener. The brain of the listener does the rest of work by interpreting these (ILD and ITD) cues to locate the perceived sound source in 3D space.

While binaural recordings contains very realistic ILD and ITD cues (since the recording is done with microphones inside the ears of a human-like head), and are obviously hurt by the crosstalk, any acoustic stereo recording is either based on ITD cues (if recorded with spaced omni mics) or ILD cues (if recorded with ORTF, XY, coincident etc. mics) which also get corrupted if played back through speakers without removing the crosstalk. Therefore one should expect the BACCH filter to improve the spatial realism of most such stereo recordings, not only binaural ones.

Of course in the case of music not recorded in a real acoustic place (which is the case of most non-classical/non-jazz music) the ILD or ITD cues are artificial and are due to level and/or time-based panning done by the mixing engineer. This only means that the 3D image is artificial in the first place but it still contains ILD and ITD cues and one should still expect the BACCH filter, which purifies the audio playback from crosstalk, to project/extrude the image (albeit artificially) in 3D space (as to opposed to leave it spatially confined, also artificially, between the two speakers).

The claim that playback of most regular stereo recordings is compatible with, and is greatly enhanced by, BACCH 3D Sound was verified independently by well-known audio critics who listened extensively to various regular stereo recordings though the BACCH filter (see some of the reviews on our Press and Reviews page,

Is the 3D realism of BACCH 3D Sound the same with all types of stereo recordings?

The stereophonic recording technique that is most accurate at spatially representing an acoustic sound field is, incontestably, the so-called "binaural" recording method⁶, which uses a dummy head with high-quality microphone in its ears.⁷ Until the recent advent of BACCH 3D Sound, the only way for an audiophile to experience the spectacular 3D realism of binaural audio was through headphones. Many such recordings exist commercially, and more have recently been made thanks to the recent rise in the popularity of headphones.

BACCH 3D Sound shines at reproducing binaural recordings through two loudspeakers and gives an uncannily accurate 3D reproduction that is far more stable and realistic than that obtained by playing binaural recordings through headphones.⁸

All other stereophonic recordings fall on a spectrum ranging from recordings that highly preserve natural ILD and ITD cues (these include most well-made recordings of "acoustic music" such as most classical and jazz music recordings) to recordings that contain artificially constructed sounds with extreme and unnatural ILD and ITD cues (such as the pan-potted sounds on recordings from the early days of stereo). For stereo recordings that are at or near the first end of this spectrum, BACCH 3D Sound offers the same uncanny 3D realism as for binaural recordings⁹. At the other end of the spectrum, the sound image would be an artificial one and the presence of extreme ILD and ITD values would, not surprisingly, lead to often spectacular sound images perceived to be located in extreme right or left stage, very near the ears of the listener or even sometimes inside of his head (whereas with standard stereo the same extreme recording would yield a mostly flat image restricted to a portion of the vertical plane between the two loudspeakers).

Monaural recordings contain very little, if any, spatial information¹⁰ and thus are not well suited for with BACCH 3D Sound. Therefore, it is best to bypass BACCH processing for mono recordings.¹¹

Luckily, many of well-made popular music recordings over the past two decades have been recorded and mastered by engineers who understand natural sound localization and construct mostly natural-like stereo images, albeit artificially, using realistic ILD and ITD values. Such recordings would give a rich and highly enjoyable 3D soundstage when reproduced through BACCH 3D Sound.

Is BACCH 3D Sound compatible with analog audio?

Yes. The BACCH-SP Sound processor accommodates (balanced or unbalanced) stereo analog inputs and outputs. Since the BACCH filter is a digital one and must be applied in the digital domain, the input analog signal is converted to a high-resolution using audiophile-grade A/D converters inside the processor. The processed digital signal can then be sent out as a digital signal (e.g. for an outboard converter or a digital speaker) or converted to analog using an audiophile-grade D/A converter inside the processor.

Why call it "BACCH 3D Sound"?

The word "stereo" was always associated with three-dimensional objects or effects until its later use, in the 1950s, in the word stereophony, which, ironically, is now a term that does not invoke true three-dimensional sound in the popular mind.¹² In fact, the earliest use of "stereo", which comes from the word Greek στερες, (stereos) which means solid, goes back to the 16th century when the term stereometry was coined to denote the measurement of solid or three-dimensional objects. This was followed by stereographic (17th c.), stereotype (18th c.), stereoscope (19th c.) (a viewer for producing 3D images), and stereophonic (circa 1950). Stereophonic sound, alas, remained a poor approximation of 3D audio until the recent advent of BACCH 3D Sound, which restores to the word stereo its original 16th century 3D connotation.

The epithet "pure" in "BACCH Stereo Purifier" refers to the purifying action of the BACCH filters, which are at the heart of BACCH 3D Sound. A BACCH filter "purifies" the sound from crosstalk for playback on loudspeakers, without adding coloration, and purifies it also from the detrimental effects of spatial comb filtering and non-idealities of the listening room, the loudspeakers and the playback chain.

How do the BACCH-SP and BACCH4Mac products compare?

While the BACCH4Mac Audiophile edition and the BACCH-SP dio share the same algorithm for designing BACCH filters, they are very different products. The former is a computer audio system (catering to the growing population of computer audiophiles) and requires the user to configure a Mac-based system for a given input or audio source, a given clock source, a given DAC etc, as well as connect and configure and sync various hardware components (audio interface, DAC) correctly . It can be configured to give excellent results (and Theoretica offers tech support assistance to help in this) but the results depend on variables that are under direct control of the user. On the other hand, the BACCH-SP is a no-compromise standalone high-end audio component. The single block of aluminum chassis contains advanced state-of-the-art digital circuitry, a dedicated CPU, a state-of-the-art clock, and an ultra-low-noise linear power supply, all optimized to give optimal results (best possible 3D imaging and pristine audio quality) without having the user tweak anything and or extra components (such the additional audio interface required for BACCH4Mac). The same applies to the BACCH-SP adio and Grand models with the additional fact that they also have state-of-the-art A/D and D/A converters.

How does the u-BACCH Plug-in differ from BACCH4Mac Intro (and other editions of BACCH4Mac)?

The u-BACCH Plug-in is an audio plug-in offered by Theoretica’s sister company, BACCH Labs, for Windows or Mac, and requires a third-party DAW (or a similarly acting platform) in which to be instantiated. Like many plug-ins intended for pro audio monitoring, mixing, and content creation applications, the quality of the audio depends on many factors that are outside of the plug-in itself. On the other hand, BACCH-dSP (the Mac application at the heart of all BACCH4Mac editions) is a standalone application, intended for audiophile listening, that can be configured to process audio from any source on the Mac (or connected to the Mac via hardware or the network). BACCH-dSP has full control of the critical audio processes (up-sampling, dithering, buffering, multi-threading etc.) and is optimized to yield the best audio quality possible.

Moreover, the u-BACCH Plug-in relies on a set of generic factory-loaded BACCH filters which assume that the speakers 1) act like perfect sound sources; 2) are perfectly matched in phase and frequency response; and 3) are placed symmetrically with respect to the listener. u-BACCH also assumes no sound reflections to occur in the listening room, and u-BACCH filters were produced using generic head-related transfer functions (HRTFs) of a dummy head. Departures from these ideal conditions, which inevitably occur in real life, would lead to a degradation of the level of crosstalk cancellation (therefore 3D imaging) that can be achieved for a given room, with an individual listener and particular pair of speakers. When these departures are minimized, u-BACCH can approximate the 3D imaging of a measured BACCH filter. On the other hand, BACCH4Mac Audiophile Edition (and higher editions of BACCH4Mac) rely on actual acoustic measurements done with the BACCH-BM in-ear microphones, which take into account the particular relevant acoustic properties of the speakers, the room, and the listener’s head, outer ear morphology, and location. The custom BACCH (c-BACCH) filters that are automatically derived by BACCH-dSP are optimized to yield the highest level of crosstalk cancellation (3D imaging) for a given room, with an individual listener and particular pair of speakers. Additionally, BACCH-dSP incorporates advanced head tracking technology to adjust the custom BACCH filters in real time so that the optimal 3D imaging is maintained as the listener head is moved over a large area (while a u-BACCH filter is restricted to an area limited to a few inches left and right of the central sweet spot). The u-BACCH Plug-in offers the lowest-cost entry to the world of BACCH 3D Sound (on both Mac and Windows machine).

The Intro version of BACCH4Mac, which is limited to u-BACCH filters, offers the same audio technology as the u-BACCH Plug-in but in the form of an optimized standalone audio processing application intended for critical audiophile listening, and for the Mac platform only. It also includes a number of built-in utilities that are useful for audiophile listening (equalizer, Linkwitz-Riley crossover network, binaural recorder, basic file player, etc..) Unlike the u-BACCH Plug-in, BACCH4Mac Intro offers an upgrade path to the higher editions (BACCH4Mac Audiophile, Audiophile+, Pro), which can be upgraded to from the Intro edition for the difference in price.

How does BACCH enhance the spatial realism in the reproduction of acoustical recordings made in real acoustical environments?

Crosstalk cancellation (XTC) techniques, such as BACCH, suppress the sound recorded on the left (right) channel of a stereo recording at the right (left) ear of the listener during stereo playback from a pair of loudspeakers. This cancellation raises the limit on the level of interaural level difference (ILD) and interaural time difference (ITD), above the levels that the speakers can deliver without XTC, allowing more of the correct spatial cues of the recorded sources to be reproduced at the ears of the listeners. (ILD is the difference between the sound pressure (in dB) caused by a given source at one ear minus the pressure at the other ear. The ITD is the difference between the sound arrival times at the two ears. Both are generally frequency-dependent functions.)

This is best illustrated by first considering the case of ILD (the case of ITD will be discussed subsequently) and acoustical stereo mic recordings in a real space.

Most, if not all, of the statements below can be verified by the experimentally-minded reader using the BACCH-BM microphone and the recording and extensive measurement capabilities of the BACCH-dSP application at the heart of Theoretica’s BACCH4Mac packages.

It is insightful to first consider the general case of a Binaural dummy-head recording, then it would become easier to understand the more particular case of regular stereo miking techniques. The latter are of two general types: Type A stereo miking techniques, that rely on ITD to code the stereo image (e.g. ORTF, XY, and other coincident mic techniques, etc..) and Type B stereo miking techniques that rely on the ILD to code the stereo image (e.g. spaced omni or A-B mics, Decca tree, Jecklin disk, etc…)
Binaural Recordings:
The general case of binaural dummy head recording is the most natural (i.e. most akin to how humans hear) as it captures both the ILD and ITD cues, as well as the so-called spectral cues (which are associated with the non-flat frequency response imposed by the diffraction of the sound waves around the torso, head, and pinnae of the dummy, or human, head wearing the in-ear microphones.) This individualized frequency response, which helps the brain-ear system locate sound sources according to the tonal coloration the listener’s particular brain-ear system expects, becomes flatter as the frequency lowers due to the wavelength becoming larger than the objects (the torso, head, and pinnae) the sound is diffracting about. These “spectral cues”¹³ are used by the human ear-brain system, in addition to the ILD and ITD cues, to locate sound sources

Let us consider the case of recording a performer on a stage in a real hall. Using a dummy-head binuaral microphone (or a human wearing in-ear mics) we would capture all three types of cues (the ILD, ITD and spectral cues) on the two channels of the stereo recording. Say the performer is located at an azimuthal angle of 50 degrees to the left of the dummy head. If one measures the ILD caused, at the dummy’s ears, by a sound source located at that location (such calcualtion consists of subtracting the SPL measured at the right ear from that at the left ear) one would find, on the average, about 8 dB (strictly speaking this depends on the distance and frequency, which for the sake of illustration, we would take to be about 10 feet and 1 kHz, respectively). If the performer, while performing (e.g. clapping), moves to the center position facing the dummy head, the ILD would drop to 0. If she moves further to the let, the ILD increases and can easily exceed 8 dB. If she approaches the recording head from the left, the ILD would build up further (due to the enhanced effect of the head shadowing the right ear) and can reach as high as 20 dB if the the performer gets very close to the left ear (since most of the sound will be blocked from reaching the right ear). As a thought experiment, let us record, using the dummy head mic, the performer as she moves (while performing) from the center position to the left position (50 degrees), and then walks to the recording head and whispers in its left ear.

For a stereo playback system to be able to reproduce this entire spatial image accurately from the above-described recording, it must reproduce this entire range of ILD, from 0 to 20 dB at the ears of the listener. We shall now explain why a regular stereo system cannot do so without XTC.

The problem with “regular” stereo playback system (as opposed to one with XTC) is that the maximum ILD it can deliver is that produced by the left (or right) speaker, which for a regular stereo (equilateral) triangle is about only 3-5 dB at 1 kHz (depending on the radiation pattern of the speaker and the distance of the listener from the speakers). This number can be easily verified by putting a test signal (1 kHz sinewave, or pink noise) in the left channel (and only the left channel), measuring the SPL at the left ear and subtracting from it the SPL measured at the right ear. (which can be easily done using sine sweep in BACCH-dSP to produce a plot of the entire ILD spectrum over the entire audio band.). The plot below shows such a typical measured ILD spectrum made with BACCH-dSP through a typical stereo system in the “regular stereo triangle” (+/- 30 degrees) configuration.

The black (red) curve represents the measured ILD spectrum of the left (right) speaker at the ears of the listener. Note that at 1 kHz, the ILD is about 5dB. At higher frequencies, head shadowing (which acts as a “natural XTC”) causes the ILD to rise a bit (as clearely seen in the plot), but the most important content, perceptually, (especially human voices) is below 1 kHz. Therefore, a listener listening to the recording we made above would hear the performer move from the center towards the left speaker, then gets “stuck” at the left speaker as the recorded ILD in the recording builds up above 5dB, since the reproduced ILD at the listener’s ears cannot exceed 5 dB. This should illustrate clearly the fundamental flaw in speakers-based spatial audio reproduction without XTC. [See Footonote¹⁴ for an additional, more subtle, flaw].

XTC can remove this limitation. In particular, BACCH can deliver the maximum possible level of XTC (with zero added tonal coloration) for a given pair of speakers in a given room based on a measurement of the two-point HRTF¹⁵ of the listener with the calibrated BACCH-BM microphone. The resulting ILD spectrum (which, by definition, is the same as the XTC spectrum) is shown in the figure below for the same audio system:

It should be clear from this plot that BACCH can deliver, for the same audio system, 15 dB ILD at 1 kHz, with ILD levels well exceeding 20 dB, at the ears of the listener sitting in the sweet spot (the location where the HRTF measurement was made.) Therefore, the performer would now be perceived to walk all the way from the center, way past the left speaker, to an azimuthal angle of 50 degrees, then walk towards the listener and whisper in his left ear, much like in the real life event. This is the case irrespective of the location of the speakers, as long as the BACCH filter used during playback was designed for that particular speakers-listener configuration. (Incidentally, BACCH-dSP has a simple easy-to-use binaural recorder that allows you to verify the above by quickly making such a recording of a performer walking around you with the BACCH-BM in your ears, then immediately listen to it through a BACCH filter.)

Now that, we hope, this is all clear for a dummy-head recording, it is easy to explain how a similar enhancement to the accuracy of spatial reproduction can be attained for a recording done with a regular stereo mic pickup.
Type A Recordings:
Stereo recordings done with a "Type A microphone" (ORTF, XY, coincident mic techniques) rely on mic capsules with directional pickup patterns (cardioid, hypercardioid, etc.) oriented in such a way to proportionally attenuate the sound of a source located the right (left) side of the microphone as it reaches the left (right) capsule. Therefore, it is mostly capturing the “ILD” (and in the case of a coincident stereo microphone, only the ILD). Although this “ILD” may be a bit different from the actual ILD a dummy head would capture (since the attenuation imposed by the highly directive capsules may not accurately represent the attenuation due head shadowing), it is fully capable of capturing a good part, if not all, of the wide range ( 0-20 dB) of our proverbial walking performer. Again, a stereo system without XTC will only be able to reproduce a small part of that range (up to about 5 dB) and again, the performer will be stuck at the left speakers as soon as she reaches about 30 degree azimuth to the left, and remains there throughout the rest of the recording, while in real life she was walking well past the angle (to 50 degrees) then towards the left side of the microphones. Again, the same stereo system with the BACCH filter whose measured XTC performance is shown in the plot above, can reproduce virtually the entire range of ILD, and thus can give the listener a far more accurate spatial reproduction of the full spatial image.

The difference between a binaural recording done with a dummy head, and a stereo recording done with Type A stereo microphone, when rendered through the same BACCH filter whose XTC performance is shown in the plot above, is that the one-to-one spatial correspondence between the real image and perceived image is more accurate for the former (since the ILD is coded with the attenuation due to a human head shadowing) than the latter (since the ILD is coded with the particular attenuation due to the directivity pattern of the capsules in the Type A stereo mic). However, they both give a spatial image (through the same BACCH filter) that is far more accurate and realistic than of playback without XTC.
Type B Recordings:
Since "Type B" stereo recording techniques (e.g. spaced omnis) use omni-directional microphones, they rely on spacing the two microphone capsules some distance apart to pick up ITD cues (the captured ILD cues being negligible¹⁶). At first look one might (wrongly) suspect that stereo recordings done with such a stereo microphone might not benefit from XTC during playback as much as Type A or binaural recordings, since XTC only affects the level of the sound pressure at the ears. But in fact, the delay between the arrival times of a source’s sound at the left and right capsules of the microphone will not be reproduced correctly at the ears of the listener if crosstalk is present, as explained in the next (long) paragraph.

To understand why this is the case, consider again the performer moving from the center position, where ITD is 0, to 50 degrees azimuth left while clapping her hands. A typical ITD for a source there would be something like 400 microseconds. Now if that recording of the performer clapping at 50 degrees azimuth is played back through a pair of stereo speakers, the level of the clap sound is the same on both channels (because there is little if any ILD captured by the Type B stereo microphone) but the clap on the right channel is delayed by 400 microseconds with respect to the right channel of the recording. Therefore, the sound of the clap will arrive at the left ear from the left channel first, then, after a delay time of t1 microseconds, that same sound wave will reach the right ear (t1 is the ITD that would be caused at the ears of the listener by a source located where the left speaker is located, i.e., at 30 degrees azimuth. It should be clear that t1 would be significantly less than the ITD of a sound source at 50 degrees (400 microseconds)). If, hypothetically, there is no sound from the right speaker, the listener would hear the clap coming from the location of the left speaker (which, at 30 degrees azimuth, is not the correct 50 degree azimuthal location of the real life clap). However, the right speaker will emit the clap recorded on the right channel 400 microseconds after it was first emitted by the left speaker. This same sound will reach the left ear t1 microseconds later (again If, hypothetically, there was no emitted sound from the left speaker the listener would hear the clap coming from the location of the right speaker) causing an ILD of t1, which is wrong in value, and also on the wrong side of the listener! However, due to the Hass precedence effect, the two sounds (emitted from the left and right speakers) are perceived as fused into one, and the ITD caused by the first one (from left speaker) dominates perceptually, as it arrived first, causing the listener to perceive the sound of the performer clapping to be essentially located at the left speaker, which is 30 degrees, and not the correct 50 degrees we seek [see Footnote 17 for a more accurate description of the net effect of the “fusing” of these two sounds].

In contrast, if the crosstalk is cancelled, the left ear (and only that ear) would hear the clap emitted from the left speaker, then the right ear (and only that ear) would hear the clap from the right speaker delayed by 400 microseconds resulting in the correct ITD at the ears, and thus allowing the listener to perceive the correct real-life location of the performer, irrespective of the location of the speakers (again, assuming that the BACCH filter corresponding to that speakers-listener configuration is used).

You can easily verify the above claim that XTC improves the spatial accuracy of Type B recordings using BACCH-dSP: First, make a recording of someone walking speaking or clapping around you while you have the BACCH-BM microphones in your ears. This first recording would be the reference binaural recording. Then make a second recording of the same performance, but this time hold each of the two capsules in each hand, spaced about 6 inches apart. Since the BACCH-BM capsules are essentially omnidirectional, this is tantamount to a "Type B recording" (spaced omnis). After the recrordings are done, play the reference binaural recording while toggling the BACCH filter on and off (which is in BACCH-dSP can easily be done by a tap of the mouse) and observe how the spatial accuracy is greatly improved when the BACCH filter is on. Finally, play the Type B recording while toggling on/off tthe BACCH filter, and you will also hear a significant enhancement in the spatial accuracy when the BACCH filter is on, as discussed above.

In conclusion XTC greatly benefits the spatial accuracy, not only the speakers-based playback of binaural recordings, but also those of Type A and Type B recordings, (and therefore of virtually of all well-made stereo acoustical recordings in real acoustical spaces) as it allows both the ILD and ITD cues to be reproduced more correctly at the ears. If XTC works only for binaural recordings, as some people who have not carefully listened to proper XTC have wrongly surmised, no one would be interested in BACCH, as binaural recordings are a very miniscule fraction of available commercial recordings.

There remains the important question of whether XTC can benefit the spatial rendering of recordings that are produced “artificially” by mixing audio stems (which is the vast majority of popular music). This question is addressed in the following section (to be added very soon).
How does BACCH enhance the spatial imaging of "studio-mixed" recordings without altering the sound intended by the mixing engineer?

In light of the arguments in FAQ #14 above, we can now address the case of “studio-mixed” recordings, which represent the vast majority of commercially available recordings. In such recordings, the mixing engineer (and sometimes with input from the artist(s) and/or producer(s) and, to a lesser extent the mastering engineer,) concoct an artificial stereo image from stems (most often mono stems) mostly through level panning (and, much less often, time or phase panning) between the left and right channels. Mixing to produce a realistic, pleasing or engaging stereo image is an art involving both technical knowhow and esthetic decisions.

Many mixing engineers are truly ingenious masters. It goes without saying that their final product deserves the utmost respect and that a good hi-fi reproduction system should not degrade or fundamentally alter their construct. It is also very true that virtually all commercially available mixed recordings were mixed while monitoring on monitors without XTC.

Depending on the techniques used and esthetic decisions made, these concocted recordings range over a wide spectrum: on one end of the spectrum are recordings aiming to emulate a real acoustic environment (e.g. a jazz club). Let us call this end of the spectrum the “pseudo-realistic end”. On the other end of the spectrum are recordings that have no binding ties to realism, and instead aim to evoke sensations, or project certain esthetic expressions (e.g. the chimes in Pink Floyd’s well-known Time track on their Dark Side of the Moon album). Let us refer to this end of the spectrum as the “artificial end”.

We will now consider what happens when such recordings are played back through XTC.

On the pseudo-realistic end of that spectrum, most of the arguments made in FAQ#14 above hold, to some extent, since the mixing engineer is essentially using at least an analog of ILD and ITD to produce a “realistic” stereo image like a stereo mic would, and all that XTC does is remove the artificial cieling on the ILD and ITD limits imposed by the speakers during playback. Most relevant in this context is reverb. During mixing, reverb is added algorithmically or through convolution with a real space impulse response (with the latter technique yielding far more realistic reverb). In both cases XTC unlocks the perceived reverberation from the speakers and project it into 3D space. It does so because the perception of a realistic 3D reverb is caused by late reflections (the diffuse field) arriving at the left and right ears at almost random arrival times (i.e. with low L-R correlation, in the parlance of acoustics) and without XTC the sound at the right and left ears would be highly corelated since the sound from each of the L or R channels reaches both ears. Such highly L-R corelated sound causes the listener to perceive the reverb to be largely restricted spatially a region that is mostly where the speakers are. It is hard to imagine a mixing engineer who would object to his mix reproduced with a reverb that is more 3D and less “stuck to the speakers” (as long as the tonal and level balance between the direct and reverberant sound is not altered. (BACCH is a patented form of advanced XTC that causes no alteration whatsoever to that balance as described in this standard, but highly technical book chapter.) In fact, one of the most noticeable and striking aspects of listening through a BACCH filter for the first time is the immediate sense of being in a real 3D space due to the higher L-R sound decorrelation that reverb is meant to cause at the ears.

On the “artificial end” of the studio-mixed recordings spectrum defined above, the mixing engineer concocts an image whose panned sources constitute an artificial stereo image that does not aim to be a reflection of a reality, but rather an esthetic or artistic construct. While mixing that image the engineer is choosing to place sources in a space that is largely between the two speakers. However, as is well-known by audiophiles, even a stereo playback system without XTC can image in a 3D, albeit relatively restricted, spatial region around the speaker (often called “the soundstage”). The main reason such imaging occurs without active XTC is because the listener’s head, by shadowing the contralateral ear from the loudspeaker (i.e. the speaker on the opposite side) creates a natural crosstalk cancellation that is highly effective at higher frequencies (i.e. frequencies whose wavelengths are smaller than that of the human head). It should be clear that this natural XTC (which can be seen in the measurement shown in the first plot in FAQ#14) depends on the span between the speakers, the distance between the head and the speakers, the radiation pattern of the speakers, and the extent and relative strength of reflections in the room. A larger speaker span, a shorter distance to the head, a more directive speaker, and a higher ratio of direct-to-reflected sound, all lead to higher values of this natural XTC. This is mainly why different stereo systems in different rooms with different listener-speakers placements, can achieve different levels of “3D imaging”.

A mixing engineer in a given studio with a certain set of stereo speakers concocts a stereo image while hearing a soundstage the spatial extent of which depends largely on the above listed parameters of the particular monitoring setup in the studio. An audiophile playing back the resulting recording through a good hi-fi stereo system at home has generally no way of knowing what these parameters were when the mix was produced, but still strives to get a good measure of a 3D soundstage. Indeed “3D soundstage” imaging of a playback system is one of the holy grails for audiophiles and audio critics. By choosing and tuning his gear and listening room to enhance such soundstage the audiophile does not betray the intent of the mixing engineer as long as the enhancement of the spatial extent of the soundstage does not come at the expense of a change in the spatial balance or tonal content of the recording during playback. It is very possible that the 3D imaging of an audiophile’s playback system has significantly better 3D imaging capability than that used by the engineer while monitoring the mix. No one would object if this were the case, or accuse the audiophile of betraying the engineer's intent.

For such recordings (on the “artificial end” of the spectrum,) XTC cannot pretend to enhance realism during playback since the stereo image was artificially concocted in the first place. However, like in the case of natural XTC, adding more XTC actively to enhance the spatial extent of the soundstage, without altering the balance or tonal content of the recording, (which is the essential characteristic of BACCH XTC) does not strictly betray the intent of the mixing engineer since the spatial extent of the artificial soundstage was not prescribed by him. Of course, this argument becomes more tenuous if XTC leads to extreme spatial panning, which can only happen for hard left or right panned sources in the absence of reflections (e.g. in an anechoic chamber, a hard left or right panned sound source played back through a pair speakers with high levels of XTC, without any ILD or spectral cues added to the sound, would lead to the sound being perceived to be very close to the left or right ears of the listener, as if wearing headphones). Such extreme imaging does not occur in real listening rooms with typical levels of direct-to-reflected sound ratio.

Of course, the level of active XTC during playback can be dialed down (in BACCH-dSP there is an “XTC percentage” slider that allows doing just that) but it should be clear from the above arguments that this is not recommended for acoustic recordings or for recordings on the “pseudo-realistic end” of the “studio-mixed” recordings spectrum. Moving towards the “artificial end” of the spectrum, the question of betraying the original intent of the engineer does indeed become a valid objection, but only to the extent to which XTC alters the tonal character and spatial balance of the recording (which BACCH, by design, does not do at all) and to the extent to which high levels of XTC can result in jarring extremely panned images, which can occur with BACCH but only in near-anechoic environments and with recordings having extremely panned mono images. The latter issue can be addressed by dialing back the XTC level (or in extreme but very rare cases, by bypassing XTC!).

By the "ideal listener in the recoding venue" we mean the actual main stereo recording microphones, or the left and right channels of the stereo master recording, which represent the left and right ear of the ideal listener in the original sound field.
Sound directivity is the extent to which loudspeakers beam the sound towards the listener instead of broadcasting it in all directions around the room.
They are most accurately preserved if the recording is made with a dummy head (see Q&A 7).
Aside from greatly compromising the 3D image, standard stereo (and even more, surround sound), inherently suffers from the problem of comb filtering, which significantly alters the tonal content of sound, and which is due to the interference of sound waves emanating from more than one speaker.
BACCH stands for “Band-Assembled Crosstalk Cancellation Hierarchy”—a name that represents the mathematical filter design method and pays tribute to the great composer with a similar sounding name.
The accuracy is due to the fact that binaural audio preserves not only the correct ILD and ITD cues discussed in Q&A4, but also contains so-called “spectral cues,” which are the effects the torso, head and ears have on the frequency response and which the brain uses, in addition to ILD and ITD cues, to locate sound, especially at higher frequencies.
The spatial accuracy of dummy head recording is only surpassed by recordings made with microphones placed in the listener’s own ears—alas, a rare commodity that would have benefits upon playback for only that listener.
This is because binaural playback through headphones or earphones is very prone to head internalization of sound (which means that the sound is perceived to be inside the head) and requires, in order to avoid this problem, an excellent match between the geometric features of the head of the listener and those of the dummy head with which the recording was made (This problem is surmounted by the BACCH-dHP module of the BACCH-dSP software ). BACCH 3D Sound does not suffer from this problem as the sound is played back though loudspeakers far from the listener’s ears.
The 3D realism is the same although the ability of reproducing a sound source at a location that accurately corresponds to the original location is relatively decreased due to the absence of spectral cues.
Some stereo recordings have overwhelmingly strong mono content and may not yield much of an improvement in spatial realism when processed through BACCH.
They contain no ILD, ITD or spectral cues but may contain some distance cues in the relative magnitude of the ratio of direct-to-reflected sound of the recorded sound sources.
Despite the tendency of some audiophiles and audio reviewers for describing the sound from certain hi-fi components as “three-dimensional” or ”holographic”.
“Spectral cues” are sometimes, not very correctly, called HRTF cues, but strictly speaking the HRTF (head-related transfer function) is a complex (“complex” in the mathematical sense of containing real and imaginary parts, corresponding to the magnitude and phase of the binaural response) function that includes the ILD, ITD and spectral information (all three of them) that completely describes the response of a source located anywhere in 3D space as recorded at the entrance of the ear canal. (More strictly speaking, the HRTF of a single human head is the collection of many (often thousands) of the complex responses of calibrated sound sources located over a large area around the head, and is often measured in an anechoic chamber, or calculated numerically by solving the Helmholtz wave equation on a mesh obtained from an individual head scan).
Aside from the major flaw of a non-XTC playback system not being capable to deliver an ILD much above that of a source located at the speaker, the spatial imaging between the speakers does not correspond to the actual recoded image since the crosstalk changes the level of recorded ILD depending on the location of the speakers and listeners. As a result, if the perfomer walks 50 feet from the center to the left to reach the 50 degree azimuthal location, she would be perceived to have only walked half the distance between the speakers (5 feet in the above example), since the perceived image is limited to the location of the left speaker.
More correctly, in spatial audio this is called the BRTF (binaural room transfer function) and not HRTF, since it includes the response of the actual speakers and the room.
Since the two capsules, being omnidirectional, cannot apply direction-dependent attenuation to the sound from a given source, the most “ILD” they can capture is the difference in SPL that the source is producing at the two capsules, which is about 1 dB if the source is much further than the distance between the two capsules, which is typically the case.
In fact, according to the HASS effect the ITD caused by the delayed right speaker is not completely suppressed, and therefore does have a little effect but, because it is on the wrong side, it pulls the perceived sound location to be a little right of the left speaker, which is even more wrong.)