Martin Rumori*
University of Music
and Performing Arts Graz
Institute of Electronic
Music and Acoustics
Inffeldgasse 10/3, 8010 Graz,
奥地利
Georgios Marentakis
Signal Processing and Speech
Communication Laboratory
Graz University of Technology
Parisflâneur. Artistic Approaches
to Binaural Technology and
Their Evaluation
抽象的
This article approaches binaural interactive environments from an artistic research
看法. Beyond content production, an aesthetic reflection of binaural media
requires pervasive access to digital processing means and ways to employ them in
作品. 然而, most conventional workflows separate media-specific ren-
dering algorithms from object-based scene authoring. Such a delimitation between
binaural engineering and its application restricts transdisciplinary creation that crosses
both areas. This article assumes that the full potential of immersive media cannot
be explored without investigating technology in the context of aesthetic experi-
恩斯. A case study is presented in which artistic references are regarded together
with its technical realization. Contemporary user experience evaluation methods
are adopted and refined with reference to the aims of the artist. A subsequent revi-
sion of the work is discussed along with implementation adjustments and conceptual
alterations. The presented project shall exemplify how artistic research may bridge
scholarly investigation and the creative acquirement of media technology beyond its
mere application. A point of departure shall be provided for further cross-fertilization
between engineering and the arts by identifying mutual implications.
1
介绍
Binaural technology is widely used to provide an immersive spatial expe-
rience in virtual and augmented reality applications, sonification, auditory
展示, or assistive technologies. Unlike in the era of dummy head stereophony
in the 1970s (保罗, 2009; Krebs, 2016), the need to wear headphones cannot
be considered an obstacle to the acceptance of binaural audio anymore. 上
相反, awareness of the perceptual implications of stereophony in headphone
reproduction such as in-head localization is increasing due to the ubiquity of
headphones, as are efforts to achieve externalization in binaural technology (cf.,
例如, Gilkey & 安德森, 2015).
Spatial listening takes place in cross-modal correspondence to other senses,
such as vision and proprioception. Spatial audio technology potentially distorts
cross-modal congruency insofar as the sensory relation of listener and environ-
ment is reconfigured (Niklas, 2014). Depending on the application, binaural
audio may support or contradict real or synthesized visual stimuli; augment
or replace the existing auditory environment; or be conceived without explicit
Presence, 卷. 26, 不. 2, 春天 2017, 111–137
土井:10.1162/PRES_a_00289
© 2017 by the Massachusetts Institute of Technology
*Correspondence to rumori@iem.at and georgios.marentakis@tugraz.at.
Rumori and Marentakis 111
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
112 PRESENCE: VOLUME 26, NUMBER 2
cross-modal references as a listening-only experience.
Interfaces to binaural systems usually follow the object-
based approach, 那是, the auditory scene is described
by multiple virtual sound sources. Each source com-
prises underlying audio material and metadata on the
source’s properties, most significantly its location in
space or a dynamic spatial trajectory. Binaural systems
thus serve as a universal rendering means similar to peri-
phonic projection technologies such as Ambisonics or
Wave Field Synthesis (Bleidt, Borsum, Fuchs, & 韦斯,
2014). 作为结果, engineers concentrate on the
development of rendering algorithms that are prefer-
ably independent of the actual audio material, 尽管
so-called “content producers” deal with the selection
and the arrangement of audio objects in the scene with
limited insight into the algorithms. Either side cannot
easily cross the boundary between scene description and
rendering.
在本文中, we approach binaural technology from
an artistic research perspective. Artistic research is receiv-
ing increased attention in the past decades because it can
explore areas of knowledge production that are hardly
accessible by formal scientific strategies (Frayling, 1993;
Borgdorff, 2006). In contrast to the common separa-
tion of “content production” from scene rendering, 我们的
aim is to investigate aesthetic implications of technology
and tools in an integral process of creation. This aim is
pursued by means of an artistic case study on interactive,
audio augmented environments using binaural technol-
奥吉, which is thoroughly investigated on a theoretical as
well as on an empirical level. A design iteration including
user evaluation is presented.
A central aspect of the case study is an artistic, 自己-
referential reflection of binaural rendering by composing
a navigable scene out of objects that are themselves
binaural recordings. The perspective to the material,
whether it is heard as an egocentric recording or as an
exocentric environment, may be changed by the listener
through interaction in the scene.
In the following, issues of binaural technology in
the context of artistic creation are introduced (看
部分 2). 随后, evaluation methods in inter-
active arts are reviewed in Section 3. The case study
Parisflâneur is described in Section 4. The formal
evaluation of the installation is presented in Section 5,
followed by a report on artistic consequences and a sub-
stantial rework of the case study in Section 6. 中央
aspects of the project are further discussed in Section 7
before the article concludes (参见章节 8).
2
Binaural Technology and
Artistic Creation
在这个部分, genre-related terms of installation
and environment, notions of interactivity, reactivity,
and immersion are presented as understood in this
文章. 此外, cultural aspects of headphone
listening, implications of object-based scene composi-
的, and aesthetic properties of binaural recordings are
illuminated.
2.1 Installation and Audio
Augmented Environment
The terms installation and environment have
several meanings in the arts and in mixed reality. Art
theory uses both terms to refer to certain art forms that
emerged since the 1960s within conceptual art. 这
form of installation is discussed in relation to the preced-
ing form of environment, although the term installation
was previously used to describe the arrangement of exhi-
bitions in general (Bishop, 2005). An environment is
characterized by the incorporation of the existing sur-
rounding into artistic reflection as it is without explicitly
designing it (Reiss, 1999). Both the installation and
the environment involve a strong spatial component
that substantially codetermines the significance of the
spectating, 那是, the experiencing body.
In the context of virtual and augmented reality, 环境-
ronment seems to carry the notion of a surrounding
that is created for exploration by spectators or listeners.
While the terms virtual environment and, less exten-
sively, augmented environment are widespread and often
used synonymously for virtual and augmented reality
分别, they only rarely refer exclusively to the
auditory domain. 在这种情况下, 相当, the term audi-
tory virtual environment is used (Novo, 2005). 声音的
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Rumori and Marentakis 113
augmented environment appears to have been coined
in the context of the LISTEN research project, a pio-
neering attempt to superimpose binaural interactive
soundscapes on everyday surroundings (Eckel, 2001;
Warusfel & Eckel, 2004).
Throughout this article, installation denotes the phys-
ical and conceptual structures that have been conceived
to form an artifact. In terms of virtual and augmented
现实, an installation would comprise the technical
and medial means as well as conceptual references that
convey objects of aesthetic experience. The entirety of
experienced entities, be it abstract virtual structures or
physical objects, are considered to be the environment.
2.2 Interactivity and Reactivity
Interaction has been commonly understood on a
mostly technical level in terms of human–machine com-
munication. Very much in contrast to this notion, 国际米兰-
activity has been thoroughly investigated in many areas,
among them social and communication studies, 媒体,
and art theory (看, 例如, 詹森, 1998; Ryan, 2001;
Paine, 2002; Franinovic & Salter, 2013). In the con-
text of interaction with sounds, connections have been
formed to the notions of embodiment, enaction, 和
tacit knowledge. 在本文中, a minimal distinction will
be drawn between reactivity and interactivity in order
to identify two different modes of human–machine
communication with respect to aesthetic experience.
Reactivity denotes the action of dynamic mecha-
nisms controlled by interface input whose effects can be
characterized as compensating, or negating. An exam-
ple of reactivity is the use of head tracking in binaural
audio systems “to decouple the position of the source
from head movements” (Bronkhorst, Veltman, & van
Breda, 1996, p. 23). 换句话说, the system com-
pensates for the listener’s movements so as to convey the
impression of a stable, exocentric auditory environment.
Interactivity, 相比之下, implies a participatory func-
tion that is conceptually assigned to the spectator’s or
listener’s input. In binaural audio, this quality is often
indicated by interaction with the presented auditory
scene such that it is intentionally altered, 例如,
when controlling a virtual sound object by gestures. 经过
相互作用, an installation reveals a certain behavior such
that the resulting effect can be related to the input.
2.3 Immersion
The discourse on immersion is similarly widespread
and heterogeneous as that on interactivity (Ryan, 2001;
Grau, 2003; Reck, 2007). In the realm of virtual and
augmented environments, the level of achievable immer-
sion or presence is commonly regarded as directly
correlated to the fidelity of mediating technology, 任-
德令, and projection techniques (看, 例如, Bimber
& Raskar, 2005; Lentz, Assenmacher, Vorländer, &
Kuhlen, 2006; Schärer & Lindau, 2009).
相比之下, a mental state similar to immersion was
introduced into literature theory as early as 1817 作为一个
“willing suspension of disbelief” (Coleridge, 1898).
Coleridge’s understanding implies
1. an active contribution of the recipient (“willing”)
和
2. that reaching this mental state depends on cognitive
processing and not only on perceptual stimulation
(例如, by means of a narrative).
因此, the mere fidelity of the synthesized stimuli is
neither a measure for the likelihood nor for the depth of
possible immersion, even with today’s VR technology
(Ryan, 2001; Ettlinger, 2008). Such a point of view has
been assumed in conceiving the case study described in
this article (参见章节 4), especially when estimating
the immersive potential of largely simplified and tech-
nically “inaccurate” rendering techniques and that of
non-reactive binaural recordings.
2.4 Headphone Listening as a
Cultural Technique
Binaural audio often gives rise to surprising expe-
riences and disturbs most lay listeners’ expectations.
Although externalization is sought by providing per-
ceptual cues that reference unmediated real-world
情况, the cultural technique of headphone listen-
ing superimposes the faculty of natural spatial hearing
(Rumori, 2017乙). In cultural theory, cultural techniques
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
114 PRESENCE: VOLUME 26, NUMBER 2
are acquirements that evolved in sociocultural partic-
ipatory practice, 那是, they are not achievements of
个人. Classical examples are lighting up a fire,
reading and writing, or storytelling.
The cultural technique of headphone listening has
been developed through conventional stereophonic sig-
nals meant to be played on loudspeakers but that are
delivered as ear signals. We are trained to abstract a spa-
tial image from an in-head stereo base localized between
our ears, and we expect to resort to this ability when
putting on headphones. Binaural externalization con-
tradicts this expectation until it is learned and associated
with headphone media.
The perturbance of expectation is even stronger in
the case of reactive binaural images enabled by tracking.
迄今为止, prevalent headphone listening skills include
the abstraction from an egocentric perspective, 那是, A
common reference of head and sound projection, 哪个
turns and moves along with the listener. 然而, 这
conveyed auditory image is considered exocentric, 为了
例子, the orchestra in the concert hall acoustics of a
classical recording.
2.5 Object-Based Soundfield
Representation
Common binaural systems are mostly approached
as black boxes by so-called “content producers.” This
is not only because rendering algorithms are hidden
behind scene authoring tools, but also because sophis-
ticated adjustments would require detailed technical
知识. From an artistic point of view, object-based
scene composition is but one approach for represent-
ing an auditory environment, not necessarily the most
适合所有情况. Listening is an integral aes-
thetic experience that takes place at various levels, 仅有的
one of them being a cognitive scene decomposition (比照.
Bregman, 1990). Depending on the artistic aim, empha-
sis may be on overall sonic qualities of an environment
or on the spatial expansion of sonic phenomena, not pri-
marily on the particular layout of scene objects to each
其他.
Advanced rendering systems incorporate source direc-
tivity models beyond simple point sources (Lindau,
Klemmer, & Weinzierl, 2008). 然而, any model
imposes assumptions and approximations. Like any
other representation, object-based scene description
cannot be considered a transparent, lossless capture of
an arbitrary complex auditory environment; 相当, 这是
an interpretation that may or may not fit artistic aims.
Artistic approaches to binaural technology in the sense
of media art do not only seek to convey a certain spa-
tial experience, but at the same time, they reflect on the
conditions and the anthropological implications of simu-
lated auditory environments. According to Reck (2007,
13), the examination of media as a subject matter tar-
gets “art through media” rather than “art with media.”
为此原因, creative exploration should endeavor to
exceed mere “content production.” In the context of
binaural audio, this implies that rendering algorithms
should not be considered as independent of the aesthetic
experience of “content”; 相当, they are a part of the
内容.
2.6 Binaural Recordings
Like scene decomposition, binaural recordings
also imply an interpretation of an integral environment.
尽管如此, interpretation as conducted by recordings
takes place on the level of perspective and behavior, 这样的
as recording direction or perspective motion, 而不是
that of separation into sonic objects.
Binaural recordings play only a marginal role in
today’s virtual and augmented reality applications due
to their static nature, both with respect to scene manipu-
lation as well as subsequent dynamic perspective changes
(例如, upon tracking). Research is performed to over-
come these limitations, 例如, based on source
separation out of recorded scenes or higher-order spa-
tial recordings (阿隆, Sheaffer, & Rafaely, 2015; 刘,
王, Jackson, & 考克斯, 2015). 再次, such techniques
involve models of decomposition and rendering whose
implications have to be considered.
From an aesthetic point of view, binaural record-
ings pose a very effective way of conveying a complex
spatial auditory image, especially to support anecdo-
tal or narrative artistic aims. Many qualities that are
hard to simulate, such as the overall atmosphere of an
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Rumori and Marentakis 115
环境, are preserved with high fidelity in record-
英格斯. Depending on the context, this property may
be rated higher than disadvantages like the described
immutability of binaural recordings.
3
Evaluation in the Context of
Interactive Art
The component of interactivity in interactive works
requires that artists are actively concerned with how the
audience interacts with the artwork, and possibly with
彼此, through the artwork (Edmonds, 2010).
An increasing body of work investigates, 所以,
无论, 如何, and when user-based evaluation could
be involved in the development process of interactive
艺术.
Evaluating interactive art usually includes a combi-
nation of usability testing and qualitative inquiry into
experiential aspects of interaction. Contrasting results to
the artistic intention may have been an obvious step to
take in order to complete the evaluation. Although rel-
evant to the installation usability, such an approach may
not be appropriate for the evaluation of other experi-
ence aspects, such as emotional and aesthetic responses.
Difficulties arise because artworks rarely contain or aim
to define a specific type of experience. 反而, they aim
at creating an experience that is open to interpretation.
There is value in such interpretations being incompat-
ible with design expectations or inconsistent among
visitors.
Specific attention has been given to joy of play,
乐趣, and enjoyment, and designing for ludic
engagement (Gaver, 2002). This type of engagement
may relate to the priorities of artists and has been iden-
tified as a pragmatic goal to evaluate in interactive
artworks (Morrison, 米切尔, & Brereton, 2007).
Creative engagement has also been associated with
interactive art experience. It emerges when participants
interpret unconventional interaction situations, 其中
their intentions and expectations are not aligned with
the system responses. It may be accentuated by grad-
ually drawing visitors in, by using interactive elements
of different levels of complexity in order to attract but
also to maintain interest (Bilda, Edmonds, & Candy,
2008).
历史上, important contributions to evaluation in
the context of interactive art has emerged in the Beta
空间 (穆勒, Edmonds, & Connell, 2006; 穆勒
& Edmonds, 2006). Evaluation methods including
direct user observation or observation using video,
contextual interviewing, structured interviews, or ques-
tionnaires have been extensively applied (Edmonds,
Bilda, & 穆勒, 2009; Candy, Amitani, & Bilda, 2006;
Bilda, 科斯特洛, & Amitani, 2006; Marentakis, Pirrò,
& Kapeller, 2014). A particularly relevant contribution
is the video-cued recall method, which may be seen as a
dynamic feedback evaluation method (Sengers & Gaver,
2006). In this method, participants are asked to recall
what they experienced, while watching their actions in a
视频.
Dynamic feedback methods are important for eval-
uating open works. This relates to giving information
obtained from the users back to them for interpretation,
in longitudinal studies that involve a diverse population.
Designers then should weigh the results to justify their
conclusions and make sure that they do not abdicate
the responsibility for the eventual success of the system
(Sengers & Gaver, 2006). Application of dynamic feed-
back for the purpose of evaluation could be observed in
Boehner, Sengers, and Warner (2008), resulting in sig-
nificant deepening and shift in the designer perspective,
when dealing with ineffable aspects of user experience
as in the case of designing aesthetics. The co-discovery
方法, in which groups of users visited an installation
while their interactions were recorded, could be used
to address social aspects of the interactive art experi-
恩斯 (Höök, Sengers, & 安德森, 2003). More open
技巧, such as shadowing, interviewing and infor-
mal discussion, and questionnaires, brought together by
the grounded theory method (Glaser & Strauss, 1967),
have been used by Morrison et al. (2007). A significant
collection of evaluation works that address interactive
sound art has appeared in Candy and Ferguson (2014)
and Candy, Edmonds, and Ascott (2011). Recent
approaches emphasize the use of artistic techniques
in order to address ineffable aspects of user experience
(Marentakis, Pirrò, & Weger, 2017).
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
116 PRESENCE: VOLUME 26, NUMBER 2
4 Artistic Case Study Parisflâneur
The case study Parisflâneur constitutes an inter-
active audio augmented environment that can be
experienced as a sound installation. The case study has
been implemented iteratively in an integral process as an
experimental system for binaural environments. 案子
study underwent several reworkings. One of its incarna-
tions received a more formal evaluation of its interaction
设计 (参见章节 5).
4.1 描述
数字 1. Parisflâneur with schematic visualization of “sonic hats.”
Parisflâneur invites listeners to put on headphones
and to navigate freely in virtual auditory space. 这
installation does not interfere with the visual or haptic
perception of the visitor apart from marking the bound-
ary of the active installation area on the floor, 和
requirement to wear headphones. Seven binaural field
recordings from Paris and around are featured, 哪个
have been carried out by the creator. They represent dif-
ferent urban and rural sound situations. The installation
contrasts the static nature of such field recordings with
their perception as dynamic point sources in a binaurally
rendered, interactive auditory environment.
When entering the environment, a complex auditory
scene is heard. The scene is formed by the seven binaural
field recordings that are rendered as seven spatially dis-
tributed, monaural virtual sources. By walking around
while listening, the recordings comprising the scene
may be identified and localized with gradually increasing
肯定.
The listener may interact with each of the sounds
by moving his or her head below a certain threshold
and then raising it again. In the installation narrative,
this interaction gesture is introduced to the listeners
as “ducking” at the exact position of a virtual source
as if one would crawl under an imaginary “sonic hat”
suspended in space and “put it on.”
The “ducking” interaction results in a gradual cross-
fade from the interactive scene to the corresponding
binaural recording while the rest of the virtual sources
disappears. The selected sound track migrates from a
dynamically rendered monaural point source toward a
static binaural recording that is therefore not reactive to
the listener’s movements.
The switchover of the heard environment’s spatial
reference to the listener’s head is reflected in the installa-
tion narrative as the sonic hat “being carried.” The point
source in the virtual scene corresponding to the active
recording is moved along with the listener. This change
happens in the background, hence inaudibly, as long as
the hat remains put on. Only when the hat is “taken off”
by performing the inverse ducking gesture, the virtual
scene will become audible again, 即, from the new
listening perspective, and the recording will be left at
its new location. 这边走, the scene may be completely
rearranged (见图 1).
4.2 Aesthetic References
Parisflâneur refers to acoustic ecology and anec-
dotal music by the incorporation of mostly unprocessed
field recordings. Anecdotal music (musique anecdotique)
has been coined by French composer Luc Ferrari start-
ing from the 1960s. With his compositions, he invited
listeners to pick up associations from the recordings
and develop their own stories while listening (Pauli,
1971). This notion contrasts with musique concrète, 这
predominant contemporary genre of composing with
recordings of that era, which required sound qualities to
be received as such, without reference and therefore in a
mode of reduced listening (Kane, 2015).
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Rumori and Marentakis 117
One of Ferrari’s concepts around anecdotal music
is the diapositive sonore. In his egalitarian understand-
英, he claimed that audio recordings should be carried
out and used just like photographs are taken in holi-
天 (or lantern slides, as he puts it). A slide mounted
for projection may be understood as both a medium that
conveys an image and an object that can be regarded
in various ways and from different perspectives. 巴黎-
flâneur reflects such medial properties metaphorically
by staging binaural recordings in different perceptual
上下文, both as immersive images and as objects in a
virtual “magic lantern.”
The integration of objects in a binaurally rendered
scene that are in turn binaural recordings implies
multiple nested levels of abstraction. At each level,
a conceptual inversion of perspective takes place.
A binaural recording captures an essentially exocentric
经验, 那是, auditory entities in the outer world.
Listening to the static recording, 然而, makes the
experience egocentric because the auditory scene is tied
to the listener’s head. This is true for all recordings, 甚至
conventional stereophonic ones. Media-specific cultural
techniques of listening enable cognitive abstraction
of exocentric references (参见章节 2.4). In Paris-
flâneur, the inner exocentric reference of the field
recordings is complemented by that of the outer virtual
scene in which the recordings are collapsed to auditory
物体. The provided way of changing perspectives by
interaction provokes the prospect of a second-order
introspection (cf., 例如, von Foerster, 2002), 哪个
always includes both directions: While listening to an
egocentric binaural recording, the listeners may imagine
being immersed in the exocentric recorded situation
as well as watching themselves from the perspective
of the rendered, likewise exocentric rendered scene.
Exploring the virtual scene in turn allows for the meta-
perspective of an uninvolved spectator to whom the
listener is an exocentric scene object just like the sound
来源. The metaphor of “carrying a sound hat” links
the egocentric binaural recording with a correspond-
ing egocentric sound object in the scene. Egocentrism
allows for “changing the world,“ 那是, reorganizing
the scene, which is not perceivable directly but requires
retrospection or a second-order meta-perspective. 作为
one of the many classical examples from fine arts for
such a self-referential conceptual structure, 工作
Authorization by Michael Snow (1969) may be named.
The playback of binaural recordings in Parisflâneur
is looped, each starting at a random position. 自从
files have different lengths, the resulting auditory scene
composed of the seven situations is constantly chang-
英. Conceptually, the installation avoids any intentional
montage but rather seeks the aleatoric recombination of
the recordings’ narratives.
4.3 Implementation
4.3.1 Aesthetic Lab for Binaural Research.
Parisflâneur has been implemented in close conjunc-
tion with the development of an experimental system
for binaural audio. The design process was iterative and
driven by the requirements of the case study. Conceiv-
ing the binaural rendering strategy was an integral part
of the artistic evolvement of the installation. 审美的
considerations focused in particular on the close rela-
tion of rendered and recorded sound material and their
transitions. To pursue this reflection practically, an open
framework was required rather than a ready-made scene
rendering system (比照. 部分 2.5). Major requirements
for the framework included:
1. the possibility to explore different rendering
techniques to make their implications explicit,
2. access to simulated physical properties of the vir-
tual space, including virtual room acoustics and
dynamic distance behavior,
3. support for speculative rendering approaches
that initially appear less applicable in terms of
communications engineering,
4. the integration of static binaural source material,
5. the exploration of “non-binaural” effects such as
deliberate in-head localization.
Rather than a monolithic system, loose building
blocks have been implemented in the SuperCollider lan-
规格 (The SuperCollider Book, 2011) for experimental
勘探. Additional software packages such as the
Jconvolver convolution engine have been adapted and
integrated using the Jack audio connection kit (Rumori
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
118 PRESENCE: VOLUME 26, NUMBER 2
数字 2. Hemispherical speaker setup in IEM Cube.
& Hollerweger, 2013). In the following, implementa-
tion details are described up to the state that has been
formally evaluated.
4.3.2 Virtual Ambisonics. Initial versions of
Parisflâneur were conceived using a three-dimensional
virtual Ambisonics approach and free-field impulse
responses (Noisternig, Sontacchi, Musil, & Höldrich,
2003). The implementation was based on the Super-
collider AmbIEM package1 and the KEMAR set of
free-field head-related impulse responses (HRIR).2
Most intermediate versions were realised in third-order
Ambisonics, some also in fourth order. Room acoustics
was initially simulated using a simple shoebox model for
首先- and second-order reflections.
4.3.3 Room Impulse Response Measurements.
After some dissatisfaction with room acoustics simula-
的, the integration of measured binaural room impulse
responses (BRIR) was sought. The motivation was to
transfer the convincing spatial quality known from bin-
aural recordings to rendering, considering room impulse
responses as a form of recorded acoustics.
1. https://github.com/supercollider-quarks/AmbIEM
2. http://sound.media.mit.edu/resources/KEMAR.html
数字 3. Impulse response measurements in IEM Cube.
Most system development took place in the Cube
space of the Institute of Electronic Music and Acous-
抽动症 (IEM) in Graz, 奥地利, which is equipped with a
24-channel hemispherical speaker setup (见图 2).
Using a dummy head in the sweet spot, the speaker
setup in IEM Cube was measured using swept sines
(Farina, 2000). The idea was to use this speaker system
as a virtual Ambisonics layout for binaural rendering,
including the captured acoustics of the space.
The impulse response measurements have been
carried out in different versions: with and without
absorbing first-order floor reflections by placing baffles;
and each with the dummy head mounted in two differ-
ent heights, at the level of the lower speaker ring of the
hemisphere and slightly raised (Rumori, Hollerweger, &
Cabrera, 2010). The latter was meant as an experimental
compensation for frequent unintended elevated local-
ization of auditory events in binaural environments (看
数字 3).
4.3.4 Virtual Ambisonics Using Room Impulse
Responses. In the virtual Ambisonics rendering
系统, the measured room impulse responses (看
部分 4.3.3) replaced the KEMAR free-field HRIR s
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Rumori and Marentakis 119
File player 1 (… 氮)
binaural
2
Tracking data input
Crossfade
egocentric
exocentric
Application control
2
1 (left channel of binaural)
Distance model
low pass filter/attenuation
1
Ambisonics encoder
3rd order, per−source order weighting
16
from other player(s)
binaural
2
File player 1
4−channel mono compat.
File player N
4−channel
Tracking data input
2
Crossfade
egocentric
exocentric
Application control
16
2
2
Ambisonics rotator
3rd order, 全球的
16
Ambisonics decoder
3rd order, global order weighting, CUBE layout
24
BRIR convolution
measured IEM CUBE speaker system
2
X : number of channels
Distance model
gradual, overlapping three−way panning
low pass filter/attenuation
2
2
1
Circular panning
monophonic
12
Circular panning
stereo (var. angle >= 0)
L
63
右
63
Stereo panning
variable width (0=mono)
2
BRIR convolution
reverberant, far field
HRIR convolution
anechoic, near field
2
2
Binaural output
headphone compensation
Binaural output
headphone compensation
X : number of channels
数字 4. Signal flows in initial and revised versions of Parisflâneur.
for convolving the decoded signals of the virtual loud-
speaker setup to a reverberant binaural signal. In a strict
理解, this approach is valid only for an immo-
bile listener, as the impulse responses were measured
from only one central listening position and orientation.
然而, the implementation using static BRIR s has
been combined with a tracking system. The positions
and synthesized distances of virtual sound sources were
corrected according to the listener’s movements, 尽管
the reverb information turned and moved along with the
listener’s head due to the static convolution (Rumori,
2017A). This implementation preserved the measured
overall room acoustics with low technical complexity,
although the relatively long convolutions demand some
processing power.
4.3.5 Resulting Signal Flow. The signal flow of
the resulting implementation is shown in Figure 4(A).
The binaural recordings are played back from disk
and provide the sound source material. The signals
are routed to the crossfade block, which forwards
them either directly to the binaural two-channel bus,
or to the rendering stage as a monaural signal. 这
implementation uses only the left channel of the
recording as the monaural signal for encoding (看
部分 6.3.1 for a discussion). Fine-grained control on
the crossfade transition is provided through break-point
功能.
In the rendering branch, a distance-dependent gain
control and low-pass filtering is applied. Attenuation
and filtering parameters along with their effective ranges
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
120 PRESENCE: VOLUME 26, NUMBER 2
were determined by informal subjective evaluation.
实际上, both the amplitude attenuation and the sim-
ulation of air absorption were required to be much
stronger than in reality in order to support navigation
and orientation solely by listening.
The resulting source signal is subsequently encoded
to the Ambisonics domain. Encoding also involves a
distance-dependent per-source weighting of Ambisonics
orders for increasing the apparent source width when
closer approaching the virtual source. Beyond that,
tracking input is required at the encoding stage, 作为
relative encoding angles also depend on the listener’s
位置 (翻译), not only the rotation.
All sources’ encoders add their output to an Ambison-
ics bus. 随后, the Ambisonics signal is rotated
according to tracking data. As the listener may walk
around freely in the tracking volume, a head rotation
also involves a translation in almost all cases. 康塞-
经常地, per-source angles have to be adjusted each cycle
anyway due to simultaneous translation. The rotator’s
advantage of constant computational demand indepen-
dent of the number of sources is therefore less effective
这里.
Integrated with Ambisonics decoding, a global order
weighting takes place that allows for experimenting with
different decoding optimization strategies. The decoded
virtual speaker signals form the input to the convolution
matrix of room impulse responses, whose binaural out-
put is mixed into the global binaural bus. A headphone
compensation based on inverted dummy head measure-
ments is applied before the signal is played back (Schärer
& Lindau, 2009).
5
评估
5.1 方法
In order to proceed with the evaluation, the artist
was asked to complete a questionnaire. The answers
were used to guide the formation of the research ques-
tions that would be addressed by the evaluation. 在里面
questionnaire, the artist commented on:
1. his intentions,
2. the imagined visitor experience,
数字 5. Photo of an evaluation participant experiencing the
Parisflâneur installation.
3. the development process and the internal workings
of the installation,
4. the context within which the work has been
发达,
5. the expectations from the evaluation process, 和
6. expressed whether he felt the intentions have been
fulfilled.
The analysis of the questionnaire was augmented with
consulting other writings of the artist and experiencing
the installation (见图 5).
As described in Section 4, interaction in the installa-
tion is based on a metaphor that relates soundscapes to
“hats,” which a user can put on, walk with, and leave at
a specific location. The metaphor serves to communicate
the ducking gesture.
A listener may therefore interact in the following
方法:
1. Explore: move among sounds in order to find out
what sounds are there and plan on how to engage
跟他们.
2. Listen: either to the soundscape composed or to
each sound field recording alone.
3. Resynthesize: perform planned actions to rearrange
the soundscape. 在这个意义上, successful interaction
should be demonstrated by a fruitful exploration of
the soundscape, detailed listening to soundscapes of
兴趣, and resynthesis according to the desire of
个人.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Rumori and Marentakis 121
4. Contemplate: walk while immersed in a given
soundscape.
桌子 1. Flow of the Evaluation of the Parisflâneur Installation
during the Klangräume I Exhibition. Durations Are Suggestive
In discussion with the artist, the following targets
were set for the evaluation:
1. the success of the ducking metaphor, both at
a conceptual level as well as at the level of its
执行,
2. the listening experience, the success with which the
intended soundscape was delivered using binaural
audio and the resulting listening experience, 和
3. the interaction between the two aspects, 那是, 这
ability of the interaction metaphor to support user
engagement with the field recordings.
The video-cued recall method, already described in
部分 3, appeared to be particularly appropriate for
the evaluation. This is because it allows users to directly
comment on their experience, and in this way both the
experiential as well as the usability aspects that have been
targeted by the evaluation could be investigated. Pilot
tests showed, 然而, that the application of the video-
cued recall technique was challenging because of the lack
of detailed audio feedback in a normal video record-
英, which may limit the ability of listeners to recall
their experience. This appears to be a general limiting
factor when considering the application of the video-
cued recall method to sound installations, especially in
the case of installations using binaural technology over
headphones. To avoid this problem, the audio output
of the installation was routed directly to the camera and
recorded in sync with the video stream. This resulted
in synchronous audiovisual information in the video
记录, which was deemed sufficient for the per-
formance of the recall method when tested in the pilot
实验.
To further facilitate the evaluators, a number of open
questions was prepared. These addressed the experi-
恩斯, the ways people discovered and interacted with the
installation, possible difficulties with the ducking tech-
nique, the way people thought the installation works,
the appropriateness of the hat metaphor, and general
comments relating to what visitors liked and did not
like in the installation. These questions were used to
guide the discussion at the end of the video-cued recall
氮
11
任务
Duration
Documented Interaction with
30 min.
the Installation
Audiovisual cued-recall
Followup Questions
Filling in Scales
30 min.
15 min.
10 min.
method in case the topics were not raised by the visitors
while recalling their experiences. 最后, 参与者
were required to fill in a number of rating scales at the
end of the session, which are shown in the Appendix.
In the scales, participants assessed crucial aspects of the
installation experience that could be presented using
one-dimensional semantic-differential or Likert scales.
桌子 1 shows the flow of evaluation. The results are
illustrated in Figure 11.
5.2 程序
The installation was set up and staged in the
rehearsal room of the MUMUTH building at the Uni-
versity of Music and Performing Arts Graz for a period
of one week, during which time it was also open to the
public at given timeslots. Data were acquired in morning
sessions in which participants were invited to assist with
the evaluation of the installation according to the pro-
cedure outlined in Table 1. Visitors were provided with
information with respect to the installation. 尤其-
拉尔, they received a copy of the public text that normally
accompanied the installation, and the ducking gesture
was explained to them. 此外, they were allowed
to ask questions as they went along.
The resulting dataset consisted of 3 hours of video
材料, 4.5 hours of audio material in interviews,
36,732 transcribed words, plus scales and tracking data
from the eleven visitors.
Interviews and video recordings were analyzed using
an iterative coding process. The coding scheme that
emerged allowed us to understand what major aspects
were experienced in the installation. The coding scheme
went through several iterations.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d
/
.
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
122 PRESENCE: VOLUME 26, NUMBER 2
5.3 结果
The Results section is broken down into three
subsections that deal with the presentation of the cod-
ing strategy of text and video data, 分别, 和
results from the scales completed by the participants.
5.3.1 Coding of Text Data. Data were coded
in seven different categories: 目的, Auditory Experi-
恩斯, Visual Experience, 状态, Purpose, Interaction, 和
概念. These were defined as follows:
1. 目的: excerpts referring to the objects that gave
rise to the experiences of participants,
2. Auditory and
3. Visual experience, 分别: excerpts referring
to the sensory experiences reported by partici-
pants, 那是, to qualities of the auditory and visual
stimulation,
4. Purpose: excerpts revealing the associated purpose
的 (observable) 行动,
5. Concepts: excerpts referring to conceptual
协会,
6. 状态: descriptions of emotional states, 和
7. Interaction: excerpts in which participants
described interaction with the installation.
Common codes within each category can be inspected
图中 6. 数字 7 presents the frequency with which
excerpts were assigned to codes belonging to each cate-
gory. 此外, 图中 8, the number of excerpts
that were coded within each category for each person is
depicted. It appears that discussions were dominated by
references to interaction with the installation, the objects
that generated sensory experiences, and the auditory
experience of the visitors. At a second level, 参与者
described the purpose behind different actions they have
进行, their emotional state, and concepts that
emerged while interacting with the installation. 最后,
participants referred little to visual aspects, 偶尔
mentioning the absence of any visual stimulation. 这
picture is consistent across participants.
数字 8 showcases common codes within each cat-
egory according to their frequency. Most references
to the Object category were related to the content of
the binaural recordings in the installation. Most ref-
erences in the Auditory Experience category related
to the experience of listening to and interacting with
binaural audio, in particular this aspect of binaural-
性. Visitors were quite impressed by the sound quality
that can be achieved with this type of technology. Vis-
itors commented extensively on the changes in the
auditory experience that the installation offers. 这
included descriptions of changes in the auditory feed-
back depending on the different states one encountered.
Particularly, the contrast of “hat on” to “hat off” was
interpreted as a difference between foreground and
background by some participants. The issue of dis-
tinguishability between foreground and background
sounds was also raised relatively often. This referred to
difficulties in finding out whether sounds belonged to
the overall soundscape or to individual binaural record-
英格斯. Most actions that participants performed were
motivated by a will to discover how the installation
works and to engage with the different sounds that
could be experienced in the installation. At a second
等级, there was some hypothesis testing in relation to
the functionality of the installation, 那是, what hap-
pens when one gets out of the tracking area, 和
repercussions of carrying sounds around and attempts to
manipulate the way things appeared.
Concerning different experienced states, visitors
referred most often to statements of appeal, relating to
what they liked and disliked in the installation. The lis-
tening experience was very much part of the discussion,
implying strongly that visitor attention was very much
directed to hearing. Participants referred to the extent
to which they believed they have discovered all material,
and the different feelings that they experienced while
interacting with the installation.
The installation experience gave rise to a variety of
概念, by way of association to the sound material
through personal experience or interpretation. Refer-
ences to the city of Paris featured prominently in the
participants’ comments. 此外, 参与者
commonly referred to the experience of listening into
something that they encountered accidentally. 这
was often associated with a feeling of listening without
having been invited to or having asked for permission.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
Rumori and Marentakis 123
(A) 目的
(乙) Auditory Experience
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
(C) Purpose
(d) 状态
(e) 概念
(F) Interaction
数字 6. Common codes within each category.
124 PRESENCE: VOLUME 26, NUMBER 2
y
r
哦
G
e
t
A
C
r
e
p
s
G
n
d
哦
C
#
我
0
0
6
0
0
5
0
0
4
0
0
3
0
0
2
0
0
1
0
O bject
A uditory
P urpose
状态
C oncepts
Visual
Interaction
数字 7. Total number of codings assigned to each category.
P11
P10
P9
P8
P7
P6
P5
P4
P3
P2
P1
Auditory Experience
Concepts
Interaction
目的
Purpose
状态
Visual Experience
0
0
5
0
0
1
0
5
1
0
0
2
数字 8. The frequency with which excerpts were coded in each
category depending on each participant.
本质上, participants felt that there was no way for
their presence to be registered within the space in which
the recording occurred. This brought up associations
of “intruding” or eavesdropping, which arguably testify
to a high degree of realism in the binaural recordings,
but also delineate the boundary between the installation
space and the recording fields.
Concepts were also raised with respect to the experi-
ence of interacting with the installation, hearing into, 或者
diving in. Participants often discussed ideas relating to
the technical setup of the installation. Certain sugges-
tions were made, 例如, to spread the installation
over a larger space, or to use light in order to indicate
the location of the different sounds. 尤其, A
number of visitors complained that the scene was too
dense, in the sense that more space should have been
available for moving around. They claimed that a larger
environment would have made it easier to locate sounds.
Concerning interaction, much of the discussion was
directed to the difficulties the visitors faced. These were
mostly related to using the ducking technique. 一些
participants complained that they could not always dif-
ferentiate between the “hat on” and “hat off” states,
and that they could not always control when ducking
would take effect. Most participants also mentioned that
it takes time to get to grips with the ducking technique,
and that the help of the evaluation crew was important
to clarify how this is done. One participant mentioned
that it may be useless to have the ducking technique,
given that one can simply go close to a recording and
listen to it quite well. Participants often did not realize
what happened to the sound once they took off their hat.
Another difficulty was to find out how to intentionally
relocate sounds and rearrange the spatial arrangement
of the scene. This was not evident to all participants and
it only became clear to some after interacting with the
binaural recordings for a while. 最后, there were dif-
ficulties isolating sounds of interest in case they were
too close to other sounds. 此外, a few par-
ticipants wondered what happens when sounds end
up very close to each other and mentioned that this
leads to difficulties in engaging with the sound they
wanted to hear.
5.3.2 Coding of Video Data. Video recordings
of participants interacting with the installation were
also coded and summarized. 数字 9 shows how the
movements of participants were distributed. 数字 10
displays the corresponding duration each specific action
was performed.
我
D
哦
w
n
哦
A
d
e
d
F
r
哦
米
H
t
t
p
:
/
/
d
我
r
e
C
t
.
米
我
t
.
/
e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d
我
F
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d
.
/
F
乙
y
G
你
e
s
t
t
哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3
s
e
C
n
A
t
s
n
我
y
r
哦
G
e
t
A
C
50
40
30
20
10
0
crouching
ducking
putting
常设
stepping
说
turning
w alking
数字 9. Average frequency with which different movements
occurred in the videos. Error bars correspond to standard error of the
意思是.
)
米
(
t
y
r
哦
G
e
A
C
H
C
A
e
n
我
e
米
时间
我
8
6
4
2
0
crouching
ducking
instructor
looking
putting
running
常设
stepping
turning
w alking
数字 10. Average duration participants engaged in actions
associated with each movement category. Error bars correspond to
standard error of the mean.
It is evident that most of the time was spent either
standing or moving at slow speed, 对应于
either listening to a specific binaural recording or explor-
ing the space in order to locate a new one, 分别.
Rumori and Marentakis 125
The third most common action was to perform the
ducking technique and to walk at a normal speed in the
room.
5.3.3 Scales. The subfigures in Figure 11 illus-
trate the results obtained using the aforementioned
scales. A χ2 test was used to examine whether the distri-
bution of the responses can be modeled by the uniform
分配. A p < 0.05 value indicates that the afore-
mentioned hypothesis can be rejected, and thus that the
tendency observed in the graph reflects a tendency in
participants’ responses.
Overall, the scale results provided the following
findings:
1. Mixed responses concerning the usability of the
ducking technique were obtained, whose usability
was average.
2. Visitors felt immersed when listening to the indi-
vidual soundscapes but there was no particular
agreement concerning immersion in the case of
listening to the virtual scene composed by all
sounds. A Mann–Whitney test showed that a sig-
nificant shift in felt immersion occurred when
participants listened to the individual soundscapes
(Z = 3.371, p-value < 0.001).
3. Participants occasionally noticed the head-
phones but on average they were not found
annoying.
4. Visitors could orient and move toward sounds of
interest with relative ease.
5. The impression from the installation was overall
positive.
6. Participants questioned how the installation works,
but were not convinced they had found a plausible
answer.
5.4 Summary of Evaluation Findings
Participants reported mostly auditory experiences,
with some general remarks on visual aspects. Dynamic
and static spatial auditory aspects permeated most of
participants’ comments. Since one aim of Parisflâneur
was to test participants’ interpretation of this differ-
ence, this result is rather unsurprising. However, this
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
p
v
a
r
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
a
_
0
0
2
8
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
126 PRESENCE: VOLUME 26, NUMBER 2
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
p
v
a
r
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
a
_
0
0
2
8
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Figure 11. Results of the scale analysis of Parisflâneur. A ∗ indicates a significant deviation from a uniform distribution at the p < 0.05 level.
Rumori and Marentakis 127
difference was seldom cast in terms of a difference in
scene spatial dynamics, but it was rather described as a
difference in what constituted foreground and back-
ground as a function of listening location. The boundary
between foreground and background was, however,
blurry as participants were unsure whether sounds
belonged to a given binaural recording or the overall
soundscape. This may explain why some participants
mentioned a lack of consistency between the different
scenes.
Spatial and auditory exploration of the different
auditory scenes was often referred to as a fun and excit-
ing activity and most participants got a sense of being
able to enter and leave auditory scenes. The spatial
boundaries of auditory scenes were not always easy to
locate. This limited the extent to which participants
guided their movement by memorizing sound loca-
tions. On the other hand, being inside an auditory
scene (i.e., wearing a hat) was sometimes perceived as
unpleasant. This may be related to the lack of head-
movement cues while “wearing a hat.” Participants
often referred to a feeling of peeking into an auditory
scene, a sense of overhearing or “voyeurism.” For some,
this may have also contributed to the sense of unpleas-
antness. Participants became aware of the possibility
to relocate sound hats (thereby constructing narra-
tives with Parisflâneur), though this was rarely used
intentionally.
Interaction with Parisflâneur develops through learn-
ing the interaction mechanism, and eventually becoming
able to put on and take off “sound hats” (auditory
scenes) and switch between a dynamic and a static 3D
audio experience. Learning to perform the ducking ges-
ture was, however, not easy and this was arguably the
major obstacle to exploring Parisflâneur.
The metaphors employed to describe the sound
hats are revealing. The everyday nature of the binaural
recordings was communicated well. The topic of Paris
and of strolling was taken up positively, and it informed
the associations people reported to a large extent. All the
associations, concepts and experiences reported by par-
ticipants refer not so much to the headphones and the
physical relationship to Parisflâneur, but to the virtual
objects perceived in the installation (i.e., street scenes,
traffic, music, etc.), which elicit emotions and associa-
tions of Paris. This becomes evident in images reported
by participants, which center on such themes. However,
the cable used for the headphones to a certain extent
hindered the latter activity.
6 Artistic Consequences
of the Evaluation
As a reaction to the evaluation process and its find-
ings summarized in the previous section, Parisflâneur
has been largely reworked by the artist. Most signifi-
cantly, the interaction scheme was adapted to a different
conceptual take on the aims of anecdotal exploration
and aesthetic experience (see Section 6.1). As a conse-
quence, the concept of “sound hat” for an enterable
virtual source has been replaced by the less tangible
“sound island.”
Further changes include the principal redesign of the
binaural rendering (Section 6.2) and numerous refine-
ments to the processing of soundfiles for both binaural
presentation and their transformation to virtual sources
(see Section 6.3).
6.1 Modifications to the
Interaction Model
The ducking gesture for “putting on” and “tak-
ing off sound hats” has been dropped. The evaluation
revealed that performing the gesture was generally
too difficult and that unsuccessful interaction attempts
lacked a clear indication of the reason for failure (see
Section 5.3.1). It can be assumed that most unsuccess-
ful ducking gestures did not catch the virtual source
position precisely enough. Furthermore, the seman-
tics of the gesture becomes ambiguous when multiple
sound sources are very close to each other in the scene
(cf. Section 5.3.1). The threshold for ducking, so far a
fixed medium value, did not fit all body heights. This
would have called for some improvement, for exam-
ple, through either an individual calibration step or an
adaptive behavior. Finally, the headphone cable that was
described as impeding the strolling imagination may be
even more cumbersome when ducking (see Section 5.4).
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
p
v
a
r
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
a
_
0
0
2
8
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
128 PRESENCE: VOLUME 26, NUMBER 2
As an alternative to the ducking gesture, a less peculiar
interaction scheme was sought that did not enforcedly
use the vertical dimension. Consequentially, the aim
for decoupling interaction from exploration was dis-
carded. Two concurrent ways of entering sound islands
have been conceived, one more controllable and one
automatic mechanism that incorporates a rudimentary
dynamic system.
6.1.1 Entering Sound Islands by
Approaching. A sound island is entered when the lis-
tener approaches its virtual position very closely (less
than 0.15 meters) for more than 3 seconds. Within this
radius, the redesigned rendering presents the virtual
source as a conventional monophonic or slightly open-
ing stereophonic signal localized in the listener’s head
(see Section 6.2.3), which provides a distinct transition
to the externalized binaural version of the recording.
Entering sound islands by approaching picks up the
Movement category of stepping to locate new sound
sources, one of the most frequent modes performed
according to the evaluation (see Section 5.3.2).
6.1.2 Entering Sound Islands Due to Being
Passive. Whenever the listener is “calm,” that is, the
speed of linear movement is less than 0.1 meters per
second, his or her avatar in the scene accumulates a cer-
tain “gravitational” force on the virtual sources. When
a certain threshold is reached, the source closest to the
listener is “attracted” and starts to draw nearer. The
distance-based mechanism for entering the sound island
(see Section 6.1.1) takes effect as soon as the source is
close enough.
It is important to notice that “gravity” here does not
mean a correctly modeled physical effect of interdepen-
dent masses. Rather, only the closest source is influenced
based on a certain velocity curve in terms of the current
distance.
Entering sound islands due to being passive picks up
standing still, the other most frequent Movement cat-
egory in the installation (see Section 5.3.2). According
to the evaluation, standing still suggests that a specific
sound island is listened to without an attitude of spa-
tial navigation or exploration. This attentive mode is
accounted for by interpreting the lack of action as a
trigger for interaction, causing the closest source to be
entered and the scene to be rearranged.
6.1.3 Leaving a Sound Island. A sound island
is left whenever the listener exceeds a distance of
0.15 meters from the source position, taking into
account the gravity mechanism at the same time. Hence,
as long as the listener is sufficiently active, the sound
island remains immobile and is left as soon as the listener
moves away. If accumulated gravity indicates a passive
listener, the virtual source will continue to be attracted
and “sticks” to the listener’s head, causing the virtual
scene to be rearranged just like wearing a “sound hat”
in the earlier implementation. Moving with more than
0.3 meters per second reduces accumulated gravity, until
it goes below a threshold that causes the sound island to
be detached from the listener.
6.2 Binaural System Redesign
Major modifications were applied to the render-
ing of the virtual sound scene based on trial-and-error
experimental sessions and incremental subjective assess-
ment (Rumori, 2017a). Most significant changes include
the reduction from three-dimensional rendering to two
dimensions (see Section 6.2.1), switching from virtual
Ambisonics-based interpolation to a simpler circular
panning (see Section 6.2.2), and a complete redesign
of the distance model (see Section 6.2.3). The resulting
signal flow of the revision is shown in Figure 4(b).
6.2.1 Two-Dimensional Rendering. In earlier
versions, the sound scene was rendered using a three-
dimensional approach (see Section 4.3.2). Based on
reports from the evaluation, subjective experience of the
artist and theoretical reflection, rendering was reduced
to two dimensions.
Consideration was triggered by the “ducking” inter-
action gesture, which has been dropped in the reworked
version (see Section 6.1). Without this form of inter-
action, the vertical dimension turned out to be barely
relevant for exploring the installation by listeners that
now move solely on a plane. As nonindividualized
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
p
v
a
r
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
a
_
0
0
2
8
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Rumori and Marentakis 129
impulse responses are used, the perception of source
elevation cannot be expected to be very accurate anyway,
while azimuthal perception should work relatively well
(Wenzel, Arruda, Kistler, & Wightman, 1993). Further-
more, two-dimensional rendering corresponds better
to the geographic map metaphor of the reactive virtual
scene that in turn refers to the notion of strolling (as in
flâneur).
Finally, the decisive contrast in Parisflâneur with
respect to dimensions and their experience is not that
of two- or three-dimensional rendering of the virtual
scene, but the different representations of the binau-
ral recordings. They are collapsed to a monaural point
source, that is, to a zero-dimensional entity in a physi-
cal understanding for both two- and three-dimensional
exocentric scene rendering. Only when presented as
sound hats or islands, without any rendering taking
place, do they unfold their fully three-dimensional
spatiality.
6.2.2 Circular Panning. After having dismissed
three-dimensional rendering, the use of Ambisonics has
been dropped as well. The localization of virtual sources
appeared to be a recurring issue in the evaluation, as it
is generally in virtual and augmented environments.
A simpler implementation framework was sought that
allows for experimentation with different approaches
for distance, room acoustics and angular resolution, also
with respect to computational performance.
The reworked implementation uses two concentric
virtual speaker rings representing two levels of source
distance and of apparent reverberation. Circular pan-
ning between two adjacent virtual speakers resulted
in crossfading two impulse responses, respectively, for
interpolation. For the far field, a ring of 12 speakers
is formed by a subset of afore-mentioned BRIR s of
IEM CUBE (cf. Section 4.3.3). Similar restrictions apply
as described for Ambisonics rendering with BRIR s
in Section 4.3.4: the virtual room acoustics remains
attached to the listener’s head while the source positions
in virtual space are updated accordingly. The near field
is represented by a ring of 36 speakers that correspond
to free-field impulse responses taken from the Sound-
ScapeRenderer software (Geier & Spors, 2012). Hence,
the azimuthal resolution is 30 degrees in the far and 10
degrees in the near field.
The linear interpolation of impulse responses and the
azimuthal resolution have not been formally evaluated.
Although there are much more advanced interpolation
methods, the relatively resource-effective approach cho-
sen was informally assessed as a significant improvement,
probably rather due to the distance-dependent amount
of reverb than the linear interpolation.
6.2.3 Distance Levels. In addition to the two
distance levels represented by virtual speaker rings, two
kinds of conventional stereophonic techniques have
been integrated for the notion of a very close source and
of one inside the listener’s head. All levels overlap for
smooth, gradual transitions (see Figure 12).
Far-field rendering by the reverberant 12-channel
speaker ring is fully active for source distances of more
than 1.5 meters. With further increments in source dis-
tance, the signal is attenuated and low-pass filtered.
Below this distance, the 36-channel speaker ring is
used for rendering an unechoic monaural source. If the
source is approached closer than 0.5 meters, it splits into
two stereo channels of the processed recordings (see
Section 6.3.1), rendered as two sources whose opening
angle (i.e., stereo base) gradually increases when further
advancing.
At an even closer distance (less than 0.2 meters), the
virtual source starts to enter the listener’s head. This is
conveyed by exploiting in-head localization of coinci-
dent signals. The two rendered stereo channels converge
to a mono version of the processed recordings, which
is directly played to both headphone channels without
any impulse response convolution taking place. For a
very small area around the center (less than 0.1 meters),
the source diverges into its two stereo channels again,
this time played directly to the headphone channels just
like the monaural version before. The anticipated stereo
signals serve as a bridge between a conventional stereo-
phonic headphone playback and a static, egocentric
binaural recording.
6.2.4 Audible Tracking Volume Boundary.
Audible feedback was used to indicate the active track-
ing area boundary. Outside the area, the playback fades
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
p
v
a
r
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
a
_
0
0
2
8
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
130 PRESENCE: VOLUME 26, NUMBER 2
1.5m
far field, reverberant
near field, anechoic
0.5m
rendered stereo source
L
0.2m
R
0.1m
in−head monaural
in−head stereophonic
Figure 12. Sound processing as a function of distance level in the revised version of Parisflâneur.
to a generative modulated noise texture that shall have
a minimal dynamic appearance rather than purely static,
“technical” noise.
importance of the sonic quality and the techniques
employed in revised rendering, a more thorough
procedure of sound file processing was sought.
6.3 Processing of Binaural Recordings
A central quality of Parisflâneur is that the same
binaural recordings are used in two different ways: As
they are, static binaural recordings perceived from an
egocentric perspective, and as sources for binaural ren-
dering in an exocentric virtual auditory scene. This poses
the challenge of processing binaural source material for
beneficial rendering.
In earlier implementations of the installation, the
recordings were used nearly unprocessed for binaural
presentation, whereas for the monaural virtual sources
only the left channel of each recording was used with
minimal equalization (see Section 4.3.5). Picking only
one channel results in a spectral disbalance of con-
tralateral sounds, as higher frequencies are increasingly
attenuated by the listener’s head (cf. Rumori, 2017a).
Most evaluation participants showed a high willing-
6.3.1 Binaural Recordings as Virtual Sources
for Rendering. The reworked rendering approach
presents the underlying recordings as a stereophonic
pair of virtual sources and as conventional stereophonic
signals on headphones, in both cases with gradual
transitions to, or from, a monaural presentation (see
Section 6.2.3). Thus mono compatibility is required,
which is usually not the case for binaural material due
to phase problems especially at lower frequencies. A
so-called Blumlein shuffler was applied to the binaural
recordings, which turns such phase differences into level
differences at low frequences. The recordings were addi-
tionally equalized considering the coloration introduced
by binaural rendering (i.e., the impulse responses) and in
comparison to the original binaural versions to support
smooth transitions.
6.3.2 Soundfiles for Binaural Playback. Pro-
ness to engage with the recordings, both in terms of
interaction and of concentrated listening; and their feed-
back frequently mentioned the changes of auditory
experience between the egocentric and the exocen-
tric modes (see Section 5.3.1). To reflect the apparent
cessing for binaural presentation involved mainly
common sound engineering tasks. Some subjective
per-file equalization and a spectral balancing among
the different sound files have been performed using
the same headphones as in the installation and with the
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
p
v
a
r
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
a
_
0
0
2
8
9
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Rumori and Marentakis 131
headphone correction filter in effect. The dynamic range
of some recordings has been slightly reduced by com-
pression so as to better adjust them to each other for
combination in the virtual auditory scene. This com-
pensation addresses the dominance of certain elements
in the recordings which might have partly caused their
reported confusion with scene objects during evaluation
(see Section 5.3.1).
7
Discussion
The case study Parisflâneur has been presented
as an artistic approach to researching binaural technol-
ogy that is neither restricted to engineering of rendering
means nor to mere “content” production. Borders
between the two, as established by common scene
authoring workflows, are constantly crossed. The com-
bination of a rendered binaural scene and static binaural
recordings indicates a meta-perspective on both kinds of
media rather than conveying a particular spatial auditory
image. The transition between the two involves changes
of reference and perspective: while the rendered exocen-
tric scene shall appear tied to the listener’s surrounding
and navigable, the nonreactive binaural recordings
are presented egocentric to the listener’s head and are
carried along.
Similarly, the rendering of the virtual scene does not
coherently model the physics of sound radiation as
usually suggested by basic principles of communica-
tions engineering. As described in Section 6.2.3, virtual
sources in the very near field, and those coincident with
the listener, are displayed using conventional stereo-
phonic techniques on headphones. The decomposition
of the monaural signal into two channels at very close
distances in fact makes use of a binaurally displayed
virtual loudspeaker pair with a dynamically increasing
stereo width.
The previous virtual Ambisonics implementation
sought a similar effect of apparent source widening
in a coherent way by gradual attenuation of higher-
order components with decreasing source distance
(see Section 4.3.5). Finally, only the omnidirectional
component (zeroth order) remained, resulting in the
same signal on all virtual loudspeakers after decoding
and indicating that the source has been “entered” in
the virtual scene. In the reworked implementation, the
climax of reaching the source is indicated by in-head
localization through coincident ear signals, that is, using
conventional stereophony directly on headphones rather
than on rendered virtual speakers.
Obviously, the psychoacoustic effect of an omnidi-
rectional signal on a multichannel speaker setup that
is rendered for headphone listening is fundamentally
different from an in-head phantom source. The latter
was chosen because of its metaphorical correspondence
to the constellation in the scene and its strong bodily
experience that does not occur in nature, except of a
few bodily noises such as chewing (Rumori, 2017b).
This way, the two notions of an exocentric virtual scene
object “inside” the listener’s head and the listener
“inside” an egocentric sound island in terms of a binau-
ral recording are embodied by two extremes of auditory
phenomena.
The mixture of stereophonic and binaural tech-
niques in addition to the combination of egocentric
and exocentric binaural presentation make the reworked
implementation even less compatible with common ren-
dering methods for universal scene descriptions than the
first. Instead, the idiosyncrasies of medial representation,
be it those of a recording, those of binaural rendering,
or those of stereophony, are not considered side effects
but intrinsic, inseparable qualities. Apart from “content”
production, the installation could not have been real-
ized without access to the signal flow and the rendering
algorithms.
The notions of “inside” and “outside” are closely
related to the interaction model of the installation,
which underwent a major revision based on evaluation
results (see Section 6.1). Most significantly, the former
“ducking” gesture turned out to be hard to perform
successfully. A very insightful finding for the artist was
the participants’ frequent perceptual distinction of vir-
tual scene and static binaural recordings (i.e., “wearing a
sound hat”) as “background” and “foreground” rather
than “outside” and “inside.” Equally enlightening was
the observation that some participants had difficulties
in delineating point sources in the virtual scene by their
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
p
v
a
r
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
a
_
0
0
2
8
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
132 PRESENCE: VOLUME 26, NUMBER 2
spatial arrangement but instead mixed elements of them
according to narrative correspondences in the record-
ings (see Section 5.3). Such findings were not expected
by the artist, who in turn discovered his own attitude to
the installation as being much more determined by its
technical realization than by auditory experience, despite
the focused emphasis on acoustic ecology and anecdotal
music as mentioned in Section 4.2.
One possibility to address the limited usability of the
interaction technique could have been to introduce an
even simpler and easier to perform gesture, comple-
mented by an explicit auditory feedback on the success
of interaction (in addition to the change between the
two kinds of binaural display). Nevertheless, the fore-
ground/background notion and the narrative-based
mixing of anecdotal elements from different recorded
tracks would not be considered. The artist opted for
the opposite way. In his mind, the described usabil-
ity issues should not be interpreted as a weakness in
conveying clarity of the installation’s functionality.
Instead, participants’ comments of this kind may indi-
cate an auditory awareness in terms of ecological rather
than analytic listening despite the documented will
to discover working principles of the installation (see
Section 5.3).
With the newly conceived interaction scheme, enter-
ing and leaving sound islands may appear even more
difficult to perform or prevent deliberately. An explicit
gesture like ducking is not needed any more; just com-
ing close to the virtual source or being passive for some
time suffices, which is prone to unintentional transi-
tions. On the other hand, the new mechanism is not
conceived to be intuitively mastered but rather to “hap-
pen” even if the listener is not consciously aware of
it. Unlike the ducking gesture to be performed ver-
tically, that is, orthogonal to two-dimensional spatial
orientation, entering a sound island by approaching
or due to being passive is much more entangled with
exploration. Self-acting changes to the auditory experi-
ence without prior interaction indicate a certain “life of
its own.”
In a similar vein, the playful reflection of egocen-
tric and exocentric spatiality by perspective changes
is affected by the revised interaction mechanism.
Evaluation results, among them those with respect to
deliberate changes of perspective by ducking, notions
of meta-perspective and second-order introspection as
introduced in Section 4.2, and the mostly unexplored
feature of interactive scene reorganization, suggest that a
cognitive map of Parisflâneur’s technical functionality is
rarely developed by lay listeners even if supported by an
introductory explanation (see Sections 5.3.1 and 5.3.2).
Only expert listeners of binaural audio may be able to
grasp the described functionality by immediate experi-
ence. However, the reported associations of intruding
or eavesdropping on the recorded situations illustrate
the successful abstraction from the egocentric recordings
towards a metaperspective of an exocentric scene that
includes the listeners themselves. The revised interac-
tion scheme shall further direct the listener’s attention
away from a technical engagement with the installa-
tion in favor of exploration by ecological listening, and
metaphorical attributions of sense to perceived sonic
changes.
A consious choice was to provide visitors with infor-
mation about the installation prior to the evaluation
and to give them the opportunity to ask questions as
they went along. In this sense, the evaluation does not
explicitly test the ways visitors would attribute meaning
to the installation spontaneously. This highly interest-
ing question was defined to be outside the scope of the
evaluation. Instead, we wanted to approximate typical
visiting conditions. We assumed that a visitor would
typically read the descriptive text and glimpse primary
modes of interaction by observing others. Further-
more, questions were allowed in order to observe the
points that needed clarification, if any, and help vis-
itors with exploring the installation without getting
stuck. We felt that both behavioral patterns and emerg-
ing issues under the aforementioned conditions would
have been more relevant for the subsequent installation
development.
The developments force us to rethink the definition
of binaural. For example, the use of in-head localization
through stereophony on headphones is not usually con-
sidered “binaural,” as no head-related impulse responses
are involved. Reflecting the discussion in the previ-
ous paragraphs, we propose to extend the definition of
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
p
v
a
r
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
a
_
0
0
2
8
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Rumori and Marentakis 133
“binaural” to any intended use of ear signals irrespective
of their properties, origin, or means of projection (e.g.,
headphones or transaural). The term “binaural” has also
been used in different meanings before. For some time,
it merely meant the spatial augmentation of audio signal
transmission by adding a second channel (Alexander,
2000; Wade & Deutsch, 2008; Paul, 2009).
Realism is often a criterion for the immersive potential
of virtual environments. From a perspective of aesthetic
experience, “realism” does not address reality in terms
of the real world but the semblance of reality in a specific
medial context. In fact, the notion of “realism” in bin-
aural engineering usually translates to the reproduction
fidelity of ear signals, that is, minimizing their devia-
tion from those in an equivalent real-world situation
(Sunder, He, Tan, & Gan, 2015). For the case study
“Parisflâneur,” there is obviously no such corresponding
real-world situation. At best, the virtual scene may be
regarded as a collection of omnidirectional loudspeak-
ers playing back monaural field recordings. However,
experiencing the virtual scene in Parisflâneur does not
imply the imagination of seven loudspeakers in space
but the successful evolvement of a mental map com-
prising abstract sound objects. Credibility, or, inversely
put, suspension of disbelief, is achieved when the listener
engages with the virtual scene such that interaction with
its objects becomes possible. This notion of immersion is
based on culturally established acousmatic experience of
sounds disembodied from their origins. Nevertheless,
such abstract sound objects may gain physical pres-
ence without a correspondence to physical reality when
supported by a narrative.
8
Conclusion
In this article, we presented an integral take on
binaural audio technology from an artistic research per-
spective. The project has been carried out in the area of
interactive, sound-based installation art. A case study
has been introduced whose artistic aims and imple-
mentation details have been thoroughly described. In
particular, interactive elements of the installation have
been analyzed for subsequent evaluation.
The complexity of developing formal evaluation
methods for aspects of artistic works has been demon-
strated. Based on an extensive literature review, appro-
priate methods from related areas were adopted and
refined for the intended evaluation task. It turned out
that such efforts frame an intensive, fruitful process for
both the artists and the researchers and yield valuable
material for further theoretical reflections and artistic
practice.
Finally, a substantially different conceptual take on
the case study and its realization is documented as the
artist’s reaction on the evaluation and the reflection pro-
cess influenced by it. Changes to the implementation
and the interaction model are described in detail, and
major aspects are discussed.
The project exemplifies that the separation of techni-
cal engineering and so-called “content” production as
currently widespread may be inappropriate. Depend-
ing on the context, this finding may apply to areas
other than binaural audio as well, whenever media
is not regarded as a mere container for conveying an
independent subject matter but as part of an integral
aesthetic experience. Consequently, close connections
between scholarly and artistic research as well as engi-
neering pose a promising lead for a further advance in
transdisciplinary collaboration.
Acknowledgments
We would like to thank the members of the Klangräume
project team, David Pirrò, Stefan Reichmann, Marian Weger;
and the institutional partners, the Institute of Electronic
Music and Acoustics Graz, University of Applied Sciences
FH Joanneum Graz and ESC Media Art Lab Graz, Austria.
Klangräume has been funded as part of the programme
Exciting Science and Social Innovations of Zukunftsfonds
Steiermark (Funds for the future development of the region
of Styria, Austria). The case study described in Section 4 was
initially conceived during two short-term scientific missions
by Martin Rumori, funded by the Sonic Interaction Design
European COST action (COST IC0601, Rocchesso, 2011).
The editors and reviewers of Presence: Teleoperators and Vir-
tual Environments provided valuable and much appreciated
suggestions for the improvement of earlier revisions of this
article.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
p
v
a
r
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
a
_
0
0
2
8
9
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
134 PRESENCE: VOLUME 26, NUMBER 2
References
Alexander, R. (2000). The inventor of Stereo: The life and works
Eckel, G. (2001). The vision of the LISTEN project. Proceed-
ings of the 7th International Conference on Virtual Systems
and Multimedia, 393–396.
of Alan Dower Blumlein. Waltham, MA: Focal Press.
Edmonds, E. (2010). The art of interaction. In
Alon, D. L., Sheaffer, J., & Rafaely, B. (2015). Robust
plane-wave decomposition of spherical microphone array
recordings for binaural sound reproduction. The Journal of
the Acoustical Society of America, 138(3).
Bilda, Z., Costello, B., & Amitani, S. (2006). Collaborative
analysis framework for evaluating interactive art experience.
CoDesign, 2(4), 225–238.
Bilda, Z., Edmonds, E., & Candy, L. (2008). Designing for
creative engagement. Design Studies, 29(6), 525–540.
Bimber, O., & Raskar, R. (2005). Spatial augmented reality.
Natick, MA: A K Peters.
Bishop, C. (2005). Installation art. London: Tate Publishing.
Bleidt, R., Borsum, A., Fuchs, H., & Weiss, S. M. (2014).
Object-based audio: Opportunities for improved listening
experience and increased listener involvement. Proceed-
ings of SMPTE Annual Technical Conference & Exhibition,
1–20.
Boehner, K., Sengers, P., & Warner, S. (2008). Interfaces
with the ineffable: Meeting aesthetic experience on its
own terms. ACM Transactions on Computer–Human
Interaction, 15(3), 12:1–12:29.
Borgdorff, H. (2006). The debate on research in the arts (Sen-
suous Knowledge No. 2). Bergen: Bergen Academy of Art
and Design.
Bregman, A. S. (1990). Auditory scene analysis. The percep-
tual organization of sound. Cambridge, MA/London: MIT
Press.
Bronkhorst, A. W., Veltman, J. A., & van Breda, L. (1996).
Application of a three-dimensional auditory display in a
flight task. Human Factors, 38(1), 23–33.
Candy, L., Amitani, S., & Bilda, Z. (2006). Practice-led
strategies for interactive art research. CoDesign, 2(4),
209–223.
Candy, L., Edmonds, E., & Ascott, R. (2011). Interacting:
Art, research and the creative practitioner. Oxfordshire:
Libri Pub.
Candy, L., & Ferguson, S. (2014). Interactive experience in the
digital age: Evaluating new art practice. Berlin/Heidelberg:
Springer.
Coleridge, S. T. (1898). Biographia literaria or biographical
sketches of my literary life and opinions and two lay sermons.
London: George Bell and Sons.
Create 10 (n.p.) Retrieved from http://www.bcs.org
/upload/pdf/ewic_create10_keynote3.pdf
Edmonds, E., Bilda, Z., & Muller, L. (2009). Artist, evaluator
and curator: Three viewpoints on interactive art, evaluation
and audience experience. Digital Creativity, 20(3), 141–
151.
Ettlinger, O. (2008). The architecture of virtual space.
Ljubljana: University of Ljubljana.
Farina, A. (2000). Simultaneous measurement of impulse
response and distortion with a swept-sine technique. Pro-
ceedings of Audio Engineering Society Convention, 108,
1–23.
Franinovi´c, K., & Salter, C. (2013). The experience of sonic
interaction. In K. Franinovi´c & S. Serafin (Eds.), Sonic
interaction design (pp. 39–75). Cambridge, MA/London:
MIT Press.
Frayling, C. (1993). Research in art and design. Royal College
of Art Research Papers, 1(1), 1–5.
Gaver, B. (2002). Designing for homo ludens. I3 Magazine,
2–6.
Geier, M., & Spors, S. (2012). Spatial audio reproduc-
tion with the SoundScape Renderer. Proceedings of 27th
Tonmeistertagung—VDT International Convention,
646–655.
Gilkey, R., & Anderson, T. R. (Eds.). (2015). Binaural and
spatial hearing in real and virtual environments. Hove:
Psychology Press.
Glaser, B. G., & Strauss, A. L. (1967). The discovery of
grounded theory: Strategies for qualitative research.
Hawthorne, NY: Aldine de Gruyter.
Grau, O. (2003). Virtual art. From illusion to immersion.
Cambridge, MA/London: MIT Press.
Höök, K., Sengers, P., & Andersson, G. (2003). Sense and
sensibility: Evaluation and interactive art. Proceedings of
the SIGCHI Conference on Human Factors in Computing
Systems, 5, 241–248.
Jensen, J. F. (1998). Interactivity. Tracking a new concept
in media and communication studies. Nordicom Review,
19(1), 185–204.
Kane, B. (2015). Sound unseen. Acousmatic sound in theory
and practice. Oxford: Oxford University Press.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
p
v
a
r
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
a
_
0
0
2
8
9
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Rumori and Marentakis 135
Krebs, S. (2016). The failure of binaural stereo: Ger-
Paine, G. (2002). Interactivity, where to from here? Organised
man sound engineers and the introduction of dummy
head microphones. Kunstkopf Stereophony. Failure
and Success of Dummy Head Recording: An Inno-
vation History of 3D Listening. Retrieved from
https://binauralrecording.wordpress.com/2016/08/03
/the-failure-of-binaural-stereo-german-sound-engineers-
and-the-introduction-of-dummy-head-microphones/
Lentz, T., Assenmacher, I., Vorländer, M., & Kuhlen, T.
(2006). Precise near-to-head acoustics with binaural
synthesis. Journal of Virtual Reality and Broadcasting, 3(2).
Lindau, A., Klemmer, M., & Weinzierl, S. (2008). Zur bin-
auralen Simulation verteilter Schallquellen [On binaural
simulation of distributed sound sources]. Proceedings of the
34th DAGA, 897–898.
Liu, Q., Wang, W., Jackson, J. B., & Cox, T. J. (2015). A
source separation evaluation method in object-based spatial
audio. Proceedings of European Signal Processing Conference,
1088–1092.
Marentakis, G., Pirrò, D., & Kapeller, R. (2014).
Zwischenräume—A case study in the evaluation of
interactive sound installations. Proceedings of the Joint Inter-
national Computer Music/Sound and Music Computing
Conferences, 277–284.
Marentakis, G., Pirrò, D., & Weger, M. (2017). Creative eval-
uation. Proceedings of the 2017 Conference on Designing
Interactive Systems, 853–864.
Morrison, A., Mitchell, P., & Brereton, M. (2007). The lens
of ludic engagement: Evaluating participation in interactive
art installations. MultiMedia 2007, 509–512.
Muller, L., & Edmonds, E. (2006). Living laboratories:
Making and curating interactive art. SIGGRAPH 2006
Electronic Art and Animation, 147–150. Retrieved from
http://siggraph.org/artdesign/gallery/S06/paper2.pdf
Sound, 7 (3), 295–304.
Paul, S. (2009). Binaural recording technology: A historical
review and possible future developments. Acta Acustica
united with Acustica, 95, 767–788.
Pauli, H. (1971). Für wen komponieren Sie eigentlich? [For
whom do you actually compose?]. Frankfurt: Fischer.
Reck, H. U. (2007). The myth of media art. Weimar: VDG.
Reiss, J. H. (1999). From margin to center. The spaces of
installation art. Cambridge, MA/London: MIT Press.
Rocchesso, D. (2011). Explorations in sonic interaction design.
Berlin: Logos.
Rumori, M. (2017a). Binaural floss—Exploring media,
immersion, technology. Proceedings of the International
Linux Audio Conference, 13–20.
Rumori, M. (2017b). Space and body in sound art: Artistic
explorations in binaural audio augmented environments. In
C. Wöllner (Ed.), Body, sound and space in music and beyond
(pp. 235–256). London: Routledge.
Rumori, M., & Hollerweger, F. (2013). Production and appli-
cation of room impulse responses for multichannel setups
using FLOSS tools. Proceedings of the International Linux
Audio Conference, 125–132.
Rumori, M., Hollerweger, F., & Cabrera, A. (2010). Binaural
room impulse responses for composition, documentation,
virtual acoustics and audio augmented environments. Pro-
ceedings of 26th Tonmeistertagung—VDT International
Convention, 670–679.
Ryan, M. L. (2001). Narrative as virtual reality. Immer-
sion and interactivity in literature and electronic media.
Baltimore/London: Johns Hopkins University Press.
Schärer, Z., & Lindau, A. (2009). Evaluation of equaliza-
tion methods for binaural signals. Proceedings of the Audio
Engineering Society Convention, 126, 1–17.
Muller, L., Edmonds, E., & Connell, M. (2006).
Sengers, P., & Gaver, B. (2006). Staying open to interpreta-
Living laboratories for interactive art. CoDesign, 2(4),
195–207.
Niklas, S. (2014). Die Kopfhörerin. Mobiles Musikhören als
ästhetische Erfahrung [The headphone listener. Mobile music
listening as aesthetic experience]. Paderborn: Fink.
Noisternig, M., Sontacchi, A., Musil, T., & Höldrich, R.
(2003). A 3D Ambisonic based binaural sound reproduc-
tion system. Proceedings of the 24th Audio Engineering
Society Conference, 1–5.
tion: Engaging multiple meanings in design and evaluation.
Proceedings of the 6th Conference on Designing Interactive
Systems, 99–108.
Sunder, K., He, J., Tan, E. L., & Gan, W. S. (2015). Natu-
ral sound rendering for headphones: Integration of signal
processing techniques. IEEE Signal Processing Magazine
(Special Issue on Signal Processing Techniques for Assisted
Listening), 32(2), 110–113.
The SuperCollider book. (2011). Cambridge, MA/London:
Novo, P. (2005). Auditory virtual environments. In
MIT Press.
J. Blauert (Ed.), Communication acoustics (pp. 277–297).
Berlin/Heidelberg: Springer.
von Foerster, H. (2002). Understanding understanding: Essays
on cybernetics and cognition. Berlin/Heidelberg: Springer.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
p
v
a
r
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
a
_
0
0
2
8
9
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
136 PRESENCE: VOLUME 26, NUMBER 2
Wade, N., & Deutsch, D. (2008). Binaural hearing. Before
and after the stethophone. Acoustics Today, 4(3), 16–27.
Warusfel, O., & Eckel, G. (2004). LISTEN. Augmenting
fade into the binaural recording. By ducking again, the
attached hat can be put off again and remains at its new
position in the virtual scene.
everyday environments through interactive soundscapes.
Workshop Proceedings of IEEE VR 04 (n.p.) Retrieved from
http://resumbrae.com/vr04/warusfel.pdf
Wenzel, E. M., Arruda, M., Kistler, D. J., & Wightman, F. L.
(1993). Localization using non-individualized head-related
transfer functions. The Journal of the Acoustical Society of
America, 94(1), 111–123.
Appendix
Introductory Text to Parisflâneur in
the Exhibition
(This text normally accompanies public exhibitions
of the installation and was also presented to the
participants of the evaluation; see Section 5.2.)
Parisflâneur is an interactive sound environment
using binaural rendering and tracked headphones. Seven
invisible but audible “sound hats” are arranged in the
virtual space which the listeners can find by approaching,
turning and listening. The “hats” contain different
sound situations recorded in and around the city of
Paris. By ducking, the listeners can put on a specific hat,
which lets them leave the virtual auditory scene and fully
Evaluation Response Sheet
(Translated from German, reproduced on next
page.)
The questionnaire was structured using both the
semantic differential as well as the Likert scale styles. For
each question, we debated which style would have been
more appropriate. We used semantic differential for
questions which could have also been answered with a
number on a scale (e.g., state how difficult was
something on a scale from 1 to 7). We decided to give
semantic differential scales seven levels of resolution, a
typical decision for such scales. We used Likert scales for
questions in which we felt that the aforementioned
strategy would require visitors to make diverging
assumptions. In such cases, we felt that the short textual
explanation of each scale point would increase
interpretation consistency and have value for both the
evaluation and the participants. We went for five levels,
and debated whether including a neutral value would be
meaningful in all cases. In one case, this was not
considered meaningful, which left us with four levels.
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
p
v
a
r
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
a
_
0
0
2
8
9
p
d
/
.
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3
Rumori and Marentakis 137
l
D
o
w
n
o
a
d
e
d
f
r
o
m
h
t
t
p
:
/
/
d
i
r
e
c
t
.
m
i
t
.
/
e
d
u
p
v
a
r
/
a
r
t
i
c
e
-
p
d
l
f
/
/
/
/
2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
a
_
0
0
2
8
9
p
d
.
/
f
b
y
g
u
e
s
t
t
o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3