Martin Rumori* - 麻省理工学院人工智能研究专业

Martin Rumori*
University of Music
and Performing Arts Graz
Institute of Electronic
Music and Acoustics
Inffeldgasse 10/3, 8010 Graz,
奥地利

Georgios Marentakis
Signal Processing and Speech
Communication Laboratory
Graz University of Technology

Parisﬂâneur. Artistic Approaches
to Binaural Technology and
Their Evaluation

抽象的

This article approaches binaural interactive environments from an artistic research
看法. Beyond content production, an aesthetic reﬂection of binaural media
requires pervasive access to digital processing means and ways to employ them in
作品. 然而, most conventional workﬂows separate media-speciﬁc ren-
dering algorithms from object-based scene authoring. Such a delimitation between
binaural engineering and its application restricts transdisciplinary creation that crosses
both areas. This article assumes that the full potential of immersive media cannot
be explored without investigating technology in the context of aesthetic experi-
恩斯. A case study is presented in which artistic references are regarded together
with its technical realization. Contemporary user experience evaluation methods
are adopted and reﬁned with reference to the aims of the artist. A subsequent revi-
sion of the work is discussed along with implementation adjustments and conceptual
alterations. The presented project shall exemplify how artistic research may bridge
scholarly investigation and the creative acquirement of media technology beyond its
mere application. A point of departure shall be provided for further cross-fertilization
between engineering and the arts by identifying mutual implications.

介绍

Binaural technology is widely used to provide an immersive spatial expe-

rience in virtual and augmented reality applications, soniﬁcation, auditory
展示, or assistive technologies. Unlike in the era of dummy head stereophony
in the 1970s (保罗, 2009; Krebs, 2016), the need to wear headphones cannot
be considered an obstacle to the acceptance of binaural audio anymore. 上
相反, awareness of the perceptual implications of stereophony in headphone
reproduction such as in-head localization is increasing due to the ubiquity of
headphones, as are efforts to achieve externalization in binaural technology (cf.,
例如, Gilkey & 安德森, 2015).

Spatial listening takes place in cross-modal correspondence to other senses,
such as vision and proprioception. Spatial audio technology potentially distorts
cross-modal congruency insofar as the sensory relation of listener and environ-
ment is reconﬁgured (Niklas, 2014). Depending on the application, binaural
audio may support or contradict real or synthesized visual stimuli; augment
or replace the existing auditory environment; or be conceived without explicit

Presence, 卷. 26, 不. 2, 春天 2017, 111–137

土井:10.1162/PRES_a_00289

*Correspondence to rumori@iem.at and georgios.marentakis@tugraz.at.

Rumori and Marentakis 111

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d

我

F
/

2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

112 PRESENCE: VOLUME 26, NUMBER 2

cross-modal references as a listening-only experience.
Interfaces to binaural systems usually follow the object-
based approach, 那是, the auditory scene is described
by multiple virtual sound sources. Each source com-
prises underlying audio material and metadata on the
source’s properties, most signiﬁcantly its location in
space or a dynamic spatial trajectory. Binaural systems
thus serve as a universal rendering means similar to peri-
phonic projection technologies such as Ambisonics or
Wave Field Synthesis (Bleidt, Borsum, Fuchs, & 韦斯,
2014). 作为结果, engineers concentrate on the
development of rendering algorithms that are prefer-
ably independent of the actual audio material, 尽管
so-called “content producers” deal with the selection
and the arrangement of audio objects in the scene with
limited insight into the algorithms. Either side cannot
easily cross the boundary between scene description and
rendering.

在本文中, we approach binaural technology from
an artistic research perspective. Artistic research is receiv-
ing increased attention in the past decades because it can
explore areas of knowledge production that are hardly
accessible by formal scientiﬁc strategies (Frayling, 1993;
Borgdorff, 2006). In contrast to the common separa-
tion of “content production” from scene rendering, 我们的
aim is to investigate aesthetic implications of technology
and tools in an integral process of creation. This aim is
pursued by means of an artistic case study on interactive,
audio augmented environments using binaural technol-
奥吉, which is thoroughly investigated on a theoretical as
well as on an empirical level. A design iteration including
user evaluation is presented.

A central aspect of the case study is an artistic, 自己-
referential reﬂection of binaural rendering by composing
a navigable scene out of objects that are themselves
binaural recordings. The perspective to the material,
whether it is heard as an egocentric recording or as an
exocentric environment, may be changed by the listener
through interaction in the scene.

In the following, issues of binaural technology in

the context of artistic creation are introduced (看
部分 2). 随后, evaluation methods in inter-
active arts are reviewed in Section 3. The case study
Parisﬂâneur is described in Section 4. The formal

evaluation of the installation is presented in Section 5,
followed by a report on artistic consequences and a sub-
stantial rework of the case study in Section 6. 中央
aspects of the project are further discussed in Section 7
before the article concludes (参见章节 8).

Binaural Technology and
Artistic Creation

在这个部分, genre-related terms of installation

and environment, notions of interactivity, reactivity,
and immersion are presented as understood in this
文章. 此外, cultural aspects of headphone
listening, implications of object-based scene composi-
的, and aesthetic properties of binaural recordings are
illuminated.

2.1 Installation and Audio
Augmented Environment

The terms installation and environment have
several meanings in the arts and in mixed reality. Art
theory uses both terms to refer to certain art forms that
emerged since the 1960s within conceptual art. 这
form of installation is discussed in relation to the preced-
ing form of environment, although the term installation
was previously used to describe the arrangement of exhi-
bitions in general (Bishop, 2005). An environment is
characterized by the incorporation of the existing sur-
rounding into artistic reﬂection as it is without explicitly
designing it (Reiss, 1999). Both the installation and
the environment involve a strong spatial component
that substantially codetermines the signiﬁcance of the
spectating, 那是, the experiencing body.

In the context of virtual and augmented reality, 环境-

ronment seems to carry the notion of a surrounding
that is created for exploration by spectators or listeners.
While the terms virtual environment and, less exten-
sively, augmented environment are widespread and often
used synonymously for virtual and augmented reality
分别, they only rarely refer exclusively to the
auditory domain. 在这种情况下, 相当, the term audi-
tory virtual environment is used (Novo, 2005). 声音的

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d

我

F
/

2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Rumori and Marentakis 113

augmented environment appears to have been coined
in the context of the LISTEN research project, a pio-
neering attempt to superimpose binaural interactive
soundscapes on everyday surroundings (Eckel, 2001;
Warusfel & Eckel, 2004).

Throughout this article, installation denotes the phys-

ical and conceptual structures that have been conceived
to form an artifact. In terms of virtual and augmented
现实, an installation would comprise the technical
and medial means as well as conceptual references that
convey objects of aesthetic experience. The entirety of
experienced entities, be it abstract virtual structures or
physical objects, are considered to be the environment.

2.2 Interactivity and Reactivity

Interaction has been commonly understood on a

mostly technical level in terms of human–machine com-
munication. Very much in contrast to this notion, 国际米兰-
activity has been thoroughly investigated in many areas,
among them social and communication studies, 媒体,
and art theory (看, 例如, 詹森, 1998; Ryan, 2001;
Paine, 2002; Franinovic & Salter, 2013). In the con-
text of interaction with sounds, connections have been
formed to the notions of embodiment, enaction, 和
tacit knowledge. 在本文中, a minimal distinction will
be drawn between reactivity and interactivity in order
to identify two different modes of human–machine
communication with respect to aesthetic experience.
Reactivity denotes the action of dynamic mecha-

nisms controlled by interface input whose effects can be
characterized as compensating, or negating. An exam-
ple of reactivity is the use of head tracking in binaural
audio systems “to decouple the position of the source
from head movements” (Bronkhorst, Veltman, & van
Breda, 1996, p. 23). 换句话说, the system com-
pensates for the listener’s movements so as to convey the
impression of a stable, exocentric auditory environment.
Interactivity, 相比之下, implies a participatory func-

tion that is conceptually assigned to the spectator’s or
listener’s input. In binaural audio, this quality is often
indicated by interaction with the presented auditory
scene such that it is intentionally altered, 例如,
when controlling a virtual sound object by gestures. 经过

相互作用, an installation reveals a certain behavior such
that the resulting effect can be related to the input.

2.3 Immersion

The discourse on immersion is similarly widespread
and heterogeneous as that on interactivity (Ryan, 2001;
Grau, 2003; Reck, 2007). In the realm of virtual and
augmented environments, the level of achievable immer-
sion or presence is commonly regarded as directly
correlated to the ﬁdelity of mediating technology, 任-
德令, and projection techniques (看, 例如, Bimber
& Raskar, 2005; Lentz, Assenmacher, Vorländer, &
Kuhlen, 2006; Schärer & Lindau, 2009).

相比之下, a mental state similar to immersion was
introduced into literature theory as early as 1817 作为一个
“willing suspension of disbelief” (Coleridge, 1898).
Coleridge’s understanding implies

1. an active contribution of the recipient (“willing”)

和

2. that reaching this mental state depends on cognitive
processing and not only on perceptual stimulation
(例如, by means of a narrative).

因此, the mere ﬁdelity of the synthesized stimuli is
neither a measure for the likelihood nor for the depth of
possible immersion, even with today’s VR technology
(Ryan, 2001; Ettlinger, 2008). Such a point of view has
been assumed in conceiving the case study described in
this article (参见章节 4), especially when estimating
the immersive potential of largely simpliﬁed and tech-
nically “inaccurate” rendering techniques and that of
non-reactive binaural recordings.

2.4 Headphone Listening as a
Cultural Technique

Binaural audio often gives rise to surprising expe-

riences and disturbs most lay listeners’ expectations.
Although externalization is sought by providing per-
ceptual cues that reference unmediated real-world
情况, the cultural technique of headphone listen-
ing superimposes the faculty of natural spatial hearing
(Rumori, 2017乙). In cultural theory, cultural techniques

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d

我

F
/

2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

114 PRESENCE: VOLUME 26, NUMBER 2

are acquirements that evolved in sociocultural partic-
ipatory practice, 那是, they are not achievements of
个人. Classical examples are lighting up a ﬁre,
reading and writing, or storytelling.

The cultural technique of headphone listening has
been developed through conventional stereophonic sig-
nals meant to be played on loudspeakers but that are
delivered as ear signals. We are trained to abstract a spa-
tial image from an in-head stereo base localized between
our ears, and we expect to resort to this ability when
putting on headphones. Binaural externalization con-
tradicts this expectation until it is learned and associated
with headphone media.

The perturbance of expectation is even stronger in
the case of reactive binaural images enabled by tracking.
迄今为止, prevalent headphone listening skills include
the abstraction from an egocentric perspective, 那是, A
common reference of head and sound projection, 哪个
turns and moves along with the listener. 然而, 这
conveyed auditory image is considered exocentric, 为了
例子, the orchestra in the concert hall acoustics of a
classical recording.

2.5 Object-Based Soundﬁeld
Representation

Common binaural systems are mostly approached

as black boxes by so-called “content producers.” This
is not only because rendering algorithms are hidden
behind scene authoring tools, but also because sophis-
ticated adjustments would require detailed technical
知识. From an artistic point of view, object-based
scene composition is but one approach for represent-
ing an auditory environment, not necessarily the most
适合所有情况. Listening is an integral aes-
thetic experience that takes place at various levels, 仅有的
one of them being a cognitive scene decomposition (比照.
Bregman, 1990). Depending on the artistic aim, empha-
sis may be on overall sonic qualities of an environment
or on the spatial expansion of sonic phenomena, not pri-
marily on the particular layout of scene objects to each
其他.

Advanced rendering systems incorporate source direc-

tivity models beyond simple point sources (Lindau,

Klemmer, & Weinzierl, 2008). 然而, any model
imposes assumptions and approximations. Like any
other representation, object-based scene description
cannot be considered a transparent, lossless capture of
an arbitrary complex auditory environment; 相当, 这是
an interpretation that may or may not ﬁt artistic aims.

Artistic approaches to binaural technology in the sense

of media art do not only seek to convey a certain spa-
tial experience, but at the same time, they reﬂect on the
conditions and the anthropological implications of simu-
lated auditory environments. According to Reck (2007,
13), the examination of media as a subject matter tar-
gets “art through media” rather than “art with media.”
为此原因, creative exploration should endeavor to
exceed mere “content production.” In the context of
binaural audio, this implies that rendering algorithms
should not be considered as independent of the aesthetic
experience of “content”; 相当, they are a part of the
内容.

2.6 Binaural Recordings

Like scene decomposition, binaural recordings

also imply an interpretation of an integral environment.
尽管如此, interpretation as conducted by recordings
takes place on the level of perspective and behavior, 这样的
as recording direction or perspective motion, 而不是
that of separation into sonic objects.

Binaural recordings play only a marginal role in
today’s virtual and augmented reality applications due
to their static nature, both with respect to scene manipu-
lation as well as subsequent dynamic perspective changes
(例如, upon tracking). Research is performed to over-
come these limitations, 例如, based on source
separation out of recorded scenes or higher-order spa-
tial recordings (阿隆, Sheaffer, & Rafaely, 2015; 刘,
王, Jackson, & 考克斯, 2015). 再次, such techniques
involve models of decomposition and rendering whose
implications have to be considered.

From an aesthetic point of view, binaural record-
ings pose a very effective way of conveying a complex
spatial auditory image, especially to support anecdo-
tal or narrative artistic aims. Many qualities that are
hard to simulate, such as the overall atmosphere of an

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d

我

F
/

2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Rumori and Marentakis 115

环境, are preserved with high ﬁdelity in record-
英格斯. Depending on the context, this property may
be rated higher than disadvantages like the described
immutability of binaural recordings.

Evaluation in the Context of
Interactive Art

The component of interactivity in interactive works

requires that artists are actively concerned with how the
audience interacts with the artwork, and possibly with
彼此, through the artwork (Edmonds, 2010).
An increasing body of work investigates, 所以,
无论, 如何, and when user-based evaluation could
be involved in the development process of interactive
艺术.

Evaluating interactive art usually includes a combi-
nation of usability testing and qualitative inquiry into
experiential aspects of interaction. Contrasting results to
the artistic intention may have been an obvious step to
take in order to complete the evaluation. Although rel-
evant to the installation usability, such an approach may
not be appropriate for the evaluation of other experi-
ence aspects, such as emotional and aesthetic responses.
Difﬁculties arise because artworks rarely contain or aim
to deﬁne a speciﬁc type of experience. 反而, they aim
at creating an experience that is open to interpretation.
There is value in such interpretations being incompat-
ible with design expectations or inconsistent among
visitors.

Speciﬁc attention has been given to joy of play,
乐趣, and enjoyment, and designing for ludic
engagement (Gaver, 2002). This type of engagement
may relate to the priorities of artists and has been iden-
tiﬁed as a pragmatic goal to evaluate in interactive
artworks (Morrison, 米切尔, & Brereton, 2007).
Creative engagement has also been associated with
interactive art experience. It emerges when participants
interpret unconventional interaction situations, 其中
their intentions and expectations are not aligned with
the system responses. It may be accentuated by grad-
ually drawing visitors in, by using interactive elements
of different levels of complexity in order to attract but

also to maintain interest (Bilda, Edmonds, & Candy,
2008).

历史上, important contributions to evaluation in

the context of interactive art has emerged in the Beta
空间 (穆勒, Edmonds, & Connell, 2006; 穆勒
& Edmonds, 2006). Evaluation methods including
direct user observation or observation using video,
contextual interviewing, structured interviews, or ques-
tionnaires have been extensively applied (Edmonds,
Bilda, & 穆勒, 2009; Candy, Amitani, & Bilda, 2006;
Bilda, 科斯特洛, & Amitani, 2006; Marentakis, Pirrò,
& Kapeller, 2014). A particularly relevant contribution
is the video-cued recall method, which may be seen as a
dynamic feedback evaluation method (Sengers & Gaver,
2006). In this method, participants are asked to recall
what they experienced, while watching their actions in a
视频.

Dynamic feedback methods are important for eval-
uating open works. This relates to giving information
obtained from the users back to them for interpretation,
in longitudinal studies that involve a diverse population.
Designers then should weigh the results to justify their
conclusions and make sure that they do not abdicate
the responsibility for the eventual success of the system
(Sengers & Gaver, 2006). Application of dynamic feed-
back for the purpose of evaluation could be observed in
Boehner, Sengers, and Warner (2008), resulting in sig-
niﬁcant deepening and shift in the designer perspective,
when dealing with ineffable aspects of user experience
as in the case of designing aesthetics. The co-discovery
方法, in which groups of users visited an installation
while their interactions were recorded, could be used
to address social aspects of the interactive art experi-
恩斯 (Höök, Sengers, & 安德森, 2003). More open
技巧, such as shadowing, interviewing and infor-
mal discussion, and questionnaires, brought together by
the grounded theory method (Glaser & Strauss, 1967),
have been used by Morrison et al. (2007). A signiﬁcant
collection of evaluation works that address interactive
sound art has appeared in Candy and Ferguson (2014)
and Candy, Edmonds, and Ascott (2011). Recent
approaches emphasize the use of artistic techniques
in order to address ineffable aspects of user experience
(Marentakis, Pirrò, & Weger, 2017).

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d

我

F
/

2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

116 PRESENCE: VOLUME 26, NUMBER 2

4 Artistic Case Study Parisflâneur

The case study Parisﬂâneur constitutes an inter-

active audio augmented environment that can be
experienced as a sound installation. The case study has
been implemented iteratively in an integral process as an
experimental system for binaural environments. 案子
study underwent several reworkings. One of its incarna-
tions received a more formal evaluation of its interaction
设计 (参见章节 5).

4.1 描述

数字 1. Parisﬂâneur with schematic visualization of “sonic hats.”

Parisﬂâneur invites listeners to put on headphones

and to navigate freely in virtual auditory space. 这
installation does not interfere with the visual or haptic
perception of the visitor apart from marking the bound-
ary of the active installation area on the ﬂoor, 和
requirement to wear headphones. Seven binaural ﬁeld
recordings from Paris and around are featured, 哪个
have been carried out by the creator. They represent dif-
ferent urban and rural sound situations. The installation
contrasts the static nature of such ﬁeld recordings with
their perception as dynamic point sources in a binaurally
rendered, interactive auditory environment.

When entering the environment, a complex auditory
scene is heard. The scene is formed by the seven binaural
ﬁeld recordings that are rendered as seven spatially dis-
tributed, monaural virtual sources. By walking around
while listening, the recordings comprising the scene
may be identiﬁed and localized with gradually increasing
肯定.

The listener may interact with each of the sounds
by moving his or her head below a certain threshold
and then raising it again. In the installation narrative,
this interaction gesture is introduced to the listeners
as “ducking” at the exact position of a virtual source
as if one would crawl under an imaginary “sonic hat”
suspended in space and “put it on.”

The “ducking” interaction results in a gradual cross-

fade from the interactive scene to the corresponding
binaural recording while the rest of the virtual sources
disappears. The selected sound track migrates from a
dynamically rendered monaural point source toward a

static binaural recording that is therefore not reactive to
the listener’s movements.

The switchover of the heard environment’s spatial
reference to the listener’s head is reﬂected in the installa-
tion narrative as the sonic hat “being carried.” The point
source in the virtual scene corresponding to the active
recording is moved along with the listener. This change
happens in the background, hence inaudibly, as long as
the hat remains put on. Only when the hat is “taken off”
by performing the inverse ducking gesture, the virtual
scene will become audible again, 即, from the new
listening perspective, and the recording will be left at
its new location. 这边走, the scene may be completely
rearranged (见图 1).

4.2 Aesthetic References

Parisﬂâneur refers to acoustic ecology and anec-

dotal music by the incorporation of mostly unprocessed
ﬁeld recordings. Anecdotal music (musique anecdotique)
has been coined by French composer Luc Ferrari start-
ing from the 1960s. With his compositions, he invited
listeners to pick up associations from the recordings
and develop their own stories while listening (Pauli,
1971). This notion contrasts with musique concrète, 这
predominant contemporary genre of composing with
recordings of that era, which required sound qualities to
be received as such, without reference and therefore in a
mode of reduced listening (Kane, 2015).

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d

我

F
/

2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Rumori and Marentakis 117

One of Ferrari’s concepts around anecdotal music
is the diapositive sonore. In his egalitarian understand-
英, he claimed that audio recordings should be carried
out and used just like photographs are taken in holi-
天 (or lantern slides, as he puts it). A slide mounted
for projection may be understood as both a medium that
conveys an image and an object that can be regarded
in various ways and from different perspectives. 巴黎-
ﬂâneur reﬂects such medial properties metaphorically
by staging binaural recordings in different perceptual
上下文, both as immersive images and as objects in a
virtual “magic lantern.”

The integration of objects in a binaurally rendered

scene that are in turn binaural recordings implies
multiple nested levels of abstraction. At each level,
a conceptual inversion of perspective takes place.
A binaural recording captures an essentially exocentric
经验, 那是, auditory entities in the outer world.
Listening to the static recording, 然而, makes the
experience egocentric because the auditory scene is tied
to the listener’s head. This is true for all recordings, 甚至
conventional stereophonic ones. Media-speciﬁc cultural
techniques of listening enable cognitive abstraction
of exocentric references (参见章节 2.4). In Paris-
ﬂâneur, the inner exocentric reference of the ﬁeld
recordings is complemented by that of the outer virtual
scene in which the recordings are collapsed to auditory
物体. The provided way of changing perspectives by
interaction provokes the prospect of a second-order
introspection (cf., 例如, von Foerster, 2002), 哪个
always includes both directions: While listening to an
egocentric binaural recording, the listeners may imagine
being immersed in the exocentric recorded situation
as well as watching themselves from the perspective
of the rendered, likewise exocentric rendered scene.
Exploring the virtual scene in turn allows for the meta-
perspective of an uninvolved spectator to whom the
listener is an exocentric scene object just like the sound
来源. The metaphor of “carrying a sound hat” links
the egocentric binaural recording with a correspond-
ing egocentric sound object in the scene. Egocentrism
allows for “changing the world,“ 那是, reorganizing
the scene, which is not perceivable directly but requires
retrospection or a second-order meta-perspective. 作为

one of the many classical examples from ﬁne arts for
such a self-referential conceptual structure, 工作
Authorization by Michael Snow (1969) may be named.
The playback of binaural recordings in Parisﬂâneur
is looped, each starting at a random position. 自从
ﬁles have different lengths, the resulting auditory scene
composed of the seven situations is constantly chang-
英. Conceptually, the installation avoids any intentional
montage but rather seeks the aleatoric recombination of
the recordings’ narratives.

4.3 Implementation

4.3.1 Aesthetic Lab for Binaural Research.
Parisﬂâneur has been implemented in close conjunc-
tion with the development of an experimental system
for binaural audio. The design process was iterative and
driven by the requirements of the case study. Conceiv-
ing the binaural rendering strategy was an integral part
of the artistic evolvement of the installation. 审美的
considerations focused in particular on the close rela-
tion of rendered and recorded sound material and their
transitions. To pursue this reﬂection practically, an open
framework was required rather than a ready-made scene
rendering system (比照. 部分 2.5). Major requirements
for the framework included:

1. the possibility to explore different rendering

techniques to make their implications explicit,
2. access to simulated physical properties of the vir-
tual space, including virtual room acoustics and
dynamic distance behavior,

3. support for speculative rendering approaches
that initially appear less applicable in terms of
communications engineering,

4. the integration of static binaural source material,
5. the exploration of “non-binaural” effects such as

deliberate in-head localization.

Rather than a monolithic system, loose building

blocks have been implemented in the SuperCollider lan-
规格 (The SuperCollider Book, 2011) for experimental
勘探. Additional software packages such as the
Jconvolver convolution engine have been adapted and
integrated using the Jack audio connection kit (Rumori

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d

我

F
/

2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

118 PRESENCE: VOLUME 26, NUMBER 2

数字 2. Hemispherical speaker setup in IEM Cube.

& Hollerweger, 2013). In the following, implementa-
tion details are described up to the state that has been
formally evaluated.

4.3.2 Virtual Ambisonics. Initial versions of
Parisﬂâneur were conceived using a three-dimensional
virtual Ambisonics approach and free-ﬁeld impulse
responses (Noisternig, Sontacchi, Musil, & Höldrich,
2003). The implementation was based on the Super-
collider AmbIEM package1 and the KEMAR set of
free-ﬁeld head-related impulse responses (HRIR).2
Most intermediate versions were realised in third-order
Ambisonics, some also in fourth order. Room acoustics
was initially simulated using a simple shoebox model for
首先- and second-order reﬂections.

4.3.3 Room Impulse Response Measurements.

After some dissatisfaction with room acoustics simula-
的, the integration of measured binaural room impulse
responses (BRIR) was sought. The motivation was to
transfer the convincing spatial quality known from bin-
aural recordings to rendering, considering room impulse
responses as a form of recorded acoustics.

1. https://github.com/supercollider-quarks/AmbIEM
2. http://sound.media.mit.edu/resources/KEMAR.html

数字 3. Impulse response measurements in IEM Cube.

Most system development took place in the Cube
space of the Institute of Electronic Music and Acous-
抽动症 (IEM) in Graz, 奥地利, which is equipped with a
24-channel hemispherical speaker setup (见图 2).
Using a dummy head in the sweet spot, the speaker
setup in IEM Cube was measured using swept sines
(Farina, 2000). The idea was to use this speaker system
as a virtual Ambisonics layout for binaural rendering,
including the captured acoustics of the space.

The impulse response measurements have been
carried out in different versions: with and without
absorbing ﬁrst-order ﬂoor reﬂections by placing bafﬂes;
and each with the dummy head mounted in two differ-
ent heights, at the level of the lower speaker ring of the
hemisphere and slightly raised (Rumori, Hollerweger, &
Cabrera, 2010). The latter was meant as an experimental
compensation for frequent unintended elevated local-
ization of auditory events in binaural environments (看
数字 3).

4.3.4 Virtual Ambisonics Using Room Impulse

Responses. In the virtual Ambisonics rendering
系统, the measured room impulse responses (看
部分 4.3.3) replaced the KEMAR free-ﬁeld HRIR s

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d

我

F
/

2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Rumori and Marentakis 119

File player 1 (… 氮)

binaural

Tracking data input

Crossfade

egocentric

exocentric

Application control

1 (left channel of binaural)

Distance model
low pass filter/attenuation

Ambisonics encoder
3rd order, per−source order weighting

from other player(s)

binaural

File player 1
4−channel mono compat.

File player N
4−channel

Tracking data input

Crossfade

egocentric

exocentric

Application control

Ambisonics rotator
3rd order, 全球的

Ambisonics decoder
3rd order, global order weighting, CUBE layout

BRIR convolution
measured IEM CUBE speaker system

X : number of channels

Distance model
gradual, overlapping three−way panning

low pass filter/attenuation

Circular panning
monophonic

Circular panning
stereo (var. angle >= 0)

右

Stereo panning
variable width (0=mono)

BRIR convolution
reverberant, far field

HRIR convolution
anechoic, near field

Binaural output
headphone compensation

X : number of channels

数字 4. Signal ﬂows in initial and revised versions of Parisﬂâneur.

for convolving the decoded signals of the virtual loud-
speaker setup to a reverberant binaural signal. In a strict
理解, this approach is valid only for an immo-
bile listener, as the impulse responses were measured
from only one central listening position and orientation.
然而, the implementation using static BRIR s has
been combined with a tracking system. The positions
and synthesized distances of virtual sound sources were
corrected according to the listener’s movements, 尽管
the reverb information turned and moved along with the
listener’s head due to the static convolution (Rumori,
2017A). This implementation preserved the measured
overall room acoustics with low technical complexity,
although the relatively long convolutions demand some
processing power.

4.3.5 Resulting Signal Flow. The signal ﬂow of

the resulting implementation is shown in Figure 4(A).
The binaural recordings are played back from disk
and provide the sound source material. The signals
are routed to the crossfade block, which forwards
them either directly to the binaural two-channel bus,
or to the rendering stage as a monaural signal. 这
implementation uses only the left channel of the
recording as the monaural signal for encoding (看
部分 6.3.1 for a discussion). Fine-grained control on
the crossfade transition is provided through break-point
功能.

In the rendering branch, a distance-dependent gain

control and low-pass ﬁltering is applied. Attenuation
and ﬁltering parameters along with their effective ranges

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d

我

F
/

2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

120 PRESENCE: VOLUME 26, NUMBER 2

were determined by informal subjective evaluation.
实际上, both the amplitude attenuation and the sim-
ulation of air absorption were required to be much
stronger than in reality in order to support navigation
and orientation solely by listening.

The resulting source signal is subsequently encoded

to the Ambisonics domain. Encoding also involves a
distance-dependent per-source weighting of Ambisonics
orders for increasing the apparent source width when
closer approaching the virtual source. Beyond that,
tracking input is required at the encoding stage, 作为
relative encoding angles also depend on the listener’s
位置 (翻译), not only the rotation.

All sources’ encoders add their output to an Ambison-

ics bus. 随后, the Ambisonics signal is rotated
according to tracking data. As the listener may walk
around freely in the tracking volume, a head rotation
also involves a translation in almost all cases. 康塞-
经常地, per-source angles have to be adjusted each cycle
anyway due to simultaneous translation. The rotator’s
advantage of constant computational demand indepen-
dent of the number of sources is therefore less effective
这里.

Integrated with Ambisonics decoding, a global order
weighting takes place that allows for experimenting with
different decoding optimization strategies. The decoded
virtual speaker signals form the input to the convolution
matrix of room impulse responses, whose binaural out-
put is mixed into the global binaural bus. A headphone
compensation based on inverted dummy head measure-
ments is applied before the signal is played back (Schärer
& Lindau, 2009).

评估

5.1 方法

In order to proceed with the evaluation, the artist

was asked to complete a questionnaire. The answers
were used to guide the formation of the research ques-
tions that would be addressed by the evaluation. 在里面
questionnaire, the artist commented on:

1. his intentions,
2. the imagined visitor experience,

数字 5. Photo of an evaluation participant experiencing the
Parisﬂâneur installation.

3. the development process and the internal workings

of the installation,

4. the context within which the work has been

发达,

5. the expectations from the evaluation process, 和
6. expressed whether he felt the intentions have been

fulﬁlled.

The analysis of the questionnaire was augmented with

consulting other writings of the artist and experiencing
the installation (见图 5).

As described in Section 4, interaction in the installa-
tion is based on a metaphor that relates soundscapes to
“hats,” which a user can put on, walk with, and leave at
a speciﬁc location. The metaphor serves to communicate
the ducking gesture.

A listener may therefore interact in the following

方法:

1. Explore: move among sounds in order to ﬁnd out
what sounds are there and plan on how to engage
跟他们.

2. Listen: either to the soundscape composed or to

each sound ﬁeld recording alone.

3. Resynthesize: perform planned actions to rearrange
the soundscape. 在这个意义上, successful interaction
should be demonstrated by a fruitful exploration of
the soundscape, detailed listening to soundscapes of
兴趣, and resynthesis according to the desire of
个人.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d

我

F
/

2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Rumori and Marentakis 121

4. Contemplate: walk while immersed in a given

soundscape.

桌子 1. Flow of the Evaluation of the Parisﬂâneur Installation
during the Klangräume I Exhibition. Durations Are Suggestive

In discussion with the artist, the following targets

were set for the evaluation:

1. the success of the ducking metaphor, both at
a conceptual level as well as at the level of its
执行,

2. the listening experience, the success with which the
intended soundscape was delivered using binaural
audio and the resulting listening experience, 和
3. the interaction between the two aspects, 那是, 这
ability of the interaction metaphor to support user
engagement with the ﬁeld recordings.

The video-cued recall method, already described in
部分 3, appeared to be particularly appropriate for
the evaluation. This is because it allows users to directly
comment on their experience, and in this way both the
experiential as well as the usability aspects that have been
targeted by the evaluation could be investigated. Pilot
tests showed, 然而, that the application of the video-
cued recall technique was challenging because of the lack
of detailed audio feedback in a normal video record-
英, which may limit the ability of listeners to recall
their experience. This appears to be a general limiting
factor when considering the application of the video-
cued recall method to sound installations, especially in
the case of installations using binaural technology over
headphones. To avoid this problem, the audio output
of the installation was routed directly to the camera and
recorded in sync with the video stream. This resulted
in synchronous audiovisual information in the video
记录, which was deemed sufﬁcient for the per-
formance of the recall method when tested in the pilot
实验.

To further facilitate the evaluators, a number of open

questions was prepared. These addressed the experi-
恩斯, the ways people discovered and interacted with the
installation, possible difﬁculties with the ducking tech-
nique, the way people thought the installation works,
the appropriateness of the hat metaphor, and general
comments relating to what visitors liked and did not
like in the installation. These questions were used to
guide the discussion at the end of the video-cued recall

氮

任务

Duration

Documented Interaction with

30 min.

the Installation

Audiovisual cued-recall
Followup Questions
Filling in Scales

30 min.
15 min.
10 min.

method in case the topics were not raised by the visitors
while recalling their experiences. 最后, 参与者
were required to ﬁll in a number of rating scales at the
end of the session, which are shown in the Appendix.
In the scales, participants assessed crucial aspects of the
installation experience that could be presented using
one-dimensional semantic-differential or Likert scales.
桌子 1 shows the ﬂow of evaluation. The results are
illustrated in Figure 11.

5.2 程序

The installation was set up and staged in the

rehearsal room of the MUMUTH building at the Uni-
versity of Music and Performing Arts Graz for a period
of one week, during which time it was also open to the
public at given timeslots. Data were acquired in morning
sessions in which participants were invited to assist with
the evaluation of the installation according to the pro-
cedure outlined in Table 1. Visitors were provided with
information with respect to the installation. 尤其-
拉尔, they received a copy of the public text that normally
accompanied the installation, and the ducking gesture
was explained to them. 此外, they were allowed
to ask questions as they went along.

The resulting dataset consisted of 3 hours of video

材料, 4.5 hours of audio material in interviews,
36,732 transcribed words, plus scales and tracking data
from the eleven visitors.

Interviews and video recordings were analyzed using

an iterative coding process. The coding scheme that
emerged allowed us to understand what major aspects
were experienced in the installation. The coding scheme
went through several iterations.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d

我

F
/

2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

122 PRESENCE: VOLUME 26, NUMBER 2

5.3 结果

The Results section is broken down into three

subsections that deal with the presentation of the cod-
ing strategy of text and video data, 分别, 和
results from the scales completed by the participants.

5.3.1 Coding of Text Data. Data were coded
in seven different categories: 目的, Auditory Experi-
恩斯, Visual Experience, 状态, Purpose, Interaction, 和
概念. These were deﬁned as follows:

1. 目的: excerpts referring to the objects that gave

rise to the experiences of participants,

2. Auditory and
3. Visual experience, 分别: excerpts referring
to the sensory experiences reported by partici-
pants, 那是, to qualities of the auditory and visual
stimulation,

4. Purpose: excerpts revealing the associated purpose

的 (observable) 行动,

5. Concepts: excerpts referring to conceptual

协会,

6. 状态: descriptions of emotional states, 和
7. Interaction: excerpts in which participants
described interaction with the installation.

Common codes within each category can be inspected

图中 6. 数字 7 presents the frequency with which
excerpts were assigned to codes belonging to each cate-
gory. 此外, 图中 8, the number of excerpts
that were coded within each category for each person is
depicted. It appears that discussions were dominated by
references to interaction with the installation, the objects
that generated sensory experiences, and the auditory
experience of the visitors. At a second level, 参与者
described the purpose behind different actions they have
进行, their emotional state, and concepts that
emerged while interacting with the installation. 最后,
participants referred little to visual aspects, 偶尔
mentioning the absence of any visual stimulation. 这
picture is consistent across participants.

数字 8 showcases common codes within each cat-

egory according to their frequency. Most references
to the Object category were related to the content of

the binaural recordings in the installation. Most ref-
erences in the Auditory Experience category related
to the experience of listening to and interacting with
binaural audio, in particular this aspect of binaural-
性. Visitors were quite impressed by the sound quality
that can be achieved with this type of technology. Vis-
itors commented extensively on the changes in the
auditory experience that the installation offers. 这
included descriptions of changes in the auditory feed-
back depending on the different states one encountered.
Particularly, the contrast of “hat on” to “hat off” was
interpreted as a difference between foreground and
background by some participants. The issue of dis-
tinguishability between foreground and background
sounds was also raised relatively often. This referred to
difﬁculties in ﬁnding out whether sounds belonged to
the overall soundscape or to individual binaural record-
英格斯. Most actions that participants performed were
motivated by a will to discover how the installation
works and to engage with the different sounds that
could be experienced in the installation. At a second
等级, there was some hypothesis testing in relation to
the functionality of the installation, 那是, what hap-
pens when one gets out of the tracking area, 和
repercussions of carrying sounds around and attempts to
manipulate the way things appeared.

Concerning different experienced states, visitors
referred most often to statements of appeal, relating to
what they liked and disliked in the installation. The lis-
tening experience was very much part of the discussion,
implying strongly that visitor attention was very much
directed to hearing. Participants referred to the extent
to which they believed they have discovered all material,
and the different feelings that they experienced while
interacting with the installation.

The installation experience gave rise to a variety of
概念, by way of association to the sound material
through personal experience or interpretation. Refer-
ences to the city of Paris featured prominently in the
participants’ comments. 此外, 参与者
commonly referred to the experience of listening into
something that they encountered accidentally. 这
was often associated with a feeling of listening without
having been invited to or having asked for permission.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d

我

F
/

2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

Rumori and Marentakis 123

(A) 目的

(乙) Auditory Experience

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d

我

F
/

2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

(d) 状态

(e) 概念

(F) Interaction

数字 6. Common codes within each category.

124 PRESENCE: VOLUME 26, NUMBER 2

y
r
哦
G
e
t
A
C

r
e
p
s
G
n
d
哦
C
#

我

0
0
6

0
0
5

0
0
4

0
0
3

0
0
2

0
0
1

O bject

A uditory

P urpose

状态

C oncepts

Visual

Interaction

数字 7. Total number of codings assigned to each category.

P11

P10

Auditory Experience
Concepts
Interaction
目的
Purpose
状态
Visual Experience

0
5

0
0
1

0
5
1

0
0
2

数字 8. The frequency with which excerpts were coded in each
category depending on each participant.

本质上, participants felt that there was no way for
their presence to be registered within the space in which
the recording occurred. This brought up associations
of “intruding” or eavesdropping, which arguably testify
to a high degree of realism in the binaural recordings,
but also delineate the boundary between the installation
space and the recording ﬁelds.

Concepts were also raised with respect to the experi-
ence of interacting with the installation, hearing into, 或者
diving in. Participants often discussed ideas relating to
the technical setup of the installation. Certain sugges-
tions were made, 例如, to spread the installation
over a larger space, or to use light in order to indicate
the location of the different sounds. 尤其, A
number of visitors complained that the scene was too
dense, in the sense that more space should have been
available for moving around. They claimed that a larger
environment would have made it easier to locate sounds.
Concerning interaction, much of the discussion was
directed to the difﬁculties the visitors faced. These were
mostly related to using the ducking technique. 一些
participants complained that they could not always dif-
ferentiate between the “hat on” and “hat off” states,
and that they could not always control when ducking
would take effect. Most participants also mentioned that

it takes time to get to grips with the ducking technique,
and that the help of the evaluation crew was important
to clarify how this is done. One participant mentioned
that it may be useless to have the ducking technique,
given that one can simply go close to a recording and
listen to it quite well. Participants often did not realize
what happened to the sound once they took off their hat.
Another difﬁculty was to ﬁnd out how to intentionally
relocate sounds and rearrange the spatial arrangement
of the scene. This was not evident to all participants and
it only became clear to some after interacting with the
binaural recordings for a while. 最后, there were dif-
ﬁculties isolating sounds of interest in case they were
too close to other sounds. 此外, a few par-
ticipants wondered what happens when sounds end
up very close to each other and mentioned that this
leads to difﬁculties in engaging with the sound they
wanted to hear.

5.3.2 Coding of Video Data. Video recordings

of participants interacting with the installation were
also coded and summarized. 数字 9 shows how the
movements of participants were distributed. 数字 10
displays the corresponding duration each speciﬁc action
was performed.

我

D
哦
w
n
哦
A
d
e
d

F
r
哦
米
H

t
t

:
/
/

d
我
r
e
C
t
.

米

我
t
.

e
d
你
p
v
A
r
/
A
r
t
我
C
e
–
p
d

我

F
/

2
6
2
1
1
1
1
8
3
6
4
9
5
p
r
e
s
_
A
_
0
0
2
8
9
p
d

乙
y
G
你
e
s
t

哦
n
0
7
S
e
p
e
米
乙
e
r
2
0
2
3

s
e
C
n
A
t
s
n

我

y
r
哦
G
e
t
A
C

crouching

ducking

putting

常设

stepping

说

turning

w alking

数字 9. Average frequency with which different movements
occurred in the videos. Error bars correspond to standard error of the
意思是.

)

米

(

y
r
哦
G
e
A
C
H
C
A
e

我

e
米
时间

我

crouching

ducking

instructor

looking

putting

running

常设

stepping

turning

w alking

数字 10. Average duration participants engaged in actions
associated with each movement category. Error bars correspond to
standard error of the mean.

It is evident that most of the time was spent either
standing or moving at slow speed, 对应于
either listening to a speciﬁc binaural recording or explor-
ing the space in order to locate a new one, 分别.

Rumori and Marentakis 125

The third most common action was to perform the
ducking technique and to walk at a normal speed in the
room.

5.3.3 Scales. The subﬁgures in Figure 11 illus-

trate the results obtained using the aforementioned
scales. A χ2 test was used to examine whether the distri-
bution of the responses can be modeled by the uniform
分配. A p < 0.05 value indicates that the afore- mentioned hypothesis can be rejected, and thus that the tendency observed in the graph reﬂects a tendency in participants’ responses. Overall, the scale results provided the following ﬁndings: 1. Mixed responses concerning the usability of the ducking technique were obtained, whose usability was average. 2. Visitors felt immersed when listening to the indi- vidual soundscapes but there was no particular agreement concerning immersion in the case of listening to the virtual scene composed by all sounds. A Mann–Whitney test showed that a sig- niﬁcant shift in felt immersion occurred when participants listened to the individual soundscapes (Z = 3.371, p-value < 0.001). 3. Participants occasionally noticed the head- phones but on average they were not found annoying. 4. Visitors could orient and move toward sounds of interest with relative ease. 5. The impression from the installation was overall positive. 6. Participants questioned how the installation works, but were not convinced they had found a plausible answer. 5.4 Summary of Evaluation Findings Participants reported mostly auditory experiences, with some general remarks on visual aspects. Dynamic and static spatial auditory aspects permeated most of participants’ comments. Since one aim of Parisﬂâneur was to test participants’ interpretation of this differ- ence, this result is rather unsurprising. However, this l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u p v a r / a r t i c e - p d l f / / / / 2 6 2 1 1 1 1 8 3 6 4 9 5 p r e s _ a _ 0 0 2 8 9 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 126 PRESENCE: VOLUME 26, NUMBER 2 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u p v a r / a r t i c e - p d l f / / / / 2 6 2 1 1 1 1 8 3 6 4 9 5 p r e s _ a _ 0 0 2 8 9 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Figure 11. Results of the scale analysis of Parisﬂâneur. A ∗ indicates a signiﬁcant deviation from a uniform distribution at the p < 0.05 level. Rumori and Marentakis 127 difference was seldom cast in terms of a difference in scene spatial dynamics, but it was rather described as a difference in what constituted foreground and back- ground as a function of listening location. The boundary between foreground and background was, however, blurry as participants were unsure whether sounds belonged to a given binaural recording or the overall soundscape. This may explain why some participants mentioned a lack of consistency between the different scenes. Spatial and auditory exploration of the different auditory scenes was often referred to as a fun and excit- ing activity and most participants got a sense of being able to enter and leave auditory scenes. The spatial boundaries of auditory scenes were not always easy to locate. This limited the extent to which participants guided their movement by memorizing sound loca- tions. On the other hand, being inside an auditory scene (i.e., wearing a hat) was sometimes perceived as unpleasant. This may be related to the lack of head- movement cues while “wearing a hat.” Participants often referred to a feeling of peeking into an auditory scene, a sense of overhearing or “voyeurism.” For some, this may have also contributed to the sense of unpleas- antness. Participants became aware of the possibility to relocate sound hats (thereby constructing narra- tives with Parisﬂâneur), though this was rarely used intentionally. Interaction with Parisﬂâneur develops through learn- ing the interaction mechanism, and eventually becoming able to put on and take off “sound hats” (auditory scenes) and switch between a dynamic and a static 3D audio experience. Learning to perform the ducking ges- ture was, however, not easy and this was arguably the major obstacle to exploring Parisﬂâneur. The metaphors employed to describe the sound hats are revealing. The everyday nature of the binaural recordings was communicated well. The topic of Paris and of strolling was taken up positively, and it informed the associations people reported to a large extent. All the associations, concepts and experiences reported by par- ticipants refer not so much to the headphones and the physical relationship to Parisﬂâneur, but to the virtual objects perceived in the installation (i.e., street scenes, trafﬁc, music, etc.), which elicit emotions and associa- tions of Paris. This becomes evident in images reported by participants, which center on such themes. However, the cable used for the headphones to a certain extent hindered the latter activity. 6 Artistic Consequences of the Evaluation As a reaction to the evaluation process and its ﬁnd- ings summarized in the previous section, Parisﬂâneur has been largely reworked by the artist. Most signiﬁ- cantly, the interaction scheme was adapted to a different conceptual take on the aims of anecdotal exploration and aesthetic experience (see Section 6.1). As a conse- quence, the concept of “sound hat” for an enterable virtual source has been replaced by the less tangible “sound island.” Further changes include the principal redesign of the binaural rendering (Section 6.2) and numerous reﬁne- ments to the processing of soundﬁles for both binaural presentation and their transformation to virtual sources (see Section 6.3). 6.1 Modiﬁcations to the Interaction Model The ducking gesture for “putting on” and “tak- ing off sound hats” has been dropped. The evaluation revealed that performing the gesture was generally too difﬁcult and that unsuccessful interaction attempts lacked a clear indication of the reason for failure (see Section 5.3.1). It can be assumed that most unsuccess- ful ducking gestures did not catch the virtual source position precisely enough. Furthermore, the seman- tics of the gesture becomes ambiguous when multiple sound sources are very close to each other in the scene (cf. Section 5.3.1). The threshold for ducking, so far a ﬁxed medium value, did not ﬁt all body heights. This would have called for some improvement, for exam- ple, through either an individual calibration step or an adaptive behavior. Finally, the headphone cable that was described as impeding the strolling imagination may be even more cumbersome when ducking (see Section 5.4). l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u p v a r / a r t i c e - p d l f / / / / 2 6 2 1 1 1 1 8 3 6 4 9 5 p r e s _ a _ 0 0 2 8 9 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 128 PRESENCE: VOLUME 26, NUMBER 2 As an alternative to the ducking gesture, a less peculiar interaction scheme was sought that did not enforcedly use the vertical dimension. Consequentially, the aim for decoupling interaction from exploration was dis- carded. Two concurrent ways of entering sound islands have been conceived, one more controllable and one automatic mechanism that incorporates a rudimentary dynamic system. 6.1.1 Entering Sound Islands by Approaching. A sound island is entered when the lis- tener approaches its virtual position very closely (less than 0.15 meters) for more than 3 seconds. Within this radius, the redesigned rendering presents the virtual source as a conventional monophonic or slightly open- ing stereophonic signal localized in the listener’s head (see Section 6.2.3), which provides a distinct transition to the externalized binaural version of the recording. Entering sound islands by approaching picks up the Movement category of stepping to locate new sound sources, one of the most frequent modes performed according to the evaluation (see Section 5.3.2). 6.1.2 Entering Sound Islands Due to Being Passive. Whenever the listener is “calm,” that is, the speed of linear movement is less than 0.1 meters per second, his or her avatar in the scene accumulates a cer- tain “gravitational” force on the virtual sources. When a certain threshold is reached, the source closest to the listener is “attracted” and starts to draw nearer. The distance-based mechanism for entering the sound island (see Section 6.1.1) takes effect as soon as the source is close enough. It is important to notice that “gravity” here does not mean a correctly modeled physical effect of interdepen- dent masses. Rather, only the closest source is inﬂuenced based on a certain velocity curve in terms of the current distance. Entering sound islands due to being passive picks up standing still, the other most frequent Movement cat- egory in the installation (see Section 5.3.2). According to the evaluation, standing still suggests that a speciﬁc sound island is listened to without an attitude of spa- tial navigation or exploration. This attentive mode is accounted for by interpreting the lack of action as a trigger for interaction, causing the closest source to be entered and the scene to be rearranged. 6.1.3 Leaving a Sound Island. A sound island is left whenever the listener exceeds a distance of 0.15 meters from the source position, taking into account the gravity mechanism at the same time. Hence, as long as the listener is sufﬁciently active, the sound island remains immobile and is left as soon as the listener moves away. If accumulated gravity indicates a passive listener, the virtual source will continue to be attracted and “sticks” to the listener’s head, causing the virtual scene to be rearranged just like wearing a “sound hat” in the earlier implementation. Moving with more than 0.3 meters per second reduces accumulated gravity, until it goes below a threshold that causes the sound island to be detached from the listener. 6.2 Binaural System Redesign Major modiﬁcations were applied to the render- ing of the virtual sound scene based on trial-and-error experimental sessions and incremental subjective assess- ment (Rumori, 2017a). Most signiﬁcant changes include the reduction from three-dimensional rendering to two dimensions (see Section 6.2.1), switching from virtual Ambisonics-based interpolation to a simpler circular panning (see Section 6.2.2), and a complete redesign of the distance model (see Section 6.2.3). The resulting signal ﬂow of the revision is shown in Figure 4(b). 6.2.1 Two-Dimensional Rendering. In earlier versions, the sound scene was rendered using a three- dimensional approach (see Section 4.3.2). Based on reports from the evaluation, subjective experience of the artist and theoretical reﬂection, rendering was reduced to two dimensions. Consideration was triggered by the “ducking” inter- action gesture, which has been dropped in the reworked version (see Section 6.1). Without this form of inter- action, the vertical dimension turned out to be barely relevant for exploring the installation by listeners that now move solely on a plane. As nonindividualized l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u p v a r / a r t i c e - p d l f / / / / 2 6 2 1 1 1 1 8 3 6 4 9 5 p r e s _ a _ 0 0 2 8 9 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Rumori and Marentakis 129 impulse responses are used, the perception of source elevation cannot be expected to be very accurate anyway, while azimuthal perception should work relatively well (Wenzel, Arruda, Kistler, & Wightman, 1993). Further- more, two-dimensional rendering corresponds better to the geographic map metaphor of the reactive virtual scene that in turn refers to the notion of strolling (as in ﬂâneur). Finally, the decisive contrast in Parisﬂâneur with respect to dimensions and their experience is not that of two- or three-dimensional rendering of the virtual scene, but the different representations of the binau- ral recordings. They are collapsed to a monaural point source, that is, to a zero-dimensional entity in a physi- cal understanding for both two- and three-dimensional exocentric scene rendering. Only when presented as sound hats or islands, without any rendering taking place, do they unfold their fully three-dimensional spatiality. 6.2.2 Circular Panning. After having dismissed three-dimensional rendering, the use of Ambisonics has been dropped as well. The localization of virtual sources appeared to be a recurring issue in the evaluation, as it is generally in virtual and augmented environments. A simpler implementation framework was sought that allows for experimentation with different approaches for distance, room acoustics and angular resolution, also with respect to computational performance. The reworked implementation uses two concentric virtual speaker rings representing two levels of source distance and of apparent reverberation. Circular pan- ning between two adjacent virtual speakers resulted in crossfading two impulse responses, respectively, for interpolation. For the far ﬁeld, a ring of 12 speakers is formed by a subset of afore-mentioned BRIR s of IEM CUBE (cf. Section 4.3.3). Similar restrictions apply as described for Ambisonics rendering with BRIR s in Section 4.3.4: the virtual room acoustics remains attached to the listener’s head while the source positions in virtual space are updated accordingly. The near ﬁeld is represented by a ring of 36 speakers that correspond to free-ﬁeld impulse responses taken from the Sound- ScapeRenderer software (Geier & Spors, 2012). Hence, the azimuthal resolution is 30 degrees in the far and 10 degrees in the near ﬁeld. The linear interpolation of impulse responses and the azimuthal resolution have not been formally evaluated. Although there are much more advanced interpolation methods, the relatively resource-effective approach cho- sen was informally assessed as a signiﬁcant improvement, probably rather due to the distance-dependent amount of reverb than the linear interpolation. 6.2.3 Distance Levels. In addition to the two distance levels represented by virtual speaker rings, two kinds of conventional stereophonic techniques have been integrated for the notion of a very close source and of one inside the listener’s head. All levels overlap for smooth, gradual transitions (see Figure 12). Far-ﬁeld rendering by the reverberant 12-channel speaker ring is fully active for source distances of more than 1.5 meters. With further increments in source dis- tance, the signal is attenuated and low-pass ﬁltered. Below this distance, the 36-channel speaker ring is used for rendering an unechoic monaural source. If the source is approached closer than 0.5 meters, it splits into two stereo channels of the processed recordings (see Section 6.3.1), rendered as two sources whose opening angle (i.e., stereo base) gradually increases when further advancing. At an even closer distance (less than 0.2 meters), the virtual source starts to enter the listener’s head. This is conveyed by exploiting in-head localization of coinci- dent signals. The two rendered stereo channels converge to a mono version of the processed recordings, which is directly played to both headphone channels without any impulse response convolution taking place. For a very small area around the center (less than 0.1 meters), the source diverges into its two stereo channels again, this time played directly to the headphone channels just like the monaural version before. The anticipated stereo signals serve as a bridge between a conventional stereo- phonic headphone playback and a static, egocentric binaural recording. 6.2.4 Audible Tracking Volume Boundary. Audible feedback was used to indicate the active track- ing area boundary. Outside the area, the playback fades l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u p v a r / a r t i c e - p d l f / / / / 2 6 2 1 1 1 1 8 3 6 4 9 5 p r e s _ a _ 0 0 2 8 9 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 130 PRESENCE: VOLUME 26, NUMBER 2 1.5m far field, reverberant near field, anechoic 0.5m rendered stereo source L 0.2m R 0.1m in−head monaural in−head stereophonic Figure 12. Sound processing as a function of distance level in the revised version of Parisﬂâneur. to a generative modulated noise texture that shall have a minimal dynamic appearance rather than purely static, “technical” noise. importance of the sonic quality and the techniques employed in revised rendering, a more thorough procedure of sound ﬁle processing was sought. 6.3 Processing of Binaural Recordings A central quality of Parisﬂâneur is that the same binaural recordings are used in two different ways: As they are, static binaural recordings perceived from an egocentric perspective, and as sources for binaural ren- dering in an exocentric virtual auditory scene. This poses the challenge of processing binaural source material for beneﬁcial rendering. In earlier implementations of the installation, the recordings were used nearly unprocessed for binaural presentation, whereas for the monaural virtual sources only the left channel of each recording was used with minimal equalization (see Section 4.3.5). Picking only one channel results in a spectral disbalance of con- tralateral sounds, as higher frequencies are increasingly attenuated by the listener’s head (cf. Rumori, 2017a). Most evaluation participants showed a high willing- 6.3.1 Binaural Recordings as Virtual Sources for Rendering. The reworked rendering approach presents the underlying recordings as a stereophonic pair of virtual sources and as conventional stereophonic signals on headphones, in both cases with gradual transitions to, or from, a monaural presentation (see Section 6.2.3). Thus mono compatibility is required, which is usually not the case for binaural material due to phase problems especially at lower frequencies. A so-called Blumlein shufﬂer was applied to the binaural recordings, which turns such phase differences into level differences at low frequences. The recordings were addi- tionally equalized considering the coloration introduced by binaural rendering (i.e., the impulse responses) and in comparison to the original binaural versions to support smooth transitions. 6.3.2 Soundﬁles for Binaural Playback. Pro- ness to engage with the recordings, both in terms of interaction and of concentrated listening; and their feed- back frequently mentioned the changes of auditory experience between the egocentric and the exocen- tric modes (see Section 5.3.1). To reﬂect the apparent cessing for binaural presentation involved mainly common sound engineering tasks. Some subjective per-ﬁle equalization and a spectral balancing among the different sound ﬁles have been performed using the same headphones as in the installation and with the l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u p v a r / a r t i c e - p d l f / / / / 2 6 2 1 1 1 1 8 3 6 4 9 5 p r e s _ a _ 0 0 2 8 9 p d / . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Rumori and Marentakis 131 headphone correction ﬁlter in effect. The dynamic range of some recordings has been slightly reduced by com- pression so as to better adjust them to each other for combination in the virtual auditory scene. This com- pensation addresses the dominance of certain elements in the recordings which might have partly caused their reported confusion with scene objects during evaluation (see Section 5.3.1). 7 Discussion The case study Parisﬂâneur has been presented as an artistic approach to researching binaural technol- ogy that is neither restricted to engineering of rendering means nor to mere “content” production. Borders between the two, as established by common scene authoring workﬂows, are constantly crossed. The com- bination of a rendered binaural scene and static binaural recordings indicates a meta-perspective on both kinds of media rather than conveying a particular spatial auditory image. The transition between the two involves changes of reference and perspective: while the rendered exocen- tric scene shall appear tied to the listener’s surrounding and navigable, the nonreactive binaural recordings are presented egocentric to the listener’s head and are carried along. Similarly, the rendering of the virtual scene does not coherently model the physics of sound radiation as usually suggested by basic principles of communica- tions engineering. As described in Section 6.2.3, virtual sources in the very near ﬁeld, and those coincident with the listener, are displayed using conventional stereo- phonic techniques on headphones. The decomposition of the monaural signal into two channels at very close distances in fact makes use of a binaurally displayed virtual loudspeaker pair with a dynamically increasing stereo width. The previous virtual Ambisonics implementation sought a similar effect of apparent source widening in a coherent way by gradual attenuation of higher- order components with decreasing source distance (see Section 4.3.5). Finally, only the omnidirectional component (zeroth order) remained, resulting in the same signal on all virtual loudspeakers after decoding and indicating that the source has been “entered” in the virtual scene. In the reworked implementation, the climax of reaching the source is indicated by in-head localization through coincident ear signals, that is, using conventional stereophony directly on headphones rather than on rendered virtual speakers. Obviously, the psychoacoustic effect of an omnidi- rectional signal on a multichannel speaker setup that is rendered for headphone listening is fundamentally different from an in-head phantom source. The latter was chosen because of its metaphorical correspondence to the constellation in the scene and its strong bodily experience that does not occur in nature, except of a few bodily noises such as chewing (Rumori, 2017b). This way, the two notions of an exocentric virtual scene object “inside” the listener’s head and the listener “inside” an egocentric sound island in terms of a binau- ral recording are embodied by two extremes of auditory phenomena. The mixture of stereophonic and binaural tech- niques in addition to the combination of egocentric and exocentric binaural presentation make the reworked implementation even less compatible with common ren- dering methods for universal scene descriptions than the ﬁrst. Instead, the idiosyncrasies of medial representation, be it those of a recording, those of binaural rendering, or those of stereophony, are not considered side effects but intrinsic, inseparable qualities. Apart from “content” production, the installation could not have been real- ized without access to the signal ﬂow and the rendering algorithms. The notions of “inside” and “outside” are closely related to the interaction model of the installation, which underwent a major revision based on evaluation results (see Section 6.1). Most signiﬁcantly, the former “ducking” gesture turned out to be hard to perform successfully. A very insightful ﬁnding for the artist was the participants’ frequent perceptual distinction of vir- tual scene and static binaural recordings (i.e., “wearing a sound hat”) as “background” and “foreground” rather than “outside” and “inside.” Equally enlightening was the observation that some participants had difﬁculties in delineating point sources in the virtual scene by their l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u p v a r / a r t i c e - p d l f / / / / 2 6 2 1 1 1 1 8 3 6 4 9 5 p r e s _ a _ 0 0 2 8 9 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 132 PRESENCE: VOLUME 26, NUMBER 2 spatial arrangement but instead mixed elements of them according to narrative correspondences in the record- ings (see Section 5.3). Such ﬁndings were not expected by the artist, who in turn discovered his own attitude to the installation as being much more determined by its technical realization than by auditory experience, despite the focused emphasis on acoustic ecology and anecdotal music as mentioned in Section 4.2. One possibility to address the limited usability of the interaction technique could have been to introduce an even simpler and easier to perform gesture, comple- mented by an explicit auditory feedback on the success of interaction (in addition to the change between the two kinds of binaural display). Nevertheless, the fore- ground/background notion and the narrative-based mixing of anecdotal elements from different recorded tracks would not be considered. The artist opted for the opposite way. In his mind, the described usabil- ity issues should not be interpreted as a weakness in conveying clarity of the installation’s functionality. Instead, participants’ comments of this kind may indi- cate an auditory awareness in terms of ecological rather than analytic listening despite the documented will to discover working principles of the installation (see Section 5.3). With the newly conceived interaction scheme, enter- ing and leaving sound islands may appear even more difﬁcult to perform or prevent deliberately. An explicit gesture like ducking is not needed any more; just com- ing close to the virtual source or being passive for some time sufﬁces, which is prone to unintentional transi- tions. On the other hand, the new mechanism is not conceived to be intuitively mastered but rather to “hap- pen” even if the listener is not consciously aware of it. Unlike the ducking gesture to be performed ver- tically, that is, orthogonal to two-dimensional spatial orientation, entering a sound island by approaching or due to being passive is much more entangled with exploration. Self-acting changes to the auditory experi- ence without prior interaction indicate a certain “life of its own.” In a similar vein, the playful reﬂection of egocen- tric and exocentric spatiality by perspective changes is affected by the revised interaction mechanism. Evaluation results, among them those with respect to deliberate changes of perspective by ducking, notions of meta-perspective and second-order introspection as introduced in Section 4.2, and the mostly unexplored feature of interactive scene reorganization, suggest that a cognitive map of Parisﬂâneur’s technical functionality is rarely developed by lay listeners even if supported by an introductory explanation (see Sections 5.3.1 and 5.3.2). Only expert listeners of binaural audio may be able to grasp the described functionality by immediate experi- ence. However, the reported associations of intruding or eavesdropping on the recorded situations illustrate the successful abstraction from the egocentric recordings towards a metaperspective of an exocentric scene that includes the listeners themselves. The revised interac- tion scheme shall further direct the listener’s attention away from a technical engagement with the installa- tion in favor of exploration by ecological listening, and metaphorical attributions of sense to perceived sonic changes. A consious choice was to provide visitors with infor- mation about the installation prior to the evaluation and to give them the opportunity to ask questions as they went along. In this sense, the evaluation does not explicitly test the ways visitors would attribute meaning to the installation spontaneously. This highly interest- ing question was deﬁned to be outside the scope of the evaluation. Instead, we wanted to approximate typical visiting conditions. We assumed that a visitor would typically read the descriptive text and glimpse primary modes of interaction by observing others. Further- more, questions were allowed in order to observe the points that needed clariﬁcation, if any, and help vis- itors with exploring the installation without getting stuck. We felt that both behavioral patterns and emerg- ing issues under the aforementioned conditions would have been more relevant for the subsequent installation development. The developments force us to rethink the deﬁnition of binaural. For example, the use of in-head localization through stereophony on headphones is not usually con- sidered “binaural,” as no head-related impulse responses are involved. Reﬂecting the discussion in the previ- ous paragraphs, we propose to extend the deﬁnition of l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u p v a r / a r t i c e - p d l f / / / / 2 6 2 1 1 1 1 8 3 6 4 9 5 p r e s _ a _ 0 0 2 8 9 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Rumori and Marentakis 133 “binaural” to any intended use of ear signals irrespective of their properties, origin, or means of projection (e.g., headphones or transaural). The term “binaural” has also been used in different meanings before. For some time, it merely meant the spatial augmentation of audio signal transmission by adding a second channel (Alexander, 2000; Wade & Deutsch, 2008; Paul, 2009). Realism is often a criterion for the immersive potential of virtual environments. From a perspective of aesthetic experience, “realism” does not address reality in terms of the real world but the semblance of reality in a speciﬁc medial context. In fact, the notion of “realism” in bin- aural engineering usually translates to the reproduction ﬁdelity of ear signals, that is, minimizing their devia- tion from those in an equivalent real-world situation (Sunder, He, Tan, & Gan, 2015). For the case study “Parisﬂâneur,” there is obviously no such corresponding real-world situation. At best, the virtual scene may be regarded as a collection of omnidirectional loudspeak- ers playing back monaural ﬁeld recordings. However, experiencing the virtual scene in Parisﬂâneur does not imply the imagination of seven loudspeakers in space but the successful evolvement of a mental map com- prising abstract sound objects. Credibility, or, inversely put, suspension of disbelief, is achieved when the listener engages with the virtual scene such that interaction with its objects becomes possible. This notion of immersion is based on culturally established acousmatic experience of sounds disembodied from their origins. Nevertheless, such abstract sound objects may gain physical pres- ence without a correspondence to physical reality when supported by a narrative. 8 Conclusion In this article, we presented an integral take on binaural audio technology from an artistic research per- spective. The project has been carried out in the area of interactive, sound-based installation art. A case study has been introduced whose artistic aims and imple- mentation details have been thoroughly described. In particular, interactive elements of the installation have been analyzed for subsequent evaluation. The complexity of developing formal evaluation methods for aspects of artistic works has been demon- strated. Based on an extensive literature review, appro- priate methods from related areas were adopted and reﬁned for the intended evaluation task. It turned out that such efforts frame an intensive, fruitful process for both the artists and the researchers and yield valuable material for further theoretical reﬂections and artistic practice. Finally, a substantially different conceptual take on the case study and its realization is documented as the artist’s reaction on the evaluation and the reﬂection pro- cess inﬂuenced by it. Changes to the implementation and the interaction model are described in detail, and major aspects are discussed. The project exempliﬁes that the separation of techni- cal engineering and so-called “content” production as currently widespread may be inappropriate. Depend- ing on the context, this ﬁnding may apply to areas other than binaural audio as well, whenever media is not regarded as a mere container for conveying an independent subject matter but as part of an integral aesthetic experience. Consequently, close connections between scholarly and artistic research as well as engi- neering pose a promising lead for a further advance in transdisciplinary collaboration. Acknowledgments We would like to thank the members of the Klangräume project team, David Pirrò, Stefan Reichmann, Marian Weger; and the institutional partners, the Institute of Electronic Music and Acoustics Graz, University of Applied Sciences FH Joanneum Graz and ESC Media Art Lab Graz, Austria. Klangräume has been funded as part of the programme Exciting Science and Social Innovations of Zukunftsfonds Steiermark (Funds for the future development of the region of Styria, Austria). The case study described in Section 4 was initially conceived during two short-term scientiﬁc missions by Martin Rumori, funded by the Sonic Interaction Design European COST action (COST IC0601, Rocchesso, 2011). The editors and reviewers of Presence: Teleoperators and Vir- tual Environments provided valuable and much appreciated suggestions for the improvement of earlier revisions of this article. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u p v a r / a r t i c e - p d l f / / / / 2 6 2 1 1 1 1 8 3 6 4 9 5 p r e s _ a _ 0 0 2 8 9 p d / . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 134 PRESENCE: VOLUME 26, NUMBER 2 References Alexander, R. (2000). The inventor of Stereo: The life and works Eckel, G. (2001). The vision of the LISTEN project. Proceed- ings of the 7th International Conference on Virtual Systems and Multimedia, 393–396. of Alan Dower Blumlein. Waltham, MA: Focal Press. Edmonds, E. (2010). The art of interaction. In Alon, D. L., Sheaffer, J., & Rafaely, B. (2015). Robust plane-wave decomposition of spherical microphone array recordings for binaural sound reproduction. The Journal of the Acoustical Society of America, 138(3). Bilda, Z., Costello, B., & Amitani, S. (2006). Collaborative analysis framework for evaluating interactive art experience. CoDesign, 2(4), 225–238. Bilda, Z., Edmonds, E., & Candy, L. (2008). Designing for creative engagement. Design Studies, 29(6), 525–540. Bimber, O., & Raskar, R. (2005). Spatial augmented reality. Natick, MA: A K Peters. Bishop, C. (2005). Installation art. London: Tate Publishing. Bleidt, R., Borsum, A., Fuchs, H., & Weiss, S. M. (2014). Object-based audio: Opportunities for improved listening experience and increased listener involvement. Proceed- ings of SMPTE Annual Technical Conference & Exhibition, 1–20. Boehner, K., Sengers, P., & Warner, S. (2008). Interfaces with the ineffable: Meeting aesthetic experience on its own terms. ACM Transactions on Computer–Human Interaction, 15(3), 12:1–12:29. Borgdorff, H. (2006). The debate on research in the arts (Sen- suous Knowledge No. 2). Bergen: Bergen Academy of Art and Design. Bregman, A. S. (1990). Auditory scene analysis. The percep- tual organization of sound. Cambridge, MA/London: MIT Press. Bronkhorst, A. W., Veltman, J. A., & van Breda, L. (1996). Application of a three-dimensional auditory display in a ﬂight task. Human Factors, 38(1), 23–33. Candy, L., Amitani, S., & Bilda, Z. (2006). Practice-led strategies for interactive art research. CoDesign, 2(4), 209–223. Candy, L., Edmonds, E., & Ascott, R. (2011). Interacting: Art, research and the creative practitioner. Oxfordshire: Libri Pub. Candy, L., & Ferguson, S. (2014). Interactive experience in the digital age: Evaluating new art practice. Berlin/Heidelberg: Springer. Coleridge, S. T. (1898). Biographia literaria or biographical sketches of my literary life and opinions and two lay sermons. London: George Bell and Sons. Create 10 (n.p.) Retrieved from http://www.bcs.org /upload/pdf/ewic_create10_keynote3.pdf Edmonds, E., Bilda, Z., & Muller, L. (2009). Artist, evaluator and curator: Three viewpoints on interactive art, evaluation and audience experience. Digital Creativity, 20(3), 141– 151. Ettlinger, O. (2008). The architecture of virtual space. Ljubljana: University of Ljubljana. Farina, A. (2000). Simultaneous measurement of impulse response and distortion with a swept-sine technique. Pro- ceedings of Audio Engineering Society Convention, 108, 1–23. Franinovi´c, K., & Salter, C. (2013). The experience of sonic interaction. In K. Franinovi´c & S. Seraﬁn (Eds.), Sonic interaction design (pp. 39–75). Cambridge, MA/London: MIT Press. Frayling, C. (1993). Research in art and design. Royal College of Art Research Papers, 1(1), 1–5. Gaver, B. (2002). Designing for homo ludens. I3 Magazine, 2–6. Geier, M., & Spors, S. (2012). Spatial audio reproduc- tion with the SoundScape Renderer. Proceedings of 27th Tonmeistertagung—VDT International Convention, 646–655. Gilkey, R., & Anderson, T. R. (Eds.). (2015). Binaural and spatial hearing in real and virtual environments. Hove: Psychology Press. Glaser, B. G., & Strauss, A. L. (1967). The discovery of grounded theory: Strategies for qualitative research. Hawthorne, NY: Aldine de Gruyter. Grau, O. (2003). Virtual art. From illusion to immersion. Cambridge, MA/London: MIT Press. Höök, K., Sengers, P., & Andersson, G. (2003). Sense and sensibility: Evaluation and interactive art. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 5, 241–248. Jensen, J. F. (1998). Interactivity. Tracking a new concept in media and communication studies. Nordicom Review, 19(1), 185–204. Kane, B. (2015). Sound unseen. Acousmatic sound in theory and practice. Oxford: Oxford University Press. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u p v a r / a r t i c e - p d l f / / / / 2 6 2 1 1 1 1 8 3 6 4 9 5 p r e s _ a _ 0 0 2 8 9 p d / . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Rumori and Marentakis 135 Krebs, S. (2016). The failure of binaural stereo: Ger- Paine, G. (2002). Interactivity, where to from here? Organised man sound engineers and the introduction of dummy head microphones. Kunstkopf Stereophony. Failure and Success of Dummy Head Recording: An Inno- vation History of 3D Listening. Retrieved from https://binauralrecording.wordpress.com/2016/08/03 /the-failure-of-binaural-stereo-german-sound-engineers- and-the-introduction-of-dummy-head-microphones/ Lentz, T., Assenmacher, I., Vorländer, M., & Kuhlen, T. (2006). Precise near-to-head acoustics with binaural synthesis. Journal of Virtual Reality and Broadcasting, 3(2). Lindau, A., Klemmer, M., & Weinzierl, S. (2008). Zur bin- auralen Simulation verteilter Schallquellen [On binaural simulation of distributed sound sources]. Proceedings of the 34th DAGA, 897–898. Liu, Q., Wang, W., Jackson, J. B., & Cox, T. J. (2015). A source separation evaluation method in object-based spatial audio. Proceedings of European Signal Processing Conference, 1088–1092. Marentakis, G., Pirrò, D., & Kapeller, R. (2014). Zwischenräume—A case study in the evaluation of interactive sound installations. Proceedings of the Joint Inter- national Computer Music/Sound and Music Computing Conferences, 277–284. Marentakis, G., Pirrò, D., & Weger, M. (2017). Creative eval- uation. Proceedings of the 2017 Conference on Designing Interactive Systems, 853–864. Morrison, A., Mitchell, P., & Brereton, M. (2007). The lens of ludic engagement: Evaluating participation in interactive art installations. MultiMedia 2007, 509–512. Muller, L., & Edmonds, E. (2006). Living laboratories: Making and curating interactive art. SIGGRAPH 2006 Electronic Art and Animation, 147–150. Retrieved from http://siggraph.org/artdesign/gallery/S06/paper2.pdf Sound, 7 (3), 295–304. Paul, S. (2009). Binaural recording technology: A historical review and possible future developments. Acta Acustica united with Acustica, 95, 767–788. Pauli, H. (1971). Für wen komponieren Sie eigentlich? [For whom do you actually compose?]. Frankfurt: Fischer. Reck, H. U. (2007). The myth of media art. Weimar: VDG. Reiss, J. H. (1999). From margin to center. The spaces of installation art. Cambridge, MA/London: MIT Press. Rocchesso, D. (2011). Explorations in sonic interaction design. Berlin: Logos. Rumori, M. (2017a). Binaural ﬂoss—Exploring media, immersion, technology. Proceedings of the International Linux Audio Conference, 13–20. Rumori, M. (2017b). Space and body in sound art: Artistic explorations in binaural audio augmented environments. In C. Wöllner (Ed.), Body, sound and space in music and beyond (pp. 235–256). London: Routledge. Rumori, M., & Hollerweger, F. (2013). Production and appli- cation of room impulse responses for multichannel setups using FLOSS tools. Proceedings of the International Linux Audio Conference, 125–132. Rumori, M., Hollerweger, F., & Cabrera, A. (2010). Binaural room impulse responses for composition, documentation, virtual acoustics and audio augmented environments. Pro- ceedings of 26th Tonmeistertagung—VDT International Convention, 670–679. Ryan, M. L. (2001). Narrative as virtual reality. Immer- sion and interactivity in literature and electronic media. Baltimore/London: Johns Hopkins University Press. Schärer, Z., & Lindau, A. (2009). Evaluation of equaliza- tion methods for binaural signals. Proceedings of the Audio Engineering Society Convention, 126, 1–17. Muller, L., Edmonds, E., & Connell, M. (2006). Sengers, P., & Gaver, B. (2006). Staying open to interpreta- Living laboratories for interactive art. CoDesign, 2(4), 195–207. Niklas, S. (2014). Die Kopfhörerin. Mobiles Musikhören als ästhetische Erfahrung [The headphone listener. Mobile music listening as aesthetic experience]. Paderborn: Fink. Noisternig, M., Sontacchi, A., Musil, T., & Höldrich, R. (2003). A 3D Ambisonic based binaural sound reproduc- tion system. Proceedings of the 24th Audio Engineering Society Conference, 1–5. tion: Engaging multiple meanings in design and evaluation. Proceedings of the 6th Conference on Designing Interactive Systems, 99–108. Sunder, K., He, J., Tan, E. L., & Gan, W. S. (2015). Natu- ral sound rendering for headphones: Integration of signal processing techniques. IEEE Signal Processing Magazine (Special Issue on Signal Processing Techniques for Assisted Listening), 32(2), 110–113. The SuperCollider book. (2011). Cambridge, MA/London: Novo, P. (2005). Auditory virtual environments. In MIT Press. J. Blauert (Ed.), Communication acoustics (pp. 277–297). Berlin/Heidelberg: Springer. von Foerster, H. (2002). Understanding understanding: Essays on cybernetics and cognition. Berlin/Heidelberg: Springer. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u p v a r / a r t i c e - p d l f / / / / 2 6 2 1 1 1 1 8 3 6 4 9 5 p r e s _ a _ 0 0 2 8 9 p d / . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 136 PRESENCE: VOLUME 26, NUMBER 2 Wade, N., & Deutsch, D. (2008). Binaural hearing. Before and after the stethophone. Acoustics Today, 4(3), 16–27. Warusfel, O., & Eckel, G. (2004). LISTEN. Augmenting fade into the binaural recording. By ducking again, the attached hat can be put off again and remains at its new position in the virtual scene. everyday environments through interactive soundscapes. Workshop Proceedings of IEEE VR 04 (n.p.) Retrieved from http://resumbrae.com/vr04/warusfel.pdf Wenzel, E. M., Arruda, M., Kistler, D. J., & Wightman, F. L. (1993). Localization using non-individualized head-related transfer functions. The Journal of the Acoustical Society of America, 94(1), 111–123. Appendix Introductory Text to Parisflâneur in the Exhibition (This text normally accompanies public exhibitions of the installation and was also presented to the participants of the evaluation; see Section 5.2.) Parisﬂâneur is an interactive sound environment using binaural rendering and tracked headphones. Seven invisible but audible “sound hats” are arranged in the virtual space which the listeners can ﬁnd by approaching, turning and listening. The “hats” contain different sound situations recorded in and around the city of Paris. By ducking, the listeners can put on a speciﬁc hat, which lets them leave the virtual auditory scene and fully Evaluation Response Sheet (Translated from German, reproduced on next page.) The questionnaire was structured using both the semantic differential as well as the Likert scale styles. For each question, we debated which style would have been more appropriate. We used semantic differential for questions which could have also been answered with a number on a scale (e.g., state how difﬁcult was something on a scale from 1 to 7). We decided to give semantic differential scales seven levels of resolution, a typical decision for such scales. We used Likert scales for questions in which we felt that the aforementioned strategy would require visitors to make diverging assumptions. In such cases, we felt that the short textual explanation of each scale point would increase interpretation consistency and have value for both the evaluation and the participants. We went for ﬁve levels, and debated whether including a neutral value would be meaningful in all cases. In one case, this was not considered meaningful, which left us with four levels. l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u p v a r / a r t i c e - p d l f / / / / 2 6 2 1 1 1 1 8 3 6 4 9 5 p r e s _ a _ 0 0 2 8 9 p d / . f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Rumori and Marentakis 137 l D o w n o a d e d f r o m h t t p : / / d i r e c t . m i t . / e d u p v a r / a r t i c e - p d l f / / / / 2 6 2 1 1 1 1 8 3 6 4 9 5 p r e s _ a _ 0 0 2 8 9 p d . / f b y g u e s t t o n 0 7 S e p e m b e r 2 0 2 3 Martin Rumori* image

下载pdf