Ge Wang

Ge Wang
Center for Computer Research in Music
and Acoustics (CCRMA)
Department of Music
Université de Stanford
660 Lomita Drive
Stanford, California 94305, Etats-Unis
ge@ccrma.stanford.edu

Ocarina: Designing the
iPhone’s Magic Flute

Abstrait: Ocarina, created in 2008 for the iPhone, is one of the first musical artifacts in the age of pervasive,
app-based mobile computing. It presents a flute-like physical interaction using microphone input, multi-touch, et
accelerometers—and a social dimension that allows users to listen in to each other around the world. This article
chronicles Smule’s Ocarina as a mobile musical experiment for the masses, examining in depth its design, aesthetics,
physical interaction, and social interaction, as well as documenting its inextricable relationship with the rise of mobile
computing as catalyzed by mobile devices such as the iPhone.

Ocarina for the iPhone was one of the earliest
mobile-musical (and social-musical) apps in
this modern era of personal mobile computing.
Created and released in 2008, it re-envisions an
ancient flute-like clay instrument—the four-hole
“English-pendant” ocarina—and transforms it in
the kiln of modern technology (voir la figure 1). Il
features physical interaction, making use of breath
input, multi-touch, and accelerometer, aussi
as social interaction that allows users to listen
in to each other playing this instrument around
the world, anonymously (in a sort of musical
“voyeurism”), by taking advantage of the iPhone’s
Global Positioning System (GPS) location and its
persistent network connection (voir la figure 2). À
date, the Smule Ocarina and its successor, Ocarina 2
(released in 2012), has more than ten million users
worldwide, and was a first class inductee into
Apple’s App Store Hall of Fame. More than five
years after its inception and the beginning of a
new era of apps on powerful smartphones, we look
in depth at Ocarina’s design—both physical and
social—as well as user case studies, and reflect on
what we have learned so far.

When the Apple App Store launched in 2008—one
year after the introduction of the first iPhone—few
could have predicted the transformative effect app-
mediated mobile computing would have on the
monde, ushering in a new era of personal computing
and new waves of designers, developers, and even
entire companies. Dans 2012 alone, 715 million

Computer Music Journal, 38:2, pp. 8–21, Été 2014
est ce que je:10.1162/COMJ a 00236
c(cid:2) 2014 Massachusetts Institute of Technology.

new units of smartphones were sold worldwide.
Entre-temps, in Apple’s App Store, there are now
over one million distinct apps spanning dozens
of categories, including lifestyle, travel, games,
productivity, and music. In the humble, early days
of mobile apps, cependant, there were far fewer
(on the order of a few thousand) apps. Ocarina
was one of the very first musical apps. It was
designed to be an expressive musical instrument,
and represents perhaps the first mass-adopted,
social-mobile musical instrument.

Origins and Related Works

The ingredients in creating such an artifact can be
traced to interactive computer music software such
as the ChucK programming language (Wang 2008),
which runs in every instance of Ocarina, laptop
orchestras at Princeton University and Stanford
University (Trueman 2007; Wang et al. 2008, 2009un),
and the first mobile phone orchestra (Wang, Essl,
and Penttinen 2008, 2014; Oh et al. 2010), utilizing
research from 2003 until the present. These works
helped lead to the founding of the mobile-music
startup company Smule (Wang et al. 2009b; Wang
2014, 2015), which released its first apps in summer
2008 et, at the time of this writing (dans 2013), a
reached over 100 million users.

More broadly, much of this was inspired and
informed by research on mobile music, lequel
was taking place in computer music and related
communities well before critical mass adoption of
an app-driven mobile device like the iPhone.

Reports on an emerging community of mobile

music and its potential can be traced back to

8

Computer Music Journal

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi
/
c
o
m

j
/

je

un
r
t
je
c
e

p
d

F
/

/

/

/

3
8
2
8
1
8
5
5
9
8
8
/
c
o
m
_
un
_
0
0
2
3
6
p
d

.

j

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Chiffre 1. Ocarina for
the iPhone. The user blows
into the microphone to
articulate the sound, multi-
touch is used to control
pitch, and accelerometers
control vibrato.

Chiffre 2. As counterpoint
to the physical instrument,
Ocarina also presents
a social interaction that
allows users to listen in,

surreptitiously, to others
playing Ocarina around
the world, taking advan-
tage of GPS location and
cloud-based networking.

documented in prior work. In the Princeton Lap-
top Orchestra classroom of 2007, Matt Hoffman
created an instrument and piece for “unplugged”
(c'est à dire., without external amplification) laptops, called
Breathalyzer, which required performers to blow
into the microphone to expressively control audio
synthesis (Hoffman 2007). Ananya Misra, with Essl
and Rohs, conducted a series of experiments that
used the microphone for mobile music performance
(including breath input, combined with camera
input; see Misra, Essl, and Rohs 2008). As far as
we know, theirs was the first attempt to make a
breath-mediated, flute-like mobile phone interface.
En outre, Essl and Rohs (2007) documented sig-
nificant exploration in combining audio synthesis,
accelerometer, compass, and camera in creating
purely on-device (c'est à dire., no laptop) musical interfaces,
collectively called ShaMus.

Location and global positioning play a significant
role in Ocarina. This notion of “locative media,” a
term used by Atau Tanaka and Lalya Gaye (Tanaka
and Gemeinboeck 2006) has been explored in various
installations, les performances, and other projects.
These include Johan Wagenaar’s Kadoum, dans lequel
GPS sensors reported heart-rate information from
24 participants in Australia to an art installation on
a different continent. Gaye, Maz ´e, and Holmquist
(2003) explored locative media in Sonic City with
location-aware body sensors. Tanaka et al. have
pioneered a number of projects on this topic,
including Malleable Mobile Music and Net D ´erive,
the latter making use of a centralized installation
that tracked and interacted with geographically
diverse participants (Tanaka and Gemeinboeck
2008).

Dernièrement, the notion of using mobile phones for
musical expression in performance can be traced
back to Golan Levin’s Dialtones (Lévine 2001),
perhaps the earliest concert concept that used the
audience’s mobile phones as the centerpiece of a
sustained live performance. Plus récemment, le
aforementioned Stanford Mobile Phone Orchestra
was formed in 2007 as the first ensemble of its kind.
The Stanford Mobile Phone Orchestra explored a
more mobile, locative notion of “electronic chamber
music” as pioneered by the Princeton Laptop
Orchestra (Trueman 2007; Smallwood et al. 2008;

Wang

9

Chiffre 1

Chiffre 2

2004 et 2006 (Tanaka 2004; Gaye et al. 2006).
The first sound synthesis on mobile phones was
documented by projects such as PDa (Geiger 2003),
Pocket Gamelan (Schiemer and Havryliv 2006),
and Mobile STK (Essl and Rohs 2006). The last of
these was a port of Perry Cook and Gary Scavone’s
Synthesis Toolkit to the Symbian OS platform,
and was the first programmable framework for
parametric sound synthesis on mobile devices.
Plus récemment, Georg Essl, the author, and Michael
Rohs outlined a number of developments and
challenges in considering mobile phones as musical
performance platforms (Essl, Wang, and Rohs 2008).
Researchers have explored various sensors on
mobile phones for physical interaction design. Il
is important to note that, although Ocarina ex-
plored a number of new elements (physical elements
and social interaction on a mass scale), the con-
cept of blowing into a phone (or laptop) has been

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi
/
c
o
m

j
/

je

un
r
t
je
c
e

p
d

F
/

/

/

/

3
8
2
8
1
8
5
5
9
8
8
/
c
o
m
_
un
_
0
0
2
3
6
p
d

.

j

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Wang et al. 2008) and the Stanford Laptop Orchestra
(Wang et al. 2009un), and also focused on various
forms of audience participation in performance (Oh
and Wang 2011). Since 2008, mobile music has
entered into the curriculum at institutions such
as Stanford University, Université du Michigan,
Princeton University, and the California Institute
of the Arts, exploring various combinations of live
performance, instrument design, social interaction,
and mobile software design.

Physical Interaction Design Process

The design of Ocarina took place in the very early
days of mobile apps, et était, by necessity, un
experiment, which explored an intersection of
aesthetics, physical interaction design, and multiple
modalities in sound, graphics, and gesture.

“Inside-Out Design”

Why an ocarina?

If one were to create a musical instrument on a
powerful mobile device such as the iPhone, why not
a harpsichord, violin, piano, drums, or something
else—anything else?

The choice to create an ocarina started with
the iPhone itself—by considering its very form
factor while embracing its inherent capabilities
and limitations. The design aimed to use only the
existing features without hardware add-ons—and to
use these capabilities to their maximum potential.
For one, the iPhone was about the physical size
of a four-hole ocarina. En plus, the hardware
and software capabilities of the iPhone naturally
seemed to support certain physical interactions that
an ocarina would require: microphone for breath
input, up to 5-point multi-touch (quite enough for a
four-hole instrument), and accelerometers to map to
additional expressive dimensions (par exemple., vibrato rate
and depth). En outre, additional features on the
device, including GPS location and persistent data
connectivité, beckoned for the exploration of a new
social interaction. Working backwards or “inside-
out” from these features and constraints, the design

suggested the ocarina, which fit the profile in terms
of physical interaction and as a promising candidate
for social experimentation.

Physical Aesthetics

From an aesthetic point of view, the instrument
aspect of Ocarina was rigorously designed as a
physical artifact. The visual presentation consists
only of functional elements (such as animated
finger holes, and breath gauge in Ocarina 2) et
visualization elements (animated waves or ripples
in response to breath). Ce faisant, the statement
was not “this simulates an ocarina,” but rather
“this is an ocarina.” There are no attempts to adorn
or “skin” the instrument, beyond allowing users
to customize colors, further underscoring that the
physical device is the enclosure for the instrument.
Even the naming of the app reflects this design
thinking, deliberately avoiding the common early
naming convention of prepending app names with
the lowercase letter “i” (par exemple., iOcarina). Encore une fois,
it was a statement of what this app is, plutôt que
what it is trying to emulate.

This design approach also echoed that of a certain

class of laptop orchestra instruments, where the
very form factor of the laptop is used to create
physical instruments, embracing its natural benefits
and limitations (Fiebrink, Wang, and Cook 2007).
This shifted the typical screen-based interaction
to a physical interaction, in our corporeal world,
where the user engages the experience with palpable
dimensions of breath, touch, and tilt.

Physical Interaction

The physical interaction design of Ocarina takes ad-
vantage of three onboard input sensors: microphone
for breath, multi-touch for pitch control, and ac-
celerometers for vibrato. En plus, Ocarina uses
two output modalities: audio and real-time graph-
ical visualization. The original design schematics
that incorporated these elements can be seen in
Chiffre 3. The intended playing method of Ocarina
asks the user to “hold the iPhone as one might a

10

Computer Music Journal

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi
/
c
o
m

j
/

je

un
r
t
je
c
e

p
d

F
/

/

/

/

3
8
2
8
1
8
5
5
9
8
8
/
c
o
m
_
un
_
0
0
2
3
6
p
d

.

j

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Chiffre 3. Initial physical
interaction design
schematic.

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi
/
c
o
m

j
/

je

un
r
t
je
c
e

p
d

F
/

/

/

/

3
8
2
8
1
8
5
5
9
8
8
/
c
o
m
_
un
_
0
0
2
3
6
p
d

.

j

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

sandwich,” supporting the device with thumbs and
ring fingers, putting the user in position to blow into
the microphone at the bottom of the device, alors que
also freeing up both index fingers and both middle
fingers to hold down different combinations of the
four onscreen finger-holes.

Breath

The user articulates Ocarina literally by blowing
into the phone, specifically into the onboard mi-
crophone. Inside the app, a ChucK program tracks
the amplitude of the incoming microphone signal in
real time, and an initial amplitude envelope is cal-
culated using a leaky integrator, implemented as a

one-pole feedback filter (the actual filter parame-
ter was determined empirically; later versions of
Ocarina actually contained a table of device-specific
gains to further compensate for variation across
device generations). The initial breath signal is con-
ditioned through additional filters tuned to balance
between responsiveness and smoothness, and is
then fed into the Ocarina’s articulator (y compris
a second envelope generator), which controls the
amplitude of the synthesized Ocarina signal. Le
signal resulting from air molecules blown into the
microphone diaphragm has significantly higher en-
ergy than speech and ambient sounds, and naturally
distinguishes between blowing interactions and
other sounds (par exemple., typical speech).

Wang

11

Real-Time Graphics

There are two real-time graphical elements that
respond to breath input. Softly glowing ripples
smoothly “wash over” the screen when significant
breath input is being detected, serving both as a
visual feedback to breath interaction, but also as
an aesthetic element of the visual presentation. Dans
the more recent Ocarina 2, an additional graphical
element visualizes the intensity of the breath input:
Below an internal breath threshold, the visualization
points out the general region to apply breath; au-dessus de
the threshold, an aurora-like light gauge rises and
falls with the intensity of the breath input.

Multi-Touch Interaction and Animation

Multi-touch is used to detect different combinations
of tone holes held by the user’s fingers. Modeled after
a four-hole English-pendant acoustic ocarina, le
mobile phone instrument provides four independent,
virtual finger holes, resulting in a total of 16 different
fingerings. Four real-time graphical finger holes are
visualized onscreen. They respond to touch gestures
in four quadrants of the screen, maximizing the
effective real estate for touch interaction. The finger
holes respond graphically to touch: They grow and
shrink to reinforce the interaction, and to help
compensate for lack of tactility. Although the touch
screen provides a solid physical object to press
against, there is no additional tactile information
regarding where the four finger holes are. The real-
time visualization aims to mitigate this missing
element by subtly informing the user of the current
fingering. This design also helps first-time users
to learn the basic interaction of the instrument by
simply playing around with it—Ocarina actually
includes a built-in tutorial, but providing more “on-
the-fly” cues to novices seemed useful nonetheless.
A nominal pitch mapping for Ocarina can be see in
Chiffre 4, including extended pitch mappings beyond
those found on an acoustic four-hole ocarina.

Accelerometers

Accelerometers are mapped to two parameters
of synthesized vibrato. This mapping offers an

additional, independent channel of expressive
control, and further encourages physical movement
with Ocarina. Par exemple, the user can lean
forward to apply vibrato, perhaps inspired by the
visuel, performative gestures of brass and woodwind
players when expressing certain passages. The front-
to-back axis of the accelerometer is mapped to
vibrato depth, ranging from no vibrato—when the
device is flat—to significant vibrato when the device
is tilted forward (par exemple., the screen is facing away
from the player). A secondary left-to-right mapping
allows the more seasoned player to control vibrato
rate, varying linearly between 2 Hz from one side to
10 Hz on the opposite side (the vibrato is at 6 Hz in its
non-tilted center position). Such interaction offers
“one-order higher” expressive parameters, akin to
expression control found on MIDI keyboards. Dans
pratique, it is straightforward to apply vibrato in
Ocarina to adorn passages, and the mechanics also
allows subtle variation of vibrato for longer notes.

Sound Synthesis

Audio output in Ocarina is synthesized in real
time in a ChucK program that includes the afore-
mentioned amplitude tracker and articulator. Le
synthesis itself is straightforward (the acoustic oca-
rina sound is not complex). The synthesis elements
include a triangle wave, modulated by a second
oscillator (for vibrato), and multiplied against the
amplitude envelope generated by the articulator
situated between Ocarina’s analysis and synthesis
modules. The resulting signal is fed into a reverber-
ator. (A general schematic of the synthesis can be
seen in Figure 5.)

The acoustic ocarina produces sound as a
Helmholtz resonator, and the size of the finger
holes are carefully chosen to affect the amount
of total uncovered area as a ratio to the enclosed
volume and thickness of the ocarina—this relation-
ship directly affects the resulting frequency. Le
pitch range of an acoustic four-hole English-pendant
ocarina is typically one octave, the lowest note
played by covering all four finger holes, et le
highest played by uncovering all finger holes. Some
chromatic pitches are played by partially covering

12

Computer Music Journal

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi
/
c
o
m

j
/

je

un
r
t
je
c
e

p
d

F
/

/

/

/

3
8
2
8
1
8
5
5
9
8
8
/
c
o
m
_
un
_
0
0
2
3
6
p
d

.

j

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Chiffre 4. Pitch mappings for
C Ionian. Five additional
pitch mappings not
possible in traditional four-
hole ocarinas are denoted
with dotted outline.

Chiffre 5. Ocarina’s general
sound synthesis scheme as
implemented in ChucK.

Pitch Mappings
C : Ionian

G

UN

B

C

E

Ocarina 1.0
design specification
ge, Octobre 2008

D

Chiffre 4

Breath Input
(articulation)

Accelerometers
(vibrato)

Multitouch
(pitch)

B

UN

C

G

F

SinOsc
(LFO for vibrato)

TriOsc
(carrier oscillator)

ADSR
(on/off envelope)

(base signal generation)

X

Audio
output

OnePole
(rough envelope)

Step
(secondary envelope)

OnePole
(low-pass filter )

NRev
(reverberator)

Chiffre 5

(primary envelope generation)

Wang

13

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi
/
c
o
m

j
/

je

un
r
t
je
c
e

p
d

F
/

/

/

/

3
8
2
8
1
8
5
5
9
8
8
/
c
o
m
_
un
_
0
0
2
3
6
p
d

.

j

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Chiffre 6. Initial
option screen design,
allowing users to name
their instrument (for social
interaction), change key
and mode, as well as simple
customizations for the
instrument’s appearance.

Option Screen

send to
chuck
code?

defaults are
in orange

C
C#
D
D#
E
F
F#
G
G#
UN
A#
B

red
blue
vert
cyan
yellow
brown
radio

name your ocarina

jabberwocky

echo

root

mode

finger

breath

Ocarina
a mobile music application
version 1.0 design specification
ge, Octobre 2008

breath sensitivity

text input,
uploaded to server,
potentially unique
and a beginning to
getting people to
create smule
anonymous
identities

defaults are
in orange

Ionian
Dorian
Phrygian
Lydian
Mixolydian
Aeolian
Locrian
Zeldarian

red
blue
vert
teal
yellow
brown
radio

send to
chuck
code?

certain holes. No longer coupled to the physical
parameters, the digital Ocarina offers precise in-
tonation for all pitches, extended pitch mapping,
and additional expressive elements, such as vibrato
and even portamento in Ocarina 2. The tuning is
not fixed; the player can choose different root keys
and diatonic modes (Ionian, Dorian, Phrygian, etc.),
offering multiple pitch mappings (voir la figure 6).

The app even contains a newly invented (c'est à dire.,
rather apocryphal) “Zeldarian” mode, where the

pitches are mapped to facilitate the playing of a
single melody: The Legend of Zelda theme song. Dans
popular culture, the Nintendo 64 video game The
Legend of Zelda: Ocarina of Time (1998) may be
the most prominent and enduring reference to the
acoustic ocarina. In this action-adventure game, le
protagonist, Link, must learn to play songs on an
in-game ocarina with magical powers to teleport
through time. The game is widely considered to be
in the pantheon of greatest video games (Wikipedia

14

Computer Music Journal

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi
/
c
o
m

j
/

je

un
r
t
je
c
e

p
d

F
/

/

/

/

3
8
2
8
1
8
5
5
9
8
8
/
c
o
m
_
un
_
0
0
2
3
6
p
d

.

j

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Chiffre 7. A typical
tablature on Ocarina’s
online songbook database
populated with content
from the user community.

Chiffre 8. Ocarina 2
provides a teaching mode
that shows the next three
fingerings for any
particular song (depuis

center and up). This mode
also provides basic
harmony accompaniment
that follows the user’s
melody playing.

root: C mode: ionian

Twinkle Twinkle Little Star
traditional

2013), and for that reason continues to endure and
delight—and continues to introduce the ocarina
to new generations of gamers (so effectively that
apparently a portion of the population mistakenly
believe ocarina is a purely fictional instrument that
exists only in the mythical in-game realm of Hyrule).
In any case, there is a sense of magic associated with
the ocarina, something that the design of Ocarina
aimed to capture. After all, isn’t hinting at magic a
powerful way to hide technology, while encouraging
users to focus on the experience?

Incorporating Expressive Game-Like Elements

In Ocarina, users learn to play various melodies via a
Web site specially crafted for users to share tablatures
for the iPhone-based instrument (Hamilton, Forgeron,
and Wang 2011). Each tablature shows a suggested
root key, mode, and sequence of fingerings (voir
Chiffre 7). An editor interface on the Web site allows
users to input and share new tablatures. Through
this site, users are able to search and access over
5,000 user-generated Ocarina tablatures; pendant
peak usage the site had more than a million hits per
month. Users would often display the tablature on
a second computer (par exemple., their laptop), while using
their iPhone to play the music. This is reminiscent
of someone learning to play a recorder while reading
music from a music stand—only here, the physical
instrument is embodied by the mobile phone, et
the computer has become both score and music
stand.

A sequel to Ocarina was created and released in
2012, called Ocarina 2 (abbreviated as O2—alluding
to the oxygen molecule and the breath interaction
needed for the app). Inspired by the success of the

Web-based tablatures, Ocarina 2’s most significant
new core features are (1) a game-like “songbook
mode” that teaches players how to play songs
note by note and (2) a dynamic harmony-rendering
engine that automatically accompanies the player. Dans
addition, every color, animation, spacing, sound, et
graphical effect was further optimized in Ocarina 2.
For a given song in Ocarina 2, an onscreen
queue of ocarina fingerings shows the next note to
play, as well as two more fingerings beyond that (voir
Chiffre 8). The player is to hold the right combination
of finger holes onscreen, and articulate the note by
blowing—the Ocarina 2 songbook engine detects
these conditions, and advances to the next note. C'est
important to emphasize there are no time or tempo
restrictions in this mode—players are generally free
to hold each note as long as they wish (and apply
dynamics and vibrato as desired), and furthermore
they are encouraged to play at their own pace. Dans
essence this songbook mode follows the player,
not the other way around. The design aims to both
provide a more natural and less stressful experience
to learn, and also to leave as much space as possible
for open expression. The player is responsible for
tempo and tempo variations, articulation (and co-
articulation of multi-note passages), dynamics, et
vibrato. The player is also responsible for the pitch
by holding the correct fingerings as shown, but is
free to embellish by adding notes and even trills.

Wang

15

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi
/
c
o
m

j
/

je

un
r
t
je
c
e

p
d

F
/

/

/

/

3
8
2
8
1
8
5
5
9
8
8
/
c
o
m
_
un
_
0
0
2
3
6
p
d

.

j

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

There is no game-score reward system in
Ocarina 2, though game-like achievements can
be earned. Progress is accumulated per song, via
“breath points” as a general measurement of how
much a user has blown into his or her phone.
Achievements like “Every Breath You Take” (accu-
mulate 300 breath points) can be earned over time.
Probably the most hard-core achievement in Oca-
rina 2 is one called “Lungevity,” which challenges
the user to accumulate 1,000,000 breath points.
By rough estimation, to get this achievement, un
would need to play 500 songs each 200 times!

Ocarina 2 was an exploration to strike a balance

between an expressive musical artifact (c'est à dire., un
instrument) and a game or toy. The goal is to retain
genuine expressive possibilities while offering game-
like qualities that can drastically reduce barrier of
entry into the experience. The theory was that
people are much less inhibited and intimidated
by trying something they perceive as a game, dans
contrast to a perceived musical instrument—yet,
perhaps the two are not mutually exclusive. Il
should be possible to have game-like elements
that draw people in, and even benignly “trick” the
user into being expressive—and, for some, possibly
getting a first-time taste for the joy of making music.

Social Interaction Design

Ocarina is possibly the first-ever massively adopted
instrument that allows its users to hear one another
around the world, accompanied by a visualization
of the world that shows where each musical
snippet originated. After considering the physical
interaction, the design underwent an exercise to use
the additional hardware and software capabilities of
the iPhone to maximum advantage, aimed to enable
a social-musical experience—something that one
could not do with a traditional acoustic ocarina (ou
perhaps any instrument). The exercise sought to
limit the design to exactly one social feature, mais
then to make that feature as compelling as possible.
(If nothing else, this was to be an interesting and
fun experiment!)

From there, it made sense to consider the device’s

location capabilities—because the phone is, par

definition, mobile and travels in daily life with its
user, and it is always connected to the Internet. Le
result was the globe in Ocarina, which allows any
user to anonymously (and surreptitiously) listen in
on potentially any other Ocarina user around the
monde (voir la figure 9). Users would only be identified
by their location (if they agreed to provide it to the
app), a moniker they could choose for themselves
(par exemple., Link123 or ZeldaInRome), and their music
(voir la figure 10).

If listeners like what they hear, they can “love”

the snippet by tapping a heart icon. The snippet
being heard is chosen via an algorithm at a central
Ocarina server, and takes into account recency,
popularity (as determined by users via “love”
count), geographic diversity of the snippets, aussi
as filter selections by the user. Listeners can choose
to listen to (1) the world, (2) a specific region, (3)
snippets that they have loved, et (4) snippets they
have played. To the author’s knowledge, this type
of social–musical interaction is the first of its kind
and scale, as users have listened to each other over
40 million times on the globe. A map showing the
rough distribution of Ocarina users can be seen in
Chiffre 11.

How is this social interaction accomplished,
technically? As a user plays Ocarina, an algorithm in
the analysis module decides when to record snippets
as candidates for uploading to a central Ocarina
server, filtering out periods of inactivity, limiting
maximum snippet lengths (this server-controlled
parameter is usually set to 30 seconds), and even
taking into account central server load. Quand
snippet recording is enabled, the Ocarina engine
rapidly takes snapshots of gestural data, y compris
current breath-envelope value, finger-hole state,
and tilt from two accelerometers. Through this
process a compact network packet is created,
time-stamped, and geotagged with GPS information,
and uploaded to the central Ocarina server and
database.

During playback in Ocarina’s globe visualization,

the app requests new snippets from the server
according to listener preference (region, popularity,
and other filters). A server-side algorithm identifies
a set of snippets that most closely matches the
desired criteria, and sends back a snippet selected

16

Computer Music Journal

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi
/
c
o
m

j
/

je

un
r
t
je
c
e

p
d

F
/

/

/

/

3
8
2
8
1
8
5
5
9
8
8
/
c
o
m
_
un
_
0
0
2
3
6
p
d

.

j

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Chiffre 9. Social interaction
design for Ocarina. Le
goal was to utilize GPS
location and data
connectivity into a single
social feature.

Chiffre 10. Listening to the
world in Ocarina.

Ocarina
an mobile + social music application
version 1.0 design specification
ge, Octobre 2008

audio playback
plays back selections
of uploaded snippets,
or perhaps in real-time

Real-time Map Display

With the user’s permission,
his/her GPS/tower location is
upload to a central smule server;
the server then sends updates
to the phone, which displays /
animates the current ocarina
usage around the world

icons (peut être)

(semi) real-time kjoule map
depends on locale

return to
primary display

je

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi
/
c
o
m

j
/

je

un
r
t
je
c
e

p
d

F
/

/

/

/

3
8
2
8
1
8
5
5
9
8
8
/
c
o
m
_
un
_
0
0
2
3
6
p
d

.

j

Chiffre 9

Chiffre 10

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Wang

17

Chiffre 11. Distribution of
the first 2 billion breath
blows around the world.

Chiffre 12. Ocarina system
conception, from physical
interaction to social
interaction.

Chiffre 11

Sound
synthesis

X

Audio
output

Multitouch
(pitch)

Accelerometers
(vibrato)

Envelope
generator

Database

Breath input
(articulation)

Anonymous
user data

Central servers

Gesture
recorder / player

Réseau
module

Internet

Chiffre 12

at random from this matching set. Note that no
audio recording is ever stored on the server—only
gesture information (which is more compact and
potentially richer). The returned snippet is rendered
by the Ocarina app client, feeding the gesture data
recording into the same synthesis engine used for
the instrument, and rendering it into sound in the
visualized globe. The system design of Ocarina,

from physical interaction to cloud-mediated social
interaction, can be seen in Figure 12.

User Case Studies

Ocarina users have listened in on each other over
40 million times, and somehow created an

18

Computer Music Journal

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi
/
c
o
m

j
/

je

un
r
t
je
c
e

p
d

F
/

/

/

/

3
8
2
8
1
8
5
5
9
8
8
/
c
o
m
_
un
_
0
0
2
3
6
p
d

.

j

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Chiffre 13. Ocarina users
share their performances
via Internet video.

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi
/
c
o
m

j
/

je

un
r
t
je
c
e

p
d

F
/

/

/

/

3
8
2
8
1
8
5
5
9
8
8
/
c
o
m
_
un
_
0
0
2
3
6
p
d

.

j

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

unexpected flood of self-expression in their ev-
eryday life. Within a few days of the release of
Ocarina (in November 2008), user-created videos
began surfacing on the Internet in channels such
as YouTube (voir la figure 13). Thousands of videos
showcased everyday users performing on their
iPhone Ocarinas, in living rooms, dorm rooms,
kitchens, holiday parties, on the streets, and many
other settings. Performers vary in age from young
children to adults, and seem to come from all over
the globe. They play many types of music, depuis
Ode to Joy, video game music (par exemple., Legend of Zelda,
Super Mario Bros., Tetris), themes from movies
and television shows (par exemple., The X-Files, Star Wars,
Star Trek), to pop and rock music, show tunes,
and folk melodies (par exemple., Amazing Grace, Kumbaya,
Shenandoah). Many are solo performances; others
are accompanied by acoustic guitars, piano, et
even other iPhone-based musical instruments.

As an example, one user created a series of videos

in which she plays Ocarina by blowing into the
iPhone with her nose (top left in Figure 12). Ap-
parently, she has a long history of playing nose
flutes, and Ocarina was her latest nasal-musical
experiment. She began with a nose-mediated ren-

dition of Music of the Night and, after this video
gained renown on YouTube, followed up with per-
formances of The Blue Danube (this one played
upside-down to further increase the difficulty), le
Jurassic Park theme, The Imperial March from Star
Wars, and Rick Astley’s Never Gonna Give You Up.
One user braved snowy streets to busk for money
with his iPhone Ocarina and filmed the experience.
Another group of users created a video promoting
tourism in Hungary. Some have crafted video
tutorials to teach Ocarina; others have scripted
and produced original music videos. All of these
represent creative uses of the instrument, quelques
that even we, its creators, had not anticipated.
There is something about playing Ocarina on one’s
iPhone that seems to overcome the inhibition
of performing, especially in people who are not
normally performers and who don’t typically call
themselves musicians.

It was surprising to see such mass adoption
of Ocarina, in spite of the app’s unique demand on
physically using the iPhone in unconventional ways.
Over the years, one could reasonably surmise that
much of its popularity may be that the sheer novelty
and curiosity of playing a flute-like instrument on

Wang

19

a mobile phone effectively overcame barriers to
try a new musical instrument. And if the physical
interaction of Ocarina provoked curiosity through
novelty, the social globe interaction provided
something—perhaps a small sense of wonder—that
was not possible without a mobile, location-aware,
networked computer.

Discussion

Is the app a new form of interactive art? Can an app
be considered art? What might the role of technology
be in inspiring or ushering a large population into
exploring musical expression? Although the mobile
app world has evolved with remarkable speed since
2008, the medium is perhaps still too young to fully
answer these questions. We can ponder, nonetheless.
There are definite limitations to the mobile
phone as a platform for crafting musical expression,
especially in creating an app designed to reach
a wide audience. Dans un sens, we have to work
with what is available on the device, and nothing
plus. We might do our best to embrace the capabili-
ties and limitations, but is that enough? Traditional
instruments are designed and crafted over decades
or even centuries, whereas something like Ocarina
was created in six weeks. Does it even make sense
to compare the two?

On the other hand, alongside limitations lie
possibilities for new interactions—both physical
and social—and new ways to inspire a large
population to be musical. Ocarina affords a sense
of expressiveness. There are moments in Ocarina’s
globe interaction where one might easily forget
the technology, and feel a small, yet nonetheless
visceral, connection with strangers on the other
side of the world. Is that not a worthwhile human
experience, one that was not possible before? Le
tendrils of possibility seem to reach out and plant
the seeds for some yet-unknown global community.
Is that not worth exploring?

As a final anecdote, here is a review for Ocarina

(Apple App Store 2008):

This is my peace on earth. I am currently
deployed in Iraq, and hell on earth is an every

day occurrence. The few nights I may have
off I am deeply engaged in this app. The globe
feature that lets you hear everybody else in the
world playing is the most calming art I have
ever been introduced to. It brings the entire
world together without politics or war. C'est
the EXACT opposite of my life—Deployed U.S.
Soldier.

Is Ocarina itself a new form of art? Or is it a toy?
Or maybe a bit of both? These are questions for each
person to decide.

Remerciements

This work owes much to the collaboration of many
individuals at Smule, Université de Stanford, CCRMA,
et ailleurs, including Spencer Salazar, Perry
Cook, Jeff Smith, David Zhu, Arnaud Berry, Mattias
Ljungstrom, Jonathan Berger, Rob Hamilton, Georg
Essl, Rebecca Fiebrink, Turner Kirk, Tina Smith,
Chryssie Nanou, and the Ocarina community.

Les références

Apple App Store. 2008. “Ocarina.” Available online
at itunes.apple.com/us/app/ocarina/id293053479.
Accessed October 2013.

Essl, G., and M. Rohs. 2006. “Mobile STK for Symbian
OS.” In Proceedings of the International Computer
Music Conference, pp. 278–281.

Essl, G., and M. Rohs. 2007. “ShaMus—A Sensor-Based
Integrated Mobile Phone Instrument.” In Proceedings
of the International Computer Music Conference, pp.
200–203.

Essl, G., G. Wang, and M. Rohs. 2008. “Developments and
Challenges Turning Mobile Phones into Generic Music
Performance Platforms.” In Proceedings of Mobile
Music Workshop, pp. 11–14.

Fiebrink, R., G. Wang, and P. R.. Cook. 2007. “Don’t
Forget the Laptop: Using Native Input Capabilities
for Expressive Musical Control.” In Proceedings of
the International Conference on New Interfaces for
Musical Expression, pp. 164–167.

Gaye, L., R.. Maz ´e, and L. E. Holmquist. 2003. “Sonic City:
The Urban Environment as a Musical Interface.” In
Proceedings of the International Conference on New
Interfaces for Musical Expression, pp. 109–115.

20

Computer Music Journal

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi
/
c
o
m

j
/

je

un
r
t
je
c
e

p
d

F
/

/

/

/

3
8
2
8
1
8
5
5
9
8
8
/
c
o
m
_
un
_
0
0
2
3
6
p
d

.

j

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3

Gaye, L., et autres. 2006. “Mobile Music Technology: Report
on an Emerging Community.” In Proceedings of
the International Conference on New Interfaces for
Musical Expression, pp. 22–25.

Geiger, G. 2003. “PDa: Real Time Signal Processing
and Sound Generation on Handheld Devices.” In
Proceedings of the International Computer Music
Conference, pp. 283–286.

Hamilton, R., J.. Forgeron, and G. Wang. 2011. “Social
Composition: Musical Data Systems for Expres-
sive Mobile Music.” Leonardo Music Journal 21:
57–64.

Hoffman, Matt. 2007. “Breathalyzer.” Available online at
smelt.cs.princeton.edu/pieces/Breathalyzer. Accessed
Octobre 2013.

Lévine, G. 2001. “Dialtones (a Telesymphony).” Available
online at www.flong.com/projects/telesymphony.
Accessed December 2013.

Misra, UN., G. Essl, and M. Rohs. 2008. “Microphone as
Sensor in Mobile Phone Performance.” In Proceedings
of the International Conference on New Interfaces for
Musical Expression, pp. 185–188.

Oh, J., and G. Wang. 2011. “Audience–Participation

Techniques Based on Social Mobile Computing.” In
Proceedings of the International Computer Music
Conference, pp. 665–671.

Oh, J., et autres. 2010. “Evolving the Mobile Phone Orchestra.”
In Proceedings of the International Conference on New
Interfaces for Musical Expression, pp. 82–87.

Smallwood, S., et autres. 2008. “Composing for Laptop
Orchestra.” Computer Music Journal 32(1):9–25.

Schiemer, G., and M. Havryliv. 2006. “Pocket Gamelan:
Tuneable Trajectories for Flying Sources in Mandala 3
and Mandala 4.” In Proceedings of the International
Conference on New Interfaces for Musical Expression,
pp. 37–42.

Tanaka, UN. 2004. “Mobile Music Making.” In Proceedings
of the International Conference on New Interfaces for
Musical Expression, pp. 154–156.

Tanaka, UN., and P. Gemeinboeck. 2006. “A Framework for
Spatial Interaction in Locative Media.”In Proceedings

of the International Conference on New Interfaces for
Musical Expression, pp. 26–30.

Tanaka, UN., and P. Gemeinboeck. 2008. “Net D ´erive: Con-
ceiving and Producing a Locative Media Artwork.” In
G. Goggins and L. Hjorth, éd.. Mobile Technologies:
From Telecommunications to Media. Londres: Rout-
ledge, pp. 174–186.

Trueman, D. 2007. “Why a Laptop Orchestra?” Organised

Sound 12(2):171–179.

Wang, G. 2008. “The ChucK Audio Programming Lan-
guage: A Strongly-Timed and On-the-Fly Environ/
mentality.” PhD Thesis, Princeton University.

Wang, G. 2014. “The World Is Your Stage: Making Music
on the iPhone.” In S. Gopinath and J. Stanyek, éd..
Oxford Handbook of Mobile Music Studies, Volume 2.
Oxford: Presse universitaire d'Oxford, pp. 487–504.

Wang, G. 2015. “Improvisation of the Masses: Anytime,
Anywhere Music.” In G. Lewis and B. Piekut, éd..
Oxford Handbook of Improvisation Studies. Oxford:
Presse universitaire d'Oxford.

Wang, G., G. Essl, and H. Penttinen. 2008. “MoPhO:
Do Mobile Phones Dreams of Electric Orchestras
In Proceedings of the International Computer Music
Conference, pp. 331–337.

Wang, G., G. Essl, and H. Penttinen. 2014. “Mobile Phone
Orchestra.” In S. Gopinath and J. Stanyek, éd.. Oxford
Handbook of Mobile Music Studies, Volume 2. Oxford:
Presse universitaire d'Oxford, pp. 453–469.

Wang, G., et autres. 2008. “The Laptop Orchestra as Class-

room.” Computer Music Journal 32(1):26–37.

Wang, G., et autres. 2009un. “Stanford Laptop Orchestra

(SLOrk).” In Proceedings of International Computer
Music Conference, pp. 505–508.

Wang, G., et autres. 2009b. “Smule = Sonic Media: Un

Intersection of the Mobile, Musical, and Social.” In
Proceedings of the International Computer Music
Conference, pp. 283–286.

Wikipedia. 2013. “The Legend of Zelda: Ocarina of Time

Wikipedia. Available online at en.wikipedia.org/
wiki/The Legend of Zelda: Ocarina of Time. Accessed
Octobre 2013.

Wang

21

je

D
o
w
n
o
un
d
e
d

F
r
o
m
h

t
t

p

:
/
/

d
je
r
e
c
t
.

m

je
t
.

e
d
toi
/
c
o
m

j
/

je

un
r
t
je
c
e

p
d

F
/

/

/

/

3
8
2
8
1
8
5
5
9
8
8
/
c
o
m
_
un
_
0
0
2
3
6
p
d

.

j

F

b
oui
g
toi
e
s
t

t

o
n
0
7
S
e
p
e
m
b
e
r
2
0
2
3Ge Wang image
Ge Wang image
Ge Wang image
Ge Wang image
Ge Wang image
Ge Wang image
Ge Wang image
Ge Wang image
Ge Wang image
Ge Wang image

Télécharger le PDF