Ge Wang
Center for Computer Research in Music
and Acoustics (CCRMA)
Department of Music
Universidad Stanford
660 Lomita Drive
stanford, California 94305, EE.UU
ge@ccrma.stanford.edu
Ocarina: Designing the
iPhone’s Magic Flute
Abstracto: Ocarina, creado en 2008 for the iPhone, is one of the first musical artifacts in the age of pervasive,
app-based mobile computing. It presents a flute-like physical interaction using microphone input, multi-touch, y
accelerometers—and a social dimension that allows users to listen in to each other around the world. This article
chronicles Smule’s Ocarina as a mobile musical experiment for the masses, examining in depth its design, aesthetics,
physical interaction, and social interaction, as well as documenting its inextricable relationship with the rise of mobile
computing as catalyzed by mobile devices such as the iPhone.
Ocarina for the iPhone was one of the earliest
mobile-musical (and social-musical) apps in
this modern era of personal mobile computing.
Created and released in 2008, it re-envisions an
ancient flute-like clay instrument—the four-hole
“English-pendant” ocarina—and transforms it in
the kiln of modern technology (ver figura 1). Él
features physical interaction, making use of breath
aporte, multi-touch, and accelerometer, también
as social interaction that allows users to listen
in to each other playing this instrument around
el mundo, anonymously (in a sort of musical
“voyeurism”), by taking advantage of the iPhone’s
Global Positioning System (GPS) location and its
persistent network connection (ver figura 2). A
fecha, the Smule Ocarina and its successor, Ocarina 2
(released in 2012), has more than ten million users
worldwide, and was a first class inductee into
Apple’s App Store Hall of Fame. More than five
years after its inception and the beginning of a
new era of apps on powerful smartphones, we look
in depth at Ocarina’s design—both physical and
social—as well as user case studies, and reflect on
what we have learned so far.
When the Apple App Store launched in 2008—one
year after the introduction of the first iPhone—few
could have predicted the transformative effect app-
mediated mobile computing would have on the
world, ushering in a new era of personal computing
and new waves of designers, developers, e incluso
entire companies. En 2012 solo, 715 millón
Computer Music Journal, 38:2, páginas. 8–21, Verano 2014
doi:10.1162/COMJ a 00236
C(cid:2) 2014 Instituto de Tecnología de Massachusetts.
new units of smartphones were sold worldwide.
Mientras tanto, in Apple’s App Store, there are now
over one million distinct apps spanning dozens
of categories, including lifestyle, travel, juegos,
productivity, and music. In the humble, early days
of mobile apps, sin embargo, there were far fewer
(on the order of a few thousand) apps. Ocarina
was one of the very first musical apps. Fue
designed to be an expressive musical instrument,
and represents perhaps the first mass-adopted,
social-mobile musical instrument.
Origins and Related Works
The ingredients in creating such an artifact can be
traced to interactive computer music software such
as the ChucK programming language (Wang 2008),
which runs in every instance of Ocarina, laptop
orchestras at Princeton University and Stanford
Universidad (Trueman 2007; Wang et al. 2008, 2009a),
and the first mobile phone orchestra (Wang, Essl,
and Penttinen 2008, 2014; Oh et al. 2010), utilizing
research from 2003 until the present. These works
helped lead to the founding of the mobile-music
startup company Smule (Wang et al. 2009b; Wang
2014, 2015), which released its first apps in summer
2008 y, at the time of this writing (en 2013), tiene
reached over 100 million users.
More broadly, much of this was inspired and
informed by research on mobile music, cual
was taking place in computer music and related
communities well before critical mass adoption of
an app-driven mobile device like the iPhone.
Reports on an emerging community of mobile
music and its potential can be traced back to
8
Computer Music Journal
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
C
oh
metro
j
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
/
/
/
3
8
2
8
1
8
5
5
9
8
8
/
C
oh
metro
_
a
_
0
0
2
3
6
pag
d
.
j
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 1. Ocarina for
the iPhone. The user blows
into the microphone to
articulate the sound, multi-
touch is used to control
pitch, and accelerometers
control vibrato.
Cifra 2. As counterpoint
to the physical instrument,
Ocarina also presents
a social interaction that
allows users to listen in,
surreptitiously, to others
playing Ocarina around
el mundo, taking advan-
tage of GPS location and
cloud-based networking.
documented in prior work. In the Princeton Lap-
top Orchestra classroom of 2007, Matt Hoffman
created an instrument and piece for “unplugged”
(es decir., without external amplification) laptops, called
Breathalyzer, which required performers to blow
into the microphone to expressively control audio
synthesis (Hoffman 2007). Ananya Misra, with Essl
and Rohs, conducted a series of experiments that
used the microphone for mobile music performance
(including breath input, combined with camera
aporte; see Misra, Essl, and Rohs 2008). As far as
sabemos, theirs was the first attempt to make a
breath-mediated, flute-like mobile phone interface.
Además, Essl and Rohs (2007) documented sig-
nificant exploration in combining audio synthesis,
accelerometer, compass, and camera in creating
purely on-device (es decir., no laptop) musical interfaces,
collectively called ShaMus.
Location and global positioning play a significant
role in Ocarina. This notion of “locative media,” a
term used by Atau Tanaka and Lalya Gaye (Tanaka
and Gemeinboeck 2006) has been explored in various
installations, performances, and other projects.
These include Johan Wagenaar’s Kadoum, en el cual
GPS sensors reported heart-rate information from
24 participants in Australia to an art installation on
a different continent. Gaye, Maz ´e, and Holmquist
(2003) explored locative media in Sonic City with
location-aware body sensors. Tanaka et al. tener
pioneered a number of projects on this topic,
including Malleable Mobile Music and Net D ´erive,
the latter making use of a centralized installation
that tracked and interacted with geographically
diverse participants (Tanaka and Gemeinboeck
2008).
Por último, the notion of using mobile phones for
musical expression in performance can be traced
back to Golan Levin’s Dialtones (Levin 2001),
perhaps the earliest concert concept that used the
audience’s mobile phones as the centerpiece of a
sustained live performance. More recently, el
aforementioned Stanford Mobile Phone Orchestra
was formed in 2007 as the first ensemble of its kind.
The Stanford Mobile Phone Orchestra explored a
more mobile, locative notion of “electronic chamber
music” as pioneered by the Princeton Laptop
Orchestra (Trueman 2007; Smallwood et al. 2008;
Wang
9
Cifra 1
Cifra 2
2004 y 2006 (Tanaka 2004; Gaye et al. 2006).
The first sound synthesis on mobile phones was
documented by projects such as PDa (Geiger 2003),
Pocket Gamelan (Schiemer and Havryliv 2006),
and Mobile STK (Essl and Rohs 2006). The last of
these was a port of Perry Cook and Gary Scavone’s
Synthesis Toolkit to the Symbian OS platform,
and was the first programmable framework for
parametric sound synthesis on mobile devices.
More recently, Georg Essl, el autor, y miguel
Rohs outlined a number of developments and
challenges in considering mobile phones as musical
performance platforms (Essl, Wang, and Rohs 2008).
Researchers have explored various sensors on
mobile phones for physical interaction design. Él
is important to note that, although Ocarina ex-
plored a number of new elements (physical elements
and social interaction on a mass scale), the con-
cept of blowing into a phone (or laptop) ha sido
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
C
oh
metro
j
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
/
/
/
3
8
2
8
1
8
5
5
9
8
8
/
C
oh
metro
_
a
_
0
0
2
3
6
pag
d
.
j
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Wang et al. 2008) and the Stanford Laptop Orchestra
(Wang et al. 2009a), and also focused on various
forms of audience participation in performance (Oh
and Wang 2011). Desde 2008, mobile music has
entered into the curriculum at institutions such
as Stanford University, University of Michigan,
Universidad de Princeton, and the California Institute
of the Arts, exploring various combinations of live
actuación, instrument design, social interaction,
and mobile software design.
Physical Interaction Design Process
The design of Ocarina took place in the very early
days of mobile apps, y era, by necessity, un
experimento, which explored an intersection of
aesthetics, physical interaction design, and multiple
modalities in sound, graphics, and gesture.
“Inside-Out Design”
Why an ocarina?
If one were to create a musical instrument on a
powerful mobile device such as the iPhone, why not
a harpsichord, violin, piano, drums, or something
else—anything else?
The choice to create an ocarina started with
the iPhone itself—by considering its very form
factor while embracing its inherent capabilities
and limitations. The design aimed to use only the
existing features without hardware add-ons—and to
use these capabilities to their maximum potential.
For one, the iPhone was about the physical size
of a four-hole ocarina. Además, the hardware
and software capabilities of the iPhone naturally
seemed to support certain physical interactions that
an ocarina would require: microphone for breath
aporte, up to 5-point multi-touch (quite enough for a
four-hole instrument), and accelerometers to map to
additional expressive dimensions (p.ej., vibrato rate
y profundidad). Además, additional features on the
device, including GPS location and persistent data
conectividad, beckoned for the exploration of a new
social interaction. Working backwards or “inside-
out” from these features and constraints, the design
suggested the ocarina, which fit the profile in terms
of physical interaction and as a promising candidate
for social experimentation.
Physical Aesthetics
From an aesthetic point of view, the instrument
aspect of Ocarina was rigorously designed as a
physical artifact. The visual presentation consists
only of functional elements (such as animated
finger holes, and breath gauge in Ocarina 2) y
visualization elements (animated waves or ripples
in response to breath). In so doing, la declaración
was not “this simulates an ocarina,” but rather
“this is an ocarina.” There are no attempts to adorn
or “skin” the instrument, beyond allowing users
to customize colors, further underscoring that the
physical device is the enclosure for the instrument.
Even the naming of the app reflects this design
thinking, deliberately avoiding the common early
naming convention of prepending app names with
the lowercase letter “i” (p.ej., iOcarina). Una vez más,
it was a statement of what this app is, en vez de
what it is trying to emulate.
This design approach also echoed that of a certain
class of laptop orchestra instruments, donde el
very form factor of the laptop is used to create
physical instruments, embracing its natural benefits
and limitations (Fiebrink, Wang, and Cook 2007).
This shifted the typical screen-based interaction
to a physical interaction, in our corporeal world,
where the user engages the experience with palpable
dimensions of breath, tocar, and tilt.
Physical Interaction
The physical interaction design of Ocarina takes ad-
vantage of three onboard input sensors: microphone
for breath, multi-touch for pitch control, and ac-
celerometers for vibrato. Además, Ocarina uses
two output modalities: audio and real-time graph-
ical visualization. The original design schematics
that incorporated these elements can be seen in
Cifra 3. The intended playing method of Ocarina
asks the user to “hold the iPhone as one might a
10
Computer Music Journal
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
C
oh
metro
j
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
/
/
/
3
8
2
8
1
8
5
5
9
8
8
/
C
oh
metro
_
a
_
0
0
2
3
6
pag
d
.
j
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 3. Initial physical
interaction design
schematic.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
C
oh
metro
j
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
/
/
/
3
8
2
8
1
8
5
5
9
8
8
/
C
oh
metro
_
a
_
0
0
2
3
6
pag
d
.
j
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
sandwich,” supporting the device with thumbs and
ring fingers, putting the user in position to blow into
the microphone at the bottom of the device, mientras
also freeing up both index fingers and both middle
fingers to hold down different combinations of the
four onscreen finger-holes.
Breath
The user articulates Ocarina literally by blowing
into the phone, specifically into the onboard mi-
crophone. Inside the app, a ChucK program tracks
the amplitude of the incoming microphone signal in
tiempo real, and an initial amplitude envelope is cal-
culated using a leaky integrator, implemented as a
one-pole feedback filter (the actual filter parame-
ter was determined empirically; later versions of
Ocarina actually contained a table of device-specific
gains to further compensate for variation across
device generations). The initial breath signal is con-
ditioned through additional filters tuned to balance
between responsiveness and smoothness, and is
then fed into the Ocarina’s articulator (incluido
a second envelope generator), which controls the
amplitude of the synthesized Ocarina signal. El
signal resulting from air molecules blown into the
microphone diaphragm has significantly higher en-
ergy than speech and ambient sounds, and naturally
distinguishes between blowing interactions and
other sounds (p.ej., typical speech).
Wang
11
Real-Time Graphics
There are two real-time graphical elements that
respond to breath input. Softly glowing ripples
smoothly “wash over” the screen when significant
breath input is being detected, serving both as a
visual feedback to breath interaction, but also as
an aesthetic element of the visual presentation. En
the more recent Ocarina 2, an additional graphical
element visualizes the intensity of the breath input:
Below an internal breath threshold, the visualization
points out the general region to apply breath; arriba
the threshold, an aurora-like light gauge rises and
falls with the intensity of the breath input.
Multi-Touch Interaction and Animation
Multi-touch is used to detect different combinations
of tone holes held by the user’s fingers. Modeled after
a four-hole English-pendant acoustic ocarina, el
mobile phone instrument provides four independent,
virtual finger holes, resulting in a total of 16 diferente
fingerings. Four real-time graphical finger holes are
visualized onscreen. They respond to touch gestures
in four quadrants of the screen, maximizing the
effective real estate for touch interaction. The finger
holes respond graphically to touch: They grow and
shrink to reinforce the interaction, and to help
compensate for lack of tactility. Although the touch
screen provides a solid physical object to press
against, there is no additional tactile information
regarding where the four finger holes are. The real-
time visualization aims to mitigate this missing
element by subtly informing the user of the current
fingering. This design also helps first-time users
to learn the basic interaction of the instrument by
simply playing around with it—Ocarina actually
includes a built-in tutorial, but providing more “on-
the-fly” cues to novices seemed useful nonetheless.
A nominal pitch mapping for Ocarina can be see in
Cifra 4, including extended pitch mappings beyond
those found on an acoustic four-hole ocarina.
Accelerometers
Accelerometers are mapped to two parameters
of synthesized vibrato. This mapping offers an
adicional, independent channel of expressive
control, and further encourages physical movement
with Ocarina. Por ejemplo, the user can lean
forward to apply vibrato, perhaps inspired by the
visual, performative gestures of brass and woodwind
players when expressing certain passages. The front-
to-back axis of the accelerometer is mapped to
vibrato depth, ranging from no vibrato—when the
device is flat—to significant vibrato when the device
is tilted forward (p.ej., the screen is facing away
from the player). A secondary left-to-right mapping
allows the more seasoned player to control vibrato
tasa, varying linearly between 2 Hz from one side to
10 Hz on the opposite side (the vibrato is at 6 Hz in its
non-tilted center position). Such interaction offers
“one-order higher” expressive parameters, akin to
expression control found on MIDI keyboards. En
práctica, it is straightforward to apply vibrato in
Ocarina to adorn passages, and the mechanics also
allows subtle variation of vibrato for longer notes.
Sound Synthesis
Audio output in Ocarina is synthesized in real
time in a ChucK program that includes the afore-
mentioned amplitude tracker and articulator. El
synthesis itself is straightforward (the acoustic oca-
rina sound is not complex). The synthesis elements
include a triangle wave, modulated by a second
oscillator (for vibrato), and multiplied against the
amplitude envelope generated by the articulator
situated between Ocarina’s analysis and synthesis
modules. The resulting signal is fed into a reverber-
ator. (A general schematic of the synthesis can be
seen in Figure 5.)
The acoustic ocarina produces sound as a
Helmholtz resonator, and the size of the finger
holes are carefully chosen to affect the amount
of total uncovered area as a ratio to the enclosed
volume and thickness of the ocarina—this relation-
ship directly affects the resulting frequency. El
pitch range of an acoustic four-hole English-pendant
ocarina is typically one octave, the lowest note
played by covering all four finger holes, y el
highest played by uncovering all finger holes. Alguno
chromatic pitches are played by partially covering
12
Computer Music Journal
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
C
oh
metro
j
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
/
/
/
3
8
2
8
1
8
5
5
9
8
8
/
C
oh
metro
_
a
_
0
0
2
3
6
pag
d
.
j
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 4. Pitch mappings for
C Ionian. Five additional
pitch mappings not
possible in traditional four-
hole ocarinas are denoted
with dotted outline.
Cifra 5. Ocarina’s general
sound synthesis scheme as
implemented in ChucK.
Pitch Mappings
C : Ionian
GRAMO
A
B
C
mi
Ocarina 1.0
design specification
ge, Octubre 2008
D
Cifra 4
Breath Input
(articulation)
Accelerometers
(vibrato)
Multitouch
(pitch)
B
A
C
GRAMO
F
SinOsc
(LFO for vibrato)
TriOsc
(carrier oscillator)
ADSR
(on/off envelope)
(base signal generation)
X
Audio
producción
OnePole
(rough envelope)
Step
(secondary envelope)
OnePole
(low-pass filter )
NRev
(reverberator)
Cifra 5
(primary envelope generation)
Wang
13
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
C
oh
metro
j
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
/
/
/
3
8
2
8
1
8
5
5
9
8
8
/
C
oh
metro
_
a
_
0
0
2
3
6
pag
d
.
j
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 6. Initial
option screen design,
allowing users to name
their instrument (for social
interacción), change key
and mode, as well as simple
customizations for the
instrument’s appearance.
Option Screen
send to
chuck
código?
defaults are
in orange
C
C#
D
D#
mi
F
F#
GRAMO
G#
A
A#
B
rojo
azul
verde
cyan
yellow
brown
radio
name your ocarina
jabberwocky
echo
raíz
mode
finger
breath
Ocarina
a mobile music application
versión 1.0 design specification
ge, Octubre 2008
breath sensitivity
text input,
uploaded to server,
potentially unique
and a beginning to
getting people to
create smule
anonymous
identidades
defaults are
in orange
Ionian
Dorian
Phrygian
Lydian
Mixolydian
Aeolian
Locrian
Zeldarian
rojo
azul
verde
teal
yellow
brown
radio
send to
chuck
código?
certain holes. No longer coupled to the physical
parámetros, the digital Ocarina offers precise in-
tonation for all pitches, extended pitch mapping,
and additional expressive elements, such as vibrato
and even portamento in Ocarina 2. The tuning is
not fixed; the player can choose different root keys
and diatonic modes (Ionian, Dorian, Phrygian, etc.),
offering multiple pitch mappings (ver figura 6).
The app even contains a newly invented (es decir.,
rather apocryphal) “Zeldarian” mode, donde el
pitches are mapped to facilitate the playing of a
single melody: The Legend of Zelda theme song. En
popular culture, the Nintendo 64 video game The
Legend of Zelda: Ocarina of Time (1998) may be
the most prominent and enduring reference to the
acoustic ocarina. In this action-adventure game, el
protagonist, Link, must learn to play songs on an
in-game ocarina with magical powers to teleport
through time. The game is widely considered to be
in the pantheon of greatest video games (Wikipedia
14
Computer Music Journal
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
C
oh
metro
j
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
/
/
/
3
8
2
8
1
8
5
5
9
8
8
/
C
oh
metro
_
a
_
0
0
2
3
6
pag
d
.
j
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 7. A typical
tablature on Ocarina’s
online songbook database
populated with content
from the user community.
Cifra 8. Ocarina 2
provides a teaching mode
that shows the next three
fingerings for any
particular song (de
center and up). This mode
also provides basic
harmony accompaniment
that follows the user’s
melody playing.
raíz: C mode: ionian
Twinkle Twinkle Little Star
traditional
2013), and for that reason continues to endure and
delight—and continues to introduce the ocarina
to new generations of gamers (so effectively that
apparently a portion of the population mistakenly
believe ocarina is a purely fictional instrument that
exists only in the mythical in-game realm of Hyrule).
In any case, there is a sense of magic associated with
the ocarina, something that the design of Ocarina
aimed to capture. Después de todo, isn’t hinting at magic a
powerful way to hide technology, while encouraging
users to focus on the experience?
Incorporating Expressive Game-Like Elements
In Ocarina, users learn to play various melodies via a
Web site specially crafted for users to share tablatures
for the iPhone-based instrument (hamilton, Herrero,
and Wang 2011). Each tablature shows a suggested
root key, mode, and sequence of fingerings (ver
Cifra 7). An editor interface on the Web site allows
users to input and share new tablatures. Through
this site, users are able to search and access over
5,000 user-generated Ocarina tablatures; durante
peak usage the site had more than a million hits per
mes. Users would often display the tablature on
a second computer (p.ej., their laptop), while using
their iPhone to play the music. This is reminiscent
of someone learning to play a recorder while reading
music from a music stand—only here, the physical
instrument is embodied by the mobile phone, y
the computer has become both score and music
stand.
A sequel to Ocarina was created and released in
2012, called Ocarina 2 (abbreviated as O2—alluding
to the oxygen molecule and the breath interaction
needed for the app). Inspired by the success of the
Web-based tablatures, Ocarina 2’s most significant
new core features are (1) a game-like “songbook
mode” that teaches players how to play songs
note by note and (2) a dynamic harmony-rendering
engine that automatically accompanies the player. En
addition, every color, animation, spacing, sound, y
graphical effect was further optimized in Ocarina 2.
For a given song in Ocarina 2, an onscreen
queue of ocarina fingerings shows the next note to
play, as well as two more fingerings beyond that (ver
Cifra 8). The player is to hold the right combination
of finger holes onscreen, and articulate the note by
blowing—the Ocarina 2 songbook engine detects
these conditions, and advances to the next note. Es
important to emphasize there are no time or tempo
restrictions in this mode—players are generally free
to hold each note as long as they wish (and apply
dynamics and vibrato as desired), and furthermore
they are encouraged to play at their own pace. En
essence this songbook mode follows the player,
not the other way around. The design aims to both
provide a more natural and less stressful experience
to learn, and also to leave as much space as possible
for open expression. The player is responsible for
tempo and tempo variations, articulation (and co-
articulation of multi-note passages), dinámica, y
vibrato. The player is also responsible for the pitch
by holding the correct fingerings as shown, but is
free to embellish by adding notes and even trills.
Wang
15
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
C
oh
metro
j
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
/
/
/
3
8
2
8
1
8
5
5
9
8
8
/
C
oh
metro
_
a
_
0
0
2
3
6
pag
d
.
j
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
There is no game-score reward system in
Ocarina 2, though game-like achievements can
be earned. Progress is accumulated per song, a través de
“breath points” as a general measurement of how
much a user has blown into his or her phone.
Achievements like “Every Breath You Take” (acumular-
mulate 300 breath points) can be earned over time.
Probably the most hard-core achievement in Oca-
rina 2 is one called “Lungevity,” which challenges
the user to accumulate 1,000,000 breath points.
By rough estimation, to get this achievement, uno
would need to play 500 songs each 200 veces!
Ocarina 2 was an exploration to strike a balance
between an expressive musical artifact (es decir., un
instrument) and a game or toy. The goal is to retain
genuine expressive possibilities while offering game-
like qualities that can drastically reduce barrier of
entry into the experience. The theory was that
people are much less inhibited and intimidated
by trying something they perceive as a game, en
contrast to a perceived musical instrument—yet,
perhaps the two are not mutually exclusive. Él
should be possible to have game-like elements
that draw people in, and even benignly “trick” the
user into being expressive—and, for some, possibly
getting a first-time taste for the joy of making music.
Social Interaction Design
Ocarina is possibly the first-ever massively adopted
instrument that allows its users to hear one another
around the world, accompanied by a visualization
of the world that shows where each musical
snippet originated. After considering the physical
interacción, the design underwent an exercise to use
the additional hardware and software capabilities of
the iPhone to maximum advantage, aimed to enable
a social-musical experience—something that one
could not do with a traditional acoustic ocarina (o
perhaps any instrument). The exercise sought to
limit the design to exactly one social feature, pero
then to make that feature as compelling as possible.
(If nothing else, this was to be an interesting and
fun experiment!)
From there, it made sense to consider the device’s
location capabilities—because the phone is, por
definition, mobile and travels in daily life with its
user, and it is always connected to the Internet. El
result was the globe in Ocarina, which allows any
user to anonymously (and surreptitiously) listen in
on potentially any other Ocarina user around the
world (ver figura 9). Users would only be identified
by their location (if they agreed to provide it to the
app), a moniker they could choose for themselves
(p.ej., Link123 or ZeldaInRome), and their music
(ver figura 10).
If listeners like what they hear, they can “love”
the snippet by tapping a heart icon. The snippet
being heard is chosen via an algorithm at a central
Ocarina server, and takes into account recency,
popularity (as determined by users via “love”
count), geographic diversity of the snippets, también
as filter selections by the user. Listeners can choose
to listen to (1) el mundo, (2) a specific region, (3)
snippets that they have loved, y (4) snippets they
have played. To the author’s knowledge, this type
of social–musical interaction is the first of its kind
and scale, as users have listened to each other over
40 million times on the globe. A map showing the
rough distribution of Ocarina users can be seen in
Cifra 11.
How is this social interaction accomplished,
technically? As a user plays Ocarina, an algorithm in
the analysis module decides when to record snippets
as candidates for uploading to a central Ocarina
server, filtering out periods of inactivity, limiting
maximum snippet lengths (this server-controlled
parameter is usually set to 30 artículos de segunda clase), e incluso
taking into account central server load. Cuando
snippet recording is enabled, the Ocarina engine
rapidly takes snapshots of gestural data, incluido
current breath-envelope value, finger-hole state,
and tilt from two accelerometers. Through this
process a compact network packet is created,
time-stamped, and geotagged with GPS information,
and uploaded to the central Ocarina server and
database.
During playback in Ocarina’s globe visualization,
the app requests new snippets from the server
according to listener preference (región, popularity,
and other filters). A server-side algorithm identifies
a set of snippets that most closely matches the
desired criteria, and sends back a snippet selected
16
Computer Music Journal
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
C
oh
metro
j
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
/
/
/
3
8
2
8
1
8
5
5
9
8
8
/
C
oh
metro
_
a
_
0
0
2
3
6
pag
d
.
j
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 9. Social interaction
design for Ocarina. El
goal was to utilize GPS
location and data
connectivity into a single
social feature.
Cifra 10. Listening to the
world in Ocarina.
Ocarina
an mobile + social music application
versión 1.0 design specification
ge, Octubre 2008
audio playback
plays back selections
of uploaded snippets,
or perhaps in real-time
Real-time Map Display
With the user’s permission,
his/her GPS/tower location is
upload to a central smule server;
the server then sends updates
to the phone, which displays /
animates the current ocarina
usage around the world
icons (maybe)
(semi) real-time kjoule map
depends on locale
return to
primary display
i
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
C
oh
metro
j
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
/
/
/
3
8
2
8
1
8
5
5
9
8
8
/
C
oh
metro
_
a
_
0
0
2
3
6
pag
d
.
j
Cifra 9
Cifra 10
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Wang
17
Cifra 11. Distribution of
the first 2 billion breath
blows around the world.
Cifra 12. Ocarina system
diseño, from physical
interaction to social
interacción.
Cifra 11
Sound
synthesis
X
Audio
producción
Multitouch
(pitch)
Accelerometers
(vibrato)
Envelope
generator
Database
Breath input
(articulation)
Anonymous
user data
Central servers
Gesture
recorder / player
Red
module
Internet
Cifra 12
at random from this matching set. Note that no
audio recording is ever stored on the server—only
gesture information (which is more compact and
potentially richer). The returned snippet is rendered
by the Ocarina app client, feeding the gesture data
recording into the same synthesis engine used for
the instrument, and rendering it into sound in the
visualized globe. The system design of Ocarina,
from physical interaction to cloud-mediated social
interacción, can be seen in Figure 12.
User Case Studies
Ocarina users have listened in on each other over
40 million times, and somehow created an
18
Computer Music Journal
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
C
oh
metro
j
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
/
/
/
3
8
2
8
1
8
5
5
9
8
8
/
C
oh
metro
_
a
_
0
0
2
3
6
pag
d
.
j
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Cifra 13. Ocarina users
share their performances
via Internet video.
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
C
oh
metro
j
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
/
/
/
3
8
2
8
1
8
5
5
9
8
8
/
C
oh
metro
_
a
_
0
0
2
3
6
pag
d
.
j
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
unexpected flood of self-expression in their ev-
eryday life. Within a few days of the release of
Ocarina (in November 2008), user-created videos
began surfacing on the Internet in channels such
as YouTube (ver figura 13). Thousands of videos
showcased everyday users performing on their
iPhone Ocarinas, in living rooms, dorm rooms,
kitchens, holiday parties, on the streets, and many
other settings. Performers vary in age from young
children to adults, and seem to come from all over
the globe. They play many types of music, de
Ode to Joy, video game music (p.ej., Legend of Zelda,
Super Mario Bros., Tetris), themes from movies
and television shows (p.ej., The X-Files, Star Wars,
Star Trek), to pop and rock music, show tunes,
and folk melodies (p.ej., Amazing Grace, Kumbaya,
Shenandoah). Many are solo performances; otros
are accompanied by acoustic guitars, piano, y
even other iPhone-based musical instruments.
Como ejemplo, one user created a series of videos
in which she plays Ocarina by blowing into the
iPhone with her nose (top left in Figure 12). AP-
parently, she has a long history of playing nose
flutes, and Ocarina was her latest nasal-musical
experimento. She began with a nose-mediated ren-
dition of Music of the Night and, after this video
gained renown on YouTube, followed up with per-
formances of The Blue Danube (this one played
upside-down to further increase the difficulty), el
Jurassic Park theme, The Imperial March from Star
Guerras, and Rick Astley’s Never Gonna Give You Up.
One user braved snowy streets to busk for money
with his iPhone Ocarina and filmed the experience.
Another group of users created a video promoting
tourism in Hungary. Some have crafted video
tutorials to teach Ocarina; others have scripted
and produced original music videos. All of these
represent creative uses of the instrument, alguno
that even we, its creators, had not anticipated.
There is something about playing Ocarina on one’s
iPhone that seems to overcome the inhibition
of performing, especially in people who are not
normally performers and who don’t typically call
themselves musicians.
It was surprising to see such mass adoption
of Ocarina, in spite of the app’s unique demand on
physically using the iPhone in unconventional ways.
A través de los años, one could reasonably surmise that
much of its popularity may be that the sheer novelty
and curiosity of playing a flute-like instrument on
Wang
19
a mobile phone effectively overcame barriers to
try a new musical instrument. And if the physical
interaction of Ocarina provoked curiosity through
novelty, the social globe interaction provided
something—perhaps a small sense of wonder—that
was not possible without a mobile, location-aware,
networked computer.
Discusión
Is the app a new form of interactive art? Can an app
be considered art? What might the role of technology
be in inspiring or ushering a large population into
exploring musical expression? Although the mobile
app world has evolved with remarkable speed since
2008, the medium is perhaps still too young to fully
answer these questions. We can ponder, nonetheless.
There are definite limitations to the mobile
phone as a platform for crafting musical expression,
especially in creating an app designed to reach
a wide audience. En un sentido, we have to work
with what is available on the device, and nothing
más. We might do our best to embrace the capabili-
ties and limitations, but is that enough? Tradicional
instruments are designed and crafted over decades
or even centuries, whereas something like Ocarina
was created in six weeks. Does it even make sense
to compare the two?
Por otro lado, alongside limitations lie
possibilities for new interactions—both physical
and social—and new ways to inspire a large
population to be musical. Ocarina affords a sense
of expressiveness. There are moments in Ocarina’s
globe interaction where one might easily forget
the technology, and feel a small, yet nonetheless
visceral, connection with strangers on the other
side of the world. Is that not a worthwhile human
experiencia, one that was not possible before? El
tendrils of possibility seem to reach out and plant
the seeds for some yet-unknown global community.
Is that not worth exploring?
As a final anecdote, here is a review for Ocarina
(Apple App Store 2008):
This is my peace on earth. I am currently
deployed in Iraq, and hell on earth is an every
day occurrence. The few nights I may have
off I am deeply engaged in this app. The globe
feature that lets you hear everybody else in the
world playing is the most calming art I have
ever been introduced to. It brings the entire
world together without politics or war. Es
the EXACT opposite of my life—Deployed U.S.
Soldier.
Is Ocarina itself a new form of art? Or is it a toy?
Or maybe a bit of both? These are questions for each
person to decide.
Expresiones de gratitud
This work owes much to the collaboration of many
individuals at Smule, Universidad Stanford, CCRMA,
and elsewhere, including Spencer Salazar, Perry
Cocinar, Jeff Smith, David Zhu, Arnaud Berry, Mattias
Ljungstrom, Jonathan Berger, Rob Hamilton, Georg
Essl, Rebecca Fiebrink, Turner Kirk, Tina Smith,
Chryssie Nanou, and the Ocarina community.
Referencias
Apple App Store. 2008. “Ocarina.” Available online
at itunes.apple.com/us/app/ocarina/id293053479.
Accessed October 2013.
Essl, GRAMO., y M. Rohs. 2006. “Mobile STK for Symbian
OS.” In Proceedings of the International Computer
Music Conference, páginas. 278–281.
Essl, GRAMO., y M. Rohs. 2007. “ShaMus—A Sensor-Based
Integrated Mobile Phone Instrument.” In Proceedings
of the International Computer Music Conference, páginas.
200–203.
Essl, GRAMO., GRAMO. Wang, y M. Rohs. 2008. “Developments and
Challenges Turning Mobile Phones into Generic Music
Performance Platforms.” In Proceedings of Mobile
Music Workshop, páginas. 11–14.
Fiebrink, r., GRAMO. Wang, y P. R. Cocinar. 2007. “Don’t
Forget the Laptop: Using Native Input Capabilities
for Expressive Musical Control.” In Proceedings of
the International Conference on New Interfaces for
Musical Expression, páginas. 164–167.
Gaye, l., R. Maz ´e, y yo. mi. Holmquist. 2003. “Sonic City:
The Urban Environment as a Musical Interface.” In
Proceedings of the International Conference on New
Interfaces for Musical Expression, páginas. 109–115.
20
Computer Music Journal
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
C
oh
metro
j
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
/
/
/
3
8
2
8
1
8
5
5
9
8
8
/
C
oh
metro
_
a
_
0
0
2
3
6
pag
d
.
j
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3
Gaye, l., et al. 2006. “Mobile Music Technology: Informe
on an Emerging Community.” In Proceedings of
the International Conference on New Interfaces for
Musical Expression, páginas. 22–25.
Geiger, GRAMO. 2003. “PDa: Real Time Signal Processing
and Sound Generation on Handheld Devices.” In
Proceedings of the International Computer Music
Conferencia, páginas. 283–286.
hamilton, r., j. Herrero, y G. Wang. 2011. “Social
Composition: Musical Data Systems for Expres-
sive Mobile Music.” Leonardo Music Journal 21:
57–64.
Hoffman, Mate. 2007. “Breathalyzer.” Available online at
smelt.cs.princeton.edu/pieces/Breathalyzer. Accedido
Octubre 2013.
Levin, GRAMO. 2001. “Dialtones (a Telesymphony).” Available
online at www.flong.com/projects/telesymphony.
Accessed December 2013.
Misra, A., GRAMO. Essl, y M. Rohs. 2008. “Microphone as
Sensor in Mobile Phone Performance.” In Proceedings
of the International Conference on New Interfaces for
Musical Expression, páginas. 185–188.
Oh, J., y G. Wang. 2011. “Audience–Participation
Techniques Based on Social Mobile Computing.” In
Proceedings of the International Computer Music
Conferencia, páginas. 665–671.
Oh, J., et al. 2010. “Evolving the Mobile Phone Orchestra.”
In Proceedings of the International Conference on New
Interfaces for Musical Expression, páginas. 82–87.
Smallwood, S., et al. 2008. “Composing for Laptop
Orchestra.” Computer Music Journal 32(1):9–25.
Schiemer, GRAMO., y M. Havryliv. 2006. “Pocket Gamelan:
Tuneable Trajectories for Flying Sources in Mandala 3
and Mandala 4.” In Proceedings of the International
Conference on New Interfaces for Musical Expression,
páginas. 37–42.
Tanaka, A. 2004. “Mobile Music Making.” In Proceedings
of the International Conference on New Interfaces for
Musical Expression, páginas. 154–156.
Tanaka, A., y P. Gemeinboeck. 2006. “A Framework for
Spatial Interaction in Locative Media.”In Proceedings
of the International Conference on New Interfaces for
Musical Expression, páginas. 26–30.
Tanaka, A., y P. Gemeinboeck. 2008. “Net D ´erive: Estafa-
ceiving and Producing a Locative Media Artwork.” In
GRAMO. Goggins and L. Hjorth, eds. Mobile Technologies:
From Telecommunications to Media. Londres: Rout-
ledge, páginas. 174–186.
Trueman, D. 2007. “Why a Laptop Orchestra?” Organised
Sound 12(2):171–179.
Wang, GRAMO. 2008. “The ChucK Audio Programming Lan-
guage: A Strongly-Timed and On-the-Fly Environ/
mentality.” PhD Thesis, Universidad de Princeton.
Wang, GRAMO. 2014. “The World Is Your Stage: Making Music
on the iPhone.” In S. Gopinath and J. Stanyek, eds.
Oxford Handbook of Mobile Music Studies, Volumen 2.
Oxford: prensa de la Universidad de Oxford, páginas. 487–504.
Wang, GRAMO. 2015. “Improvisation of the Masses: Anytime,
Anywhere Music.” In G. Lewis and B. Piekut, eds.
Oxford Handbook of Improvisation Studies. Oxford:
prensa de la Universidad de Oxford.
Wang, GRAMO., GRAMO. Essl, and H. Penttinen. 2008. “MoPhO:
Do Mobile Phones Dreams of Electric Orchestras?"
In Proceedings of the International Computer Music
Conferencia, páginas. 331–337.
Wang, GRAMO., GRAMO. Essl, and H. Penttinen. 2014. “Mobile Phone
Orchestra.” In S. Gopinath and J. Stanyek, eds. Oxford
Handbook of Mobile Music Studies, Volumen 2. Oxford:
prensa de la Universidad de Oxford, páginas. 453–469.
Wang, GRAMO., et al. 2008. “The Laptop Orchestra as Class-
room.” Computer Music Journal 32(1):26–37.
Wang, GRAMO., et al. 2009a. “Stanford Laptop Orchestra
(SLOrk).” In Proceedings of International Computer
Music Conference, páginas. 505–508.
Wang, GRAMO., et al. 2009b. “Smule = Sonic Media: Un
Intersection of the Mobile, Musical, and Social.” In
Proceedings of the International Computer Music
Conferencia, páginas. 283–286.
Wikipedia. 2013. “The Legend of Zelda: Ocarina of Time,"
Wikipedia. Available online at en.wikipedia.org/
wiki/The Legend of Zelda: Ocarina of Time. Accedido
Octubre 2013.
Wang
21
yo
D
oh
w
norte
oh
a
d
mi
d
F
r
oh
metro
h
t
t
pag
:
/
/
d
i
r
mi
C
t
.
metro
i
t
.
mi
d
tu
/
C
oh
metro
j
/
yo
a
r
t
i
C
mi
–
pag
d
F
/
/
/
/
3
8
2
8
1
8
5
5
9
8
8
/
C
oh
metro
_
a
_
0
0
2
3
6
pag
d
.
j
F
b
y
gramo
tu
mi
s
t
t
oh
norte
0
7
S
mi
pag
mi
metro
b
mi
r
2
0
2
3