The creation of a believable mixed reality application often requires an attentive audiovisual calibration. For the non-initiated, mixed-reality (MR) aims to blend the real and virtual worlds into one dependent scene, where the digital elements adapt to the local boundaries that define our space. Most high-quality use-cases are now location-fixed and require the touch of an engineer or a design artist to tune and blend the elements together (but mobile augmented reality apps are now a common example for smartphones).
By using the material captured in the previous session, we decided to create a VR/MR collaborative demo, where a live percussionist would play along the pre-recorded musicians represented as digital character. In the same space, an audience member would then be able to observe and hear both the digital characters, and a live digital avatar of the real-time percussionist, through VR goggles.
The ultimate goal here is to provide to the audience the concurrent impression of both “being there” with the musicians, and them “being here” with the observer. In other words, a coherent cognitive space needs to subjectively exist for the participants to feel compelled by the experience.
To make our digital characters, we imported the motion capture data from Motive into Maya to rig a skeleton figure. That skeleton was then used to animate whatever digital asset we wanted to test in Unity. Our djembe percussion trio was thus transformed into sort of video-game animations.
Our next step was to choose an appropriate demo location, we chose the NYU Future Reality Lab, who are also participants to the Holodeck project, since they were able to provide us with a digital recreation of that same environment for an occluded VR headset implementation. This is important because there is always a relation between the auditory and visual component of a mixed-reality scene, the presence of one creates the expectation of the other, our brains need cohesiveness to subconsciously create a sense of reality. Since the audience is subject to the sound character of a real percussionist placed a few feet away, we need to show a digital environment which comes close to an expected scenario able to produce that acoustic character. In our workflow, the auditory sense defined the visual element needed to justify itself.
After an audiovisual synching process, we used SteamVR as the dynamic spatial audio engine to create localizable point-sound-sources from the spot-microphones’ captures. To enhance the blend with the real drummer, we processed the dry object-audio material with an acoustic impulse response filter recorded at the listener’s location. This process effectively “transfers” the acoustic property of a room onto a signal recorded into a different one. The final mix was tuned through rehearsals, where we evaluated the blend between the live and recorded virtual drummers.
During the presentation, the live drummer was fitted with a mocap suit and live-rendered as an avatar. We then reproduced the pre-recorded playback to the observer and the musician. The observer was fitted with open-back headphones in order to be able to hear the local drummer too. See the outcome in the video below.
Game characters are animated using the pre-recorded motion capture and sound of a Djembe percussion trio and a dancer. During the presentation, a live, motion-capped musician, joins the performance becoming a real-time avatar (brighter color).
An audience member can observe the whole performance using a VR headset. The pre-recorded sound is processed with a dynamic spatial audio engine and acoustic filters to match the local acoustics. Sound in external view is from my phone (excuse bad quality)
Several future improvements are planned. Formal research investigations will follow using the paradigms involved in this pilot test. As far as music goes, more types of instrumentations and music should be tested, as well as more types of interactive combinations (e.g. all musicians in real-time, audience in a separate room etc.). Variations and modifications of these factors imply different sets of requirements and different possibilities.
Recording and motion capturing an African djembe trio and a dancer
The HoloDeck computing network aims to prototype the digital collaborative interactions of the future. By setting up an audiovisual network of data exchange between calibrated rooms, we can do today what will be possible to do in 10 years from now with augmented and mixed reality technologies. We can thus study the social impact of remote digital immersive interaction for a vast number of applications.
One of the many applications of interest to us is music. How will remote music performance improve in the future? How can VR and AR be used to give a realistic impression to the musician and to the audience alike? Will we ever achieve an immersive digital experience comparable to real-life collaborative performance? In other words… can we make people create music remotely together and believe this is real? These are the questions that we are approaching by putting together this first iteration of our VR demo experience.
In order to collect material to use for our experiments, we recorded the sound and motion capture of an African percussion trio, plus a dancer (all members of the NYU percussion program). This material can later be used to make digital avatars that represent these musicians… sort of video-game characters. These avatar characters will eventually be used to provide a canvas in VR to test the response of additional musicians, dancers, and audience members. In these videos, we show the recording setup of 2 out of 3 drummers, and the dancer capturing.
Motive and Optitrack were used for the motion capture and ProTools for recording the sound. We tried to capture the different sound areas of the djembe in the most isolated way possible using spot-mics, in order to create audio objects easy to manipulate in a VR implementation. The motion tracking suits did require a little bit of posture adaptation for the performer’s drumming technique but the results were good nonetheless. For the dancer, we simply played back the lead-drum recording and mocapped the movements, in a future version we will try to capture the sound of the footsteps too.
Soundfield recording is the process of capturing an environmental auditory scenes. Not long ago, the NYU immersive audio group picked up what used to be called “IHearNY3D”, a collection of 3D Ambisonics recordings of various interesting locations of New York City. We soon realized how many applications in the media and research field can make good use of spatial audio recordings. 360 sound design, VR gaming, urban data analysis, classification training… just to name a few.
From there, CityTones started. We decided to begin a field collection repository of soundfield data, complete with 360 video, objective location data and subjective soundfield quality assessments. By recording the same location at different times of the day, we can now inspect how the soundfield environment changes with the dynamic routine of the urban crowd (see also SONYC).
Since then, we developed the project into an open source repository to be used for non-commercial purposes by engineer, scientists and students alike. This database wants to collect worldwide spatial recordings in B-format Ambisonics of interesting locations at different times of the day.
We encourage all recording engineers with an interest in soundfields to help us make this repository grow. You can find the submission portal on the button above and the submission specs on the publications page. The recordings will soon be open and available to download for the general public as multi-channel B-format files with panoramic spherical video.
On this page, I will attempt to keep a regular journal blog on the immersive media projects I participate in throughout my career. My intention is primarily to keep a record of the numerous activities, side-gigs, resources, and personal projects which do not see the light of academic publication, lest they get forgotten into oblivion.
I hope to keep the language on this blog more accessible to the general public, while reserving the technical details to the publications page. I expect the kind of posts here to be sort of diverse and eclectic… hopefully for a wider audience than just a few colleagues
Thanks for your time!