HoloDeck – Percussion Demo

… continuing from previous post

The creation of a believable mixed reality application often requires an attentive audiovisual calibration. For the non-initiated, mixed-reality (MR) aims to blend the real and virtual worlds into one dependent scene, where the digital elements adapt to the local boundaries that define our space. Most high-quality use-cases are now location-fixed and require the touch of an engineer or a design artist to tune and blend the elements together (but mobile augmented reality apps are now a common example for smartphones).

By using the material captured in the previous session, we decided to create a VR/MR collaborative demo, where a live percussionist would play along the pre-recorded musicians represented as digital character. In the same space, an audience member would then be able to observe and hear both the digital characters, and a live digital avatar of the real-time percussionist, through VR goggles.

The ultimate goal here is to provide to the audience the concurrent impression of both “being there” with the musicians, and them “being here” with the observer. In other words, a coherent cognitive space needs to subjectively exist for the participants to feel compelled by the experience. 

To make our digital characters, we imported the motion capture data from Motive into Maya to rig a skeleton figure. That skeleton was then used to animate whatever digital asset we wanted to test in Unity. Our djembe percussion trio was thus transformed into sort of video-game animations. 

Rigging the percussionist motion capture data in Maya to use as character animation
A prototype avatar in Unity

Our next step was to choose an appropriate demo location, we chose the NYU Future Reality Lab, who are also participants to the Holodeck project, since they were able to provide us with a digital recreation of that same environment for an occluded VR headset implementation. This is important because there is always a relation between the auditory and visual component of a mixed-reality scene, the presence of one creates the expectation of the other, our brains need cohesiveness to subconsciously create a sense of reality. Since the audience is subject to the sound character of a real percussionist placed a few feet away, we need to show a digital environment which comes close to an expected scenario able to produce that acoustic character. In our workflow, the auditory sense defined the visual element needed to justify itself.

After an audiovisual synching process, we used SteamVR as the dynamic spatial audio engine to create localizable point-sound-sources from the spot-microphones’ captures. To enhance the blend with the real drummer, we processed the dry object-audio material with an acoustic impulse response filter recorded at the listener’s location. This process effectively “transfers” the acoustic property of a room onto a signal recorded into a different one. The final mix was tuned through rehearsals, where we evaluated the blend between the live and recorded virtual drummers.  

During the presentation, the live drummer was fitted with a mocap suit and live-rendered as an avatar. We then reproduced the pre-recorded playback to the observer and the musician. The observer was fitted with open-back headphones in order to be able to hear the local drummer too. See the outcome in the video below.

Game characters are animated using the pre-recorded motion capture and sound of a Djembe percussion trio and a dancer. During the presentation, a live, motion-capped musician, joins the performance becoming a real-time avatar (brighter color).

An audience member can observe the whole performance using a VR headset. The pre-recorded sound is processed with a dynamic spatial audio engine and acoustic filters to match the local acoustics. Sound in external view is from my phone (excuse bad quality)

Several future improvements are planned. Formal research investigations will follow using the paradigms involved in this pilot test. As far as music goes, more types of instrumentations and music should be tested, as well as more types of interactive combinations (e.g. all musicians in real-time, audience in a separate room etc.). Variations and modifications of these factors imply different sets of requirements and different possibilities.