Lip Sync info in wav file

BrianJ · Feb 12, 2004

In the recent interview with Gabe Newell, they talked about how they packed lip sync information into a wav file. Does anybody know the name of this technique or have any info on it?

Brian Damage · Feb 12, 2004

I'd imagine that the Source engine has a custom wav player that searches for a specific sequence of data that marks the transition from sound data to lip synch data.

If you played the altered WAVs back in something like windows media player, you'd probably hear the speech, then some static or crackling from the lip synch data.

EDIT: Of course, that's only my take on the situation. If I'm wrong, I'd love to know how it's really done...

Brian Damage · Feb 12, 2004

Yes, that's right; they said that when you process the basic speech WAV, you have to also give the editor a line of text to work out the phonemes from.

Merc248 · Feb 12, 2004

Voice comm real time lip synching works... well, at least in Half-Life.

Would be stupid if they didn't include that in HL2.

Spiffae · Feb 12, 2004

BrianJ said:
In the recent interview with Gabe Newell, they talked about how they packed lip sync information into a wav file. Does anybody know the name of this technique or have any info on it?

Well, the stolen source had full SDK docs for the lipsynch and facial animation components. I can't remember what it was called, but it involved writing phoenomes... for example, the word "open" might be written OH - p - N or something... those symbols are readable by the facial animation system. (I don't remember how it worked exactly, i haven't seen the source since it was first leaked) it's something like that. Every sound group is linked to a certain facial muscle position, so the game reads the sound file, lines it up with the phoenomes that have been coded into a special wav file, and then lipsynchs in real time.

If this is too much info, please edit.

hiln · Feb 12, 2004

Spiffae said:
Well, the stolen source had full SDK docs for the lipsynch and facial animation components. I can't remember what it was called, but it involved writing phoenomes... for example, the word "open" might be written OH - p - N or something... those symbols are readable by the facial animation system. (I don't remember how it worked exactly, i haven't seen the source since it was first leaked) it's something like that. Every sound group is linked to a certain facial muscle position, so the game reads the sound file, lines it up with the phoenomes that have been coded into a special wav file, and then lipsynchs in real time.

If this is too much info, please edit.

has anyone noticed in the 20 minute video or whatever, that g-man kind of talks robotic?

maybe its just me, I dunno.
But then again, alex and that doctor seemed to talk fine so I guess its probably not a problem with the lip sync thing.

Spiffae · Feb 12, 2004

i'd say what you're hearing is a problem with the voice actor. the voice isn't generated by the phoenomes... it's a simple recording of an actor in a recording studio.

hiln · Feb 12, 2004

Spiffae said:
i'd say what you're hearing is a problem with the voice actor. the voice isn't generated by the phoenomes... it's a simple recording of an actor in a recording studio.

yeah, your right.

Did he talk like that in the first one?

Mechagodzilla · Feb 12, 2004

Gman's funny talking is the same. He always did sound odd, like the Steven Hawkings computer voice almost. The words are fine, but the pauses between each sentence are off.
People wonder whether he's an alien or just freaky evil.

MultiVaC · Feb 12, 2004

I think that's an intentional thing on the part of the voice actor. It makes him seem a bit enigmatic. It worked out well in the original Half-Life. Multiplayer lip-synching is possible (I've seen it in HL) but it's the really simplistic 'mouth opens when sound is made, closes when no sound is present' kind.

Brian Damage · Feb 12, 2004

The Gman's voice works. Makes him seem creepy.

Wouldn't it be possible to perform accurate realtime lip-synching based on frequencies in the sound file...?

Flyingdebris · Feb 12, 2004

not unless you added in the phonenems

one idea i had for hl2 multiplayer a while back was to make some sort of device that could read the movements of your face as you talked and use that information to animate your character's face as you talk and sync it up to your voice.

if it existed it would probably be one of those control point based motion capture systems and it would require you to stick little adhesive pads to parts of your face

but to be able to let an enemy see your smile as he turns a blind corner to catch a face full of shotgun blast, it would be so worth it. especially if he had it too and you saw his surprise

Brian Damage · Feb 12, 2004

Not a bad idea... how about a set of bands that wrap comfortably around your face and alter their internal resistance based upon how much they stretch, and then feed the data into some kind of controller? That could work. It'd look a bit like a holey balaclava...

Flyingdebris · Feb 12, 2004

either way i doubt it would catch on for 2 reasons. both systems would probably not be cheap, which most gamers are. Also the fact that you have to stick stuff to your face may be a big put off for most people.

but oh well, a man can dream can't he?

Spiffae · Feb 12, 2004

Flyingdebris said:
either way i doubt it would catch on for 2 reasons. both systems would probably not be cheap, which most gamers are. Also the fact that you have to stick stuff to your face may be a big put off for most people.

but oh well, a man can dream can't he?

Umm, i think he was joking

Brian Damage · Feb 12, 2004

No I wasn't.

And I didn't mean it to be sticky. The wearer would merely stretch it over his/her face, with the middle of each band passing over an area of major facial movement.

Para · Feb 12, 2004

Seems like you got a bit off-topic (I don't mind) but I'd still like to answer the original question.

Lipsync data can easily be inserted to WAV files as "tags" which is a standard feature in WAV format. Basically it's a short line of text that tells when you have reached certain position in the audio file. Then all they need to is to create events, such as "When 'phoneme_f' found in WAV file, activate 'facial_morph_phoneme_f'".

Simple as that

umop · Feb 12, 2004

Brian Damage said:
The Gman's voice works. Makes him seem creepy.

Wouldn't it be possible to perform accurate realtime lip-synching based on frequencies in the sound file...?

Actually I was thinking about this a couple of weeks ago. I was trying to think of valve's likely method of synching the lips to the wav and I thought of the possiblity of detecting the highs and lows in the frequency of the wav and matching the mouth accordingly, but that isn't really possible now that I think of it because if the frequency shoots up, there's no distinguishing how it actually sounds... like "dog" and "cat" might spark the same frequency and it wouldn't have anything to work with.
embedding the text contained in the wav sounds like the best plan, but also sounds like a pain if you've got like 500 wavs to create

also, real-time lip-synching on the fly isn't really possible using the current method. sure it can detect when noise is made and make your mouth open and close (like in hl1) but it won't be perfectly realistic. good enough for me though :E

Brian Damage · Feb 12, 2004

I'd say it's not so much frequencies, as combinations of frequencies, or combinations of frequencies that differ relative to one another...

Spiffae · Feb 12, 2004

Brian Damage said:
I'd say it's not so much frequencies, as combinations of frequencies, or combinations of frequencies that differ relative to one another...

We're simply not at the level where we can do that realtime stuff yet. Spoken language is so subtle, and so arbitrary in some cases that the phoenome system is essential. There are a lot of times where two sounds might sound the same to the ear (and the computer) but the facial muscles you use to make those sounds is completely different. The phoenomes help create the illusion that the character is actually articulating the word.

Brian Damage · Feb 12, 2004

So we also need detection of context with other detected phonemes, then...

Lip Sync info in wav file

BrianJ

Guest

Tank

Tank

Newbie

Newbie

Newbie

Newbie

Newbie

Tank

Tank

Tank

Tank

Tank

Tank

Newbie

Tank

Newbie

Newbie

Tank

Newbie

Tank

Similar threads