Lip Sync info in wav file

  • Thread starter Thread starter BrianJ
  • Start date Start date
B

BrianJ

Guest
In the recent interview with Gabe Newell, they talked about how they packed lip sync information into a wav file. Does anybody know the name of this technique or have any info on it?
 
I'd imagine that the Source engine has a custom wav player that searches for a specific sequence of data that marks the transition from sound data to lip synch data.

If you played the altered WAVs back in something like windows media player, you'd probably hear the speech, then some static or crackling from the lip synch data.

EDIT: Of course, that's only my take on the situation. If I'm wrong, I'd love to know how it's really done...
 
Yes, that's right; they said that when you process the basic speech WAV, you have to also give the editor a line of text to work out the phonemes from.
 
Voice comm real time lip synching works... well, at least in Half-Life. :p

Would be stupid if they didn't include that in HL2.
 
BrianJ said:
In the recent interview with Gabe Newell, they talked about how they packed lip sync information into a wav file. Does anybody know the name of this technique or have any info on it?

Well, the stolen source had full SDK docs for the lipsynch and facial animation components. I can't remember what it was called, but it involved writing phoenomes... for example, the word "open" might be written OH - p - N or something... those symbols are readable by the facial animation system. (I don't remember how it worked exactly, i haven't seen the source since it was first leaked) it's something like that. Every sound group is linked to a certain facial muscle position, so the game reads the sound file, lines it up with the phoenomes that have been coded into a special wav file, and then lipsynchs in real time.

If this is too much info, please edit.
 
Spiffae said:
Well, the stolen source had full SDK docs for the lipsynch and facial animation components. I can't remember what it was called, but it involved writing phoenomes... for example, the word "open" might be written OH - p - N or something... those symbols are readable by the facial animation system. (I don't remember how it worked exactly, i haven't seen the source since it was first leaked) it's something like that. Every sound group is linked to a certain facial muscle position, so the game reads the sound file, lines it up with the phoenomes that have been coded into a special wav file, and then lipsynchs in real time.

If this is too much info, please edit.



has anyone noticed in the 20 minute video or whatever, that g-man kind of talks robotic?

maybe its just me, I dunno.
But then again, alex and that doctor seemed to talk fine so I guess its probably not a problem with the lip sync thing.
 
i'd say what you're hearing is a problem with the voice actor. the voice isn't generated by the phoenomes... it's a simple recording of an actor in a recording studio.
 
Spiffae said:
i'd say what you're hearing is a problem with the voice actor. the voice isn't generated by the phoenomes... it's a simple recording of an actor in a recording studio.

yeah, your right.

Did he talk like that in the first one?
 
Gman's funny talking is the same. He always did sound odd, like the Steven Hawkings computer voice almost. The words are fine, but the pauses between each sentence are off.
People wonder whether he's an alien or just freaky evil.
 
I think that's an intentional thing on the part of the voice actor. It makes him seem a bit enigmatic. It worked out well in the original Half-Life. Multiplayer lip-synching is possible (I've seen it in HL) but it's the really simplistic 'mouth opens when sound is made, closes when no sound is present' kind.
 
The Gman's voice works. Makes him seem creepy.

Wouldn't it be possible to perform accurate realtime lip-synching based on frequencies in the sound file...?
 
not unless you added in the phonenems

one idea i had for hl2 multiplayer a while back was to make some sort of device that could read the movements of your face as you talked and use that information to animate your character's face as you talk and sync it up to your voice.

if it existed it would probably be one of those control point based motion capture systems and it would require you to stick little adhesive pads to parts of your face

but to be able to let an enemy see your smile as he turns a blind corner to catch a face full of shotgun blast, it would be so worth it. especially if he had it too and you saw his surprise
 
Not a bad idea... how about a set of bands that wrap comfortably around your face and alter their internal resistance based upon how much they stretch, and then feed the data into some kind of controller? That could work. It'd look a bit like a holey balaclava...
 
either way i doubt it would catch on for 2 reasons. both systems would probably not be cheap, which most gamers are. Also the fact that you have to stick stuff to your face may be a big put off for most people.

but oh well, a man can dream can't he?
 
Flyingdebris said:
either way i doubt it would catch on for 2 reasons. both systems would probably not be cheap, which most gamers are. Also the fact that you have to stick stuff to your face may be a big put off for most people.

but oh well, a man can dream can't he?
Umm, i think he was joking :)
 
No I wasn't.

And I didn't mean it to be sticky. The wearer would merely stretch it over his/her face, with the middle of each band passing over an area of major facial movement.
 
Seems like you got a bit off-topic (I don't mind) but I'd still like to answer the original question.

Lipsync data can easily be inserted to WAV files as "tags" which is a standard feature in WAV format. Basically it's a short line of text that tells when you have reached certain position in the audio file. Then all they need to is to create events, such as "When 'phoneme_f' found in WAV file, activate 'facial_morph_phoneme_f'".

Simple as that :)
 
Brian Damage said:
The Gman's voice works. Makes him seem creepy.

Wouldn't it be possible to perform accurate realtime lip-synching based on frequencies in the sound file...?

Actually I was thinking about this a couple of weeks ago. I was trying to think of valve's likely method of synching the lips to the wav and I thought of the possiblity of detecting the highs and lows in the frequency of the wav and matching the mouth accordingly, but that isn't really possible now that I think of it because if the frequency shoots up, there's no distinguishing how it actually sounds... like "dog" and "cat" might spark the same frequency and it wouldn't have anything to work with.
embedding the text contained in the wav sounds like the best plan, but also sounds like a pain if you've got like 500 wavs to create :rolleyes:
also, real-time lip-synching on the fly isn't really possible using the current method. sure it can detect when noise is made and make your mouth open and close (like in hl1) but it won't be perfectly realistic. good enough for me though :E
 
I'd say it's not so much frequencies, as combinations of frequencies, or combinations of frequencies that differ relative to one another...
 
Brian Damage said:
I'd say it's not so much frequencies, as combinations of frequencies, or combinations of frequencies that differ relative to one another...

We're simply not at the level where we can do that realtime stuff yet. Spoken language is so subtle, and so arbitrary in some cases that the phoenome system is essential. There are a lot of times where two sounds might sound the same to the ear (and the computer) but the facial muscles you use to make those sounds is completely different. The phoenomes help create the illusion that the character is actually articulating the word.
 
So we also need detection of context with other detected phonemes, then...
 
Back
Top