Video montage with kdenlive: Working with audio

Audio Blog Banner 02.png

If a percentage were to be put in relation to the importance in an audiovisual production between what is the audio and the visual part, photography or video, the average would be 50%. So in an audiovisual production, whatever the type is, the audio is as important a part as the video.

Its narrative power is very high, since in cinema we can substitute elements for sounds without their emitter or the elements that produce them being present in the filming. An example may be the simple barking of a dog; just by hearing it, the viewer will deduce that behind this wall that the protagonist jumps there is a dog without having seen it. If the barking of a “chiguagua” is heard, then it will make us laugh, but if the barking of a fierce dog is heard and one of these great and dangerous comes to mind, then we will fear for the physical integrity of the protagonist .

The same happens with the destruction: when we make a short film it is difficult to find volunteers to destroy your car or the window of your store, but the sound of a braking with the corresponding impact sound and a close-up of an actor with the face of being witness to the accident, there has already been an accident here. Assume that everything in cinema is a lie.

Introduction to audio processing in an audiovisual production

Each type of audiovisual production and genre has its small variants but generically there are some common aspects in all; an exception would be a music video clip where the only audio that is present is the musical theme, or not, since ambient sounds or small narrative introductions or endings are sometimes introduced. A clear example of this that I have just cited I can illustrate with a video clip of the group Twisted Sister, of the song “I Wanna Rock”, where there is a narrative introduction, even ambient sounds mixed with the music, look at the pool scene from minute 3:00 to 3:11 and a little “funny” ending.

In an audiovisual production the audio is not processed in the video editor; this task is carried out in specific editors to deal with the audio. In linux we have the wonderful Audacity for this task. What can be done perfectly in the video editor is the final mix of the different parts of the audio already processed and of course the synchronization of the ambient sounds, special effects, etc … with the video.

Let’s see how this theme works in professional production companies. To explain this workflow, we will take as an example the making of a short film, where there are dialogues, ambient sounds, special effects and background music.

I will make a step-by-step scheme starting from the point where we start the video montage and we want a final result in stereo.

  1. As we saw in the chapter on basic editing, the story is staged without effects or transitions or more sounds than those that come in the clips, where the dialogues and ambient sounds of the recording are. If in any fragment the music sets the pace or goes from being an ambient filler to a narrative element, it can also be included. If there is an explosion in a fragment and it lasts 3 seconds and we do not have these shots yet (because they are part of the advanced montage), what is done is to put a color clip (red itself) of 3 seconds where the images and sounds will then go corresponding.
  2. Once the IMPORTANT basic montage is finished, the audio track alone is exported (apart from a draft with the video). This in kdenlive is as simple as going to the export wizard and choosing a “preset” to export only the audio. 01.png
  3. This audio track is perfectly synchronized with the video, so it is important not to alter its duration or fragment it; it does not matter if once processed there are fragments in which there is only silence since, once cleaned and treated, it will be inserted into the video again and will be perfectly synchronized with it. In advanced montage the raw audio track (the one that was exported in the previous point) has been removed and replaced with the already processed track. This process consists, among other things, of cleaning unwanted sounds or background noise, equalizing, adjusting levels … etc.
  4. Once the advanced editing of the film, short film, documentary, etc … with its dialogues and narrative audios also finished, is finished, this video is exported with good quality and we already have what is the important part of our project finished in a single clip of video, but the credits and texts that are inserted in the “pre-final” montage are still missing.
  5. This “pre-final” montage is a new project (within this realization) that we started from scratch; from basic to advanced was a continuation but here we started a new project and inserted the result of the advanced montage as the main video clip. In this montage the initial and final credits and the text titles are made or inserted (they may have been made in another specific project). In short, it is the section on Credits and texts. Once this step is finished, where for the synchronization of the credits it is possible that we have already referenced the music that accompanies them to synchronize them with it, once this “prefinal” is finished we export this video again with good quality, but with the muted music, since this is already part of the final montage, where the audio mix is ​​carried out. So this video will still only have the dialogues.
  6. At this point we have already started the final part of our project: finishing the audio section. We start from another blank project, we insert the “prefinal” result as the only video clip; This already takes all the elements from the beginning to the end and we are ready to finish decorating this with the audio section.
  7. In this final project (with a single video track¹ that covers the entire project) which gives us a fluidity of reproduction in most cases in real time, we create the necessary tracks with ambient sounds, special effects and music. Typically at least one audio track is used for each sound element. In this example we would need at least 4 tracks: dialogues, environments, effects and music. But in some cases, due to overlapping environments or effects, even more can be used.
  8. Once the entire audio insertion process is finished, there are two aspects depending on the project or the producer’s preferences.
    • The final result is mixed in the same video editor.
    • Tracks are exported individually with the duration of the project, mixing is done in the audio studio and a stereo track is returned, in this case, which is inserted into the final video, silencing or eliminating the other tracks.
      *Advice: Copies of the video montage are always saved when we make important changes and deleting the audio tracks for inserting the final result as a stereo track with all the audio in the project is an important change and deserves to have saved a copy.

¹ (It is common that to finalize the synchronization of the audios, in point 7 of this explanation, what is done is to insert a copy of the video at low resolution, (an exact copy of the high resolution project), so that it is easily reproduced in real time, giving fluency and facilitating the work of synchronizing the audio.Once finished, this low resolution video is replaced with its good version, or the audio is exported and inserted into the project where it is worked at high resolution for its final export.)

Elementary process of dialogues

I’m going to give a few touches, tips, that have nothing to do with Kdenlive but with the audio. These tips are the process in point 3 of the previous section.

Once the audio track is exported, it is sent to the sound technicians, in this case oneself.

In the audio editor what you do is clean the background noises. In spaces where there is no dialogue or atmosphere, mute the audio without modifying its duration, compress if necessary to equalize levels, equalize. Especially the typical attenuation of bass below 100Hz and treble above 5000Hz. There is a wealth of information on how to do this in Audacity on the web. So I will focus on some tips that affect our video production.

The biggest problem that we find on many occasions is that the dialogues recorded outdoors, when cleaning the voice, are degraded because the background noise levels are very high. So we have to resort to dubbing. An extra work very present in almost all audiovisual productions, where the The actor has not only recorded the video scenes, but then has to go to the audio studio to repeat the dialogue with the same emphasis, rhythm and speed as in the original.

To make this dubbing, the corresponding video fragments of the basic montage are exported with the dirty audio; These clips are the reference that the actor has to do the dubbing. He sees the raw scene and with headphones he hears himself and after a little rehearsal in each scene he reinterprets the dialogues until he squares it and takes it for granted.

Then these audios are inserted into a new track in Audacity exactly in sync with the originals and the raw ones are muted.

The sound technician uses as a reference the video draft that has been exported together with the audio track to be able to do his job in the most correct way possible and place each audio in the stereo spectrum according to the action and placement of the actors. Although the balanced and panoramic theme can also be done in the video editor.

Curiosity: In cinema this dubbing is done by the main and supporting actors themselves when necessary, and it is very frequent in the extras. These people hired to fill in, who sometimes have a brief interaction in the dialogue with a phrase or word, if this audio has not turned out well it will be dubbed and this will be done by anyone from the production team. It is very common when someone goes to see his minute of glory because he has been an extra in a blockbuster to be surprised that what is heard is not his voice.

Monaural (abbreviated mono) or stereo tracks?

In video montage it is very common to use mono tracks for dialogue, effects and some ambient sounds; mixing these tracks will give the export a stereo track in which the sounds will come in the same direction as the action. It should be said that the richness of stereo sound is always higher than that of mono as long as we have recorded with a stereo microphone or with two microphones: one for each channel and placed so that they give us a correct stereo field.

It should be noted that a stereo track with identical channels is the same as a mono track. For a track to be considered truly stereo there must be differences between the two channels, even if they are very subtle. These differences can be of many types, in tone, in time, in volume, different elements in each channel, etc.

Examples from the previous paragraph.

  • A dog barking, although we cannot see it in the image, if we hear it through the left channel, we deduce that it is on the left. So a left balanced mono track.
  • A vehicle that passes through the plane from the left to the right: a mono track that sways along this same path.
  • A narration from someone we don’t see, which is called a voice-over: a mono track in the center.
  • Voice of an actor: mono voice balanced slightly towards the place it occupies in the shot.
  • It is advisable to record and insert environments in stereo, but these can be the result of mixing several tracks with different library sounds to set the scene where the action takes place.
  • Music in stereo. In this case the normal thing is that they are already processed; all we have to do is adjust the volume according to our needs.

Take into account that the elements do not naturally produce a stereo sound (a barking dog is not stereo, if there are two barking dogs then yes). Since two balanced mono tracks in a stereo field in different locations already give us a sense of stereo and the location, in this case, of the two dogs, the little one, which we deduce without seeing it from the barking sound, we place it at the right, because there we see it on the plane and if we don’t see it we deduce that it is there by hearing it in this location, and the large one in the center, or where we need it, for the same reason.

A motorcycle does not produce a stereo sound, but if it passes in the plane from left to right the correct thing is that this mono track is balanced in this same direction. It begins to be heard from the left, the volume is increasing, that is, it is getting closer, little by little the sound goes from the left side to the right, at the moment it is at the same level in the two channels, which is the center and the volume at its maximum, is when we have it right in front of us to go the sound to the right and lowering the volume because it moves away. All this process can be done with a motorcycle sound sample that we have in mono, and even to give it more realism, when it comes we will equalize this sound so that it sounds a little higher, enhancing these and attenuating the bass. When it is in front, real sound, without equalizing and when it goes we will attenuate the treble and enhance the bass. Which is what happens in reality because the sound waves are compressed when the element approaches, causing the highest sound to sound, and they expand when it moves away, sounding the lowest sound.

And after having read the previous paragraph if I say “A dog can produce stereo sound”, what a contradiction we will think. Let me explain, the dog itself cannot, but depending on the elements around it, then yes, a dog, a single person, a single motorcycle, etc., can sound in stereo. How is this? Let’s say that the dog is barking inside a cathedral, then this barking will generate the well-known reverberation produced by this type of buildings, which even in nature far exceeds the stereo field, but with a single two-channel system we can emulate very much. realistic. So in this case, we will apply a reverb effect to the mono or stereo library sound with identical channels to emulate this environment.

The final export will be done in stereo and each sound will be located in its corresponding location.

In stereo recordings, the correct position of the microphones is very important. If the previous requirement is correct, this audio will be used without balancing, since the sounds will already be in accordance with the action, but if they require retouching, kdenlive has some effects for this purpose.

When mixing the audio in stereo, make sure that the speakers are correctly positioned, that they correspond to the left and right channels. It would be very funny to have a car go from right to left and hear it backwards because of working with the speakers reversed. As this effect is funny it can be a humorous resource, it would be very typical of comedies that play with the absurd or surreal productions.

Finally!!! Working with audio in kdenlive

Audio Blog Banner 01.png

Kdenlive works with two channels, left and right, which is called stereo sound. In each audio track that we need in the project, we can insert both monaural or stereo audio and balance or “pan” as we wish with the effects that we will see later.

In the “timeline” of video editors we have two types of tracks, video and audio. In Kdenlive the video tracks are multipurpose, they reproduce both aspects, the audio tracks only reproduce the audio. We can insert a video clip in an audio track if we only want to use the audio from the aforementioned video, although the most appropriate thing is to extract the audio from said video, as we will see later in this section, and delete the video.


In kdenlive we can insert the tracks we need, both audio and video. To insert new tracks is as simple as right-clicking on the header and choosing the option “Insert track”: a very intuitive wizard opens that allows us to choose the location of the track, the type of track and its name.tracks.png

By default, the audio is shown in the following image:03.png

But if we want to see it in the traditional way:04.png

We go to the menu “Preferences → Configure kdenlive” and in the menu that opens we go to the “Timeline” section and in the thumbnails section we activate the option “Separate channels”05.png

In the track header we have a button to mute the audio of all its clips:05 2.pngThis button affects video export. If we mute a track, the sounds that are in it will not be present in the export.

In the project monitor we have a button to adjust the listening volume while we work and an output level indicator.06.pngListening volume does not affect output levels, it is just an assistant to adjust the output level while we work. So this level does not affect the export, although it is silent here, in the export the audio will be at the level indicated by the corresponding meters.

Separate the audio from the video clip into different tracks

An option widely used in professional video editors is to separate the audio from a video and place this audio on a separate track; this in kdenlive is done by pressing with the Right click on the video clip and choose the option “Split Audio”.07.pngFor this to be done correctly, there will have to be an audio track with empty space that covers the entire video clip to which we extract the audio. If this requirement is not met, we will get an error in the lower left corner (which is where kdenlive is giving us informative indications) that will warn us that there is not enough space.08.pngIn this case what you have to do is create a new audio track and this audio will be placed in it.

Note that if we have applied an audio effect to this video clip when dividing the audio, the effects will remain effective with the audio on the new track.

When we have extracted the audio in this way, audio and video are still linked, so any editing we make to one of them (such as moving, stretching, shrinking or cutting) will affect the other as if it were just one. In case we want the audio to be independent of the video or vice versa, to break this link in any of the clips we press the right button and choose the option “Ungroup clips”. Or we press “Ctrl + Shift + G” with any of them selected.

If we want the process of separating the audios from the video clips to be done automatically while we are inserting them, we have to activate the corresponding option that is at the bottom right of the timeline.09.pngThis way the clips are also kept grouped.

When we activate the option to automatically divide a green led turns on at the head of one of the tracks,10.png the audios will be inserted in this track. To change the track simply click on another audio track instead of the led to activate it.

Like when we do it manually, the track must have enough free space to be able to accommodate the entire clip (with the same duration of the video clip) if there is not enough space, the audio will not be divided but the video clip will be mute .

In the tools located in the lower right part of the timeline we have icons to display or not the waveforms. See the following image where I have annotated the function of each tool in this time line.png

Audio effects

Kdenlive has a wide assortment of audio effects and in the effects menu they are arranged in three families:

  • Audio: It covers all kinds of effects more typical of an audio editor than a video one, but sometimes it is useful to be able to use them directly in the video editor itself.
  • Audio channels: Effects that affect stereo channels or mono tracks and that are often used in the video editor to be able to position the sound with the action.
  • Audio correction: Effects also frequently used in video editors to manage the volume level of the different audio tracks.

Then we have the audio fades in the family of “Molten”That shares space with the video fades.

In kdenlive, audio effects are distinguished from video effects by the shape of their icon: in video the icon is square and in audio it is round.eleven

The audio and fade effects are applied exactly the same as the video effects and we even have the same two types of animatable effects: clock and keyframe bar effects, their operation being the same as for video.

In the video tracks that have the integrated audio, we can apply audio effects together with the video effects, but it is recommended (for a comfortable work flow) is to separate the audio from the video and apply the video effects to the video track and the audio to the audio track.

Commonly used effects in video editors
  • Fade in and out, from the “Fundidos” family. Apart from serving for a smooth input and output of the audio when necessary, it is a must when we cut the audio at both ends even with a duration of 2 or 3 frames to avoid unwanted clips and clicks. It is also recommended to insert it on tracks that are processed in the same way as cuts for the same 1.png
  • Volume (animatable) from the “Audio Correction” family. Its name already indicates its function and the one that can be animated. Kdenlive adjusts automatically, calculating an average, the volume of the different tracks so that the audio remains below the saturation value. But sometimes we need to alter these volumes to enhance a specific sound (as in the case of dialogues) by attenuating the other sound (s) or giving it more gain.
    • This editor does not have an audio mixer so this effect is essential and necessary to be able to adjust the volumes of the different clips in the way we need when the automatic mode does not suit our needs.
    Being animatable, it allows us to vary the volume of the same clip over time.vol 01.png Its interface is very simple and it only has one animatable parameter, which is the gain. Its default value is 0, if we need to increase volume then gain towards positive values ​​(→) and if we need to lower it then negative values ​​(←). It should be noted that it has the possibility of interpolation, by default “Linear”, but if smoother animation curves are desired we can set this to “Smooth”.
    Another aspect to highlight about this effect is that by having it selected, it can be edited directly on the corresponding track. In the following image we see two audio tracks that, having the effect selected, show us its editing line.vol 03.png Inserting new keyframes is as easy as double clicking on the line; to move them they are taken with the mouse and adjusted as desired. For the points to be seen, the clip must be selected and this also shows us a central reference line of the 0dB point along with the line that graphically shows us how it affects the gain.
    Observe in the previous image how in the clip above the line is based on straight lines, (with pointed corners in the changes of direction) and that of the clip below is more rounded (softer). This is because of interpolation: the top one is in “Linear” mode and the bottom one is in “Smooth” mode.
    vol 04.png
    Interpolation setting detail

    The interpolation can be changed from the effect or from the clip. When we move the mouse to a keyframe, it turns red; if we press the right mouse button, a menu opens with the option to change the interpolation of this point. This affects from the keyframe that is changed to the next one to the right.interpol.png

  • Gain from the “Audio Correction” family. Increase or decrease the volume of the audio to the inserted clip. It cannot be animated.
  • Balance from the “Audio Channels” family. It can be animated. It is used interchangeably on mono and stereo tracks for the same purpose: to bring sound to one side of the stereo spectrum. Very useful in stereo tracks with different sounds in their channels to attenuate one of them. It only has a value that goes from 0 to 1000, the sound being totally zero in the left channel, 1000 in the right channel and 500 the center point.bal 01.png• Mono tracks in a stereo system always sound in the center, but by balancing them we can place this sound wherever we want in the stereo field.
    • In stereo tracks this effect works as follows: If we move towards the value to the right, the left channel will be muted and vice versa. When it is a stereo track with the same sound on both channels, it acts the same as if it were a mono track. But if they are different sounds with this effect you can attenuate and even silence one of them. If, for example, in the channel on the left we have a dog barking and in the channel on the right a cat meowing and we go to the value on the right, it will give the feeling that the dog is moving away until it is completely silenced when it reaches the maximum value of the right that is 1000 and if We do the opposite (we go towards the maximum value on the left, which is 0 in this effect) the one that will move away until it is silenced will be the cat.
  • Panning from the “Audio Channels” family. Designed for stereo tracks. What this effect does is take the sound of one channel from the stereo field to the other without affecting the other. It has a box to choose which channel it will affect and a slider called “Pan” that goes from 0 (left) to 1000 (right). Its initial value of 500 what it does is place the chosen channel in the center of the stereo field. The “Pan” slider can be animated.bread 01.png It works as follows:
    • Affects the channel of the stereo track that is selected in the channel parameter.
    • If we have selected the left channel in panning, the value 0 will be without any alteration in that channel, the value 500 will make the audio of the left channel be placed in the center and the value 1000 will make the left channel sound together with the right channel only. on the right, leaving the left channel of this track in silence.
    • If the right channel is selected for panning, the value 1000 will be without any alteration in that channel, the value 500 will make the audio of the right channel be placed in the center and the value 0 will make the right channel sound together with the left channel only. on the left, leaving the right channel of this track in silence.
    • If we take as an example a stereo track like the example of the “Balance” effect (a dog barking on the left and a cat meowing on the right), if we choose the left channel in channel and make an animation that goes from 0 to 1000 the sound sensation will be that the dog moves from the left to the right and the cat remains there on the right immobile.
    • To do the opposite (that the cat goes to the dog) we would have to choose the right channel and animate the pan from the value 1000 to 0.
  • Swap channels from the “Audio Channels” family. Inverts the channels of a stereo track. If we apply it to the audio track of the previous example (a dog on the left, a cat on the right), when applying this effect to that track, the cat will sound on the left and the dog on the right. Very useful when reversing planes that already come with the stereo sound inserted, such as the passing of a vehicle, etc … It has no parameters.

The rest of the processing, except for some exceptions, it is better to export the audio and process it in audacity or our favorite audio editor.

An exception would be to apply a reverb to sounds that sound too dry because they have been doubled and have lost the reverb that each space carries; but be careful not to abuse the effect since the dialogues can become unintelligible.

The reverb effect that I liked because of its simplicity and effectiveness has been the Plate Reverb in which with the default values, just interacting with the “Damping” value, good results can be achieved. If we reduce this value it is as if we were expanding the room where we are and if we increase it, on the contrary. If we put a voice with the default values, the reverb is very subtle and suitable in most situations; if we lower the “Damping” to the value 0 we have put this person in a cathedral.


I wish it is useful to you and thank you for your attention

Deja un comentario