Spectrogram View

From Audacity Development Manual
Revision as of 16:19, 5 February 2015 by PeterSampson (talk | contribs) (Spectral selection: added link to Spectral Selection)
Jump to: navigation, search
The Spectrogram View of an audio track provides a visual indication of how the energy in different frequency bands changes over time.
  • Bill 11Nov2014: Here is a start on detailed documentation of spectrogram view. I intend to continue with this, showing the log(f) view, then concluding with some sort of comparison that shows that different settings (window size, linear vs. log(f), min and max frequency) are better for different source material. I also need to demonstrate when the Frequency Gain setting can be useful.
  • Bill 22Dec2014: Added Spectrogram log(f) with examples. Still need example of frequency gain.
  • Bill 22Dec2014: Needs review.
    • Steve 23Jan15: Looks good, but as part of the P2 I think it needs examples of spectral selection (assuming that spectral selection is included in 2.1)
    • Peter 04Feb15: I really don't agree. We already have a page called Spectral Selection which documents that. Furthermore the Waveforn view page doesn't carry an image with a selction. We could consider a link to Spectral Selection. Removed the P2.
      • Steve 04Feb15: As
  1. Spectral Selection is on by default,
  2. Even when you "switch it off" with the Q button, it comes back on as soon as you make a new selection (so there really is no way to "turn it off"),
  3. The user is more likely to have a selection than not have a selection,
  4. We have a section here called What the Colors Mean,
I think users will, in the absence of any explanation, find it confusing that the will see colours that different to what are shown here. All that I think we need is one image of a Spectral Selection, saying:
"This is how the track appears in spectrogram view when there is a selection. (link to Spectral Selection").
  • Gale 05Feb15: ToDo-1 I have not seen this page before now, but I'm going to go further and call it P1 for an image showing spectral selection (or possibly two, one with selection with no bandwidth, and one for a selection with defined bandwidth). Even making a selection with no bandwidth draws a horizontal line in the selection. What is it assumed to be, without explanation? Is it another kind of zoom band? And as Steve says, if you drag down, what is the colour overlay?

What the Colors Mean

To demonstrate how the various settings affect the appearance of an audio track in spectrogram view, we will start with this artificially constructed test track. It consists of five segments of a sine wave tone at 2000 Hz, each 2 seconds long. The first segment is at a level of -10 dB, the second at -30 dB, and subsequent segments at -50 dB, -70 dB and -90 dB.

This is how the track appears in waveform view.

SpectrogramView 01.png

This is how the track appears in spectrogram view, using the default settings.

SpectrogramView 02.png

The default settings are:

  • Window size: 256
  • Window type: Hanning
  • Minimum frequency (Hz): 0
  • Maximum frequency (Hz): 8000
  • Gain (dB): 20
  • Range (dB): 80
  • Frequency Gain (dB/dec): 0

What do these settings mean and how to they relate to what you see on the screen?

As you can clearly see, the minimum and maximum frequency settings determine the minimum and maximum frequencies displayed, as indicated in the track vertical scale.

Gain can be said to increase the "brightness" of the display. It does this by amplifying the signal by the indicated amount. With the default setting of 20 dB, any frequency band that originally had (before amplification) a level of -20 dB or greater (and now, after amplification has a level greater than 0 dB) will be displayed as white. Similarly the "lower" level bands will also "get brighter".

There are six color bands in spectrogram view: white, red, magenta, dark blue, light blue and grey. The Range setting determines the spacing between colors.

With the default settings of Gain = 20 dB and Range = 80 dB, the colors correspond to the following levels:

  • anything above -20 dB is indistinguishably white (the tone at -10 dB in the image above is white)
  • levels from -40 dB to -20 dB transition from red to white (the tone at -30 dB in the image above is light red)
  • levels from -60 dB to -40 dB transition from magenta to red (the tone at -50 dB in the image above is magenta)
  • levels from -80 dB to -60 dB transition from dark blue to magenta (the tone at -70 dB in the image above is bluish purple)
  • levels from -100 dB to -80 dB transition from light blue to dark blue (the tone at -90 dB in the image above is light blue)
  • anything below -100 dB is grey

Time Smearing and Frequency Smearing

If this is a pure tone, why does the spectrogram show energy at frequencies between 0 Hz and 5000 Hz?

Spectrogram view uses the fast fourier transform (FFT) to display the frequency information versus time. There is an inherent trade-off between frequency resolution and time resolution. When using the default window size of 256 the spectrogram is drawn quickly, but the frequency resolution is not so good.

The image below shows the time smearing at the start of the track.

SpectrogramView 03.png

Changing the Window Size to 2048 and displaying the entire track results in this view of the track.

SpectrogramView 04.png

We can see that the frequency resolution has increased. That is, there is much less "frequency smearing" with the larger window size. Note that the "spikes" every two seconds are the result of the discontinuities created when joining the segments of tone at different levels.

Looking at the first 0.04 seconds of the track, we can see the the "time smearing" has increased with a window size of 2048 compared to 256.

SpectrogramView 05.png

You can zoom in on the vertical (frequency) axis.

SpectrogramView 06.png

After zooming in, the vertical ruler changes to allow greater precision of the scale.

SpectrogramView 07.png

Effect of Different Window Types

In the case of this particular test track we can get even better frequency resolution by changing the Window Type to Blackman-Harris

SpectrogramView 08.png

Changing to a rectangular window causes the track to be redrawn faster at the expense of very bad frequency smearing. Note that this is still at a window size of 2048.

SpectrogramView 09.png

Spectrogram log(f) View

Choosing Spectrogram log(f) from the Track Dropdown menu will display a logarithmic vertical scale. Here is a music track displayed in Spectrogram log(f) view with the default settings of: Window size of 256, Window type of Hanning, Minimum Frequency 0 and Maximum frequency 8000.

SpectrogramView 10.png

Different settings can improve the visibility of certain elements in the recording. In the image below the settings were: Window size of 2048, Window type of Hanning, Minimum Frequency 20 and Maximum frequency 22000.

SpectrogramView 11.png

Spectral selection

To define a time range combined with a spectral range, hover at a vertical position that you want to be the approximate center frequency to act on then click and drag a selection horizontally. A horizontal line appears beside the I-Beam mouse pointer that defines the center frequency.

Drag vertically (with or without continuing to drag horizontally) to define the bandwidth (range of frequencies) to be acted on. A "box" containing a combined frequency and time range is now drawn in a yellowish tint as shown below:

Spectral 02.png

For more detail see Spectral Selection.