Time-frequency analysis in Bioinformatics

This page demonstrates the use of our time-frequency analysis add-on package for Mathematica. For an introduction to time-frequency analysis, read our tutorial.

Bioinformatics example

Wavelet analysis can be productively applied to many different signals in bioinformatics. The Kyte-Doolittle hydrophobicity profile of a protein is one such signal. The hydrophobicity of the amino-acids along the backbone of a protein are known to be an important determining factor in protein folding and the Kyte-Doolittle scheme is a common quantification of the hydrophobicity.

Consider the C-terminal domain of rabbit serum haemopexin (1hxn in the Protein DataBank) which has a 4-bladed propellor structure:

Show[Import["Protein/1hxn.png"]] ;

[Graphics:HTMLFiles/tutorial_198.gif]

Load the Kyte-Doolittle hydrophobicity profile:

HΦ = ReadList["Protein/hydropathy", Number] ;

In order to avoid artefacts due to sharp features at the boundary, append the reverse of the data to form an even-symmetric function which can then be analyzed using the CWT:

HΦinterp = Interpolation[Join[HΦ, Reverse[HΦ]]] ;

Little information can be extracted from the hydrophobicity data by eye:

                                                                    th Plot[HΦinterp[t], {t, 1, Length[HΦ]}, AxesLabel {n   residue, "Φ(n)"}] ;

[Graphics:HTMLFiles/tutorial_202.gif]

Use dynamic programming to compute the TFR of the Kyte-Doolittle hydrophibicity profile of this protein (with even-periodic boundary conditions) for a given wavelet parameter:

HΦtfr[σ_] := HΦtfr[σ] = FunctionTFR[HΦinterp[t], {t, 1, 2Length[HΦ], 2Length[HΦ]}, Parameterσ, TemporalRange {1, Length[HΦ]}]

A plot of the TFR of the Kyte-Doolittle hydrophobicity profile with white lines overlaid to show the positions of the alpha-helices which separate the four blades of this propellor-shaped protein can be used to look for interesting signal components. Due to the complexity of this data, there is no clearly desirable value for the parameter of the wavelet. Thus, it is productive to plot the results for several different parameter values, giving progressively finer vertical ("spectral") resolution at the cost of worsening horizontal ("temporal") resolution:

Do[Module[{tfr = HΦtfr[2^lnσ]}, Show[{ContourPlot[Abs[tfr[t, 2π ν]], {t, ...  Length[HΦ] + 2}, {0, .36}}, DisplayFunction$DisplayFunction]], {lnσ, 0, 4}] ;

[Graphics:HTMLFiles/tutorial_205.gif]

[Graphics:HTMLFiles/tutorial_206.gif]

[Graphics:HTMLFiles/tutorial_207.gif]

[Graphics:HTMLFiles/tutorial_208.gif]

[Graphics:HTMLFiles/tutorial_209.gif]

Little work has been done in this area, but these results suggest that the time-frequency content of the hydrophobicity profile strongly correlates with the protein's structure. In particular, the hydrophobicity appears to contain pseudo-periodic fluctuations which correlate with the weaving of the protein backbone. In the centers of each of the four propellor blades, the hydrophobicity exhibits strong components with a frequency of around 0.1 residues^(-1). This half period of 5 residues roughly corresponds to the depth of the propellors, indicating that time-frequency analysis may be used to quantitatively predict the size of protein motifs.