
Deep Recurrent Learning for Heart Sounds
Segmentation based on Instantaneous Frequency
Features
Aprendizaje profundo y recurrente para la segmentaci
´
on de sonidos card
´
ıacos basado en
caracter
´
ısticas de frecuencia instant
´
anea
´
Alvaro Joaqu
´
ın Gaona
∗1
, Pedro David Arini
∗†2
∗
Facultad de Ingenier
´
ıa, Universidad de Buenos Aires,
Instituto de Ingenier
´
ıa Biom
´
edica, (IIBM)
Avenida Paseo Col
´
on 850, C1063ACV, Buenos Aires, Argentina
1
agaona@fi.uba.ar
†
Instituto Argentino de Matem
´
atica ”Alberto P. Calder
´
on”, CONICET
Saavedra 15, C1083ACA, Buenos Aires, Argentina
2
pedro.arini@conicet.gov.ar
Abstract—In this work, a novel stack of well-known
technologies is presented to determine an automatic method
to segment the heart sounds in a phonocardiogram (PCG).
We will show a deep recurrent neural network (DRNN)
capable of segmenting a PCG into their main components
and a very specific way of extracting instantaneous frequency
that will play an important role in the training and testing
of the proposed model. More specifically, it involves an Long
Short-Term Memory (LSTM) neural network accompanied
by the Fourier Synchrosqueezed Transform (FSST) used to
extract instantaneous time-frequency features from a PCG.
The present approach was tested on heart sound signals
longer than 5 seconds and shorter than 35 seconds from
freely-available databases. This approach proved that, with
a relatively small architecture, a small set of data and the
right features, this method achieved an almost state-of-the-art
performance, showing an average sensitivity of 89.5%, an
average positive predictive value of 89.3% and an average
accuracy of 91.3%.
Keywords: phonocardiogram; fourier synchrosqueezed
transform; long short-term memory.
Resumen—En este trabajo se presenta un conjunto de
t
´
ecnicas bien conocidas definiendo un m
´
etodo autom
´
atico para
determinar los sonidos fundamentales en un fonocardiograma
(PCG). Mostraremos una red neuronal recurrente capaz de
segmentar segmentar un fonocardiograma en sus principales
componentes, y una forma muy espec
´
ıfica de extraer
frecuencias instant
´
aneas que jugar
´
an un importante rol en
el entrenamiento y validaci
´
on del modelo propuesto. M
´
as
espec
´
ıficamente, el m
´
etodo propuesto involucra una red
neuronal Long Short-Term Memory (LSTM) acompa
˜
nada
de la Transformada Sincronizada de Fourier (FSST) usada
para extraer atributos en tiempo-frecuencia en un PCG. El
presente enfoque fue evaluado con se
˜
nales de fonocardiogramas
mayores a 5 segundos y menores a 35 segundos de duraci
´
on
extra
´
ıdos de bases de datos p
´
ublicas. Se demostr
´
o, que con
una arquitectura relativamente peque
˜
na, un conjunto de
datos acotado y una buena elecci
´
on de las caracter
´
ısticas,
este m
´
etodo alcanza una eficacia cercana a la del estado del
arte, con una sensitividad promedio de 89.5%, una precisi
´
on
promedio de 89.3% y una exactitud promedio de 91.3%.
Palabras clave: fonocardiograma; transformada sincronizada
de fourier; long short-term memory.
I. INTRODUCTION
Phonocardiography is a method to record the acoustic
phenomena of the heart graphically. It is used to provide
information about the cardiac cycle by plotting sounds and
murmurs of the heart. The sounds result from the closure
of the heart valves, and it is possible to identify at least
two sounds. The first one, S
1
, corresponds to the closure
of the atrioventricular valves (mitral and tricuspid valve) at
the beginning of the systole. At this point, the ventricles
filled with blood from the atriums and muscle contractions
begin to eject the oxygenated and deoxygenated blood to the
pulmonary and systemic circuits respectively. After most of
the blood has been ejected from the ventricles, the aortic and
pulmonary valves close producing the second sound, S
2
.
Additionally, two other segments of the phonocardiogram
(PCG) can be identified. The first one is the segment S
1
-
S
2
called isovolumetric contraction and the second one is
the segment. S
2
-S
1
called isovolumetric relaxation, which
usually is shorter than the first segment. Heart sounds
segmentation dates back to 1997 where H. Liang et al. used
a deterministic algorithm based on the normalized average
Shannon energy of a PCG signal achieving a 93% correct
ratio. This approach has, however, some drawbacks such
as corrupting noise. In the same year, H. Liang et al. [1]
proposed an algorithm based on wavelet decomposition and
reconstruction performing correctly in over 93% of cases.
Heart sounds segmentation boomed in 2010 when Schmidt
et al. [2] proposed a Hidden Markov Model (HMM) based
on time-duration called Dependent-duration Hidden Markov
Model (DHMM). Additionally, it introduced the use of
annotations derived from the EKG to label training sets to
train the proposed model, later used by Springer et al. [3]
to go even further and outperform the previous work by
adding logistic regression and modifying the implementation
of the Viterbi algorithm. In 2018, Renna et al. in [4] have
used Deep learning techniques to segment the PCG. Their
Revista elektron, Vol. 4, No. 2, pp. 52-57 (2020)
Recibido: 15/08/20; Aceptado: 31/10/20
Creative Commons License - Attribution-NonCommercial-
NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
https://doi.org/10.37537/rev.elektron.4.2.101.2020
Original Article