Voice is the sound produced in a person's larynx and uttered through the mouth, as speech or song. Like finger prints and the iris of the eye the voice of each person is unique. The parameters of a person’s voice differ in pitch, timber and vocal intensity. Hence it can be used for authentication. In this paper feature extraction methods used are Linear Predictive Coding, Fast Fourier Transform and wavelet transform to extract the above mentioned features from a database of voices that were recorded. Artificial neural network classifier operates on the result of classification. Some of the applications are Forensic applications, Artificial intelligence, Access Control, Transaction Authentication, Law Enforcement and Speech Data Management
",erric( +ereira
1,2,3,4,5
Electronics and Telecommunication , Padre Conceicao College of Engineering,Verna, Goa, India
E-mail: suri!a"adi#rediff$com
rac%isatarde&ar3#gmail$com
A - ) . R A C .
$oice is the sound produced in a person/s lar*n0 and uttered throu#h the mouth" as speech or
son#1 2i(e fin#er prints and the iris of the e*e the voice of each person is uni3ue1 .he parameters
of a person4s voice differ in pitch" tim5er and vocal intensit*1 6ence it can 5e
authentication1 In this paper feature e0traction methods used are 2inear +redictive Codin#" Fast
Fourier .ransform and 'avelet transform to e0tract the a5ove mentioned features from a
data5ase of voices that 'ere recorded1 Artificial neural net'or
of classification1 )ome of the applications are Forensic applications" Artificial intelli#ence" Access
Control" .ransaction Authentication" 2a' 7nforcement and )peech ,ata 8ana#ement1
Inde0 .eams9 A&&9 Artificial &eural
:avelet .ransform" FF.9 Fast Fourier .ransform
I1 I&.R;,%C.I;&
T%e 'oice is roduced in a erson(s lar!n) and uttered t%roug% t%e mout%, as seec% or song$ T%e t%ree
different arameters t%at ma&e a
'oice*,itc%+fre,uenc! reresentation -%ic% deends on seed of 'i.ration of our 'ocal c%ord* and
tim.er + ,ualit! in a ersons 'oice and it is measured from t%e en'eloe of t%e o-er sectru
-%ic% is different for eac% erson*$
II1 +R;C7,%R7 ;$7R$I7:
In t%is aer met%ods used for feature e)tracti
and /iscrete -a'elet Transform$ 0rtificial neural
A1 Feature 70traction
1 Fast Fourier .ransform (FF.)
0 1ast 1ourier Transform is an algorit%m to comute discrete 1ourier Transform and its in'ers
1ourier Transform con'erts time domain to fre,uenc! domain and 'ice
suc% Transform$ 0s a result 11T are -id
mat%ematics$ 11T gi'es t%e e)act same result as /1T does, t%e onl! difference is t%at 11T does it
T%e se,uence of 2 comle) num.ers
according to t%e /1T formula:
ournal of Advance Foundation and Research in Computer (IJAFRC)
$olume " Issue <" Au#ust20!1
Fi#ure 1 -loc( ,ia#ram
used for feature e)traction are 1ast 1ourier Transform, 5inear Predicti'e C
0rtificial neural net-or& is used for classification$
1ourier Transform is an algorit%m to comute discrete 1ourier Transform and its in'ers
1ourier Transform con'erts time domain to fre,uenc! domain and 'ice 'ersa$ 0n 11T raidl! comutes
suc% Transform$ 0s a result 11T are -idel! used for man! alications in engineering, science and
mat%ematics$ 11T gi'es t%e e)act same result as /1T does, t%e onl! difference is t%at 11T does it
comle) num.ers is transformed into 2 comle) num.ers
according to t%e /1T formula:
ournal of Advance Foundation and Research in Computer (IJAFRC)
$olume " Issue <" Au#ust20!1 I))& 23!< ? !<@3
'''1iAafrc1or#
$oice Reco#nition %sin# Artificial &eural &et'or(
">anshila Rodri#ues
@
Electronics and Telecommunication , Padre Conceicao College of Engineering,Verna, Goa, India
garr!dacosta13#gmail$com
3ans%ilarod24#gmail$com
produced in a person/s lar*n0 and uttered throu#h the mouth" as speech or
son#1 2i(e fin#er prints and the iris of the e*e the voice of each person is uni3ue1 .he parameters
of a person4s voice differ in pitch" tim5er and vocal intensit*1 6ence it can 5e used for
authentication1 In this paper feature e0traction methods used are 2inear +redictive Codin#" Fast
Fourier .ransform and 'avelet transform to e0tract the a5ove mentioned features from a
( classifier operates on the result
of classification1 )ome of the applications are Forensic applications" Artificial intelli#ence" Access
Control" .ransaction Authentication" 2a' 7nforcement and )peech ,ata 8ana#ement1
2+C9 2inear +redictive Codin#" ,:.9 ,iscrete
directl! re,uires 6+22* oerations: t%ere are 2 oututs
terms$ 0n 11T is an! met%od to comute t%e same results in 6+2
7! far t%e most commonl! used 11T is t%e Coole!=Tu&e! algorit%m$ T%is is a
.rea&s do-n a /1T of an! comosite si3e 2 > 2122 into man! smaller /1Ts of
6+2* multilications .! comle) roots of unit!$ In
11T+8* is e,ui'alent to 11T+8, n* -%ere n is t%e si3e of
gleton dimension$ If t%e lengt% of 8 is less t%an n, 8 is added -it% trailing 3eros to lengt%
n, t%e se,uence 8 is truncated$ ?%en 8 is a matri), t%e lengt% of t%e columns
redictive Codin# (2+C)
oding +5PC* is a tool used mostl! in audio signal rocessing and 'oice recognition for
sectral en'eloe of a digital signal of a 'oice samle in comressed
model$ It is one of t%e most o-erful 'oice anal!sis tec%ni,ues, and one
of t%e most useful met%ods for encoding good ,ualit! 'oice at a lo- .it rate and ro'ides e)tremel!
accurate estimates of 'oice arameters$ 5PC anal!3es t%e 'oice signal .! estimating t%e formants,
remo'ing t%eir effects from t%e 'oice signal, and estimating t%e intensit! and fre,uenc! of t%e remaining
ure 21 2inear +redictor -loc( ,ia#ram
0s er t%e 1ig2 ast 'oice samle is used to redict t%e ne- 'oice samle <+n*$ T%e difference .et-een
current 'oice samle s+n* and t%e linearl! redicted 'oice samle <+n* gi'es t%e error e+n*$ @sing t%e
error signal e+n* !ou find t%e :1ilter Coefficients;$
<+n* is
<+n* can .e redicted from a linearl! -eig%ted summation of t%e ast 'oice samles$ 2o- t%e
error signal e+n* gi'es onl! t%e amlitude information$ To e)tract t%e itc% +or fre,uenc!* information of
1ourier Transform +11T* of t%e error signal is ta&en$ T%e reason .eing t%e 5PC
filter coefficients are not dominant in t%e time domain, it is onl! -%en t%e 11T of t%e error signal is ta&en
t%e coefficients are seen in t%eir roer domain +fre,uenc! domain*$ T%ese filter coefficients
fre,uenc! domain are gi'en to t%e artificial neural net-or& for furt%er rocessing$
(,:.)
In discrete -a'elet Transform at eac% decomosition le'el, t%e %alf .and filters roduce signals sanning
onl! %alf t%e fre,uenc! .and$ In accordance -it% 2!,uist;s rule if t%e original signal %as a %ig%est
ournal of Advance Foundation and Research in Computer (IJAFRC)
$olume " Issue <" Au#ust20!1 I))& 23!< ? !<@3
'''1iAafrc1or#
oututs 8&, and eac% outut
terms$ 0n 11T is an! met%od to comute t%e same results in 6+2 log 2* oerations$
algorit%m$ T%is is a di'ide and con,uer
into man! smaller /1Ts of
In A0T507 B > 11T+8 ,
is t%e si3e of 8 in t%e first non-
is added -it% trailing 3eros to lengt% n$ If t%e
is a matri), t%e lengt% of t%e columns
and 'oice recognition for
comressed form, using t%e
model$ It is one of t%e most o-erful 'oice anal!sis tec%ni,ues, and one
of t%e most useful met%ods for encoding good ,ualit! 'oice at a lo- .it rate and ro'ides e)tremel!
'oice signal .! estimating t%e formants,
remo'ing t%eir effects from t%e 'oice signal, and estimating t%e intensit! and fre,uenc! of t%e remaining
<+n*$ T%e difference .et-een
<+n* gi'es t%e error e+n*$ @sing t%e
<+n* can .e redicted from a linearl! -eig%ted summation of t%e ast 'oice samles$ 2o- t%e
error signal e+n* gi'es onl! t%e amlitude information$ To e)tract t%e itc% +or fre,uenc!* information of
%e error signal is ta&en$ T%e reason .eing t%e 5PC
filter coefficients are not dominant in t%e time domain, it is onl! -%en t%e 11T of t%e error signal is ta&en
t%e coefficients are seen in t%eir roer domain +fre,uenc! domain*$ T%ese filter coefficients in t%e
fre,uenc! domain are gi'en to t%e artificial neural net-or& for furt%er rocessing$
In discrete -a'elet Transform at eac% decomosition le'el, t%e %alf .and filters roduce signals sanning
onl! %alf t%e fre,uenc! .and$ In accordance -it% 2!,uist;s rule if t%e original signal %as a %ig%est
International Journal of Advance Foundation and Research in Computer (IJAFRC)
$olume " Issue <" Au#ust20!1 I))& 23!< ? !<@3
Fi#ure 31 +rocess of ,:.
0ccording to 1ig3 /?T is comuted .! successi'e lo- ass and %ig% ass filtering of t%e discrete time-
domain signal$ 0t eac% le'el, t%e %ig% ass filter roduces detailed information -%ile t%e lo- ass filter
associated -it% scaling function roduces coarse aro)imation$
III1 AR.IFICIA2 &7%RA2 &7.:;RB
A1 -ac( propa#ation al#orithm
Fi#ure !1 -ac( propa#ation process
0s er 1ig4 outut for t%e resented inuts is calculated$ T%e error -%ic% is found .! comaring t%e
actual outut -it% t%e desired outut$ T%e errors are assed .ac& t%roug% t%e neural net-or& .!
comuting t%e contri.ution of eac% %idden rocessing unit and deri'ing t%e corresonding ad9ustment
needed to roduce t%e correct outut$ T%e connection -eig%ts are t%en ad9usted and t%e neural net-or&
%as 9ust DlearnedD from an e)erience$
III1 I8+2787&.A.I;&
In t%is aer a data.ase of 'oices is made using PCA Eecorder, -%ic% is an android alication t%at
records 'oice samles at FGGGH3+Aono*$ In t%is aer fi'e 'oice samles of fi'e different eole are
recorded$ T%ese 'oice signals toget%er ma&e u t%e data.ase$ 0ll t%e 'oice signals are used .! A0T507 in
real-time from t%e sa'ed data.ase in t%e comuter$ 7! setting a t%res%old of a articular 'alue t%e noise
is remo'ed$ T%en a feature e)traction is erformed using linear redicti'e coding, 1ast 1ourier
Transform and /iscrete ?a'elet Transform, searatel!$ T%en t%e samles of t%e t%ree 'oice signals of
eac% erson are used to ma&e t%e training data s%eets and t%e samles of t%e remaining t-o 'oice signals
of eac% erson are used to ma&e t%e testing data s%eets$ 1ig5 s%o-s %o- t%e data.ase is di'ided$
$1 A++2ICA.I;&)
1$ Forensic applications-Voice recognition using 022 can .e used in t%e cases of &idnaing
t%e criminal can .e identified
2$ Access Control applications
information to usual a ass-ord*$
3$ .ransaction Authentication applications
aut%entication is ac%ie'ed using 'oice recognition using 022$
4$ +ersonaliCation applications
t%e o-ner of a mo.ile or a car$
$I1 2I8I.A.I;&)
• ?%ile recording t%e data.ase if noise is a.o'e t%e t%res%old, t%en it -ill .e considered as art of
t%e erson;s 'oice signal and -ill cause ro.lems later in t%e s!stem$
• Peole can mimic t%e stored 'oices and in t%is -a! an intruder -ould .e declared as t%e correct
erson$
$II1 C;&C2%)I;&
5PC and 11T are most effecti'e feature e)traction met%ods for 'oice recognition$ 7! t%ese met%ods itc%
and 'ocal $intensit! of a erson;s 'oice are t%e .est suited arameters for identif!ing an indi'idual$
$III1 R7F7R7&C7)
J1K /igital Lignal Processing Pri
ournal of Advance Foundation and Research in Computer (IJAFRC)
$olume " Issue <" Au#ust20!1
-loc( dia#ram indicatin# the ma(in# of the trainin# and testin# data sheets
T%e result of classification .ased on different feature e)traction met%ods are as follo-s:
.a5le 9 Results
70traction 8ethod Accurac*
11T
5PC
/?T d.4
/?T d.I
FFM
I2M
FGM
FGM
Voice recognition using 022 can .e used in t%e cases of &idnaing
t%e criminal can .e identified from a &no-n ool of criminal data.ases$
Access Control applications- 1or control access to comuter net-or&s +adds .iometric
information to usual a ass-ord*$
.ransaction Authentication applications- In Tele%one .an&ing %ig%er le'els of secure
aut%entication is ac%ie'ed using 'oice recognition using 022$
applications- Ao.iles and Cars can used Voice recognition using 022 to identif!
t%e o-ner of a mo.ile or a car$
?%ile recording t%e data.ase if noise is a.o'e t%e t%res%old, t%en it -ill .e considered as art of
d -ill cause ro.lems later in t%e s!stem$
Peole can mimic t%e stored 'oices and in t%is -a! an intruder -ould .e declared as t%e correct
5PC and 11T are most effecti'e feature e)traction met%ods for 'oice recognition$ 7! t%ese met%ods itc%
and 'ocal $intensit! of a erson;s 'oice are t%e .est suited arameters for identif!ing an indi'idual$
/igital Lignal Processing Princiles, 0lgorit%ms, and 0llications .! No%n G$Proa&is
ournal of Advance Foundation and Research in Computer (IJAFRC)
$olume " Issue <" Au#ust20!1 I))& 23!< ? !<@3
'''1iAafrc1or#
-loc( dia#ram indicatin# the ma(in# of the trainin# and testin# data sheets
T%e result of classification .ased on different feature e)traction met%ods are as follo-s:
Voice recognition using 022 can .e used in t%e cases of &idnaing$ ?%ere
control access to comuter net-or&s +adds .iometric
In Tele%one .an&ing %ig%er le'els of secure
Ao.iles and Cars can used Voice recognition using 022 to identif!
?%ile recording t%e data.ase if noise is a.o'e t%e t%res%old, t%en it -ill .e considered as art of
Peole can mimic t%e stored 'oices and in t%is -a! an intruder -ould .e declared as t%e correct
5PC and 11T are most effecti'e feature e)traction met%ods for 'oice recognition$ 7! t%ese met%ods itc%
and 'ocal $intensit! of a erson;s 'oice are t%e .est suited arameters for identif!ing an indi'idual$
nciles, 0lgorit%ms, and 0llications .! No%n G$Proa&is
International Journal of Advance Foundation and Research in Computer (IJAFRC)
$olume " Issue <" Au#ust20!1 I))& 23!< ? !<@3