of x

Online Handwritten Script Recognition

Published on 2 weeks ago | Categories: Documents | Downloads: 0 | Comments: 0

Comments

Content

 

ONLINE HANDWRITTEN SCRIPT RECOGNITION

Presentation By: Priya Ahuja CSE 6C 10-CSU-110

 

CONTENTS Online Recognition Why Handwriting Recognition? Why is Handwriting Recognition difficult? difficult? Properties of Scripts Features of Handwritten Script Steps in Handwritten Script Recognition Future Scope References

 

Online Reconition On-line handwriting recognition involves the automatic conversion of text as it is written on a special digitie digitierr or P!"# where a sensor pic$s up the pen-tip movements as well as pen-up%pen-down switching& 'hat $ind of data is $nown as digital in$ and can (e regarded as a dynamic representation of handwriting& 'he o(tained signal converted into letter codes which are usa(le within computer and is text-processing applications& 'he elements of an on-line handwriting recognition interface typically include) *+ orsensitive s tylus forsurface# stylus the userwhich to write with& ,+ a a pen touch may (e integrated with# or adacent to# an output display& .+ a software application which interprets the movements of the stylus across the writing surface# translating the resulting stro$es into digital text&

 

Devices that accept on-line handwitten data! "o# the top le$t% Poc&et PC% CossPad% In& Lin&% Cell Phone% S#at 'oad% Ta(l Ta(let et with displa)% Anoto pen% Waco# Ta(let% Ta(let PC

 

!hy "an#$ritin Reconition% Online documents may m ay (e written in different languages and scripts& " single document page in itself may contain text written in multiple scripts&  " script script is defined as a graphic form of a writing system& !ifferent scripts may follow the same writing system& For example# the alpha(etic system is adopted (y scripts li$e Roman and /ree$ # and the phonetic-alpha(etic system is adopted (y most m ost 0ndian scripts # including !evnagari& " specific script li$e Roman may (e used (y multiple languages such as 1nglish# /erman# and French& 'he general class of Han-(ased scripts include 2hinese# 3apanese# and 4orean 5we do not consider 4ana or Han-/ul+& !evnagari script is used (y many 0ndian languages# including Hindi# Sans$rit# 6arathi# and Raasthani&  "ra(ic script is used (y "ra(ic# "ra(ic# Farsi# 7rdu# etc& et c& Roman script is used (y many 1uropean languages li$e 1nglish# /erman# French# and 0talian&

 



The most important characteristic of online documents is that they capture the temporal sequence of strokes while writing the document&



We use stroke properties as well as the spatial and temporal information of a collection of strokes to identify the script used in the document.

 

!hy is "an#$ritin Reconition &i''icult% High varia(ility of individual characters Writing style

Stro$e width and 8uality Sie of the writing 9ariation even for single writer: Relia(le segmentation of cursive script extremely pro(lematic due to ;6erging< of adacent characters

 

Pro(erties o' scri(ts  :   "ra(ic Ara)ic :  "ra(ic is written from from right to left within a line line and the lines are written from top to (ottom& " typical "ra(ic "ra(ic character contains a relatively long main stro$e which is drawn from right to left# along with one to three dots& 'he character set contains three long vowels& Short mar$ings5diacritics+ may (e added to the main character to indicate short vowels& !ue to these diacritical mar$s and the dots in the script# the length of the stro$es vary considera(ly& considera(ly& Cyrillic: 2yrillic script loo$s very similar to the cursive Roman script& 'he most distinctive features of 2yrillic script# compared to Roman script are)   *+ individual characters# connected together in a word# form one long stro$e# ,+ the a(sence of of delayed stro$es &!elayed stro$es cause movement of the pen in the direction opposite to the regular r egular writing direction&

'he word ;trait< contains three delayed stro$es# shown as (old dotted lines here&

 

&e*naari ) 'he most important characteristic of !evnagari script is the horiontal line present at the top of each word# called ;Shirore$ha< &'hese lines are usually drawn after the word is written and hence are similar to delayed stro$es in Roman script& 'he words are written from left to right in a line&  'he word ;devnagari< written in !evnagari script& 'he Shirore$ha is shown in (old&

"an) 2haracters of Han script are composed of multiple short stro$es& 'he stro$es are usually drawn from top to (ottom and left to right within a character& 'he direction of writing of words in a line is either left to right or top to (ottom&

"e)re$: Words in a line of He(rew script are written from right to left and# hence# the script is temporally similar to "ra(ic& 'he most distinguishing factor of He(rew from "ra(ic is that the stro$es are more uniform in length in the former&

Ro+an) Roman script has the same writing direction as 2yrillic# !evnagari# and Han scripts& 0n addition# the length of the stro$es tends to fall (etween that of !evnagari and 2yrillic scripts&

 

,eatures o' "an#$ritten scri(t "oriontal .nterstro/e &irection ".&:'his ".&:'his is the sum of the horiontal directions (etween the starting points of consecutive stro$es in the pattern& 'he feature essentially captures the writing direction within a line&

where =start5&+ denotes the x coordinate of the pen-down position of the stro$e#  n is the num(er of stro$es in the pattern# and r is set to . to reduce errors due to a(rupt changes in direction (etween successive stro$es& 'he value of H0! falls f alls in the range >r  n # n  [email protected]&

 

A*erae Stro/e 2enth AS2: 1ach stro$e is resampled during preprocessing so that the sample points are e8uidistant& Hence# the num(er of sample points in a stro$e is used as a measure of its length& 'he "verage "verage Stro$e Aength is defined as the average length of the individual stro$es in the pattern&

where n is the num(er of stro$es in the pattern& 'he value of "SA is a real num(er which falls in the range >*&B# R B @#where the value of RB depends on the resampling distance used during preprocessing& Shirore/ha Strenth) Strenth) 'his feature measures the strength of the horiontal line component in the pattern using the Hough transform& 'he value of this feature is computed as)

 

Where H5r#denotes 5r#th th (in in the twoH5r#denotes the num(er of votes in the 5r# dimensional Hough transform space& 'he Hough transform can (e

 

computed forisdynamic (y(ins considering only the points& 'heefficiently numerator the sumdata of the corresponding to sample line orientations (etween -*Bo and *Bo and the denominator is the sum of all the (ins in the Hough transform space& 'he value of Shirore$ha Strength is a real num(er which falls in the range >B&B# *&[email protected]&

Shirore/ha Con'i#ence) We compute a confidence measure for a stro$e (eing a Shirore$ha &1ach stro$e in the pattern is inspected for three different properties of a Shirore$haC Shirore$has span the width of a word# always occur at the top of the word# and are horiontal& Hence# the confidence 52+ of a stro$e 5s+ is computed as)

 

Stro/e &ensity) &ensity) 'his is the num(er of stro$es per unit length 5x-axis+ of the pattern& Dote that the Han script is written using short stro$es# while Roman and 2yrillic are written using longer stro$es& where n is the num(er of stro$es in the pattern& 'he value of Stro$e !ensity is a real num(er and can vary within the range 5B&B#R *+# where R*  is a positive real num(er& As(ect Ratio) Ratio) 'his is the ratio of the width to the height of a pattern& 'he value of "spect "spect Ratio is a real num(er and can vary within the range 5B&B# R,+# where R, is a positive real num(er& Re*erse &istance) &istance) 'his is the distance (y which the pen moves in the direction opposite to the normal writing direction& 'he normal writing direction is different for different scripts& 'he value of Reverse !istance is a nonnegative integer and its o(served values were in the range >B#*,[email protected]&

 

A*erae "oriontal Stro/e &irection) Horiontal Stro$e !irection 5H!+ of a stro$e# s# can (e understood as the horiontal direction from the start of the t he stro$e to its end& Formally# we define H!5s+ as) where =pen-down5&+and =pen-up5&+ are the x-coordinates of the pen-down and pen-up positions# respectively& For an n-stro$e pattern# the Horiontal Stro$e computed as the l average of the H! values of"verage its component stro$es& 'he!irection value of is "verage Horionta Horiontal Stro$e !irection falls in the range >-*&B#*&[email protected]&

A*erae 3ertical Stro/e &irection) &irection ) 0t is defined similar to the "verage "verage Horiontal Stro$e !irection& 'he 9ertical 9ertical !irection 59!+ of a single stro$e s is defined as)

where E pen-down 5&+and E pen-up5&+ are the y-coordinates of the pen-down and pen-up positions# respectively &For an n-stro$e pattern# the "verage 9ertical 9ertical Stro$e !irection is computed as the average of the 9! values of its component stro$es&

'he value of "verage 9ertical 9ertical Stro$e !irection falls in the range >-*&B#*&[email protected]&  

3ertical .nterstro/e &irection 3.&: 'he 9ertical 9ertical 0nterstro$e !irection is defined as)

   E5s+ is the average of the y-coordinates of the stro$e points and n is the num(er of stro$es in the pattern& 'he value of 90! is an integer and falls in the range 5* -n# n -*+& 3ariance o' Stro/e 2enth) 2enth) 'his is the variance in sample lengths of individual stro$es within a pattern& 'he value is of 9ariance 9ariance of Stro$e Aength is a nonnegative integer& integer&

 

STEPS .N "AN&!R.TTEN SCR.PT RECO4N.T.ON 15 Pre(rocessin Pre(rocessin)) /oal is to remove unwanted variation& 2ommon 6ethods) S$ew % Slant % Sie normaliation)  

'raectory data mapped to ,! representation Gaselines % core area estimated similar to offline case

Special Online 6ethods) Outlier 1limination) Remove position measurements caused (y interferences  Resampling and smoothing of the traectory  1limination of delayed stro$es&

 

Resampling and smoothing of the traectory -) /oal) Dormalie variations in writing speed 5no identification:+ 18uidistant resampling  interpolation&

1limination of delayed stro$es -) Handling of delayed stro$es pro(lematic# additional time varia(ility: Remove (y heuristic rules

 

Feature 1xtraction

Gasic 0dea) !escri(e shape of pen traectory locally 'ypical Features) Slope angle of local traectory5represented as sin and cos ) continuous variation+ Ginary pen-up vs& pen-down feature Hat feature for descri(ing delayed stro$es5stro$es that spatially correspond to removed delayed stro$es are mar$ed+ Feature !ynamics) 0n all applications of H66s dynamic features greatly enhance performance& !iscrete time derivative of features Here) !ifferences (etween successive slope angles

 

2A"SS0F02"'0OD 'he last (ig step is classificati classification& on& 0n this step various models to map the extracted featuresare to used different classes and thus identifying the characters or words the features represent&

 

An#rei An#reye*ich ar/o* Born: 14 June 18! in "ya#an$ "ussia %ied: &' July 1(&& in )etrograd *now +t )eters,urg-$ "ussia arko/ is particularly remem,ered for his study of arko/ chains$ sequences of random /aria,les in which the future /aria,le is determined ,y the present /aria,le ,ut is independent of the way in which the present state arose from its  predecessors. This work launched the theory of stochastic processes&

 

6ar$ov random processes  " random se8uence has the the 6ar$ov property if its distri(ution is determined solely (y its current state& "ny "ny random process having this property is called a Markov random process& process& For o(serva(le state se8uences 5state is $nown from data+# this leads to a Markov chain model& For non-o(serva(le states# this leads to a Hidden Markov Model 5H66+&

 

2hain Rule  6ar$ov Property 'a)es *le

 P * qt $ qt  1 $... q1 - =  P *qt  0 qt   1 $...q1 - P *qt  1 $...q1 −

 





 P * qt  $ qt  1 $... q1 - =  P * qt  0 qt  1 $...q1 - P   * qt  1 0 qt  & $...q1 - P *qt  & $...q1 −



 









   P *qi 0 qi 1 $...q1  P * qt  $ qt  1 $... q1 - =  P *q1  -∏ −



i=&

6ar$ov property

 P * qi 0 qi 1 $...q1 - =  P * q  i  0 qi 1 -  for   f or  i > 1 −





 P * qt  $ qt  1 $...q1 - =  P * q1 - ∏ P *qi 0 q  i 1 - =  P *q1 - P    * q& 0 q1 -... P * qt  0 qt  1 −



 



i =&  

" 6ar$ov System Has N  states#  states# called s1, s2  .. sN 

s,

s* N=3 t=0 

s.

'here are discrete timesteps# t=0, t=1, …  … 

 

 " 6ar$ov System

Has N  states#  states# called s1, s2  .. sN 

s,

'here are discrete timesteps# t=0, t=1, …  …  On the tIth timestep the system is in exactly one of the availa(le states& 2all it qt  Dote) qt  ∈Js1, s2  .. sN  K

2urrent State

s* N=3 t=0  qt =q0 =s3

s.

 

 " 6ar$ov System

Has N  states#  states# called s1, s2  .. sN 

2urrent State

s,

'here are discrete timesteps# t=0, t=1, …  …  On the tIth timestep the system is in exactly one of the availa(le states& 2all it qt  Dote) qt  ∈Js1, s2  .. sN  K

s* N=3 t=1 qt =q1=s2 

Getween each timestep# the next state is chosen randomly&

s.

 

P58tL*Ms*N8tMs,+ M *%, P58tL*Ms,N8tMs,+ M *%, P58tL*Ms.N8tMs,+ M B P58tL*Ms*N8tMs*+ M B P58tL*Ms,N8tMs*+ M B

s,

P58tL*Ms.N8tMs*+ M *

s*

qt =q1=s2 

'here are discrete timesteps# t=0, t=1, …  …  On the tIth timestep the system is in exactly one of the availa(le states& 2all it qt  Dote) qt  ∈Js1, s2  .. sN  K Getween each timestep# the next state is chosen randomly&

s.

N=3 t=1

Has N  states#  states# called s1, s2  .. sN 

P58tL*Ms*N8tMs.+ M *%. P58tL*Ms,N8tMs.+ M ,%. P58tL*Ms.N8tMs.+ M B

'he current state determines the pro(a(ility distri(ution for the next state&

 

P58tL*Ms*N8tMs,+ M *%, P58tL*Ms,N8tMs,+ M *%, Has N  states#  states# called s1, s2  .. sN 

P58tL*Ms.N8tMs,+ M B P58tL*Ms*N8tMs*+ M B

s,

P58tL*Ms,N8tMs*+ M B

'here are discrete timesteps# t=0, t=1, …  …  *%,

P58tL*Ms.N8tMs*+ M *

Dote) qt  ∈Js1, s2  .. sN  K

*%,

s* N=3 t=1 qt =q1=s2 

On the tIth timestep the system is in exactly one of the availa(le states& 2all it qt 

,%. *%.

s.

*

P58tL*Ms*N8tMs.+ M *%. P58tL*Ms,N8tMs.+ M ,%. P58tL*Ms.N8tMs.+ M B Often notated with arcs (etween states

Getween each timestep# the next state is chosen randomly& 'he current state determines the pro(a(ility distri(ution for the next state&

 

P58tL*Ms*N8tMs,+ M *%, P58tL*Ms,N8tMs,+ M *%, P58tL*Ms.N8tMs,+ M B

6ar$ov 8 Property is conditionally independent of J 8 s,

P58tL*Ms*N8tMs*+ M B

tL*

*%,  8 # 8  K given 8 & * B t

P58tL*Ms,N8tMs*+ M B P58tL*Ms.N8tMs*+ M *

0n other words)

*%,

s* N=3 t=1 qt =q1=s2 

# 8t-,#

t-*

,%. *%.

s.

P58tL* M s  N8t M si + M P58tL* M s  N8t M si #any earlier history+ Dotation)

*

P58tL*Ms*N8tMs.+ M *%. P58tL*Ms,N8tMs.+ M ,%.

aij

=

 P *qt    1  =  si 0  q = s j +

P58tL*Ms.N8tMs.+ M B π  i

=

 P *  q  1



si -

 

E7a+(le: A Si+(le ar/o* o#el ,or Pre#iction  "ny given day# day # the !eather weather can (e descri(ed as (eing (eing in one of three states)   

State *) precipitation 5rain# snow# hail# etc&+ State ,) cloudy State .) sunny

'ransitions (etween states are descri(ed (y the 'ransitions transition matrix

'his model can then (e descri(ed (y the following directed graph

 

Basic Calculations-1 1xample) What is the pro(a(ility that the weather for eight consecutive days is sun-sun-sun-rain-rain-sun-cloudy-sun2? Solution) O M sun sun sun rain rain sun cloudy sun   . . . * * . , .

 

From Markov To Hidden Markov  The (re*ious +o#el assu+es that each state can )e uni8uely associate# $ith an o)ser*a)le e*ent 



Once an o(servation is made# the state of the system is then trivially retrieved 'his model# however# is too restrictive to (e of practical use for most realistic pro(lems

To +a/e the +o#el +ore 'le7i)le9 $e $ill assu+e that the outco+es or o)ser*ations o' the +o#el are a (ro)a)ilistic 'unction o' each state 



1ach state can produce a num(er of outputs according to a uni8ue pro(a(ility distri(ution# and each distinct output can potentially (e generated at any state 'hese are $nown a "i##en ar/o* o#els "# " # (ecause the state se8uence is not directly o(serva(le# it can only (e approximated from the se8uence of o(servations produced (y the system

 

H66 Formal !efinition  "n H66# # is a Q-tuple consisting of  D the num(er of states 6 the num(er of possi(le o(servations Jπ*# π,# && πDK 'he starting state pro(a(ilities P58B M Si+ M πi

   

 a**

a*, 3 a*D 

a,*

a,, 3 a,D

'he state transition pro(a(ilities

) aD*

) ) aD, 3 aDD

  P58tL*MS  N 8tMSi+Mai

 (*5*+

(*5,+

3 (*56+

'he o(servation pro(a(ilities

(,5*+

(,5,+

3 (,56+

  P5OtM$ N 8tMSi+M(i5$+

)

)

)

)

)

)

 

The coin-toss problem To illustrate the conce(t o' an " consi#er the 'ollo$in scenario  "ssume that you are placed in a room with with a curtain  Gehind the curtain there is a person performing a coin-toss experiment  'his person selects one of several coins# and tosses it) heads 5H+ or tails 



5'+ 'he person tells you the outcome 5H#'+# (ut not which coin was used each time

 our oal is to )uil# a (ro)a)ilistic +o#el  o +o#el that )est e7(lains a se8uence o' o)ser*ations O;<o19o=9o>9o?9 O;<o19o=9o>[email protected];<"9T9T9"[email protected]

 

 

'he coins represent the statesC these are hidden (ecause you do not $now which coin was tossed each time 'he outcome of each toss represents an o(servation  " li$ely2 se8uence of coins may (e inferred from the o(servations# (ut this state se8uence will not (e uni8ue

 

 'he 2oin 'oss 1xample    * coin • "s •0n

a result# the 6ar$ov model is o(serva(le o(serva(le since there is only one state

fact# we may descri(e the system with a deterministic model where the states are the actual o(servations 5see figure+ •the model parameter P5H+ may (e found from the ratio of heads and tails •OM H H H ' ' H •S M * * * , , *

 

'he 2oin 'oss 1xample  , coins

 

From 6ar$ov to Hidden 6ar$ov 6odel) 'he 2oin 'oss 1xample  . coins

 

*# , or . coins? !hich o' these +o#els is )est% 

Since thecan states not o(serva(le# the (est we do isare select the model m odel that (est explains the data 5e&g&# 6aximum Ai$elihood criterion+ 

Whether the o(servation se8uence is long and rich enough to warrant a more mor e complex model is a different story# though

 

Future Scope Over the past three decades# many different methods have (een explored (y a large num(er of scientists to recognie characters& " variety of approaches have (een proposed and tested (y researchers in different parts of the world to improve the experience of usa(ility&

 

References ieee xplore.ieee.orgiel!3"2#1#2 01261096  01261096 . pdf   pdf  F& 2oulmas 2oulmas## $riting %&stems' (n )ntrod*ction to +heir  ing*istic (nal&sis) (nal&sis) 2am(ridge 7niversity Press# ,BB.&

R& Plamondon and S& D& Srihari# On-line and off-line handwriting recognition) " comprehensive survey# )---  +ransactions on attern  (nal&sis and Machine )ntelligence, )ntelligence,vol& vol& ,,# pp& .-TU# ,BBB&  "& A& Spit# !etermination of the script and language content content of document images# )--- +ransactions +ransactions on attern attern   (nal&sis (nal&sis and Machine )ntelligence, vol& *V# pp& ,.Q-,UQ#*VV ,.Q-,UQ#*VV  /& =& 'an# 2& 9iard-/audin# and "& 4ot# "utomatic Writer 0dentification Framewor$ for Online Handwritten!ocuments 7sing 2haracter Prototypes# attern /ecogn.,,BBV& /ecogn.,,BBV&  

 

'H"D4 EO7 :

Sponsor Documents

Hide

Forgot your password?

Or register your new account on INBA.INFO

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close