ACCEPTEDCONFERENCEONCOMPUTERVISIONANDPATTERNRECOGNITION2001 [619320]

Licență

ACCEPTEDCONFERENCEONCOMPUTERVISIONANDPATTERNRECOGNITION2001 [619320]

Byadmin ianuarie 1, 2024

ACCEPTEDCONFERENCEONCOMPUTERVISIONANDPATTERNRECOGNITION2001
RapidObjectDetectionusingaBoostedCascadeofSimple
Features
PaulViola MichaelJones
viola@me rl.com mjones@cr l.dec.com
MitsubishiElectricResearch Labs CompaqCRL
201Broadway,8thFL OneCambridge Center
Cambridge, MA02139 Cambridge, MA02142
Abstract
This paperdescribes amachinelearning approachforvi-
sual object detection whic hiscapa bleofprocessing images
extremely rapidly andachievinghigh detectio nrates. This
work isdistingu ished bythreekeycontrib utions. Theﬁrst
istheintroductio nofanewimagerepresentation called the
“Inte gralImage”whic hallows thefeatur esused byourde-
tector tobecomp uted very quickly.Thesecond isalearning
algorithm, based onAdaB oost, whic hselects asmall num-
berofcritical visual featur esfromalargersetandyields
extremely efﬁcien tclassiﬁer s[6]. Thethirdcontrib ution is
amethod forcomb ining increasingly morecomple xclassi-
ﬁersina“cascad e”whic hallows backgroundregionsofthe
imagetobequicklydiscar dedwhile spending morecompu-
tation onpromising object-lik eregions.Thecascade canbe
viewed asanobject speciﬁc focus-o f-attentio nmechanism
whic hunlik epreviousapproachesprovides statistical guar-
anteesthatdiscar dedregions areunlik elytocontain theob-
jectofinterest.Inthedomain offace detection thesystem
yields detection rates compar abletothebest previoussys-
tems. Used inreal-time applications, thedetecto rruns at
15frames persecond without resorting toimagedifferenc-
ingorskincolor detection.
1.Introduction
This paper bring stogeth ernewalgor ithms andinsights to
construct aframeworkforrobustandextrem elyrapid object
detectio n.This frameworkisdemon strated on,andinpart
motivated by,thetask offacedetection .Towardthisend
wehaveconstru cted afrontalfacedetection system which
achie vesdetection andfalsepositi verates which areequiv-
alent tothebest published results [16,12,15,11,1].This
facedetectio nsystem ismost clearly distinguished from
previous approa ches initsability todetect faces extremely
rapid ly.Operating on384by288pixelimages, faces arede-tected at15framespersecond onaconvention al700MHz
Intel Pentium III.Inother facedetection systems, auxiliary
information, such asimage differences invideo sequence s,
orpixelcolor incolor images, havebeen used toachie ve
high frame rates. Our system achie veshigh framerates
workin gonly with theinform ation presen tinasingle grey
scale image. These alternati vesources ofinfor mation can
also beintegrated with oursystem toachie veevenhigher
framerates.
There arethree main contrib utions ofourobject detec-
tion frame work. Wewill introdu ceeach ofthese ideas
brieﬂy belowandthen describ ethem indetail insubsequ ent
sections.
Theﬁrstcontrib utionofthispaper isanewimage repre-
sentation called anintegralimagethatallowsforveryfast
featur eevaluation. Moti vated inpartbytheworkofPapa-
georgiou etal.ourdetection system does notworkdirectly
with image intensities [10].Likethese autho rsweusea
setoffeatur eswhich areremin iscent ofHaar Basis func-
tions (thou ghwewillalsouserelated ﬁlters which aremore
comp lexthan Haar ﬁlters). Inordertocompu tethese fea-
tures veryrapidly atmanyscales weintrod ucetheintegral
image representation forimages. Theintegralimage canbe
comp uted from animage using afewoperations perpixel.
Once compu ted,anyoneofthese Harr-likefeatur escanbe
comp uted atanyscale orlocation inconstant time.
The second contrib ution ofthispaperisamethod for
constru cting aclassiﬁer byselecting asmall numbe rofim-
portantfeatur esusing AdaBoost [6].Within anyimage sub-
windo wthetotal numb erofHarr-likefeatur esisverylarge,
farlargerthan thenumbe rofpixels.Inordertoensure fast
classiﬁcation, thelearnin gprocess must excludealargema-
jority oftheavailable featur es,andfocusonasmall setof
critical features. Moti vated bytheworkofTieuandViola,
featur eselection isachie vedthroughasimple modiﬁcation
oftheAdaBoost procedure: theweak learner isconstrain ed
sothateach weak classiﬁer return edcandependononly a
1

single feature [2].Asaresult each stage oftheboosting
process, which selects anewweak classiﬁer ,canbeviewed
asafeature selection proce ss.AdaBoost provides aneffec-
tivelearnin galgorithm andstrong boun dsongener alization
performan ce[13,9,10].
The third major contrib ution ofthispaper isamethod
forcomb ining successi velymore comp lexclassiﬁers ina
cascade structure which dramatically increases thespeed of
thedetector byfocusing attention onprom ising regionsof
theimage. Thenotio nbehind focus ofattention approaches
isthatitisoften possible torapidly determ inewhere inan
image anobjectmigh toccur [17,8,1].More comp lexpro-
cessing isreserv edonly forthese promising regions. The
keymeasur eofsuch anappro achisthe“falsenegative”rate
oftheattention alprocess. Itmust bethecase thatall,or
almost all,object instances areselected bytheattention al
ﬁlter.
Wewilldescribe aprocess fortraining anextremely sim-
pleandefﬁcient classiﬁer which canbeused asa“super –
vised” focusofattention operato r.The term supervised
referstothefactthattheattentiona loperato ristrained to
detect examples ofaparticular class. Inthedoma inofface
detectio nitispossible toachie vefewerthan 1%falseneg-
ativesand40% falsepositi vesusing aclassiﬁer constructed
fromtwoHarr-likefeatur es.The effectofthisﬁlter isto
reducebyoveronehalfthenumberoflocation swhere the
ﬁnal detector must beevaluated .
Those sub-win dowswhich arenotrejected bytheinitial
classiﬁer areprocessed byasequen ceofclassiﬁers, each
slightly more comp lexthan thelast. Ifanyclassiﬁer rejects
thesub-wind ow,nofurther processing isperfor med. The
structu reofthecascaded detection process isessentially
thatofadegenerate decision tree, andassuch isrelated to
theworkofGeman andcolleague s[1,4].
Anextremely fastfacedetector will havebroadprac-
tical application s.These includ euser interf aces, image
datab ases, and telecon ferencin g.Inapplication swhere
rapid frame-rates arenotnecessary ,oursystem will allow
forsigniﬁcant addition alpost-pro cessing andanalysis. In
addition oursystem canbeimplemen tedonawide range of
small lowpowerdevices, includ inghand- helds andembed-
dedprocessors. Inourlabwehaveimplemen tedthisface
detecto rontheCompa qiPaqhandh eldandhaveachie ved
detectio nattwoframes persecond (this device hasalow
power200mips StrongArm processor which lacks ﬂoating
pointhardware).
The remain derofthepaper describes ourcontr ibutions
andanumbe rofexperim ental results, includ ingadetailed
descrip tion ofourexperimental method ology .Discussion
ofclosely related worktakesplace attheendofeach sec-
tion.
2.Featur es
Ourobject detection procedure classiﬁes images based on
thevalueofsimple featur es.There aremanymotivations

A B
C D
Figure 1:Examp lerectang lefeature sshownrelati vetothe
enclosin gdetection windo w.Thesum ofthepixelswhich
liewithin thewhite rectang lesaresubtracted from thesum
ofpixelsinthegreyrectan gles. Two-rectan glefeatu resare
shownin(A)and(B). Figure (C)showsathree- rectangle
featur e,and(D)afour-rectangle feature.
forusing featur esrather than thepixelsdirectly .Themost
comm onreason isthatfeatur escanacttoencod ead-hoc
domainknowledge that isdifﬁculttolearn using aﬁnite
quan tityoftraining data. Forthissystem there isalso a
second critical motivation forfeatures: thefeature based
system operate smuchfaster than apixel-based system.
The simple features used areremin iscent ofHaar basis
functions which havebeen used byPapageo rgiou etal.[10].
More speciﬁcally ,weusethree kindsoffeatures. Thevalue
ofatwo-r ectangle featur eisthedifference between thesum
ofthepixelswithin tworectangu larregions.The regions
havethesame size andshape andarehoriz ontally orver-
tically adjacen t(see Figure 1).Athree-rectan glefeatur e
comp utes thesum within twooutside rectangles subtracted
from thesum inacenter rectangle. Finally afour-rectan gle
featur ecomp utes thedifference between diago nalpairs of
rectang les.
Giventhatthebase resolution ofthedetecto ris24x2 4,
theexhau stivesetofrectang lefeatures isquite large,over
180,000 .Note thatunlik etheHaar basis, thesetofrectan-
glefeature sisovercomp lete1.
2.1.Integral Image
Rectangle features canbecomp uted veryrapid lyusing an
interme diate repre sentation fortheimage which wecallthe
integralimage.2Theintegralimage atlocation

contain s
thesum ofthepixelsaboveandtotheleftof

,inclusi ve: ! "#$%"
'&(&$
1Acomple tebasis hasnolineardepen dence between basis elements
andhasthesame number ofelements astheimage space ,inthiscase 576.
Thefullsetof180,000 thousa ndfeatur esismanytimes over-complete.
2There isaclose relationto“summed area tables”asused ingraphics
[3].Wechoose adifferentname here inorder toemphasi zeitsuseforthe
analy sisofimage s,rather than fortexturemapping .
2

A
CB
D1
42
3
Figur e2:Thesum ofthepixelswithin rectan gle canbe
computed with four array referen ces.Thevalue oftheinte-
gralimage atlocation 1isthesum ofthepixelsinrectang le.Thevalue atlocation 2is
,atlocation 3is
,
andatlocation 4is
.Thesum withincan
becomp uted as

.
wher e
istheintegralimage and
istheorigi-
nalimage. Using thefollowing pairofrecur rences:

(1) (

((2)
(whe re

(isthecumu lativerowsum,

,
and

)theintegralimage canbecomp uted in
onepass overtheorigin alimage.
Using theintegral image anyrectang ular sum canbe
computed infour array references (see Figure 2).Clearly
thedifference between tworectang ular sums canbecom-
putedineight referen ces. Since thetwo-rectan glefeatures
deﬁne daboveinvolveadjacen trectang ular sums theycan
becompu tedinsixarray referen ces, eight inthecase of
thethree-re ctangle features, andnine forfour-rectangle fea-
tures.
2.2.Featur eDiscussion
Rectangle features aresome whatprimiti vewhen comp ared
with alternati vessuch assteerable ﬁlters [5,7].Steerable ﬁl-
ters, andtheir relati ves,areexcellent forthedetailed analy-
sisofboun daries, image comp ression, andtextureanalysis.
Incontra strectangle featur es,while sensiti vetothepres-
ence ofedges, bars, andother simple image structur e,are
quite coarse. Unlik esteerable ﬁlters theonly orientatio ns
available arevertical, horizo ntal, anddiagon al.Thesetof
rectan glefeatures dohoweverprovidearich image repre-
sentation which suppo rtseffectivelearnin g.Inconjun ction
with theintegralimage ,theefﬁcienc yoftherectan glefea-
turesetprovides ample compe nsation fortheir limited ﬂex-
ibility .
3.Lear ning Classiﬁcation Functions
Givenafeatur esetandatraining setofpositi veandneg-
ativeimages, anynumb erofmach inelearning approachescould beused tolearn aclassiﬁcation functio n.Inoursys-
temavariant ofAdaBoost isused both toselect asmall set
offeatures andtrain theclassiﬁer [6].Initsoriginalform,
theAdaBoo stlearning algorith misused toboost theclas-
siﬁcation perfo rmance ofasimple (sometimes called weak)
learnin galgor ithm. There areanumberofformalguaran-
teesprovided bytheAdaBoo stlearnin gprocedure .Freund
andSchapir eprovedthat thetraining errorofthestrong
classiﬁer appro aches zero exponentially inthenumberof
rounds. More importa ntly anumberofresults were later
provedabou tgeneralization perfo rmance [14].The key
insight isthatgeneraliza tion perfo rmance isrelated tothe
marginoftheexamples, andthatAdaBoo stachie veslarge
marginsrapid ly.
Recall thatthere areover180,000 rectang lefeatur esas-
sociated with each image sub-wind ow,anumberfarlarger
than thenumberofpixels. Eventhoug heach feature can
becomp uted veryefﬁciently ,comp uting thecomple tesetis
prohibiti velyexpensive.Ourhypothesis, which isborneout
byexperiment, isthataverysmall numb erofthese featur es
canbecomb ined toform aneffectiveclassiﬁer .Themain
challeng eistoﬁndthese features.
Insuppor tofthisgoal, theweak learning algorithm is
designe dtoselect thesingle rectang lefeature which best
separates thepositi veandnegativeexamp les(this issimilar
totheappro achof[2]inthedomain ofimage database re-
trieval).Foreach feature, theweak learner determin esthe
optima lthreshold classiﬁcation function, such thatthemin-
imum numberofexamp lesaremisclassiﬁed. Aweak clas-
siﬁer
%thus consists ofafeatu re,athresholdand
aparity !indicating thedirection oftheinequality sign:

% #"
if

%%$

other wise
Here
isa24x2 4pixelsub-w indowofanimage. SeeTa-
ble1forasummary oftheboosting proce ss.
Inpractice nosingle feature canperfo rmtheclassiﬁca-
tiontaskwith lowerror.Features which areselected inearly
roundsoftheboosting process haderror rates between 0.1
and0.3. Features selected inlater rounds, asthetask be-
comes moredifﬁcult, yield error rates between 0.4and0.5.
3.1.Lear ning Discussion
Manygener alfeatur eselection procedure shavebeen pro-
posed (see chapter 8of[18]forareview).Ourﬁnal appli-
cation demand edaveryaggressi veappro achwhich would
discard thevastmajority offeatures. Forasimilar recog ni-
tionproblemPapageorgiouetal.propo sedascheme forfea-
tureselection based onfeature varianc e[10].Theydemo n-
strated goodresults selecting 37feature soutofatotal 1734
featur es.
Roth etal.proposeafeatur eselection process based
ontheWinnowexpon ential perceptron learning rule [11].
TheWinnowlearning process convergestoasolution where
manyofthese weights arezero. Neverthe lessaverylarge
3

Givenexample images

where
fornegativeandpositi veexamples respec-
tively.Initialize weights

!
"$#

"$%for
!&
respec-
tively,where'and(arethenumber ofnegativesand
positi vesrespecti vely.For )
*+-,:
1.Normalize theweights,/.

10
.

2
34

.
3
sothat 5.isaprobab ilitydistrib ution.
2.Foreach feature, 6,train aclassiﬁer 7
3which
isrestricted tousing asingle feature. The
error isevaluated with respect to8.,9
3
2
:7
3

<;=
:.
3.Choose theclassiﬁer ,7
.,with thelowest error9
..
4.Update theweights:
.>

.

@?
AB$C.
where D
EFifexample
isclassiﬁed cor-
rectly ,D
&otherwise, and
?.
GIH AG
H.Theﬁnal strong classiﬁer is:7
JLK
M2ON.
4

P.-7Q.
SR
"
2TN.
4
P.otherwise
where
P.
OUWV+X
YH
Table 1:The AdaBoo stalgor ithm forclassiﬁer learn-
ing. Each round ofboosting selects onefeature from the
180,000 potential features.
numberoffeature sareretained (perhaps afewhundred or
thousand).
3.2.Lear ning Results
While details onthetraining andperforman ceoftheﬁnal
system arepresente dinSection 5,several simple results
merit discussion. Initial experiments demo nstrated thata
frontalfaceclassiﬁer constru cted from 200featur esyields
adetection rate of95% with afalse positi verateof1in
14084.These results arecompellin g,butnotsufﬁcient for
manyreal-w orldtasks. Interms ofcomp utation, thisclas-
siﬁer isproba blyfaster than anyother published system,
requiring 0.7seconds toscan an384by288pixelimage .
Unfo rtunately ,themost straightfo rwardtechniqu eforim-
proving detection perfo rmance, adding features totheclas-
siﬁer ,directly increases compu tation time.
Forthetask offacedetection ,theinitial rectangle fea-
tures selected byAdaBoost aremeanin gfulandeasily inter-
prete d.Theﬁrstfeature selected seems tofocusontheprop-
ertythattheregionoftheeyesisoften darkerthan theregionFigure 3:The ﬁrst andsecond featur esselected byAd-
aBoost. Thetwofeature sareshowninthetoprowandthen
overlayed onatypical training faceinthebotto mrow.The
ﬁrstfeatur emeasures thedifference inintensity between the
regionoftheeyesandaregion across theuppercheeks. The
featur ecapitalizes ontheobserv ationthattheeyeregionis
often darkerthan thecheek s.Thesecond feature compar es
theintensities intheeyeregionstotheintensity across the
bridgeofthenose.
ofthenose andcheeks (see Figure 3).This feature isrel-
ativelylargeincomp arison with thedetectio nsub-win dow,
andshould besome what insensiti vetosizeandlocation of
theface. Thesecond featur eselected relies ontheproperty
thattheeyesaredarkerthan thebridge ofthenose.
4.The Attentional Cascade
This section describes analgor ithm forconstru cting acas-
cade ofclassiﬁers which achie vesincreased detection per-
form ance while radically reducin gcomp utation time. The
keyinsight isthat smaller ,and therefo remore efﬁcient,
boostedclassiﬁers canbeconstructed which reject manyof
thenegativesub-wind owswhile detecting almost allposi-
tiveinstances (i.e. thethreshold ofaboosted classiﬁer can
beadjusted sothatthefalsenegativerateisclose tozero).
Simpler classiﬁers areused toreject themajor ityofsub-
windo wsbeforemorecomple xclassiﬁers arecalled upon
toachie velowfalsepositi verates.
Theoverallform ofthedetection processisthatofade-
generatedecision tree, what wecalla“cascade” (see Fig-
ure4).Apositi veresult fromtheﬁrstclassiﬁer triggers the
evaluation ofasecond classiﬁer which hasalso been ad-
justed toachie veveryhigh detection rates. Apositi veresult
from thesecond classiﬁer triggersathird classiﬁer ,andso
on.Anegativeoutcome atanypoint leads totheimmediate
rejection ofthesub-wind ow.
Stages inthecascade areconstructed bytrainin gclas-
siﬁers using AdaBoost andthen adjusting thethresho ldto
minimize false negatives.Note thatthedefaultAdaBoost
thresho ldisdesigned toyield alowerrorrateonthetrain-
ingdata. Ingener alalowerthresho ldyields higher detec-
4

T
FT
FT
F1 2 3
Reject Sub−windowAll Sub−windows
Further
Processing
Figur e4:Schematic depiction ofathedetection cascade.
Aseries ofclassiﬁers areapplied toeverysub-wind ow.The
initial classiﬁer eliminates alargenumbe rofnegativeexam-
pleswith verylittle processing .Subsequen tlayers eliminate
additional negativesbutrequire additio nalcomp utation. Af-
terseveralstages ofprocessing thenumbe rofsub-w indows
havebeen reducedradically .Further processing cantake
anyform such asaddition alstages ofthecascade (asinour
detectio nsystem) oranalternati vedetectio nsystem.
tionrates andhigherfalsepositi verates.
Forexamp leanexcellent ﬁrststage classiﬁer canbecon-
structed from atwo-featu restrong classiﬁer byreducing the
thresh oldtominimize falsenegatives.Measured againsta
validation training set,thethresho ldcanbeadjusted tode-
tect100%ofthefaces with afalsepositi verateof40%. See
Figur e3foradescrip tion ofthetwofeatures used inthis
classiﬁer .
Computatio nofthetwofeature classiﬁer amou ntsto
about60microp rocessor instruction s.Itseems hard to
imagin ethatanysimpler ﬁlter could achie vehigher rejec-
tion rates. Bycompa rison, scanning asimple image tem-
plate, orasingle layer perce ptron, would requir eatleast 20
times asmanyopera tions persub-wind ow.
The structure ofthecascade reﬂects thefactthat
within anysingle image anoverwhelming majority ofsub-
windo wsarenegative.Assuch, thecascade attempts tore-
jectasmanynegativesaspossible attheearliest stage pos-
sible. While apositi veinstance willtrigger theevaluation
ofeveryclassiﬁer inthecascade, thisisanexceedingly rare
event.
Much likeadecision tree, subsequ entclassiﬁers are
trained using those examp leswhich pass throu ghallthe
previous stages. Asaresult, thesecond classiﬁer faces a
moredifﬁcult taskthan theﬁrst. Theexamp leswhich make
itthrou ghtheﬁrst stage are“hard er”than typical exam-
ples. The moredifﬁcult examples faced bydeeperclassi-
ﬁers push theentire recei veroperating chara cteristic (ROC)
curvedownw ard.Atagivendetection rate, deeper classi-
ﬁers havecorrespondin glyhigher falsepositi verates.4.1.Training aCascade ofClassi ﬁers
The cascade training process involvestwotypes oftrade-
offs.Inmost cases classiﬁers with more features will
achie vehigher detection rates andlowerfalsepositi verates.
Atthesame time classiﬁers with more features requiremore
time tocompu te.Inprinc ipleonecould deﬁne anoptimiza-
tionframe workinwhich: i)thenumb erofclassiﬁer stages,
ii)thenumberoffeature sineach stage, andiii)thethresh-
oldofeach stage, aretraded offinorder tominimize the
expected numberofevaluated featur es.Unfo rtunately ﬁnd-
ingthisoptimum isatremen dously difﬁcult problem.
Inpractice averysimple frameworkisused toproduce
aneffectiveclassiﬁer which ishighly efﬁcient. Each stage
inthecascade reduces thefalsepositi verateanddecreases
thedetection rate. Atargetisselected fortheminim um
reduction infalse positi vesandthemaxim umdecrea sein
detection .Each stage istrained byaddin gfeatur esuntil the
targetdetection andfalsepositi vesrates aremet(these rates
aredeterm ined bytesting thedetector onavalidation set).
Stages areadded until theoveralltargetforfalse positi ve
anddetection rateismet.
4.2.Detector Cascade Discussion
Thecomplete facedetection cascade has38stages with over
6000featur es.Nevertheless thecascade structure results in
fastaverage detection times. Onadifﬁcult dataset, con-
taining 507faces and75million sub-windo ws,faces are
detected using anaverage of10feature evaluatio nspersub-
windo w.Incompar ison, thissystem isabou t15times faster
than animplementatio nofthedetection system constru cted
byRowleyetal.3[12]
Anotion similar tothecascade appea rsinthefacede-
tection system describe dbyRowleyetal.inwhich twode-
tection netw orks areused [12].Rowleyetal.used afaster
yetlessaccurate netw orktoprescree ntheimage inorder to
ﬁndcandida teregions foraslowermore accur atenetw ork.
Thoughitisdifﬁcult todeterm ineexactly ,itappears that
Rowleyetal.’stwonetw orkfacesystem isthefastest exist-
ingfacedetector .4
The structur eofthecascaded detection processises-
sentially thatofadegenerate decision tree, andassuch is
related totheworkofAmit andGeman [1].Unlik etech-
niqueswhich useaﬁxeddetecto r,Amit andGeman propose
analternati vepointofviewwhere unusu alco-oc curren ces
ofsimple image featur esareused totrigger theevaluation
ofamore comp lexdetection process. Inthiswaythefull
detection process need notbeevaluated atmanyofthepo-
tential image location sandscales. While thisbasic insight
3Henry Rowleyverygracio usly supplied uswith imple mentati onsof
hisdetection system fordirec tcompari son. Report edresul tsareagainst
hisfastest system. Itisdifﬁculttodetermine from thepubli shed literature ,
buttheRowley-Balu ja-Kana dedetector iswidely conside redthefastest
detec tionsystem andhasbeen heavilytestedonreal-world problems.
4Other publi shed detectors haveeitherneglectedtodiscuss perfo r-
mance indetail,orhaveneverpublish eddetectio nandfalsepositiv erates
onalargeanddifﬁculttrainingset.
5

isveryvaluable ,intheir implemen tation itisnecessary to
ﬁrstevaluatesome featur edetector ateverylocation. These
featu resarethen groupedtoﬁndunusualco-occu rrence s.In
practice, since theform ofourdetector andthefeatures that
ituses areextrem elyefﬁcient, theamortize dcost ofevalu-
ating ourdetector atevery scale andlocation ismuch faster
than ﬁnding andgrouping edges throu ghout theimage .
Inrecent workFleuret andGeman havepresented aface
detectio ntechniq uewhich relies ona“chain ”oftests inor-
dertosignify thepresence ofafaceataparticu larscale and
locatio n[4].Theimage properties measure dbyFleuret and
Geman ,disjunctio nsofﬁnescale edges, arequite differen t
than rectang lefeature swhich aresimple, existatallscales,
andaresome what interpretab le.The twoappro aches also
differradically intheir learnin gphilo sophy .Themotivation
forFleuret andGeman’ slearning process isdensity estima-
tionanddensity discriminatio n,while ourdetector ispurely
discrimin ative.Finally thefalsepositi verateofFleuret and
Geman ’sapproach appears tobehighe rthan thatofprevi-
ousapproaches likeRowleyetal.andthisappro ach. Un-
fortunately thepaper does notrepor tquan titativeresults of
thiskind.Theincluded examp leimages each havebetween
2and10falsepositi ves.
5Results
A38layer cascaded classiﬁer wastrained todetect frontal
uprightfaces. Totrain thedetecto r,asetoffaceandnon-
facetraining images were used. Thefacetraining setcon-
sisted of4916hand labeled faces scaled andaligned toa
base resolution of24by24pixels.The faces were ex-
tracted from image sdownloaded durin garandomcrawlof
theworldwide web.Some typical faceexamp lesareshown
inFigure 5.The non-facesubwind owsused totrain the
detecto rcome from 9544images which were manu allyin-
spected andfoundtonotcontain anyfaces. There areabou t
350million subwindo wswithin these non-faceimages.
Thenumb eroffeatur esintheﬁrstﬁvelayers ofthede-
tector is1,10,25,25and50features respecti vely.The
rema ining layers haveincreasing lymorefeature s.Thetotal
numberoffeatur esinalllayers is6061.
Each classiﬁer inthecascade wastrained with the4916
trainin gfaces (plus their vertical mirror images foratotal
of9832 trainin gfaces) and10,000non-f acesub-w indows
(also ofsize 24by24pixels)using theAdabo osttraining
procedure .Fortheinitial onefeatu reclassiﬁer ,thenon-
facetraining examp leswere collected byselecting random
sub-windo wsfromasetof9544 images which didnotcon-
tainfaces. Thenon-faceexamples used totrain subsequen t
layer swere obtain edbyscanning thepartial cascade across
thenon-faceimages andcollecting falsepositi ves.Amax-
imum of10000such non-facesub-wind owswere collected
foreach layer .
SpeedoftheFinalDetector
Figure 5:Examp leoffrontaluprightfaceimages used for
training .
Thespeed ofthecascaded detecto risdirectly related to
thenumberoffeatures evaluated perscanned sub-win dow.
Evaluated ontheMIT+CMU testset[12],anaverage of10
featur esoutofatotal of6061areevaluatedpersub-win dow.
This ispossible because alargemajor ityofsub-win dows
arerejected bytheﬁrstorsecond layer inthecascade. On
a700Mhz Pentium IIIproce ssor,thefacedetecto rcanpro-
cess a384by288pixelimage inabou t.067 second s(us-
ingastarting scale of1.25 andastep sizeof1.5described
below).This isrough ly15times faster than theRowley-
Baluja-Kan adedetector [12]andabout 600times faster than
theSchneid erman-Ka nade detector [15].
ImageProcessing
Allexamplesub-wind owsused fortrainin gwere vari-
ance norma lized tominimize theeffectofdifferen tlight-
ingconditio ns.Normalizatio nistheref orenecessary during
detection aswell. The varianc eofanimage sub-wind ow
canbecomputed quick lyusing apair ofintegralimage s.
Recall that

,where isthestandard
deviation,
isthemean, and
isthepixelvalue within
thesub-win dow.Themean ofasub-wind owcanbecom-
puted using theintegralimage. Thesum ofsquar edpixels
iscomp uted using anintegralimage oftheimage squared
(i.e. twointegralimages areused inthescanning process).
Durin gscannin gtheeffectofimage norm alization canbe
achie vedbypost-m ultiplying thefeature valuesrather than
pre-multiplyin gthepixels.
Scanning theDetector
Theﬁnal detecto risscanned across theimage atmulti-
plescales andlocations. Scaling isachie vedbyscaling the
detector itself, rather than scaling theimage. This process
makessense becau sethefeatur escanbeevaluated atany
6

Dete ctorFalsedetections
10 31 50 65 78 95 167
Viola-Jones 76.1% 88.4% 91.4% 92.0% 92.1% 92.9% 93.9%
Viola-Jones (voting) 81.1% 89.7% 92.1% 93.1% 93.1% 93.2 % 93.7%
Rowley-Bal uja-Kan ade 83.2% 86.0% – – – 89.2% 90.1%
Schne iderman-Ka nade – – – 94.4% – – –
Roth- Yang-Ahuj a – – – – (94.8%) – –
Table 2:Detection rates forvariou snumbe rsoffalse positi vesontheMIT+CMU testsetcontainin g130images and507
faces.
scale with thesame cost. Good results were obtain edusing
asetofscales afactor of1.25 apart.
Thedetector isalsoscanned across location. Subsequen t
locatio nsareobtaine dbyshifting thewindo wsome number
ofpixels
.This shifting proce ssisaffecte dbythescale of
thedetecto r:ifthecurrentscale is
thewindo wisshifted
by

,where
istherounding operatio n.
Thechoice of
affects both thespeed ofthedetecto ras
well asaccurac y.Theresults wepresent arefor

.
Wecanachie veasigniﬁcant speedu pbysetting

with only aslight decrease inaccurac y.
IntegrationofMultipleDetections
Since theﬁnal detector isinsensiti vetosmall changesin
translation andscale, multiple detection swillusually occur
aroundeach faceinascanned image. Thesame isoften true
ofsome types offalsepositi ves.Inpractice itoften makes
sense toreturn oneﬁnal detectio nperface. Towardthisend
itisuseful topostpr ocess thedetected sub-win dowsinorder
tocomb ineoverlappin gdetections intoasingle detection.
Inthese experiments detection sarecomb ined inavery
simple fashion. The setofdetection sareﬁrst partitio ned
intodisjoint subsets. Twodetections areinthesame subset
iftheir bounding regionsoverlap. Each partitio nyields a
single ﬁnal detection. The corners oftheﬁnal bounding
regionaretheaverageofthecorners ofalldetection sinthe
set.
Experiments onaReal-WorldTestSet
Wetested oursystem ontheMIT+CMU frontal facetest
set[12].This setconsists of130image swith 507labeled
frontalfaces. AROCcurveshowing theperfor mance ofour
detecto ronthistestsetisshowninFigure 6.Tocreate the
ROCcurvethethresh oldoftheﬁnal layer classiﬁer isad-
justed from

to
.Adjusting thethresho ldto

will yield adetection rateof0.0andafalse positi verate
of0.0. Adjusting thethresho ldto

,however,increases
both thedetection rateandfalsepositi verate, butonly toa
certain point. Neither ratecanbehigher than therateofthe
detectio ncascade minus theﬁnal layer .Ineffect, athresh-
oldof

isequivalent toremo vingthat layer .Further
incre asing thedetection andfalsepositi verates require sde-
creasing thethresh oldofthenextclassiﬁer inthecascade.Thus,inorder toconstruct acomplete ROCcurve,classiﬁer
layers areremo ved.Weusethenumbe roffalsepositi vesas
opposed totherateoffalse positi vesforthex-axisofthe
ROCcurvetofacilitate comp arison with other systems. To
comp utethefalse positi verate, simply divide bythetotal
numberofsub-wind owsscanned .Inourexperiments, the
numberofsub-wind owsscanned is75,081,800.
Unfortu nately ,most previouspublished results onface
detection haveonly included asingle opera tingregime(i.e.
single point ontheROCcurve).Tomakecompar ison with
ourdetector easier wehavelisted ourdetection rateforthe
falsepositi verates repor tedbytheother systems. Table 2
lists thedetection rateforvarious numbers offalse detec-
tions foroursystem aswell asother published systems. For
theRowley-Baluja-Kanad eresults [12],anumberofdiffer-
entversio nsoftheir detector were tested yieldin ganumber
ofdifferent results theyarealllisted inunder thesame head-
ing. FortheRoth-Y ang-Ahu jadetector [11],theyreported
their result ontheMIT+CMU testsetminus 5images con-
taining linedrawnfaces remo ved.
Figure 7showstheoutpu tofourfacedetector onsome
testimages from theMIT+CMU testset.
Asimplevotingschemetofurtherimproveresults
Intable 2wealso showresults from runningthree de-
tectors (the38layer onedescribed aboveplus twosimilarly
trained detectors) andoutpu tting themajor ityvoteofthe
three detecto rs.This impro vesthedetection rateaswell as
eliminatin gmore falsepositi ves.Theimpro vemen twould
begreater ifthedetectors were more indep endent. Thecor-
relation oftheir error sresults inamodestimpro vementover
thebestsingle detecto r.
6Conclusions
Wehavepresente danappro achforobject detection which
minimize scomp utation time while achie vinghigh detection
accur acy.The approach wasused toconstru ctafacede-
tection system which isapproximately 15faster than any
previous appro ach.
This paper bringstogether newalgor ithms, representa-
tions, andinsights which arequite generic andmay well
7

Figur e7:Output ofourfacedetector onanumberoftestimages from theMIT+CMU testset.
Figur e6: ROC curveforour face detector onthe
MIT+CMU testset.Thedetector wasrunusing astep size
of1.0andstarting scale of1.0(75,081, 800sub-w indows
scanne d).
havebroad erapplication incomputer vision andimage pro-
cessing.
Finally thispaper presen tsasetofdetailed experiments
onadifﬁcult facedetection dataset which hasbeen widely
studied .This dataset includes faces underaverywide range
ofcondition sincludin g:illumination ,scale, pose, andcam-
eravariatio n.Expe riments onsuch alargeandcomp lex
dataset aredifﬁcult andtime consumin g.Nevertheless sys-
tems which workunderthese cond itions areunlik elytobe
brittle orlimited toasingle setofcond itions. More impor –
tantly conclusions drawnfrom thisdataset areunlik elyto
beexperimental artifacts.
Refer ences
[1]Y.Amit, D.Geman, andK.Wilder.Joint induction ofshape
features andtreeclassiﬁers, 1997.
[2]Anon ymous. Anon ymous. InAnonymou s,2000.[3]F.Crow.Summed-area tables fortexture mapping. In
Proceedings ofSIGGRAPH ,volume 18(3), pages 207– 212,
1984.
[4]F.Fleuret andD.Geman. Coarse-to-ﬁne facedetection. Int.
J.Computer Vision ,2001.
[5]WilliamT.Freeman andEdwardH.Adelson. The design
anduseofsteerable ﬁlters. IEEE Transactions onPattern
Analysis andMachineIntellig ence,13(9):891– 906, 1991 .
[6]YoavFreund andRobert E.Schapire. Adecision-theo retic
generalization ofon-line learning and anapplication to
boosting. InComputational Learning Theory: Eurocolt ’95,
pages 23–37. Springer -Verlag, 1995.
[7]H.Greenspan, S.Belongie, R.Gooodman ,P.Perona, S.Rak-
shit, andC.Anderson. Overcomp letesteerable pyramid ﬁl-
tersandrotation invariance .InProceedings oftheIEEE Con-
ference onComputer Vision andPattern Reco gnition,1994.
[8]L.Itti,C.Koch, andE.Nieb ur.Amodel ofsalienc y-based
visual attention forrapid scene analysis. IEEE Patt.Anal.
Mach.Intell. ,20(11):12 54–125 9,November1998 .
[9]Edgar Osuna, Robert Freund, andFederico Girosi. Training
support vector machines :anapplication tofacedetection.
InProceedings oftheIEEE Confer enceonComputer Vision
andPattern Reco gnition,1997 .
[10] C.Papageor giou, M.Oren, andT.Poggio. Ageneral frame-
workforobject detection. InInternationa lConfer enceon
Computer Vision ,1998 .
[11] D.Roth, M.Yang, andN.Ahuja. Asnowbased facedetector .
InNeur alInformation Processing 12,2000.
[12] H.Rowley,S.Baluja, andT.Kanade. Neural netw ork-b ased
facedetection. InIEEE Patt.Anal. Mach.Intell. ,volume 20,
pages 22–38, 1998.
[13] R.E.Schapire, Y.Freund, P.Bartlett, andW.S.Lee. Boost-
ingthemargin: anewexplanation fortheeffectiveness of
voting methods. Ann. Stat.,26(5):1651 –1686 ,1998.
[14] Robert E.Schapire, YoavFreund, Peter Bartlett, and
WeeSunLee. Boosting themargin: Anewexplanation for
theeffectiveness ofvoting method s.InProceedings ofthe
Fourteenth Internationa lConfer enceonMachine Learning ,
1997.
[15] H.Schneiderma nandT.Kanade. Astatistical method for3D
object detection applied tofaces andcars. InInternationa l
Confer ence onComputer Vision ,2000.
8

[16] K.Sung andT.Poggio. Example-bas edlearning forview-
based facedetection. InIEEE Patt.Anal. Mach.Intell. ,vol-
ume 20,pages 39–51 ,1998.
[17] J.K.Tsotsos, S.M. Culhane, W.Y.K.Wai,Y.H.Lai,N.Davis,
andF.Nuﬂo. Modeling visual-attention viaselecti vetun-
ing. Artiﬁcial Intellig ence Journ al,78(1-2 ):507–545 ,Octo-
ber1995.
[18] Andre wWebb.Statistical Pattern Reco gnition.Oxford Uni-
versity Press, NewYork,1999 .
9

Copyright Notice

© Licențiada.org respectă drepturile de proprietate intelectuală și așteaptă ca toți utilizatorii să facă același lucru. Dacă consideri că un conținut de pe site încalcă drepturile tale de autor, te rugăm să trimiți o notificare DMCA.

Acest articol: ACCEPTEDCONFERENCEONCOMPUTERVISIONANDPATTERNRECOGNITION2001 [619320] (ID: 619320)

Dacă considerați că acest conținut vă încalcă drepturile de autor, vă rugăm să depuneți o cerere pe pagina noastră Copyright Takedown.

ACCEPTEDCONFERENCEONCOMPUTERVISIONANDPATTERNRECOGNITION2001 [619320]

Copyright Notice

3.1 Prezentare generală a raionului Șoldănești. Raionul Șoldănești este amplasat în partea de Nord-Est a Republicii Moldova. Coordonatele geografice… [308277]

Numerele naturale [603120]

UNIVESITATEA ,, DUNĂREA DE JOS” GALAȚI FACULTATEA DE ȘTIINȚE JURIDICE, SOCIALE ȘI POLITICE SPECIALIZAREA DREPT REFERAT Expertiza criminalistică… [614069]

Metodologia cercetării în științele [611195]

Elaborarea planului de marketing a intreprinderii [307405]

3.3 Analiza cu element finit a unui subansamblu din cadrul structurii de proiectat 1. Alegere reper Metodele de calcul ale structurilor cunoscute si… [608871]

Copyright Notice

Similar Posts