ACCEPTEDCONFERENCEONCOMPUTERVISIONANDPATTERNRECOGNITION2001 [619320]

ACCEPTEDCONFERENCEONCOMPUTERVISIONANDPATTERNRECOGNITION2001
RapidObjectDetectionusingaBoostedCascadeofSimple
Features
PaulViola MichaelJones
viola@me rl.com mjones@cr l.dec.com
MitsubishiElectricResearch Labs CompaqCRL
201Broadway,8thFL OneCambridge Center
Cambridge, MA02139 Cambridge, MA02142
Abstract
This paperdescribes amachinelearning approachforvi-
sual object detection whic hiscapa bleofprocessing images
extremely rapidly andachievinghigh detectio nrates. This
work isdistingu ished bythreekeycontrib utions. Thefirst
istheintroductio nofanewimagerepresentation called the
“Inte gralImage”whic hallows thefeatur esused byourde-
tector tobecomp uted very quickly.Thesecond isalearning
algorithm, based onAdaB oost, whic hselects asmall num-
berofcritical visual featur esfromalargersetandyields
extremely efficien tclassifier s[6]. Thethirdcontrib ution is
amethod forcomb ining increasingly morecomple xclassi-
fiersina“cascad e”whic hallows backgroundregionsofthe
imagetobequicklydiscar dedwhile spending morecompu-
tation onpromising object-lik eregions.Thecascade canbe
viewed asanobject specific focus-o f-attentio nmechanism
whic hunlik epreviousapproachesprovides statistical guar-
anteesthatdiscar dedregions areunlik elytocontain theob-
jectofinterest.Inthedomain offace detection thesystem
yields detection rates compar abletothebest previoussys-
tems. Used inreal-time applications, thedetecto rruns at
15frames persecond without resorting toimagedifferenc-
ingorskincolor detection.
1.Introduction
This paper bring stogeth ernewalgor ithms andinsights to
construct aframeworkforrobustandextrem elyrapid object
detectio n.This frameworkisdemon strated on,andinpart
motivated by,thetask offacedetection .Towardthisend
wehaveconstru cted afrontalfacedetection system which
achie vesdetection andfalsepositi verates which areequiv-
alent tothebest published results [16,12,15,11,1].This
facedetectio nsystem ismost clearly distinguished from
previous approa ches initsability todetect faces extremely
rapid ly.Operating on384by288pixelimages, faces arede-tected at15framespersecond onaconvention al700MHz
Intel Pentium III.Inother facedetection systems, auxiliary
information, such asimage differences invideo sequence s,
orpixelcolor incolor images, havebeen used toachie ve
high frame rates. Our system achie veshigh framerates
workin gonly with theinform ation presen tinasingle grey
scale image. These alternati vesources ofinfor mation can
also beintegrated with oursystem toachie veevenhigher
framerates.
There arethree main contrib utions ofourobject detec-
tion frame work. Wewill introdu ceeach ofthese ideas
briefly belowandthen describ ethem indetail insubsequ ent
sections.
Thefirstcontrib utionofthispaper isanewimage repre-
sentation called anintegralimagethatallowsforveryfast
featur eevaluation. Moti vated inpartbytheworkofPapa-
georgiou etal.ourdetection system does notworkdirectly
with image intensities [10].Likethese autho rsweusea
setoffeatur eswhich areremin iscent ofHaar Basis func-
tions (thou ghwewillalsouserelated filters which aremore
comp lexthan Haar filters). Inordertocompu tethese fea-
tures veryrapidly atmanyscales weintrod ucetheintegral
image representation forimages. Theintegralimage canbe
comp uted from animage using afewoperations perpixel.
Once compu ted,anyoneofthese Harr-likefeatur escanbe
comp uted atanyscale orlocation inconstant time.
The second contrib ution ofthispaperisamethod for
constru cting aclassifier byselecting asmall numbe rofim-
portantfeatur esusing AdaBoost [6].Within anyimage sub-
windo wthetotal numb erofHarr-likefeatur esisverylarge,
farlargerthan thenumbe rofpixels.Inordertoensure fast
classification, thelearnin gprocess must excludealargema-
jority oftheavailable featur es,andfocusonasmall setof
critical features. Moti vated bytheworkofTieuandViola,
featur eselection isachie vedthroughasimple modification
oftheAdaBoost procedure: theweak learner isconstrain ed
sothateach weak classifier return edcandependononly a
1

single feature [2].Asaresult each stage oftheboosting
process, which selects anewweak classifier ,canbeviewed
asafeature selection proce ss.AdaBoost provides aneffec-
tivelearnin galgorithm andstrong boun dsongener alization
performan ce[13,9,10].
The third major contrib ution ofthispaper isamethod
forcomb ining successi velymore comp lexclassifiers ina
cascade structure which dramatically increases thespeed of
thedetector byfocusing attention onprom ising regionsof
theimage. Thenotio nbehind focus ofattention approaches
isthatitisoften possible torapidly determ inewhere inan
image anobjectmigh toccur [17,8,1].More comp lexpro-
cessing isreserv edonly forthese promising regions. The
keymeasur eofsuch anappro achisthe“falsenegative”rate
oftheattention alprocess. Itmust bethecase thatall,or
almost all,object instances areselected bytheattention al
filter.
Wewilldescribe aprocess fortraining anextremely sim-
pleandefficient classifier which canbeused asa“super –
vised” focusofattention operato r.The term supervised
referstothefactthattheattentiona loperato ristrained to
detect examples ofaparticular class. Inthedoma inofface
detectio nitispossible toachie vefewerthan 1%falseneg-
ativesand40% falsepositi vesusing aclassifier constructed
fromtwoHarr-likefeatur es.The effectofthisfilter isto
reducebyoveronehalfthenumberoflocation swhere the
final detector must beevaluated .
Those sub-win dowswhich arenotrejected bytheinitial
classifier areprocessed byasequen ceofclassifiers, each
slightly more comp lexthan thelast. Ifanyclassifier rejects
thesub-wind ow,nofurther processing isperfor med. The
structu reofthecascaded detection process isessentially
thatofadegenerate decision tree, andassuch isrelated to
theworkofGeman andcolleague s[1,4].
Anextremely fastfacedetector will havebroadprac-
tical application s.These includ euser interf aces, image
datab ases, and telecon ferencin g.Inapplication swhere
rapid frame-rates arenotnecessary ,oursystem will allow
forsignificant addition alpost-pro cessing andanalysis. In
addition oursystem canbeimplemen tedonawide range of
small lowpowerdevices, includ inghand- helds andembed-
dedprocessors. Inourlabwehaveimplemen tedthisface
detecto rontheCompa qiPaqhandh eldandhaveachie ved
detectio nattwoframes persecond (this device hasalow
power200mips StrongArm processor which lacks floating
pointhardware).
The remain derofthepaper describes ourcontr ibutions
andanumbe rofexperim ental results, includ ingadetailed
descrip tion ofourexperimental method ology .Discussion
ofclosely related worktakesplace attheendofeach sec-
tion.
2.Featur es
Ourobject detection procedure classifies images based on
thevalueofsimple featur es.There aremanymotivations



















































































  
 
 
 
 
 
 
 
 


























A B
C D
Figure 1:Examp lerectang lefeature sshownrelati vetothe
enclosin gdetection windo w.Thesum ofthepixelswhich
liewithin thewhite rectang lesaresubtracted from thesum
ofpixelsinthegreyrectan gles. Two-rectan glefeatu resare
shownin(A)and(B). Figure (C)showsathree- rectangle
featur e,and(D)afour-rectangle feature.
forusing featur esrather than thepixelsdirectly .Themost
comm onreason isthatfeatur escanacttoencod ead-hoc
domainknowledge that isdifficulttolearn using afinite
quan tityoftraining data. Forthissystem there isalso a
second critical motivation forfeatures: thefeature based
system operate smuchfaster than apixel-based system.
The simple features used areremin iscent ofHaar basis
functions which havebeen used byPapageo rgiou etal.[10].
More specifically ,weusethree kindsoffeatures. Thevalue
ofatwo-r ectangle featur eisthedifference between thesum
ofthepixelswithin tworectangu larregions.The regions
havethesame size andshape andarehoriz ontally orver-
tically adjacen t(see Figure 1).Athree-rectan glefeatur e
comp utes thesum within twooutside rectangles subtracted
from thesum inacenter rectangle. Finally afour-rectan gle
featur ecomp utes thedifference between diago nalpairs of
rectang les.
Giventhatthebase resolution ofthedetecto ris24x2 4,
theexhau stivesetofrectang lefeatures isquite large,over
180,000 .Note thatunlik etheHaar basis, thesetofrectan-
glefeature sisovercomp lete1.
2.1.Integral Image
Rectangle features canbecomp uted veryrapid lyusing an
interme diate repre sentation fortheimage which wecallthe
integralimage.2Theintegralimage atlocation

contain s
thesum ofthepixelsaboveandtotheleftof

,inclusi ve:   ! "#$%"
 '& (&$
1Acomple tebasis hasnolineardepen dence between basis elements
andhasthesame number ofelements astheimage space ,inthiscase 576.
Thefullsetof180,000 thousa ndfeatur esismanytimes over-complete.
2There isaclose relationto“summed area tables”asused ingraphics
[3].Wechoose adifferentname here inorder toemphasi zeitsuseforthe
analy sisofimage s,rather than fortexturemapping .
2

A
CB
D1
42
3
Figur e2:Thesum ofthepixelswithin rectan gle canbe
computed with four array referen ces.Thevalue oftheinte-
gralimage atlocation 1isthesum ofthepixelsinrectang le.Thevalue atlocation 2is
,atlocation 3is
,
andatlocation 4is
 .Thesum withincan
becomp uted as
 

.
wher e
   istheintegralimage and
  istheorigi-
nalimage. Using thefollowing pairofrecur rences:
    
  

 

(1)    ( 
  

 

  ((2)
(whe re

  (isthecumu lativerowsum,

 

  ,
and
 

  )theintegralimage canbecomp uted in
onepass overtheorigin alimage.
Using theintegral image anyrectang ular sum canbe
computed infour array references (see Figure 2).Clearly
thedifference between tworectang ular sums canbecom-
putedineight referen ces. Since thetwo-rectan glefeatures
define daboveinvolveadjacen trectang ular sums theycan
becompu tedinsixarray referen ces, eight inthecase of
thethree-re ctangle features, andnine forfour-rectangle fea-
tures.
2.2.Featur eDiscussion
Rectangle features aresome whatprimiti vewhen comp ared
with alternati vessuch assteerable filters [5,7].Steerable fil-
ters, andtheir relati ves,areexcellent forthedetailed analy-
sisofboun daries, image comp ression, andtextureanalysis.
Incontra strectangle featur es,while sensiti vetothepres-
ence ofedges, bars, andother simple image structur e,are
quite coarse. Unlik esteerable filters theonly orientatio ns
available arevertical, horizo ntal, anddiagon al.Thesetof
rectan glefeatures dohoweverprovidearich image repre-
sentation which suppo rtseffectivelearnin g.Inconjun ction
with theintegralimage ,theefficienc yoftherectan glefea-
turesetprovides ample compe nsation fortheir limited flex-
ibility .
3.Lear ning Classification Functions
Givenafeatur esetandatraining setofpositi veandneg-
ativeimages, anynumb erofmach inelearning approachescould beused tolearn aclassification functio n.Inoursys-
temavariant ofAdaBoost isused both toselect asmall set
offeatures andtrain theclassifier [6].Initsoriginalform,
theAdaBoo stlearning algorith misused toboost theclas-
sification perfo rmance ofasimple (sometimes called weak)
learnin galgor ithm. There areanumberofformalguaran-
teesprovided bytheAdaBoo stlearnin gprocedure .Freund
andSchapir eprovedthat thetraining errorofthestrong
classifier appro aches zero exponentially inthenumberof
rounds. More importa ntly anumberofresults were later
provedabou tgeneralization perfo rmance [14].The key
insight isthatgeneraliza tion perfo rmance isrelated tothe
marginoftheexamples, andthatAdaBoo stachie veslarge
marginsrapid ly.
Recall thatthere areover180,000 rectang lefeatur esas-
sociated with each image sub-wind ow,anumberfarlarger
than thenumberofpixels. Eventhoug heach feature can
becomp uted veryefficiently ,comp uting thecomple tesetis
prohibiti velyexpensive.Ourhypothesis, which isborneout
byexperiment, isthataverysmall numb erofthese featur es
canbecomb ined toform aneffectiveclassifier .Themain
challeng eistofindthese features.
Insuppor tofthisgoal, theweak learning algorithm is
designe dtoselect thesingle rectang lefeature which best
separates thepositi veandnegativeexamp les(this issimilar
totheappro achof[2]inthedomain ofimage database re-
trieval).Foreach feature, theweak learner determin esthe
optima lthreshold classification function, such thatthemin-
imum numberofexamp lesaremisclassified. Aweak clas-
sifier
 %thus consists ofafeatu re,athresholdand
aparity !indicating thedirection oftheinequality sign:

 % #"
if


 %%$

other wise
Here
isa24x2 4pixelsub-w indowofanimage. SeeTa-
ble1forasummary oftheboosting proce ss.
Inpractice nosingle feature canperfo rmtheclassifica-
tiontaskwith lowerror.Features which areselected inearly
roundsoftheboosting process haderror rates between 0.1
and0.3. Features selected inlater rounds, asthetask be-
comes moredifficult, yield error rates between 0.4and0.5.
3.1.Lear ning Discussion
Manygener alfeatur eselection procedure shavebeen pro-
posed (see chapter 8of[18]forareview).Ourfinal appli-
cation demand edaveryaggressi veappro achwhich would
discard thevastmajority offeatures. Forasimilar recog ni-
tionproblemPapageorgiouetal.propo sedascheme forfea-
tureselection based onfeature varianc e[10].Theydemo n-
strated goodresults selecting 37feature soutofatotal 1734
featur es.
Roth etal.proposeafeatur eselection process based
ontheWinnowexpon ential perceptron learning rule [11].
TheWinnowlearning process convergestoasolution where
manyofthese weights arezero. Neverthe lessaverylarge
3

Givenexample images
 

  where
fornegativeandpositi veexamples respec-
tively.Initialize weights 
 
!
"$#

"$%for
!&
respec-
tively,where'and(arethenumber ofnegativesand
positi vesrespecti vely.For )
*+-,:
1.Normalize theweights,/.

10
.

2
34

.
3
sothat 5.isaprobab ilitydistrib ution.
2.Foreach feature, 6,train aclassifier 7
3which
isrestricted tousing asingle feature. The
error isevaluated with respect to8.,9
3
2
:7
3

 <;=
 :.
3.Choose theclassifier ,7
.,with thelowest error9
..
4.Update theweights:
.>
 

.

@?
 AB$C.
where D
EFifexample
isclassified cor-
rectly ,D
&otherwise, and
?.
 GIH AG
H.Thefinal strong classifier is:7
 JLK
M2ON.
4

P.-7Q.
 SR
"
2TN.
4
P.otherwise
where
P.
OUWV+X
YH
Table 1:The AdaBoo stalgor ithm forclassifier learn-
ing. Each round ofboosting selects onefeature from the
180,000 potential features.
numberoffeature sareretained (perhaps afewhundred or
thousand).
3.2.Lear ning Results
While details onthetraining andperforman ceofthefinal
system arepresente dinSection 5,several simple results
merit discussion. Initial experiments demo nstrated thata
frontalfaceclassifier constru cted from 200featur esyields
adetection rate of95% with afalse positi verateof1in
14084.These results arecompellin g,butnotsufficient for
manyreal-w orldtasks. Interms ofcomp utation, thisclas-
sifier isproba blyfaster than anyother published system,
requiring 0.7seconds toscan an384by288pixelimage .
Unfo rtunately ,themost straightfo rwardtechniqu eforim-
proving detection perfo rmance, adding features totheclas-
sifier ,directly increases compu tation time.
Forthetask offacedetection ,theinitial rectangle fea-
tures selected byAdaBoost aremeanin gfulandeasily inter-
prete d.Thefirstfeature selected seems tofocusontheprop-
ertythattheregionoftheeyesisoften darkerthan theregionFigure 3:The first andsecond featur esselected byAd-
aBoost. Thetwofeature sareshowninthetoprowandthen
overlayed onatypical training faceinthebotto mrow.The
firstfeatur emeasures thedifference inintensity between the
regionoftheeyesandaregion across theuppercheeks. The
featur ecapitalizes ontheobserv ationthattheeyeregionis
often darkerthan thecheek s.Thesecond feature compar es
theintensities intheeyeregionstotheintensity across the
bridgeofthenose.
ofthenose andcheeks (see Figure 3).This feature isrel-
ativelylargeincomp arison with thedetectio nsub-win dow,
andshould besome what insensiti vetosizeandlocation of
theface. Thesecond featur eselected relies ontheproperty
thattheeyesaredarkerthan thebridge ofthenose.
4.The Attentional Cascade
This section describes analgor ithm forconstru cting acas-
cade ofclassifiers which achie vesincreased detection per-
form ance while radically reducin gcomp utation time. The
keyinsight isthat smaller ,and therefo remore efficient,
boostedclassifiers canbeconstructed which reject manyof
thenegativesub-wind owswhile detecting almost allposi-
tiveinstances (i.e. thethreshold ofaboosted classifier can
beadjusted sothatthefalsenegativerateisclose tozero).
Simpler classifiers areused toreject themajor ityofsub-
windo wsbeforemorecomple xclassifiers arecalled upon
toachie velowfalsepositi verates.
Theoverallform ofthedetection processisthatofade-
generatedecision tree, what wecalla“cascade” (see Fig-
ure4).Apositi veresult fromthefirstclassifier triggers the
evaluation ofasecond classifier which hasalso been ad-
justed toachie veveryhigh detection rates. Apositi veresult
from thesecond classifier triggersathird classifier ,andso
on.Anegativeoutcome atanypoint leads totheimmediate
rejection ofthesub-wind ow.
Stages inthecascade areconstructed bytrainin gclas-
sifiers using AdaBoost andthen adjusting thethresho ldto
minimize false negatives.Note thatthedefaultAdaBoost
thresho ldisdesigned toyield alowerrorrateonthetrain-
ingdata. Ingener alalowerthresho ldyields higher detec-
4

T
FT
FT
F1 2 3
Reject Sub−windowAll Sub−windows
Further
Processing
Figur e4:Schematic depiction ofathedetection cascade.
Aseries ofclassifiers areapplied toeverysub-wind ow.The
initial classifier eliminates alargenumbe rofnegativeexam-
pleswith verylittle processing .Subsequen tlayers eliminate
additional negativesbutrequire additio nalcomp utation. Af-
terseveralstages ofprocessing thenumbe rofsub-w indows
havebeen reducedradically .Further processing cantake
anyform such asaddition alstages ofthecascade (asinour
detectio nsystem) oranalternati vedetectio nsystem.
tionrates andhigherfalsepositi verates.
Forexamp leanexcellent firststage classifier canbecon-
structed from atwo-featu restrong classifier byreducing the
thresh oldtominimize falsenegatives.Measured againsta
validation training set,thethresho ldcanbeadjusted tode-
tect100%ofthefaces with afalsepositi verateof40%. See
Figur e3foradescrip tion ofthetwofeatures used inthis
classifier .
Computatio nofthetwofeature classifier amou ntsto
about60microp rocessor instruction s.Itseems hard to
imagin ethatanysimpler filter could achie vehigher rejec-
tion rates. Bycompa rison, scanning asimple image tem-
plate, orasingle layer perce ptron, would requir eatleast 20
times asmanyopera tions persub-wind ow.
The structure ofthecascade reflects thefactthat
within anysingle image anoverwhelming majority ofsub-
windo wsarenegative.Assuch, thecascade attempts tore-
jectasmanynegativesaspossible attheearliest stage pos-
sible. While apositi veinstance willtrigger theevaluation
ofeveryclassifier inthecascade, thisisanexceedingly rare
event.
Much likeadecision tree, subsequ entclassifiers are
trained using those examp leswhich pass throu ghallthe
previous stages. Asaresult, thesecond classifier faces a
moredifficult taskthan thefirst. Theexamp leswhich make
itthrou ghthefirst stage are“hard er”than typical exam-
ples. The moredifficult examples faced bydeeperclassi-
fiers push theentire recei veroperating chara cteristic (ROC)
curvedownw ard.Atagivendetection rate, deeper classi-
fiers havecorrespondin glyhigher falsepositi verates.4.1.Training aCascade ofClassi fiers
The cascade training process involvestwotypes oftrade-
offs.Inmost cases classifiers with more features will
achie vehigher detection rates andlowerfalsepositi verates.
Atthesame time classifiers with more features requiremore
time tocompu te.Inprinc ipleonecould define anoptimiza-
tionframe workinwhich: i)thenumb erofclassifier stages,
ii)thenumberoffeature sineach stage, andiii)thethresh-
oldofeach stage, aretraded offinorder tominimize the
expected numberofevaluated featur es.Unfo rtunately find-
ingthisoptimum isatremen dously difficult problem.
Inpractice averysimple frameworkisused toproduce
aneffectiveclassifier which ishighly efficient. Each stage
inthecascade reduces thefalsepositi verateanddecreases
thedetection rate. Atargetisselected fortheminim um
reduction infalse positi vesandthemaxim umdecrea sein
detection .Each stage istrained byaddin gfeatur esuntil the
targetdetection andfalsepositi vesrates aremet(these rates
aredeterm ined bytesting thedetector onavalidation set).
Stages areadded until theoveralltargetforfalse positi ve
anddetection rateismet.
4.2.Detector Cascade Discussion
Thecomplete facedetection cascade has38stages with over
6000featur es.Nevertheless thecascade structure results in
fastaverage detection times. Onadifficult dataset, con-
taining 507faces and75million sub-windo ws,faces are
detected using anaverage of10feature evaluatio nspersub-
windo w.Incompar ison, thissystem isabou t15times faster
than animplementatio nofthedetection system constru cted
byRowleyetal.3[12]
Anotion similar tothecascade appea rsinthefacede-
tection system describe dbyRowleyetal.inwhich twode-
tection netw orks areused [12].Rowleyetal.used afaster
yetlessaccurate netw orktoprescree ntheimage inorder to
findcandida teregions foraslowermore accur atenetw ork.
Thoughitisdifficult todeterm ineexactly ,itappears that
Rowleyetal.’stwonetw orkfacesystem isthefastest exist-
ingfacedetector .4
The structur eofthecascaded detection processises-
sentially thatofadegenerate decision tree, andassuch is
related totheworkofAmit andGeman [1].Unlik etech-
niqueswhich useafixeddetecto r,Amit andGeman propose
analternati vepointofviewwhere unusu alco-oc curren ces
ofsimple image featur esareused totrigger theevaluation
ofamore comp lexdetection process. Inthiswaythefull
detection process need notbeevaluated atmanyofthepo-
tential image location sandscales. While thisbasic insight
3Henry Rowleyverygracio usly supplied uswith imple mentati onsof
hisdetection system fordirec tcompari son. Report edresul tsareagainst
hisfastest system. Itisdifficulttodetermine from thepubli shed literature ,
buttheRowley-Balu ja-Kana dedetector iswidely conside redthefastest
detec tionsystem andhasbeen heavilytestedonreal-world problems.
4Other publi shed detectors haveeitherneglectedtodiscuss perfo r-
mance indetail,orhaveneverpublish eddetectio nandfalsepositiv erates
onalargeanddifficulttrainingset.
5

isveryvaluable ,intheir implemen tation itisnecessary to
firstevaluatesome featur edetector ateverylocation. These
featu resarethen groupedtofindunusualco-occu rrence s.In
practice, since theform ofourdetector andthefeatures that
ituses areextrem elyefficient, theamortize dcost ofevalu-
ating ourdetector atevery scale andlocation ismuch faster
than finding andgrouping edges throu ghout theimage .
Inrecent workFleuret andGeman havepresented aface
detectio ntechniq uewhich relies ona“chain ”oftests inor-
dertosignify thepresence ofafaceataparticu larscale and
locatio n[4].Theimage properties measure dbyFleuret and
Geman ,disjunctio nsoffinescale edges, arequite differen t
than rectang lefeature swhich aresimple, existatallscales,
andaresome what interpretab le.The twoappro aches also
differradically intheir learnin gphilo sophy .Themotivation
forFleuret andGeman’ slearning process isdensity estima-
tionanddensity discriminatio n,while ourdetector ispurely
discrimin ative.Finally thefalsepositi verateofFleuret and
Geman ’sapproach appears tobehighe rthan thatofprevi-
ousapproaches likeRowleyetal.andthisappro ach. Un-
fortunately thepaper does notrepor tquan titativeresults of
thiskind.Theincluded examp leimages each havebetween
2and10falsepositi ves.
5Results
A38layer cascaded classifier wastrained todetect frontal
uprightfaces. Totrain thedetecto r,asetoffaceandnon-
facetraining images were used. Thefacetraining setcon-
sisted of4916hand labeled faces scaled andaligned toa
base resolution of24by24pixels.The faces were ex-
tracted from image sdownloaded durin garandomcrawlof
theworldwide web.Some typical faceexamp lesareshown
inFigure 5.The non-facesubwind owsused totrain the
detecto rcome from 9544images which were manu allyin-
spected andfoundtonotcontain anyfaces. There areabou t
350million subwindo wswithin these non-faceimages.
Thenumb eroffeatur esinthefirstfivelayers ofthede-
tector is1,10,25,25and50features respecti vely.The
rema ining layers haveincreasing lymorefeature s.Thetotal
numberoffeatur esinalllayers is6061.
Each classifier inthecascade wastrained with the4916
trainin gfaces (plus their vertical mirror images foratotal
of9832 trainin gfaces) and10,000non-f acesub-w indows
(also ofsize 24by24pixels)using theAdabo osttraining
procedure .Fortheinitial onefeatu reclassifier ,thenon-
facetraining examp leswere collected byselecting random
sub-windo wsfromasetof9544 images which didnotcon-
tainfaces. Thenon-faceexamples used totrain subsequen t
layer swere obtain edbyscanning thepartial cascade across
thenon-faceimages andcollecting falsepositi ves.Amax-
imum of10000such non-facesub-wind owswere collected
foreach layer .
SpeedoftheFinalDetector
Figure 5:Examp leoffrontaluprightfaceimages used for
training .
Thespeed ofthecascaded detecto risdirectly related to
thenumberoffeatures evaluated perscanned sub-win dow.
Evaluated ontheMIT+CMU testset[12],anaverage of10
featur esoutofatotal of6061areevaluatedpersub-win dow.
This ispossible because alargemajor ityofsub-win dows
arerejected bythefirstorsecond layer inthecascade. On
a700Mhz Pentium IIIproce ssor,thefacedetecto rcanpro-
cess a384by288pixelimage inabou t.067 second s(us-
ingastarting scale of1.25 andastep sizeof1.5described
below).This isrough ly15times faster than theRowley-
Baluja-Kan adedetector [12]andabout 600times faster than
theSchneid erman-Ka nade detector [15].
ImageProcessing
Allexamplesub-wind owsused fortrainin gwere vari-
ance norma lized tominimize theeffectofdifferen tlight-
ingconditio ns.Normalizatio nistheref orenecessary during
detection aswell. The varianc eofanimage sub-wind ow
canbecomputed quick lyusing apair ofintegralimage s.
Recall that





,where isthestandard
deviation,
isthemean, and
isthepixelvalue within
thesub-win dow.Themean ofasub-wind owcanbecom-
puted using theintegralimage. Thesum ofsquar edpixels
iscomp uted using anintegralimage oftheimage squared
(i.e. twointegralimages areused inthescanning process).
Durin gscannin gtheeffectofimage norm alization canbe
achie vedbypost-m ultiplying thefeature valuesrather than
pre-multiplyin gthepixels.
Scanning theDetector
Thefinal detecto risscanned across theimage atmulti-
plescales andlocations. Scaling isachie vedbyscaling the
detector itself, rather than scaling theimage. This process
makessense becau sethefeatur escanbeevaluated atany
6

Dete ctorFalsedetections
10 31 50 65 78 95 167
Viola-Jones 76.1% 88.4% 91.4% 92.0% 92.1% 92.9% 93.9%
Viola-Jones (voting) 81.1% 89.7% 92.1% 93.1% 93.1% 93.2 % 93.7%
Rowley-Bal uja-Kan ade 83.2% 86.0% – – – 89.2% 90.1%
Schne iderman-Ka nade – – – 94.4% – – –
Roth- Yang-Ahuj a – – – – (94.8%) – –
Table 2:Detection rates forvariou snumbe rsoffalse positi vesontheMIT+CMU testsetcontainin g130images and507
faces.
scale with thesame cost. Good results were obtain edusing
asetofscales afactor of1.25 apart.
Thedetector isalsoscanned across location. Subsequen t
locatio nsareobtaine dbyshifting thewindo wsome number
ofpixels
.This shifting proce ssisaffecte dbythescale of
thedetecto r:ifthecurrentscale is
thewindo wisshifted
by

,where
istherounding operatio n.
Thechoice of
affects both thespeed ofthedetecto ras
well asaccurac y.Theresults wepresent arefor

.
Wecanachie veasignificant speedu pbysetting


with only aslight decrease inaccurac y.
IntegrationofMultipleDetections
Since thefinal detector isinsensiti vetosmall changesin
translation andscale, multiple detection swillusually occur
aroundeach faceinascanned image. Thesame isoften true
ofsome types offalsepositi ves.Inpractice itoften makes
sense toreturn onefinal detectio nperface. Towardthisend
itisuseful topostpr ocess thedetected sub-win dowsinorder
tocomb ineoverlappin gdetections intoasingle detection.
Inthese experiments detection sarecomb ined inavery
simple fashion. The setofdetection sarefirst partitio ned
intodisjoint subsets. Twodetections areinthesame subset
iftheir bounding regionsoverlap. Each partitio nyields a
single final detection. The corners ofthefinal bounding
regionaretheaverageofthecorners ofalldetection sinthe
set.
Experiments onaReal-WorldTestSet
Wetested oursystem ontheMIT+CMU frontal facetest
set[12].This setconsists of130image swith 507labeled
frontalfaces. AROCcurveshowing theperfor mance ofour
detecto ronthistestsetisshowninFigure 6.Tocreate the
ROCcurvethethresh oldofthefinal layer classifier isad-
justed from

to
 .Adjusting thethresho ldto

will yield adetection rateof0.0andafalse positi verate
of0.0. Adjusting thethresho ldto

,however,increases
both thedetection rateandfalsepositi verate, butonly toa
certain point. Neither ratecanbehigher than therateofthe
detectio ncascade minus thefinal layer .Ineffect, athresh-
oldof

isequivalent toremo vingthat layer .Further
incre asing thedetection andfalsepositi verates require sde-
creasing thethresh oldofthenextclassifier inthecascade.Thus,inorder toconstruct acomplete ROCcurve,classifier
layers areremo ved.Weusethenumbe roffalsepositi vesas
opposed totherateoffalse positi vesforthex-axisofthe
ROCcurvetofacilitate comp arison with other systems. To
comp utethefalse positi verate, simply divide bythetotal
numberofsub-wind owsscanned .Inourexperiments, the
numberofsub-wind owsscanned is75,081,800.
Unfortu nately ,most previouspublished results onface
detection haveonly included asingle opera tingregime(i.e.
single point ontheROCcurve).Tomakecompar ison with
ourdetector easier wehavelisted ourdetection rateforthe
falsepositi verates repor tedbytheother systems. Table 2
lists thedetection rateforvarious numbers offalse detec-
tions foroursystem aswell asother published systems. For
theRowley-Baluja-Kanad eresults [12],anumberofdiffer-
entversio nsoftheir detector were tested yieldin ganumber
ofdifferent results theyarealllisted inunder thesame head-
ing. FortheRoth-Y ang-Ahu jadetector [11],theyreported
their result ontheMIT+CMU testsetminus 5images con-
taining linedrawnfaces remo ved.
Figure 7showstheoutpu tofourfacedetector onsome
testimages from theMIT+CMU testset.
Asimplevotingschemetofurtherimproveresults
Intable 2wealso showresults from runningthree de-
tectors (the38layer onedescribed aboveplus twosimilarly
trained detectors) andoutpu tting themajor ityvoteofthe
three detecto rs.This impro vesthedetection rateaswell as
eliminatin gmore falsepositi ves.Theimpro vemen twould
begreater ifthedetectors were more indep endent. Thecor-
relation oftheir error sresults inamodestimpro vementover
thebestsingle detecto r.
6Conclusions
Wehavepresente danappro achforobject detection which
minimize scomp utation time while achie vinghigh detection
accur acy.The approach wasused toconstru ctafacede-
tection system which isapproximately 15faster than any
previous appro ach.
This paper bringstogether newalgor ithms, representa-
tions, andinsights which arequite generic andmay well
7

Figur e7:Output ofourfacedetector onanumberoftestimages from theMIT+CMU testset.
Figur e6: ROC curveforour face detector onthe
MIT+CMU testset.Thedetector wasrunusing astep size
of1.0andstarting scale of1.0(75,081, 800sub-w indows
scanne d).
havebroad erapplication incomputer vision andimage pro-
cessing.
Finally thispaper presen tsasetofdetailed experiments
onadifficult facedetection dataset which hasbeen widely
studied .This dataset includes faces underaverywide range
ofcondition sincludin g:illumination ,scale, pose, andcam-
eravariatio n.Expe riments onsuch alargeandcomp lex
dataset aredifficult andtime consumin g.Nevertheless sys-
tems which workunderthese cond itions areunlik elytobe
brittle orlimited toasingle setofcond itions. More impor –
tantly conclusions drawnfrom thisdataset areunlik elyto
beexperimental artifacts.
Refer ences
[1]Y.Amit, D.Geman, andK.Wilder.Joint induction ofshape
features andtreeclassifiers, 1997.
[2]Anon ymous. Anon ymous. InAnonymou s,2000.[3]F.Crow.Summed-area tables fortexture mapping. In
Proceedings ofSIGGRAPH ,volume 18(3), pages 207– 212,
1984.
[4]F.Fleuret andD.Geman. Coarse-to-fine facedetection. Int.
J.Computer Vision ,2001.
[5]WilliamT.Freeman andEdwardH.Adelson. The design
anduseofsteerable filters. IEEE Transactions onPattern
Analysis andMachineIntellig ence,13(9):891– 906, 1991 .
[6]YoavFreund andRobert E.Schapire. Adecision-theo retic
generalization ofon-line learning and anapplication to
boosting. InComputational Learning Theory: Eurocolt ’95,
pages 23–37. Springer -Verlag, 1995.
[7]H.Greenspan, S.Belongie, R.Gooodman ,P.Perona, S.Rak-
shit, andC.Anderson. Overcomp letesteerable pyramid fil-
tersandrotation invariance .InProceedings oftheIEEE Con-
ference onComputer Vision andPattern Reco gnition,1994.
[8]L.Itti,C.Koch, andE.Nieb ur.Amodel ofsalienc y-based
visual attention forrapid scene analysis. IEEE Patt.Anal.
Mach.Intell. ,20(11):12 54–125 9,November1998 .
[9]Edgar Osuna, Robert Freund, andFederico Girosi. Training
support vector machines :anapplication tofacedetection.
InProceedings oftheIEEE Confer enceonComputer Vision
andPattern Reco gnition,1997 .
[10] C.Papageor giou, M.Oren, andT.Poggio. Ageneral frame-
workforobject detection. InInternationa lConfer enceon
Computer Vision ,1998 .
[11] D.Roth, M.Yang, andN.Ahuja. Asnowbased facedetector .
InNeur alInformation Processing 12,2000.
[12] H.Rowley,S.Baluja, andT.Kanade. Neural netw ork-b ased
facedetection. InIEEE Patt.Anal. Mach.Intell. ,volume 20,
pages 22–38, 1998.
[13] R.E.Schapire, Y.Freund, P.Bartlett, andW.S.Lee. Boost-
ingthemargin: anewexplanation fortheeffectiveness of
voting methods. Ann. Stat.,26(5):1651 –1686 ,1998.
[14] Robert E.Schapire, YoavFreund, Peter Bartlett, and
WeeSunLee. Boosting themargin: Anewexplanation for
theeffectiveness ofvoting method s.InProceedings ofthe
Fourteenth Internationa lConfer enceonMachine Learning ,
1997.
[15] H.Schneiderma nandT.Kanade. Astatistical method for3D
object detection applied tofaces andcars. InInternationa l
Confer ence onComputer Vision ,2000.
8

[16] K.Sung andT.Poggio. Example-bas edlearning forview-
based facedetection. InIEEE Patt.Anal. Mach.Intell. ,vol-
ume 20,pages 39–51 ,1998.
[17] J.K.Tsotsos, S.M. Culhane, W.Y.K.Wai,Y.H.Lai,N.Davis,
andF.Nuflo. Modeling visual-attention viaselecti vetun-
ing. Artificial Intellig ence Journ al,78(1-2 ):507–545 ,Octo-
ber1995.
[18] Andre wWebb.Statistical Pattern Reco gnition.Oxford Uni-
versity Press, NewYork,1999 .
9

Similar Posts