Robust object recognition with cortex-like mechanisms

被引:1057
作者
Serre, Thomas
Wolf, Lior
Bileschi, Stanley
Riesenhuber, Maximilian
Poggio, Tomaso
机构
[1] MIT, Ctr Biol & Computat Learning, McGovern Inst Brain Res, Cambridge, MA 02139 USA
[2] MIT, Brain & Cognit Sci Dept, Cambridge, MA 02139 USA
[3] Georgetown Univ, Med Ctr, Washington, DC 20007 USA
关键词
object recognition; model; visual cortex; scene understanding; neural network;
D O I
10.1109/TPAMI.2007.56
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a new general framework for the recognition of complex visual scenes, which is motivated by biology: We describe a hierarchical system that closely follows the organization of visual cortex and builds an increasingly complex and invariant feature representation by alternating between a template matching and a maximum pooling operation. We demonstrate the strength of the approach on a range of recognition tasks: From invariant single object recognition in clutter to multiclass categorization problems and complex scene understanding tasks that rely on the recognition of both shape-based as well as texture-based objects. Given the biological constraints that the system had to satisfy, the approach performs surprisingly well: It has the capability of learning from only a few training examples and competes with state-of-the-art systems. We also discuss the existence of a universal, redundant dictionary of features that could handle the recognition of most object categories. In addition to its relevance for computer vision, the success of this approach suggests a plausibility proof for a class of feedforward models of object recognition in cortex.
引用
收藏
页码:411 / 426
页数:16
相关论文
共 68 条
[11]  
Bileschi S.M., 2006, StreetScenes : towards scene understanding in still images
[12]   MULTICHANNEL TEXTURE ANALYSIS USING LOCALIZED SPATIAL FILTERS [J].
BOVIK, AC ;
CLARK, M ;
GEISLER, WS .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1990, 12 (01) :55-73
[13]  
CARSON C, 1999, P 3 INT C VIS INF SY
[14]  
Christoudias CM, 2002, INT C PATT RECOG, P150, DOI 10.1109/ICPR.2002.1047421
[15]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[16]   UNCERTAINTY RELATION FOR RESOLUTION IN SPACE, SPATIAL-FREQUENCY, AND ORIENTATION OPTIMIZED BY TWO-DIMENSIONAL VISUAL CORTICAL FILTERS [J].
DAUGMAN, JG .
JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 1985, 2 (07) :1160-1169
[17]   THE ORIENTATION AND DIRECTION SELECTIVITY OF CELLS IN MACAQUE VISUAL-CORTEX [J].
DEVALOIS, RL ;
YUND, EW ;
HEPLER, N .
VISION RESEARCH, 1982, 22 (05) :531-544
[18]   SPATIAL-FREQUENCY SELECTIVITY OF CELLS IN MACAQUE VISUAL-CORTEX [J].
DEVALOIS, RL ;
ALBRECHT, DG ;
THORELL, LG .
VISION RESEARCH, 1982, 22 (05) :545-559
[19]   Perception of objects in natural scenes: Is it really attention free? [J].
Evans, KK ;
Treisman, A .
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 2005, 31 (06) :1476-1492
[20]  
Fergus R, 2003, PROC CVPR IEEE, P264