Web Spambot Detection Based on Web Navigation Behaviour

被引:24
作者
Hayati, Pedram [1 ]
Potdar, Vidyasagar [1 ]
Chai, Kevin [1 ]
Talevski, Alex [1 ]
机构
[1] Curtin Univ Technol, Digital Ecosyst & Business Intelligence Inst, Antispam Res Lab ASRL, Perth, WA, Australia
来源
2010 24TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS (AINA) | 2010年
关键词
Web spambot detection; Web; 2.0; spam; user behaviour;
D O I
10.1109/AINA.2010.92
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Web robots have been widely used for various beneficial and malicious activities. Web spambots are a type of web robot that spreads spam content throughout the web by typically targeting Web 2.0 applications. They are intelligently designed to replicate human behaviour in order to bypass system checks. Spam content not only wastes valuable resources but can also mislead users to unsolicited websites and award undeserved search engine rankings to spammers' campaign websites. While most of the research in anti-spam filtering focuses on the identification of spam content on the web, only a few have investigated the origin of spam content, hence identification and detection of web spambots still remains an open area of research. In this paper, we describe an automated supervised machine learning solution which utilises web navigation behaviour to detect web spambots. We propose a new feature set (referred to as an action set) as a representation of user behaviour to differentiate web spambots from human users. Our experimental results show that our solution achieves a 96.24% accuracy in classifying web spambots.
引用
收藏
页码:797 / 803
页数:7
相关论文
共 27 条
[1]  
ABRAM H, 2008, P 2008 15 WORK C REV
[2]  
[Anonymous], 4 C EM ANT MOUNT VIE
[3]  
[Anonymous], 2009, LIV SPAM ZEITG SOM S
[4]  
[Anonymous], 2009, P 6 INT C DET INTR M
[5]  
Baird H. S., 2005, P SPIE IS T C DOC RE
[6]   Assessing the accuracy of prediction algorithms for classification: an overview [J].
Baldi, P ;
Brunak, S ;
Chauvin, Y ;
Andersen, CAF ;
Nielsen, H .
BIOINFORMATICS, 2000, 16 (05) :412-424
[7]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[8]  
Chellapilla K., 2004, NIPS
[9]  
Cooley R., 1999, Knowledge and Information Systems, V1, P5
[10]   Web mining: Information and pattern discovery on the World Wide Web [J].
Cooley, R ;
Mobasher, B ;
Srivastava, J .
NINTH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 1997, :558-567