Journal of Applied Intelligent Systems and Information Sciences

Journal of Applied Intelligent Systems and Information Sciences

Authorship Clustering using Homogeneous Feature Space and Two-stepped Automatic Fuzzy Cmeans Clustering

Document Type : Original Article

Authors
Computer Engineering Department, Bu Ali Sina University, Hamedan, Iran
Abstract
Identifying the authorship either of an anonymous or a doubtful document constitutes a cornerstone for automatic forensic applications.  Moreover, it is a challenging task for both humans and computers considering complex content of document with variety of backgrounds. Due to nature of task it is always considered as an unsupervised task. Clustering documents according to the linguistic style of the authors who wrote them has been a task little studied by the research community. In order to address this problem, PAN Evaluation Framework has become the first effort to promote the development of the author clustering. There are different approaches to address the task and this article proposes a method based on a set of homogeneous features and two-stepped automatic FCM clustering. We use word Ngram, part-of-speech tagging and some other context free features, then using document similarity graph (DSG) estimating number of clusters; finally we use FCM to cluster corpus. We have done the task in very short amount of time and our performance results is comparable with leaderboard competitors in PAN CLEF 2017 challenge.
Keywords

Subjects


Volume 1, Issue 1
March 2020
Pages 54-63

  • Receive Date 09 February 2020
  • Revise Date 29 February 2020
  • Accept Date 01 March 2020