PuPoCl: Development of Punjabi Poetry Classifier Using Linguistic Features and Weighting

Main Article Content

Jasleen Kaur Jatinderkumar R. Saini

Abstract

Analysis of poetic text is very challenging from computational linguistic perspective. For library suggestion framework, poetries can be characterized on different measurements, such as writer, time period, sentiments, emotions and topic. In this paper, subject based Punjabi poetry classifier was developed using weka toolset. Four different categories were manually populated with 2034 poems (NAFE, LIPA, RORE, PHSP categories consists of 505, 399, 529 and 601 numbers of poetries, respectively. After tokenization of 2034 poetries, 45667 features were extracted and passed to noise removal sub phase. A total of 31938 features were extracted, after removal of noise, and weighted using term frequency and the entire process is repeated for tf-idf weighting scheme also . Two types of Linguistic features namely: Lexical features and syntactic features of poetries were explored to develop classifier using machine learning algorithms. Naive Bayes, Support Vector Machine, Hyper pipes and K-nearest neighbour algorithms were experimented with 31938 lexical features and 30396 syntactic features. Result shows that SVM outperformed all other classifiers using tf and tf-idf weighing schemes whereas KNN is the worst performer. With addition of POS tags with words, accuracy of SVM is increased by 1%. Result also revealed that with testing time of 0.19sec, SVM is the most efficient machine learning algorithm for Punjabi poetry classification, using tf-idf scheme.

Article Details

How to Cite
KAUR, Jasleen; SAINI, Jatinderkumar R.. PuPoCl: Development of Punjabi Poetry Classifier Using Linguistic Features and Weighting. INFOCOMP, [S.l.], v. 16, n. 1-2, p. 1-7, dec. 2017. ISSN 1982-3363. Available at: <http://www.dcc.ufla.br/infocomp/index.php/INFOCOMP/article/view/546>. Date accessed: 19 oct. 2018.
Section
Machine Learning and Computational Intelligence