Modification of Stemming Algorithm Using A Non Deterministic Approach To Indonesian Text

 Natural Language Processing is part of Artificial Intelegence that focus on language processing. One of stage in Natural Language Processing is Preprocessing. Preprocessing is the stage to prepare data before it is processed. There are many types of proccess in preprocessing, one of them is stemmin...

Full description

Saved in:
Bibliographic Details
Main Authors: Rifai, Wafda (Author), Winarko, Edi (Author)
Format: EJournal Article
Published: IndoCEISS in colaboration with Universitas Gadjah Mada, Indonesia., 2019-10-31.
Subjects:
Online Access:Get Fulltext
Get Fulltext
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary: Natural Language Processing is part of Artificial Intelegence that focus on language processing. One of stage in Natural Language Processing is Preprocessing. Preprocessing is the stage to prepare data before it is processed. There are many types of proccess in preprocessing, one of them is stemming. Stemming is process to find the root word from regular word. Errors when determining root words can cause misinformation. In addition, stemming process does not always produce one root word because there are several words in Indonesian that have two possibilities as root word or affixes word, e.g.the word "beruang".To handle these problems, this study proposes a stemmer with more accurate word results by employing a non deterministic algorithm which gives more than one word candidate result. All rules are checked and the word results are kept in a candidate list. In case there are several word candidates were found, then one result will be chosen.This stemmer has been tested to 15.934 word and results in an accurate level of 93%. Therefore the stemmer can be used to detect words with more than one root word.
Item Description:https://jurnal.ugm.ac.id/ijccs/article/view/49072