Time complexity in rejang language stemming

— The Rejang language belongs to the Malay language family, which is almost spread throughout Indonesia and has its uniqueness that is different from Malay in general. The speakers spread across six districts, namely Rejang Lebong Regency, Lebong Regency, Kepahiang Regency, North Bengkulu Regency, Central Bengkulu, and Bengkulu City, the majority are used in Bengkulu Province. Rejang language has a complex and complicated word structure so that it is easily understood by people, so a language dictionary is made in the form of a printed book or digital dictionary that allows everyone to access it anywhere and anytime, in Bengkulu currently there is no digital one. dictionary, while to build a language translator application a stemming algorithm is needed. is the process of separating basic words from afﬁxed words in sentences by separating basic words and afﬁxes which can consist of preﬁxes, inﬁxes, and, sufﬁxes. The application of the Stemming Algorithm in the Rejang language for the preservation of local culture so as not to lose its cultural roots and the speciﬁc purpose of designing a model for implementing the Rejang stemming algorithm in the Rejang language in the form of an AI-integrated database system. The algorithms used are Enhanced Conﬁx Striping (ECS), New Enhances Conﬁx Striping (NECS,) and Rejang Algorithm where the afﬁx groups in ECS and NECS make modiﬁcations to the preﬁxes which are grouped into several groups of preﬁxes, sufﬁx, and afﬁxes are grouped into several groups. sufﬁx group. The results of this study have succeeded in building an efﬁcient and effective Rejang language stemming algorithm, where efﬁciency is indicated by the time complexity of the O (log n ) algorithm, and the effectiveness is shown from the results of accuracy of 99% against the 9000-word test with afﬁxes, this accuracy value indicates that there has been over stemming and under stemming 1% with a failure rate of 1% from 15 documents tested and the average speed of complexity is between 4.27 seconds to 111.754 seconds.


I. INTRODUCTION
Each province in Indonesia has its regional dialect, such as Sundanese from West Java Province, Minang from West Sumatra Province, Batak from North Sumatra Province, and Rejang from Bengkulu Province. Although most provinces have local language libraries in the form of books, they do not have digital libraries, Building a digital dictionary is the main requirement as a source of supporting data used by language translator applications.
In Bengkulu, there is currently no Rejang digital dictionary for the Rejang language translator application to Indonesian, while to build a language translator application a stemming algorithm is needed. Stemming is the process of separating the basic word from an affixed word in a sentence by separating the base word and affixes which can consist of prefixes (prefixes), insertions (infixes), and suffixes (suffixes) [1]. Every language with other languages generally has differences due to differences in morphology, such as the Javanese language has a different morphology from the Rejang language. The stemming algorithm is influenced by several factors, including Overstemming, Understemming, Unchanged, and Spelling exceptions [2]. The fastest stemming process is found in the Vega Algorithm, the highest accuracy is found in the Nazief and Adriani algorithms [3], [4].
Rejang language is a local language in the Rejang tribal community located in Bengkulu Province with speakers spread over four districts, namely Rejang Lebong Regency, Lebong Regency, Kepahiang Regency, and North Bengkulu Regency [4]. At least five languages are recorded in the Bengkulu Province: the Bengkulu language, Enggano language, Jurnal Infotel, Vol. 14, No. 3, August 2022 https://doi.org/10.20895/infotel.v14i3.764 Time complexity in rejang language stemming Rejang language, Javanese language, and Sundanese language [5]. The algorithm used to improve the performance of Information Retrieval (IR) is a process of separating documents that are considered relevant from a set of available documents [6]- [8].
The stemming process ultimately increases the number of documents retrieved in the IR system. Text grouping, categorization, and summarization also require these conversions as part of the preprocessing before actually implementing the related algorithms [9], [10].
The proposed algorithm is applied to 41 different sets of QAPLIB benchmarks, while the obtained results show that the proposed algorithm is more efficient and accurate compared to Taboo Search, Evolutionary Differential, and Genetics algorithms. both and convex relaxation achieve good sparse coefficients as compared to other optimization algorithms. The resulting BP is more accurate than the greedy technique with a slight increase in processing time [11], [12].
The Stemmer method by using the morphology of a word has several drawbacks, such as inappropriately removing the prefixes on basic words starting with the letters "k", "t", "s", and "p" and inappropriate in removing suffixes, especially for the suffixes "-kan" and "-an." [13]- [18].
The morphology of the Indonesian language is not only based on the data of teaching materials that have been used so far but also new teaching materials are sought according to the needs of students, including those related to the Jambi Malay morphophonemic process as part of local wisdom as demanded by the vision and mission of the Study Program. First, we have to understand the process of forming words and morphemes as the smallest part of the language itself, Misguided in Indonesian that keeps happening is of course not good for the development of Indonesian, this misunderstanding in Indonesian will be avoided if language users want to learn the rules of the language [19]- [24].

II. RESEARCH METHOD
The Rejang stemming algorithm was developed by first studying the workings of the Enhanced Confix Striping (ECS), New Enhances Confix Striping (NECS and stemming algorithms. In this study, the researchers used the stemming algorithm as the main reference in making the Rejang algorithm. Next, analyze the strengths and weaknesses of the three algorithms. After studying how it works and analyzing the strengths and weaknesses of the Indonesian stemming algorithm as a basis for reference or guidelines, a stemming algorithm for the Rejang language was made based on the morphology of the Rejang language.

A. Building a Rejang Language Digital Dictionary
The steps for making a digital dictionary are as follows.

1) Inventory or Data Collection Rejang Language Basic Words
The basic words used in this study were taken from the Rejang language Dictionary published by the Bengkulu Provincial Language Office of the Ministry of Education and Culture in 2013 which consists of 6983 Rejang language basic words.
2) Database Creation of Rejang Language Basic Word Dictionary The database structure of the Rejang language base word consists of 3 fields, namely Id which is the primary key, Base-Word which contains the Rejang language basic word, and Meaning which contains the meaning of the Rejang language word in Indonesian. The database structure of the Rejang language dictionary is shown in Table 1.

B. Study and Analysis of Rejang Language Morphology
To create a new stemming algorithm, the Rejang Algorithm, we first begin by analyzing the groups, sequences, and methods of deleting the morphological affixes of the Rejang language.

1) Analysis of Rejang Language Structure
The Rejang language belongs to the Malay or Indonesian language family. The syntax structure follows the DM law, the element being explained (D) is followed by an explanatory element (M), for example, umeak lei "big house": umeak, as the element being explained (D) followed by lei, as the element explaining (M). Tabel 2 is an example of the application of the DM law in a verb phrase. Table 2

. Example of the Application of the DM Law in Verb Phrases
Explained The example above also shows that the verb Rejang language has its system. In this system some verbs consist of one morpheme, for example, temot 'sit', Bedan 'stop', and verbs that consist of two morphemes, for example, bekerjo 'work', majak 'to Jurnal Infotel, Vol. 14, No. 3, August 2022 https://doi.org/10.20895/infotel.v14i3.764 Time complexity in rejang language stemming invite', melilei 'barlari', besaweak 'besawah'. Verbs consist of two morphemes, namely bound morpheme and free morpheme. The bound morpheme in the example above is a prefix, namely bekerjo and besaweak, men-in majak and me-lilei. In addition, in the example above there is an original form, namelyinvite in majak. Even though it has started the process, the original form cannot be used as a free morpheme in syntactic construction. The free morphemes in the example above are temot and Bedan.

3) Morphological Process Analysis
The morphological process of the Rejang language is the process of adding affixes to the basic words of the Rejang language. Affixes in Rejang language consist of: prefix group (men-, ke-, be-, ne-, te, se), insertion group (-em-, -en-), suffix group (-ke), prefix compound group -prefix ( be-+ ke-, te-+ ke-, se-+ ke-), and prefix-suffix compound groups (ke-+ -ke). The addition of affixes can cause conditions: 1) changes in phonemes (vocals) contained in basic words, 2) changes in the form of basic words, and 3) no changes in form in basic words. a) Analysis of the addition of affixes that cause changes in the form of the basic word, consisting of:

1) Deletion Sequence Creation
In the development of the Rejang Language stemming algorithm, the idea of sorting for deletion of affixes begins with the deletion of prefixes, followed by deletion of insertions, then deletion of suffixes, and finally deletion of combinations.

2) Sequencing the Deletion Process in Detail
The Rejang algorithm is an algorithm based on the grouping of morphological affixes of the Rejang language. The order and method of deleting affixes in the Rejang language stemming algorithm are expressed by a sequence of affix deletion rules consisting of prefixes, insertions, and suffixes. The process of removing affixes includes the prefix "men", "ke", "be", "ne", "te", "se", insertion of "em", "en", and the suffix "ke".

D. Analyzing Algorithm Performance
The Rejang language stemming algorithm is an algorithm built from the searching procedure to find basic words in the Rejang language digital dictionary. The searching procedure used in this research is the binary searching procedure which has a time complexity of O(log n), where n is the number of basic words in the Rejang language digital dictionary. The main procedure consists of procedure 1 (check prefix), procedure 2 (check insertion), procedure 3 (check suffix), and procedure 4 (check merge). In each sub-procedure, each path in it all uses the searching procedure.    In each subprocedure, each path in it all uses the searching procedure. So the time complexity of procedure 3 is only from the time complexity of the procedure for the Removal of Vowel Prefixes or Consonant Prefixes. Thus the time complexity of procedure 3 is O(log n).

III. RESULT
A. Testing the Rejang Language Stemming Algorithm

1) Testing Affixes
The Rejang digital dictionary contains 6983 basic words. Algorithm testing in this study was carried out on 9000 affixed words built from 1500 basic words of the Rejang language. The selection of the 1500 basic words is based on groups of basic words starting with vowels, starting with consonants, and starting with the letter 'k' where each of the 500 words is taken entirely from the digital dictionary of the Rejang language.
a) List of Affixed Words tested: The number of affixed words tested in the Rejang algorithm application was 9000, consisting of 18 groups of affixes with each group consisting of 500 words. Each group can consist of affixed words containing basic words starting with a vowel, and/or starting with a consonant, or starting with the letter "k". b) Results of Testing Affixed Words: Stemming test results Rejang algorithm for multiple words affixed per affix group, test results 9000 suffix words in Rejang algorithm application (Success = 8973; Fail = 27).
2) Testing a Document Containing a Line of Words The word line test by the REJANG Stemmer application was carried out on 15 documents, where the documents were taken from the Rejang language book: 1. Titled "Kelpiak Ukum Adat Ngen Riyan Ca'o Kutai Jang, Rejang Lebong Regency" 2. Titled "Ireak Ca 'o Kutei Jang" 3. Titled "Kejai Ca'o Malim". The book that was used as a test was initially still in the form of a printed book. Furthermore, to be used as test data on the Rejang Algorithm, the books are converted into digital books in .txt format. The output of the test results of a document are 1) the total number of words in the document, 2) the number of words with stemming affixes, and 3) the processing time for stemming length. The output of the test results of 15 documents is shown in Table 4.
The affixed words that do not change after stemming can be shown in Table 5.

B. Comparison of Stemming Algorithms
The results of the Rejang algorithm research in this article are a novelty in several respects compared to the Enhanced Confix Stripping (ECS) and New Enhanced Confix Stripping (NECS) algorithms.

1) Affix Group
The affix group of the Enhanced Confix Stripping (ECS) algorithm, New Enhanced Confix Stripping (NECS) modifies the prefixes which are grouped into several groups of prefixes, the suffixes are grouped into Jurnal Infotel, Vol. 14, No. 3, August 2022 https://doi.org/10.20895/infotel.v14i3.764 Time complexity in rejang language stemming several groups of suffixes. While there is no insertion affix grouping. In the Rejang algorithm, the grouping of affixes is based on the morphology of the Rejang language, and there is no modification of the affix groups. A comparison of affix groups of each algorithm is shown in Table 6.

2) Removal Order
The order of removal of affixes from the Enhanced Confix Stripping (ECS) algorithm is starting with particle removal, pronoun removal, suffix removal, standard prefix removal and complex prefix removal, New Enhanced Confix Stripping (NECS) algorithm begins with prefix 1 removal, prefix removal 2, deletion prefix 3, particle deletion, deletion of property suffix, deletion of suffix 1, deletion of suffix 2 and deletion of insertions, beginning with particle deletion, pronoun deletion, unprocessed prefix deletion, combined prefix deletion, morphophonemic deletion of prefixes and suffix deletion. As for the Rejang algorithm, it begins with the removal of the prefix, the removal of the insert, the removal of the suffix, and the removal of the merge. The order of deletion is shown in Table 7 below.

IV. DISCUSSION
Rejang language digital dictionary database with a base word count of 7000 words, of which 1500 words are used to construct 9000 affixed words used in testing the Rejang language stemming algorithm.
From the results of analyzing the morphology of the process of adding affixes, it was found that 18 groups/types of the smallest affixes in the Rejang language were used as conditional statements in the Rejang language stemming from the Enhanced Confix Striping (ECS) Algorithm and New Enhances Confix Striping (NECS) Algorithm and the Test Results of 9000 Words Affixes in the Rejang Algorithm Application (Success= 8973; Fail= 27), in the process of adding affixes, it is determined that the process of deleting affixes in the Rejang language stemming Algorithm is in the sequence starting from the deletion of the prefix, then the deletion of the insertion, then the deletion of the suffix and finally the deletion of the merge.

V. CONCLUSION
This research has succeeded in building Rejang language stemming with the ECS Algorithm and NECS Algorithm which is efficient and effective, where efficiency is indicated by the time complexity of the O(log n) algorithm, and effectiveness is shown from the results, 99% accuracy against the 9000 words affix test. In other words, this accuracy value indicates that 1% has occurred over stemming and under stemming and this study also succeeded in testing 15 text documents with an average stemming failure rate of 1% with an average complexity time of 4.277 seconds to 111.7543 seconds.