Human height and weight classification based on footprint using gabor wavelet and K-NN methods

In the forensic field, height and weight are one of the parameters in identifying a person. Identifying height and weight is usually done manually using scales and height measuring instruments. However, problems are often found when a person's body is separated from one another. One way to estimate weight and height is to measure foot length. A relationship between height, and leg length can be expressed in the correlation coefficient (r), the same as for weight. Therefore, in this study, a system for measuring human height and weight based on images of the footprint is implemented on Android. The methods used in this study are Gabor Wavelet and kNearest Neighbor (k-NN). The simulation results generate the best accuracy of 75%. The system can process images with an average computation time of 8.92 seconds. This proposed application is expected to assist forensic experts in identifying a person's height and weight if their body parts are separated from one another. The system can also be used to categorize the ideal body level according to the Body Mass Index (BMI).

INTRODUCTION Human footprints have mainly been known for a variety of reasons. Scientists concentrated their efforts on footprints analysis. Their research such as morphometric or orthopedic diagnosis [1], sex determination [2]- [4], anatomical and anthropological [5]- [7], Pleistocene [8], biometrics [9]- [12], and forensic oriented studies of footprints [13], [14]. Forensic science has several ways to determine a human's height and weight if their bodies have been separated. Human height and weight can be determined by analyzing the soles of the feet. In some countries, footprints are commonly found at crime scenes. If carefully examined, footprints can be found at almost any crime scene [15]. Numerous studies have simulated a system for estimating height based on footprint measurements [16]. However, only a few studies have examined the relationship between height, weight, and footprint. An individual's stature is well understood to be related to body parts such as the head, trunk, limbs, feet, and footprints [14], [17]. There must be a link between a person's body height and weight with their footprints. In the past, some forensic studies have confirmed this relationship.
However, some researchers have studied the relationship between anthropometric foot dimensions with bodyweight [18]- [21]. They used linear regression for weight estimation. Their study did not address height estimation. Therefore, this study proposed a system to classify the human height and weight based on footprint using the android application. This study was carried out in two stages: the training and testing processes. As for training, we use 51 footprint images and 12 footprint images. Gabor wavelet was used in the feature extraction process, and K-Nearest Neighbor was used in the classifying process. Gabor wavelets are a widely used mathematical tool for analyzing mathematical functions and determining the best solutions. Object tracking [22], image compression [23], fabric classification [24], text classification [25], and faces classification [26] are some of its applications. Meanwhile, KNN is the simplest algorithm among other algorithms predicting a class in an imaging sample [27]- [29]. As a guide, this paper is organized as follows. Section II contains methods that been used . Section III contains explanation of the results achieved and. Finally, section IV is conclusion, limitations, future work. Figure 1 shows an overview of the designed system for estimating human height and weight based on the footprint. This system consists of image acquisition, preprocessing, and feature extraction, and the last stage is height and weight estimation using a classifier. The following subsection contains details of each stage and the methods used in this study.

A. Data Collection
Image acquisition is where the feet' soles are stamped onto an A4 size paper using red ink. The footprint was taken from the right foot, normal or not deformed with an upright tasting position. An example of the footprint image result is shown in Figure 2a Digital images are obtained by scanning the footprint and stored in JPEG format. The image is carried out through the preprocessing stage to have good quality and to standardize to produce high accuracy. Furthermore, identification simulation is carried out through two processes, training and testing, as presented in Figure 3. These two phases have a similar process, but in the training phase, there is a process of storing the feature extraction results in a numeric value. The result of feature extraction is then stored in a database which will be used as comparison test image data in the classification process [30]. In the training stage, the pixel value is carried out as a database reference matched in the classification. Meanwhile, the test phase is the process used to test the image so that the proposed system can classify it. In the training stage, used 51 training images. Meanwhile, in the test phase, used 12 testing images. B. Image Pre-processing This research, preprocessing is used to improve the image quality to be processed at the next stage. This process consists of grayscale and black-white conversion using histograms, as shown in Figure 4. Specifically for the black and white conversion, the Otsu thresholding process is carried out in determining the threshold to obtain dynamic values.

C. Feature Extraction
In this phase, feature extraction is carried out to obtain characters from the processed image. The feature extraction method used in this study is the Gabor wavelet transform. In this process, there are complex numbers, so in working on the Gabor wavelet transformation, real and imaginary numbers must be separated. Here are the steps to generate a feature vector using Gabor wavelets [31]- [32]:

Generating Kernel
The kernel generating process is carried out separately according to the data type (real or complex). This process will produce a feature vector. The feature vector generated by the Gabor wavelet is a combination of the orientation and frequency values.

Kernel Convolution
Convolution will be carried out if the Gabor kernel has been formed. This Gabor kernel will then be convoluted with the pixel value of each part of the scanned image of the sole. This process starts from the top-left pixel to the bottom right pixel.

Feature Vector
After the convolution is done, each segmented part will produce a feature vector with the mean, variance, and deviation after the convolution is done.

D. Classification using K-Nearest Neighbor (KNN)
KNN works based on the nearest neighboring distance between objects in the following way [24], [33]: 1) It is calculating the distance from all training vectors to test vectors, 2) Take the K value that is closest to the vector value, 3) Calculate the average value. If the value of k = 1, the object is assumed to be a class member of its nearest neighbor [34], [35].
The best value of k depends on the amount of data. In general, the higher the value of k, the lower the noise effect on the classification process. All the parameters resulting from feature extraction will be used as predictors in the classification process. The k-NN classification process in the study uses k = 1 with the type of city block neighbors.

E. Application design
The user interface application is designed to run on the Android mobile platform. The following Figure 5 shows the design of the proposed application.

F. System performance evaluation
The proposed system can be evaluated by using formulas that are described in Eq.3 and Eq.4:

III. RESULTS
The test was carried out in three scenarios to determine the performance generated by the system by changing each parameter of the method used. In the first scenario, an experiment was conducted on the k-NN method using the Euclidian distance measurement with k = 1-5 and 10. Then in the second scenario, using a distance city block with k = 1-5 and 10. In the third test scenario, experiments were also carried out against the k-NN method using the Chebyshev distance with k = 1-5 and 10. The results of the three distances are then compared, and find the best k value from the simulation. Another test parameter is to change the pixel size to 1500x1060. For each test, 12 test images were used.

A. Scenario-1 Test Result
In the first scenario testing, testing using Euclidean distance. The k values used are 1, 2, 3, 4, 5, and 10. The system is tested with an image with a pixel size of 800x566. Table 1 is the test result, including the value of accuracy and computation time.

B. Scenario-2 Test Result
The test is carried out in this scenario using the distance city block type with k and image size equal to scenario-1. The results of the scenario-2 test are shown in Table 2.

C. Scenario-3 Test Result
They are testing the third scenario using the type distance Chebyshev. With k and the image size is the same as the previous scenario. The results of the scenario-3 test are shown in Table 3.

D. Scenario-4 Test Result
In the fourth test scenario, an experiment was conducted on the pixel size of the image, which previously used size of 800x566. In this scenario, it was changed to 1500x1060. Then ,the calculation of the distance used is using Euclidean. The results of the scenario-4 test are shown in Table 4.

E. Scenario-5 Test Result
In testing the fifth scenario, it compares the best results from testing scenarios 1, 2, and 3. The highest accuracy in testing on Android applications can be seen in Table 5 shows the results of the scenario-5 test.  Table 1, it is known that the fastest computation time in this test is with a value of k = 1 with a computation time of 8.53 seconds. Meanwhile, the best accuracy in this test is with a value of k = 10 with an accuracy of 75%, even though it generates a higher computation time with a delta of not more than one second. From Table 2, it is known that the fastest computation time is 8.65 seconds, with an accuracy value of 41.66% achieved by all k values. From Table  3, it can be seen that the fastest computation time is 8.54 seconds, with the highest accuracy of 41.66% by k = 10. Meanwhile, in scenario-4, it can be concluded that using large pixels will also get a large computation time. This scenario proves that the pixel image size of 1500x1060 produces a large computational time from scenario one testing. In this test, the average computation time is 29.27 seconds. While the test results in the combined scenario, as presented in Table  5, show that the highest accuracy achieved is 75% using the Euclidean distance type. Meanwhile, distance city block and Chebyshev produced 41.66% and 42% accuracy.
The results of several test scenarios presented show that the best combination of parameters from scenarios 1, 2, and 3 succeeded in increasing the accuracy significantly to 75%. In this case, the operation of image morphology, including dilation and erosion on black and white images, greatly affects the estimation of the geometric distance measurement of the foot. Errors in this process significantly contribute to estimating height and weight and can substantially reduce accuracy. At least, the proposed system can prove a correlation between foot size and human height and weight. In the forensic field, this can be useful for describing human posture. The performance validation in this study is considered more valuable when compared to the study by Eboh and Ewamayinma, which only investigated the relationship of height to footprint [36]. This preliminary study provides extensive research opportunities to explore other methods to improve detection accuracy.

V. CONCLUSION
This study has designed a system to estimate height and weight based on foot size. Based on the tests carried out on this system, the following conclusions are obtained. Notably, (1) the best result of the proposed system is 75% accuracy with a computation time of 8.92 seconds, and (2) the system is implemented on Android with a minimum version of 4.4 (KitKat). With this application, it is hoped that it can assist medical forensics in identifying the human's weight and height using an Android mobile device. However, this study still has limitations where the number of samples is relatively small. Therefore, future research is still very wide open to exploring other image processing methods to improve detection accuracy with a larger sample.