Breast Cancer Image Segmentation Using K-Means Clustering Based on GPU Cuda Parallel Computing

— Image processing technology is now widely used in the health area, one example is to help the radiologist to analyze the result of MRI (Magnetic Resonance Imaging), CT Scan and Mammography. Image segmentation is a process which is intended to obtain the objects contained in the image by dividing the image into several areas that have similarity attributes on an object with the aim of facilitating the analysis process. The increasing amount of patient data and larger image size are new challenges in segmentation process to use time efficiently while still keeping the process quality. Research on the segmentation of medical images have been done but still few that combine with parallel computing. In this research, K-Means clustering on the image of mammography result is implemented using two-way computation which are serial and parallel. The result shows that parallel computing gives faster average performance execution up to twofold.


INTRODUCTION
Cancer is one of the leading causes of death worldwide.In 2012, about 8.2 million deaths were caused by cancer.Lung cancer, liver cancer, stomach cancer, colorectal cancer and breast cancer are the biggest causes of cancer deaths each year.According to GLOBOCAN (IARC) data in 2012, it is known that breast cancer is cancer with the highest percentage of new cases (after controlling by age), which is 43.3%, and the percentage of deaths (after controlled by age) due to breast cancer is 12.9% [1].
More than 30% of cancers could be prevented by altering risk factors and dietary that cause cancer.Cancer that is known early has the possibility to get better treatment.In the medical world, early detection of cancer can be known by scanning the body using equipment such as MRI (Magnetic Resonance Imaging), CT Scan and Mammography [2].After scanning the body, the radiologist will analyze the medical image to get the diagnosis.The analysis process can run using the help of computer programs to speed up and improve the accuracy of the diagnostic process [3].In the field of medical image analysis on bioinformatics science, one of the computational processes that can always be improved to help this work is image segmentation [4].
Image segmentation is a process aimed at obtaining objects contained in the image by dividing the image into several regions that have similarity attributes on an object [5].The main purpose of image segmentation is to simplify the analysis process so that the results obtained become more meaningful [6].Image segmentation is usually used to find objects and boundaries in an image.In the medical world, areas with similar attribute resemblance are found in images of tissue or internal organs observed with medical equipment [7].There have been many studies related to the segmentation of medical images to produce a high quality segmented image, but still few of those that combine with parallel computing technology.Whereas in practice, the process of segmentation of medical images in hospitals or clinics should be done quickly and accurately because the increased amount of data available from each patient and the larger image size [8], so in addition to the results of quality segmentation processing, the processing time should be managed well.
Previous study [9] which is conducted by Indah Soesanti, Adhi Susanto, Thomas Sri Widodo and Maesadji Tjokronegoro have done segmenting an MRI medical image of the brain with and without noise.In that research, Fuzzy C-Means Clustering (FCM) Fuzzy Logic is applied with spatial information to reduce the influence of noise on medical image segmentation.The study aims to produce a good image of segmentation results even though there is noise in the original image.Gaussian noise is given by 2%, 4%, 6%, 8% and 10%.FCM method with spatial information plus able to segment medical image of brain MRI without and with noise to get the result of the segmented image with minimal influence of noise with computation time which not much different where each processing requires the same amount of iteration.
Segmentation of medical image for the introduction of the object of cancer has also been done by Fakhrurrozi Basyid and Kusworo Adi using Active Contour method [10].In this research, the active contour is used for the determination of image segmentation into four types of volumes namely Gross Tumor Volume (GTV), Clinical Target Volume (CTV), Planning Target Volume (PTV) and organ at risk (OR).The method is capable of segmenting images on multi-region objects and adjacent but sensitive to noise both in 2D and 3D images.
If in previous studies [9] fuzzy logic with spatial information applied to brain MRI imagery, Dr. J. Dinesh Peter applied to the image of the mammogram used for the story of breast cancer [3].The study utilized mammogram image data from mini-MIAS database then comparison was done between standard method of FCM and FCM with new spatial information.The result shows FCM with new spatial information which is a combination of fuzzy grouping and segmentation set level accelerates segmentation process and segmentation level.
The above three studies before focus on the image segmentation method, the research is conducted by Muhammad Koprawi, Teguh Bharata Adji and Dani Adhiptaiagn in GPU parallel computing performance and serial performance on digital imagery [11].Digital image loading is classified into medical image segmentation which is using 3 forms: grayscale, negative and black and white.One study shows that the time difference generated by the CPU with GPU is a very large difference, can reach 100 (one hundred) times faster GPU.
In this case, the segmentation process is done to the medical image of cancer with K -Means Clustering method which will be implemented with two types of computation.Implementation of K-Means clustering method for image segmentation has been done by Dina Budhi Utami and Muhammad Ichwan to support hand recognition process using HuMoment [12].The research builds on the above image that Kinect produces.K-Means used in the training process to categorize the sample data so that the hand poses are obtained.The habits are stored in the database for use in the recognition process.The system could recognize hand poses which are not in position.Moreover, the system is able to recognize the hand poses although there were changes in rotation, position, and size, with accuration level of 88.57%.
From the research results, it could be seen that not only the previous discussed methods which could be used for segmentation, but also K-Means Clustering can also be used.Therefore, this paper used K -Means Clustering and the results can be a justification to improve the speed of segmentation process not only can be done with segmentation clarity method but also can be done with parallel computing.

II. RESEARCH METHOD
The methodology used in this research is experimental, which is research conducted to design and evaluate a problem-solving model.

A. Literature Study
Literature study is the first step done in this research, literature study includes referral activity and reading literature to meet the information needs, at this stage focuses on research related to medical image segmentation, K-Means Clustering method and its implementation on parallel computing especially GPU CUDA.

B. Data Colection
Based on referral papers that conduct qualitative studies on breast Cancer detection with mammogram segmentation [13], at this stage, data referring to the dataset used in this study were taken from the Mammographic Image Analysis Society (MIAS) database [14].MIAS is a British research group organization interested in understanding mammograms and has produced a database of digital mammograms.The data in the form of public data image of breast obtained from the mammogram.Four samples of the imagery are then segmented using kmeans clustering with parallel and serial computing.That dataset used based on a referral paper

C. System Implementation
At this stage two segmentation processes are applied by using serial and parallel computing.Parallel computing is a way of using resources simultaneously in computational processes [15].Parallel computing can be done using a computer by utilizing multiple processors as well as by utilizing multiple computers connected via the network (computer cluster) or a combination of both [15].Comparison of parallel and serial computations can be seen in Fig. 1 and 2. From the two images above, it can be seen that serial computing if getting problem will be divided into several instructions and will be alternately executed by one CPU.While parallel computing will divide the problem into several parts, then the parts will be divided into several instructions to be executed by a CPU and the process can run parallel between the problem part one with the other problem part.
Parallel computing requires a very large capacity to process a lot of computing, in addition, the user must implement a parallel programming language to be able to realize it.GPU is one of the processing technology found on the graphics card and usually used for multimedia purposes.The core is owned by many GPUs so it can support multiple processing at the same time.GPU is only intended for multimedia purposes so that when there is no process related to multimedia, GPU is not automatically used [11].
CUDA ™ (Compute Unified Device Architecture) is a parallel computing platform and programming model that enables significant improvements in computing performance where computing utilizes the power of the graphics processing unit (GPU) [8].Since its inception in 2006, CUDA has been widely distributed through thousands of applications and published research papers and installed over 300 million CUDA-enabled GPUs in notebooks, workstations, cluster computers and super-computers.GPUs are also applied to astronomy, biology, chemistry, physics, data mining, manufacturing, finance [16].
Figure 3 describes the memory hierarchy in CUDA.Each stream processor, analogous to a thread (x, y) is grouped into blocks.Each block has a shared memory called shared memory.Shared memory is a small, high-speed memory that only stream processors can access in the same block.Bandwidth between stream processor and shared memory is huge, reaching 70.7 GB / s.Another type of memory is device memory.This memory is accessible to all multiprocessors, but with slower bandwidth.The computer memory termed the host memory.Implementation is done by using Python programming language.The program is made in 2 forms, namely the program with serial computing and parallel computing.Referring to previous research [11] to perform parallel computing on Python then it takes a library to access the GPU.This research used PYCUDA which is an API provided by python to access NVIDIA graphics card.Selected PYCUDA because the code is like a regular python code.One of the advantages of using PYCUDA is like a simpler error handling compared to C or CUDA itself.In addition, other libraries are needed OpenCV (CV2) in both serial and parallel computing.OpenCV is required to enter data in the form of images into the program.Thus, the system requirements used for implementation can be summarized as follows.

D. Segmentation Process
K-Means Clustering method is a commonly used algorithm because its ease of implementation and it is even classified as the oldest clustering algorithm [17].The K-Means algorithm is a method that can be used to cluster objects as much as n by attribute similarity into a number of clusters k, where k <n.The algorithm works by finding the closest distance between the centroid and the iterative training data [18].The K-Means algorithm is implemented with the following steps.( After all centroids are updated, then compared with the previous centroid value.If its position does not change then the algorithm ends.However, if there is a change in the position of the centroid, then back to the process 3 and 4. Next repeated until all centroid stable (unchanged position).The steps in the above algorithm can be illustrated with the flow diagram in Fig. 4.

E. System Evaluation
Evaluation of the system is done by testing and analysis of some images to be tested then measure the time and number of iterations.In this research the test is done by computation time computation between CPU and GPU in segmentation of medical image

III. RESULT
The original image will be segmented into 3 clusters.In accordance with the K-means clustering method, each iteration will calculate the centroid point distance to the data and then the centroid point will be changed.In this study, medical images will be converted into grayscale first and then each pixel will be calculated distance with the centroid point.From the observation result of the execution time and the number of iterations of each process, it is obtained as can be seen in Table 1.

IV. DISCUSSION
As can be seen in Table 1 in previous chapter, the computing duration of all samples, GPU CUDA has faster duration than CPU because the computation is done by more than one processor.But, all samples have different time gap. Figure 11 shows the comparison of time gap between computation with CPU (serial) and with GPU CUDA (parallel).

V. CONCLUSSION
The use of parallel computing mode in the segmentation process with K-means Clustering method results in execution time up to two times faster than serial computing using CPU.This suggests that the use of parallel computing can be an alternative way to increase processing speed in addition to modification of segmentation methods.Judging from the number of iteration of processing using CPU and GPU is not much different; it proves that processing on GPU is faster not because of less iteration.
Subsequent studies should be evaluated not only at the excessive time-rate but also the quality of the segmented image.In addition to combine other segmentation methods with parallel computing can also be done to find the best way to produce a quality segmented image with the most computationally compact time, especially medical images.ACKNOWLEDGMENT

Fig. 1 .
Fig.1.Serial Computing[15] as follows, a) Stream : In CUDA, streams are stored in a GPU's global named memory.Streams are organized based on the number of threads that will access them in parallel.After that, the thread should be organized into thread block.b) Kernel : In CUDA, the kernel remains a program that operates on the data contained in global memory.

Fig. 3
Fig. 3 CUDA Memory Hirarchy a) Softwares  Operating System Windows 10  Python 3.5  Anaconda IDE (Spyder)  PyCuda minimum 2.7  OpenCV module b) Hardware  Processor intel core i5-2430M CPU @ 2.40GHz × 4  NVIDIA Geforce 930mx CUDA 1GB (48 cores)  RAM 4GB DDR3  Harddisk 500GB a) Initialization Number of clusters In this step, specify the number of clusters or segments to be generated.This sum will be called k which is a positive integer.The k value is usually determined by a heuristics or case studies.b) Determine the position of the first Centroid Select a random number of k objects from the data set as the initial centroid.c) Determine the Distance Between Centroids and Other ObjectsAll non-centroid data objects are calculated in distance to all centroids.The distance calculation is generally done by Euclidean distance method with the following equation[18].the Object to the Closest Cluster After one object has a distance value of k, then enter the object to one of the closest clusters between k.e) Change the position of CentroidTo update the centroid values the following formula is used[18].

Fig. 11 .
Fig. 11.Computing Duration Comparison Sample 1, 3 and 4 have minimum gap two times of CPU computing duration.This may be caused by the different sample conditions.

Fig. 12 .
Fig.12.Amount of Iteration Figure 12 shows the amount of iteration of each sample computation both CPU and GPU.Sample 1 and 2 have the same amount of iteration but sample 3 and 4 have different amount of iteration, but not too significant only 1 difference.

Table 1 .
Execution Time and Amount of Iterations