Object Position Estimation based on Dual Sight Perspective Configuration

Development of the coordination system requires the dataset because the dataset could provide information around the system that the coordination system can use to make decisions. Therefore, the capability to process and display data-related positions of objects around the robots is necessary. This paper provides a method to predict an object’s position. This method is based on the Indoor Positioning System (IPS) idea and object position estimation with the multi-camera system (i.e., stereo vision). This method needs two input data to estimate the ball position: the input image and the robot’s relative position. The approach adopts simple and easy calculation technics: trigonometry, angle rotations, and linear function. This method was tested on a ROS and Gazebo simulation platform. The experimental result shows that this configuration could estimate the object’s position with Mean Squared Error was 0.383 meters. Besides, R squared distance calibration value is 0.9932, which implies that this system worked very well at estimating an object’s position.


INTRODUCTION
The development of the coordination system requires a dataset [1] because the data could provide information around the system. The command or coordination system uses the dataset to produce a command set further. Therefore, processing and displaying data related to positions (in coordinate format) is necessary. The dataset represents the information and stuffs layout around the robot. Then, a command set containing each robot's next behavior is created by the command system as a result.
Various approaches and models are submitted to sample the conditions around a system or robot based on their sensor types. One of those approaches is, using a sensor configuration based on an omnidirectional camera configuration [2]. The system could detect objects around the system in a circular area within a certain radius using this sensor configuration. On the other hand, this configuration could calculate a distance between the robot and the object even with a simple regression function [3].
The localization approaches (methods) are under development so that the robot can retrieve data from the surrounding environment; one of them is Monte Carlo Localization [4]. The generated result by using this method is a map. Then this method would generate a better result in an indoor area [5], [6]. Meanwhile, this method requires a high capability computation hardware and the use of appropriate sensors such as LiDAR [7], [8]. Otherwise, systems with lower computation capability (standard microcontrollers) would degenerate the system's performance.
In this research, a simple method that can estimate an object's coordinate is proposed and named the Dual-Sight Perspective configuration. The method is tested in the realm of the simulation system. The created technique consists of trigonometry and angle rotations functions. This configuration is expected to ease the computation and could be implemented on a low computation capability hardware.

A. Camera Configuration
Fundamentally, detecting objects must require sensors. In this case, the desired output sensor data is in the form of an image. Therefore, the right sensor to use is a vision sensor such as a camera. The vision sensor has various configurations available to use, such as mono or perspective vision [9], stereo vision [10], and omnidirectional vision [11], shown in Fig.1, to detect objects. Further, in the case of detecting objects using omnidirectional cameras, a few available methods are Blob Tracking, Hough Circle Transform [12], [13], and even by using a neural network method [14].
The configuration of stereo vision is an example or even a part of a multi-camera system, for other examples of multi-camera systems available. The usage of multi-camera systems is distance measurement systems [15], 3d image reconstruction [16], object's or robot's control position system [17], [18], and indoor positioning systems [19].
In this study, omnidirectional camera configuration is the best option because this camera has almost or even more than 180 degrees FOV. Furthermore, the single-point omnidirectional camera is desirable. Because the captured image is, but it would need a geometry correction or calibration [20]. (c) Omnidirectional Camera.

B. Indoor Positioning System
In researches discussing indoor positioning systems (IPS), there are few approaches classified based on their signal categories, such as radio frequency (RF), light, sound, and magnetic field [21]. For example, one of these methods based on radio frequency signals is Wi-Fi connection [22], [23], Bluetooth connections [24], and ZigBee protocol [25]. On the IPS using Wi-Fi signals, there are two available methods, received signal strength (RSS) measurement and time and space attributes of the received signal (TSARS) [26]. For example, usually, there are more than two hotspots (access points) as the anchor of RSS measurement (shown in Fig.2). This technique would require trilateration or a multilateration technique.

C. System Modelling
At last, the designed configuration is adopting and modifying the other researches before. The stereo camera configuration with different camera positions showing an interesting point. The Dual-Sight Perspective idea came out by reviewing the camera's position while every camera is mounted on a robot and intentionally moves around as the omnidirectional camera mounted on soccer robots.
Then, the indoor position system gives an interesting point too. The RSS measurement result is obtained on some specific distance value. Finding the equal function is calculated using the regression method, so the signal strength measured next would be calibrated using the generated function [27]. The alpha distance value is obtained in an image form for the Dual-Sight Perspective design, so the alpha value is in pixel value. Then the pixel value could be calibrated into real distance value in meter.
At last, dual sight perspective configuration is a kind of combined method between stereo vision and indoor positioning system (Fig.3). First, the camera model used in stereo vision is changed into omnidirectional camera configuration and the camera position would be various since it is mounted on each robot. Then, the acquisition data system adopts the IPS method while the collected data is the object and robot distance in pixel unit inside the image captured.

D. Data Acquisition System
At a glance, the DSP configuration will look like the IPS system when it is online [22]. This is because 96 Jurnal Infotel Vol. 13  all of the robots are doing a raw data collecting procedure. Every robot captures images with cameras (e.g., soccer ball) and calculates its pose and orientation on the field (see Fig.4). The collected data are shared into the ROS system that runs the main algorithm.
The ROS system consists of four support systems. There are Plugin, Data Pre-processor, Estimation Position algorithm, and Coordination system (not included in this research). The plugin works as a data buffer between ROS and Gazebo as this study uses both platforms. The plugin sends data from ROS to Gazebo and otherwise.
The data pre-processor block contains two algorithms, image processing, and the robot's pose parser algorithm. Image processing is a method to detect and recognize the estimated object. The image processing is divided into 3 sequences, HSV filtering, Morphology filtering, and Blob tracking algorithm. The output of image processing is detected coordinate object in the image captured known as centroid data (pixel x, pixel y).

E. Simulation System
As the initial step, this system runs in a threedimensional simulation system. Gazebo simulator is used to provide the world simulation and ROS to run the programs needed. It is required to build a 3D design of each element used for the simulation initially before proceeding to a simulation. [28].
A Gazebo and ROS simulation package designed for the Robocup MSL named simatch [29] is available as open-source, which the package is licensed under Apache 2.0 license. The package contains the robot's design, MSL field texture, goals, and the ball based on the Robocup MSL game (Fig.5). Unfortunately, the omnidirectional camera and robot's odometry system are not modeled inside the package [30]. Additional plugin (omnidirectional camera, and odometry system) is added inside the robot's model (Fig.6). There are no physical sensors used. Instead, two plugins are used to take data from the simulation system. Camera plugin is a plugin for taking the image data inside the simulator. The output image is set at 1080x1080 pixels resolutions and up to 180 degrees of FOV. The positioning plugin takes the information about all object's positions and orientation data in Gazebo. According to represents specific data for each robot, a parser program is used, which is explained before.
Overall, based on the system modeling section in this part, every model is built in the ROS system. The ROS graph structure is shown in Fig.7. While the simulation system is running, the built ROS system also runs simultaneously. The programs containing every algorithm created are executed following ROS standard communication system (publish/subscribe topics) [31].
The acquired data is an image captured by the simulated camera on the robot. The detected range is until almost half of the field on one side. However, this plugin creates the ideal calibrated image. Fig.8 shows a sample image captured by a simulated camera. There are elements inside the image captured. The zero image point locates at the top left corner, where the image represents a matrix form [32]. The Center image point is the point that occurs at the center of the image, as the resolution is 1080x1080 pixels, so the center point is located at the coordinate (539, 539). The yellow dashed line represents the distance and direction between the ball and the center of the robot.

F. Computer Vision
The main objective in this point is to identify the specified object (ball). The input image is filtered by two methods, HSV filtering and Morphology Filtering. First, the image coloring format is converted from RGB into HSV format as follows [32]. Then, implementing a threshold value in the HSV formatted 97 Jurnal Infotel Vol. 13  image, the HSV format image is filtered, and it focuses on the ball's color in binary format like in [33]. The morphology filter intends to increase (dilate/open operation) or reduce (erosion/close operations) the selected structure area as follows in [32]. The open operation is chosen for the morphology filter method in this study. If the robot can identify the ball, then the ball's position to the robot could be estimated. The result of the filtering process is a morphology filtered binary image. Then the blob tracking is done using this image as input. Blob tracking is a method to detect a selected area depending on information surroundings. Then, the blob result is taken to the center point of the selected area (called "centroid") to be processed using the moments' equation. Equations (1) and (2) can be used to determine the centroid position as follows [32]. The output value is in Cartesian coordinate format (Cx, Cy) and pixel unit (see Fig.9).

G. Pixel to Meter Calibrator Function
This process uses the blob tracking algorithm. The algorithm is only to find the position of the centroid position. So, the centroid data is sampled by moving the ball in front of the robot per 0.1 meters. The ball's initial position is 0.4 meters from the robot's middle point. This sampling task is done while the distance between the robot and the ball is 9 meters (Fig.11). To acquire the ball's centroid and robot's middle distance see dashed line in Fig.8, the modified Pythagoras method is used (Euclidean distance) as follows [34]. The formula is shown as (3). In comparison, (4) is the generated function of distributed data in Fig.10 with the linear regression analysis method.

H. Quaternion to Euler Converter Algorithm
This part is intended as a pose data collecting algorithm and data converting algorithm because the robot's yaw angle data is required to estimate the ball's position. The Gazebo system provides the positioning data in Quaternion format (x, y, w, z), while the data is required in Euler format (roll, pitch, yaw). There is no specific function Quaternion to Euler conversion function used. Instead of using the "getRPY()" function, convert a Quaternion format to an Euler format. The pseudo-code below shows the Quaternionto-Euler conversion algorithm.  Figure 8 is an example captured image when the omnidirectional camera is in an active state. The image's initial point (0, 0) is located at the top-left corner because an image is equivalent to be a matrix, as mentioned in [32]. The referring point is located at the center of the image located at the coordinate of (539, 539). The point is used as the reference point of the ball to a robot's distance to estimate the point's calculation.

I. Estimation Function
Referring to Fig.8, the dashed line's length is an important value. First, the length is calculated using (3). Then, the ball's angle value towards the x refers point axis is calculated. Finally, the angle is determined using the trigonometry invers method [35], such as (5). The angle value is written in radian value, so the degree value input must be converted to a radian value.

= acos ( )
There are two conditions in this process. First, when the Cy value is below the y value center of the image variable (Cy ≤ 539). Second, when Cy value is above y value center of image variable (Cy > 539). Both of the condition gives two different results using if are calculated using (5). Equation (6) is provided to encounter this problem. The equation adds a rotation effect. When the condition occurs like the second condition, the result deviates more with a doubled real distance between ball and robot. The Symbol is a variable containing the value of 0.017444 (or 3.14/180). This value will convert the angle in degree value to be a radian value. The condition mentioned above is described in Fig.12. The next step is to determine the amount of influence caused by the robot's direction. The process is described in (7) and (8). The result value is a degree value between the robot's headings and the ball's direction to the robot. As the change of the ball's direction, the image's centroid point also changes. Equations (9) and (10) After that, calibrating the new centroid point by using (4). The result of calibration is moved to the variable (dx, dy) while the centroid point (Cx1, Cy1) is substituting the x value in (4). The transformed equation is shown as (11) and (12 The result of (11) and (12) is a delta distance value in the meter. So the distance in pixel before (Cx1, Cx2) has been calibrated into the real distance in meter. The last, counting the ball's position using (13) for x element and (14) for y components.
On (13) and (14), Pr means robot position on the field. The Prx and Pry elements represent each coordinate elements, Prth is the element of the robot's direction written in radian value. At last, the result of (13) and (14)

J. Testing Scenarios
At last, the system has been tested under four different scenarios. The first scenario is illustrated in Fig.13, the second scenario is illustrated in Fig.14, the third scenario is illustrated in Fig.15, the last scenario is illustrated in Fig.16. The black hairline inside those figures is illustrating a 1-meter distance on the x or yaxis.    In the first scenario, robot 1's orientation is located (-8.5, 0, 0). Robot 2's orientation is (-2, 1, 0) and the robot 3's orientation is (-2, -1, 0). Then, the ball is located on the center of the field representing the zero coordinate or (0, 0). All of the robots face the same direction (to the right side of the picture shown by "Yr" annotation in Fig.13). The pose value above represents the x and y axes plus the robot's heading. The x and y elements represent the coordinate of the robots locating on the map (field). The yaw element shows the value of which degree the robots are facing. For the second and the third scenario, all of the robots are locating in the same initial poses. The only difference is that the ball is moved around the y axis with the same coordinate and headings. Therefore, the ball is posed at (0, 3) for the second scenario. In contrast, the ball is posed at a coordinate of (0, -3) for the third scenario. Finally, the robots are posed in different coordinates and heading 100 Jurnal Infotel Vol. 13  courses except for robot 1 because robot 1 plays the role of the goalkeeper for the last one. In summary, robot 2 is located on (-2, 3) with the heading angle is -90 degrees, and robot 3 is located on (-5, -2) with a heading angle is 45 degrees.

K. Estimation Coordinates Errors
The estimation coordinates are compared with the actual ball's coordinate to get the estimation error (deviation) values. The Root Mean Square Error (RMSE) to find the deviation errors is shown in (21). Suppose that (Pbx, Pby) is the estimated values and (Prx, Pry) is the actual coordinate. Then, M is the number of estimation points taken as the number of scenarios that follow [27].
III. RESULTS Figure 17 until Fig.20 show the illustration of the estimated ball's coordinate, respectively. The graph illustrates the viewpoint of the field used in the simulation. The maximum x and y have been shrunk to a smaller resolution according to make the deviation is could be recognized well. Every dots inside the image are the coordinate point due to the estimation coordinate system on the field. The points displayed in the graph are the average value of each robot's sample that has been calculated before. As an example, the obvious blue dot in Fig.17 is the average value from robot 1's sampled object's position data.    Because the robots detect and calculate the estimation separately, the estimation results will produce three different points. Of course, this causes differences in the estimation results. Therefore, to produce a uniform point, it is necessary to calculate the average of each element from the ball's coordinate point. The final result is determined by calculating the average values from all of the estimated results using (20) and (21). The average results are already shown in Fig.17 until Fig.20 and Table 1.

IV. DISCUSSION
The omnidirectional camera on the soccer robots usually uses the reflective convex mirror, as shown in Fig.1c. Therefore, the captured image will have a distortion effect and need calibration. However, implementing the distortion and calibration on camera simulation burdens the computer that is running the simulation. Therefore, according to a fine simulation, the distortion and image calibration is skipped in this study. As a result, the captured image looks like in Fig.8, then considered as a calibrated image and ready to be processed. Figure 10 shows the data comparison between the distance values in pixel into the real distance in meters between robots to the ball. The graph shows the linear function written as (4). By implementing a linear analysis, the R squared value is 0.9932. This result only applies to this study because this study uses a simulation system that every condition can be considered a perfect condition.
Consider that a robot is located at the coordinate of (-8.5, 0), a heading angle of about 360 degrees. In contrast, the ball is located at (0, 0) and does not have the heading angle. Last, the ball is located on the image's coordinate at (539, 40). In a word, these data are shown in Table 2. According to the data in Table 2 above, the best equation to be used is (15)(16). First, the ball is detected on the image at coordinate (539, 40) pixels for Robot 1. Then, it is known that the robot's center point is located at (539, 539) of the image as follows these calculations. Robot 1 predicted that the ball is located at (-0.0954, 0.2786) coordinate based on the calculation above. Besides, the prediction results by Robots 2 and 3 are shown in Table 3. The average result of the predicted ball's coordinate is (0, 0.294). The average value is compared to the actual ball's position, i.e. (0, 0). The squared error will be calculated as below.
= (0 − 0) 2 + (0.294 − 0) 2 = 0.294 The estimated ball's positions number 2 to 4 in Table 1 are calculated in the same way. The different SE values in Table 1  Then, all of SE in Table 1 was used to calculate the root mean square error (19). The root means square error from 4 testing scenarios in this study is 0.383 meters. Therefore, it can be concluded that the prediction of the object's coordinate position is fairly accurate. So, this configuration could be considered to be used in a lower computational capability system to estimate an object's position.

V. CONCLUSION
The output of this system is the object's coordinate written as (x, y). The dual sight perspective configuration can predict the object's position with the estimated error of 0.383 meters. While the R squared value of distance in the image to real simulated distance value is noted as 0.9932. Furthermore, then the corresponding result is obtained in the realm of simulation. However, the dual sight perspective configuration still needs to be developed in other ways and be tested on the real-world applications.