Table of Contents [306685]

Table of Contents

Chapter 1. Introduction 1

Chapter 2. Project Objectives 4

Chapter 3. Bibliographic research 5

3.1. [anonimizat]/video analysis 5

3.1.1. Players tracking 5

3.1.2. Offside detection from a still image 9

Chapter 4. Analysis and Theoretical Foundation 12

4.1. Color spaces: RGB vs HSV 12

4.2. [anonimizat]’s maximums 13

4.3. Morphological operations 17

4.3.1. Dilation 17

4.3.2. Erosion 18

4.3.3. Opening 18

4.3.4. Closing 19

4.4. Contour extraction based on its features 19

4.5. [anonimizat]’s transform 20

4.5.1. Canny’s edge detection algorithm 20

4.5.2. Hough’s classical transform for line detection in an edge image 23

4.6. Vanishing point 26

4.7. Clustering 27

4.7.1. BSAS- Basic Sequential Algorithmic Scheme 28

4.8. Proposed solution 30

Chapter 5. Detailed Design and Implementation 31

5.1. Background subtraction 31

5.2. Obtaining the field’s mask 34

5.3. Obtaining the players masks 36

5.4. Players clustering 38

5.5. Line detection and finding the vanishing point 41

5.6. Determining direction of play and differentiating between attackers and defenders 45

5.7. Drawing the offside line 45

Chapter 6. Testing and Validation 47

Chapter 7. User’s manual 53

7.1. Installation manual 53

7.1.1. Software resources 53

7.1.2. Hardware resources 53

7.1.3. Installing the application 53

7.2. Running the application 53

Chapter 8. Conclusions 55

8.1. Contributions/achievements 55

8.2. Critical analysis of the results achieved and possible improvements 55

Bibliography 58

Introduction

The usage of technology in sports has always been a [anonimizat], physical, [anonimizat]. [anonimizat]/likelihood of human error would be worthwhile.

This has eventually led to a [anonimizat], [anonimizat], [anonimizat], ensuring every split millisecond is accounted for and any final ranking is not up for debate. [anonimizat], snooker or even boxing all make use of such tools in an attempt to help referees make better calls and not have their erroneous decisions be the talk of the day after an important sports event.

Paradoxically, football, arguably the world’s [anonimizat], preferring a more dynamic play and excitement to fewer wrong calls that would supposedly benefit all parties involved. [anonimizat]’ performances than on teams themselves and a few important technologies emerged.

[anonimizat] “Goal Line Control” to determine whether the ball has passed the goal line in its entirety and thus a goal should be awarded or not. It was a momentous day when Benzema’s goal in France vs Honduras eventually stood after replays have shown the ball had indeed passed the goal line in its entirety, a much closer decision than the outrageous disallowing of Lampard’s goal in England vs Germany just 4 years prior.

One of the most common errors made by football referees, central or assistant, remains the offside rule though. No technology has been implemented to prevent wrong calls and several matches have been unfairly decided by such poor decisions. What is all the more frustrating is that TV replays usually do show the offside line, in particular those hiring a special suite of tools that also consists of superimposed team logos, score and so on, so theory says we should not be far off from putting it into practice as well, especially since the likelihood of a wrongly called offside is much higher than that of a ball passing the goal line or not. The offside rule is not restricted to football and there are distinct rules that define what makes a player find himself in an offside position in each of these sports.

According to [1], the football offside rule states that:

“It is not an offence in itself to be in an offside position.

A player is in an offside position if:

he is nearer to his opponents' goal line than both the ball and the second-last opponent

A player is not in an offside position if:

he is in his own half of the field of play or

he is level with the second-last opponent or

he is level with the last two opponents “

The current paper will deal with the most common scenario, not taking into consideration the ball’s position as ball tracking, or rather detection, in a still image is a tedious task that is beyond the scope and means of our implementation. More specifically, a player is considered to be offside if at the exact moment a team-mate situated behind him passes the ball towards him, his position is more advanced than that of the last opponent’s in reference to the opposite team’s goal line.

Figures 1.1. and 1.2 show two different moments of play where one of the players from the attacking team is in an offside position(the first) or not(the latter).

Figure 1.1 The right-most red player(attacker) is in an offside position since he is closer to the goal-line than the last defender from the opposite team

Figure 1.2 The red most advanced player is not in an offside situation since he is behind the line that marks the last blue player

It is worth pointing out that most technologies previously mentioned make use of several high-speed cameras, some capable of shooting at 500fps [12], strategically placed and fine tuned so as to provide accurate results in the shortest time possible. Given the lack of images sets acquired by such systems, much less the systems themselves, the current paper addresses the possibility of an offside position detection from a single static image, taken by a single camera, with zero parameters known beforehand. There are serious limitations that come with this approach, all of which will be explained in the paper, whose basis is a mere proof of concept, of how image processing could be used in such instances, especially given the access to powerful resources.

As such, the current paper seeks to analyze the unique, proprietary features of an image representing a moment from the play and output a modified version of it where the line of the last defender is drawn and, from there, the position of the most advanced forward with respect to it can clue in the viewer on whether he is in an offside position or not. We will see that contrary to an initial assumption that analysis on a still image can be easily extended to video in a frame-by-frame measure and be the logical sequence of actions, a video analysis and tracking of players would yield better, albeit at a greater cost(time, effort, resources) results. The algorithm proposed was geared towards a generic approach, taking into consideration statistical features of an image that should have been similar traits in images shot at a football match, but there are great variances that cannot be accounted for completely and consequently, the algorithm will mostly be illustrated on a sample image, rather than a set of such samples, precisely because a single frame caught in a match doesn not yield sufficient statistical information that can be extended to all such instances.

Project Objectives

The main purpose of this project is to build an application that could act as a classifier which provided with an image could decide whether there are any attackers in a potential offside situation. The end-game is quite a simple one, but it requires several steps that could each be worthy of individual research papers. The limitations associated with the image acquisition resources slightly deviate it from the original intent in maximizing what can be achieved in the current conditions and exposing these limitations for what they are and what it would take to overcome them.

The steps that make up the general algorithm described in this paper can constitute individual, independent objectives on their own and can be formulated as such:

Given an image taken at a football match, isolate everything that does not have to do with the field- a background subtraction algorithm applicable on a static image that takes away banners, stands, graphical elements, but keeps the lines and players of the field.

Perform player detection on a single static image- differentiate between blobs representing (residual)lines/patches/noise and players

Classify each actor in a static image shot at a football match as an attacker, defender, goalie or referee

Find the vanishing point of the ‘parallel’ horizontal lines of the field- the midfield line/goal lines

By combining the previous 4 results, output a modified version of the original image that is close to what TV replays showing offsides look like nowadays, with significant less resources

Given the project objectives, it is easy to extract the functional and nonfunctional requirements of the project. The functional requirements are subject to the formulated features and describe what the system can act like or rather what it behaves like.

As such, the application should be able to perform:

background subtraction

player segmentation and clustering

line detection

calculation of vanishing point

contrast/color-related filtering

final offside line detection and drawing of line/offside area

The non-functional requirements refer to how the system’s operations can be evaluated based on different criteria:

Performance: the system should be able to output the final image in a reasonable time, less than three seconds

Accuracy: the line drawn should be close to what the human eye sees as well as a player’s most extreme point and seemingly parallel to the field’s lines

Usability: the application should be very easy to use, since it has a straightforward objective, and intermediary outputted results should let the user know which step of the process was responsible for a potential faulty final output

Bibliographic research

Problem context- sport events image/video analysis

Image analysis from sport events has long been approached in the specialized literature, but even more so in recent years as technology has advanced, stakes were raised and sport organizations/teams/spectators all have something to benefit from this. Whether it is a goal line technology or detailed information about players’ movement around the field, statistics related to distances covered, effort and other detailed reports that help the teams identify strengths and weaknesses in an objective manner, there is no denying that usage technology in sports comes with great benefits.

Player tracking thus becomes an important task in computer vision, specialized systems that provide statistics for individual players proving to be very profitable and serving the grand majority of professional teams from the elite football.

The needs and benefits of such technology become obvious when one has to think about the effort involved in performing the same task manually and, presumably, analyze the same video sequence a number of times to keep track of every player individually, whereas professional systems such as Prozone[13] or Sport Universal[14] can provide statistics for all players, be it shortly after the match has ended in the first case, or even in real-time for the latter, which is truly remarkable and can constitute a highly valuable tool in managing a team.

The task of player and ball tracking is a highly challenging one, perhaps much more complicated than the detection of pedestrians for example, since the players often wear similar kits, the scene is subject the light and shadow variations, plus the dynamics and constant changing of positions is even more problematic. Another source of worry is that of players occlusion, which happens very often in almost every sport, especially when the cameras acquisitioning the images are placed close to the field and not high enough.

While the current paper is not concerned with player or ball tracking from a video sequence, but instead correct detection and clustering given a single, static image, several techniques can be switched back and forth between the two approaches in analyzing a scene from a sports event, football match in this case.

Players tracking

As motivated before, the tracking of players in a video sequence is in no way a trivial task but at the same time, if performed correctly, it can provide valuable information for fans and managers alike, such as distance and main areas covered by a particular player, maximum speed and so on, metrics that would be very hard to account for manually.

For such a challenging task, numerous papers have tackled it from different perspectives- a single broadcast camera like Liu Jia et al. [2], a dedicated, manually placed camera, as described by Dearden et al. in [3] or even several cameras, Ming et al [4][9][10]. The concepts used in the algorithm range from the background subtraction found in most papers, to unsupervised learning, Markov Chain Monte Carlo association[2], convolutional neural networks and Adaboost[5], Support Vector Machines[6][8], template tracking with Kalman filters[7] or Histogram of Oriented Gradients[8].

In [2], Liu Jia et al. provide player appearances in form of hundreds of samples for unsupervised learning, which are then used for labeling the players- team #1, team #2, referees, while players’ position and scale are determined by a boosting based detector.

Figure 3.1 System framework of [2]

The system framework is illustrated in Figure 3.1 and it can be summarized as a two-pass video scan. During the first pass, the dominant color of the field is obtained by accumulating color histograms, and the unsupervised learning of players samples gives the player models used in the second pass for player labeling performed after the boosting detection(based on Haar features) that in turn followed the playfield segmentation based on the predominant color learned in the first pass. Finally the MCMC association generates the players’ trajectories.

The algorithm was tested with sample videos from the World Cup 2006 with the following results:

Table 3.2 Detection results of [2] algorithm

Figure 3.3 Results of tracking the players in [2]

Another solution is offered by S. Mackowiak in [8], who uses the widely popular HOG method in pedestrian detection, along with Support Vector Machines, in addition to the already common tasks of dominant color based segmentation playfield modeling based on the Hough Transform.

Figure 3.4 System overview of [8]

Figure 3.5 Player detection and playfield fitting results of [8]

The same issues identified in [2] are apparent in [8] as well, relating to player occlusion and the scenarios where this makes the detection fail, when the overlap is too significant.

Ming et al.[4] and Hamid et al. [9] make use of multiple fixed-point cameras to overcome the occlusion problem and other clutter in the scene, trying to fuse the observations obtained from each camera. Hamid et al. do this by iteratively finding minimum weight K-length cycles in a complete K-partite graph, whose nodes are represented by the blobs of detected players. The edge-weights of the graph represent a function of pair-wise similarity between blobs observed in camera-pairs and their corresponding ground plane distances. Another strong point of this algorithm are the minimal supervision needed..

Figure 3.6 Example set of player locations and the corresponding K-partite graph, where K=3

Figure 3.7 Actual offside detection steps making use of the 3 simultaneous recording cameras

Another attempt at detecting offside situation by a 3D reconstruction based on the synchronized feeds of six cameras is described in T. D’Orazio et al [10]. The cameras placed on both sides of the field, 3 on each side, with the optical axes parallel to the goal line to reduce perspective errors, are responsible for acquiring the images for processing in order to detect the players position and ball position in real time. Furthermore, the player providing the pass is identified and then it can be decided if an active offside situation has occurred.

Figure 3.8 Scheme of visual system

The conclusions of this paper show that it is indeed feasible to have real-time detection of offside positions, given powerful enough resources, but even so, the rate of success is still unacceptable for such a system to actually be adopted by football organizations without further improvements. Despite its complexity, it is suggested the addition of a few other cameras would increase the rate and ultimately, in its current form, the algorithm could still be used in an attempt to help the referees make decisions, instead of making the decisions for them.

Offside detection from a still image

While the previously mentioned works are mainly concerned with the problem of player tracking in video sequences, a similarly themed paper on the subject of offside detection in static images was written by Davide Devescovi [10]. The main steps of the algorithm are very close to the final form of this paper’s solution, but their actual implementation differs greatly.

Steps:

Field extraction

Find field mask

Find player mask

Cluster players

Find field lines and vanishing point

Draw offside line

For example, the field extraction method is based on [11], executed in the RGB space, as opposed to the HSV space we used. Given the RGB histograms’ maximums and some threshold values for each component, the following formula was derived for the binarization of the image:

Figure 3.9 Binarization of image based on histogram peaks of the image Rpeak, Gpeak and Bpeak and pre-established thresholds Rth, Gth and Bth. A pixel is considered a field pixel if its difference in R,G,B values from the histogram’s peaks are all less than the fixed thresholds and the green value of the pixel’s color is greater than the blue and red one

The lines detected as part of step 5) with the Hough Transform are filtered based on their slope and after this first round if discarding lines, we also discard any line that is too ‘similar’ (in angle, probably part of the same line in reality, detected as more lines due to thickness) to an already processed line. The final part consists of converging towards the vanishing point by calculating the intersection of every pair of two lines and averaging the resulted point with the previous position.

Figure 3.10 Line detection and vanishing point detection in [10]

The conclusions of this paper are close to what we experienced as well, in that it is way too complex of a task to draw the offside line for all scenarios. Described in only one frame and without any prior knowledge, the scene does not contain enough valuable information regardless of textures/players’ position etc, so in the end the author used hardcoded players’ masks for testing a second picture and dealing with player occlusions.

Figure 3.11 Final result of algorithm presented in [10]

Analysis and Theoretical Foundation

The algorithm proposed in this paper can be seen as a sequence of steps, but behind every one of those steps there is some elementary theoretical foundation, ranging from color spaces used, to line detection algorithms, clustering and so on. We shall look at each of these concepts individually, explain why certain solutions were preferred to others and their role in the grand scheme of things that is the final procedure.

Color spaces: RGB vs HSV

The images used as samples for the application are originally represented in the RGB space. In the previously mentioned paper [10], the initial binarization of the image was made based on the R, G and B values of individual pixels. While this is easier to process from a mathematical point of view, the HSV space is popularly used in computer vision and image processing as a better alternative to the RGB space since it the latter fails to capture subtleties that come with illumination, shadows, other noise and distinguishing them from the true color components, so much that the dominant RGB component in a green light might not be green for example.

Numerous papers have shown the HSV space is much better suited for image processing, especially in the case of object detection where static background subtraction precedes it. Since the current paper addresses the idea of background subtraction as well, in addition to the more perceptive and intuitive feel of this space, the HSV space was used, by converting the original image via built-in functions.

While a color in the RGB space lend itself more to hardware applications, a color in the HSV space has three main components that are closer to how humans perceive and interpret color, mapped on a cone, instead of the RGB cube:

Hue- defines the color itself, representing an angle going from 0 to 360 degrees, starting at 0 with red and going through green, blue and other intermediary colors before completing the circle

Saturation- defines the hue’s dominance in the color, such that on the outer edge of the hue we have ‘stronger’, more ‘pure’ colors, but as we move towards the center, we have grayscale tones, i.e. desaturated colors, where no hue is dominating; goes from 0 (the cone’s vertical axis, fully desaturated) to 1(fully saturated)

Value- refers to the lightness or darkness of the color; goes from 0(darkest color, at the top of the cone) and increases in lightness towards the basis of the cone to 1

Figure 4.1 Cone representing the HSV color space vs the cube of the RGB color space (Chroma=Saturation)

While the OpenCV built-in function cvtColor(src, hsv, CV_BGR2HSV) does the conversion automatically, this can be done by hand given the R, G, B values using the formulas in (4.2), according to Travis in [20].

(4.2)

Background subtraction, finding HS histogram’s maximums

Background subtraction has been the subject of a great deal of research, some algorithms and solutions having already been mentioned in Chapter 2 of this paper. It is usually considered a mandatory step in trying to localize regions of interest in an image, more so in detecting moving objects in videos from static images, on the basis that the background does not change from one frame to another, as opposed to the objects in question. The current paper does not address video processing, however it does have to deal with a slightly different connotation of the expression ‘background subtraction’. To be able to localize players and differentiate them from fans for example, we need to delimit the field and subtract everything else, be it stands or logos etc.

The basic idea for this process is that we can use the knowledge that the largest part of the picture is supposed to be represented by the field, which, in turn, should be mainly green, perhaps with different green tones. By binarizing the image in such way that every field pixel(that is every green pixel, not players or field lines) is turned black, while everything else turns white, we can get a sense of the image, meaning we might have some noise left in the stands/banners/field itself where there are still some green tones, but the largest part of the image should now be a black polygon, representing the field, which can be further processed to effectively obtain the region of interests, i.e. the players. This can be done by performing some morphological operations to remove the outsides of the field, after the field itself has been delimited by means of boundary detection.

As for the actual binarization process, we performed color segmentation in both spaces, RGB and HSV with mixed results. One obvious, brute classifier for field pixels in the RGB space was to consider every pixel in the image pertaining to the field if its green value was greater than both the red and blue one. Unsurprisingly, this did not yield great results, nor did a threshold difference in color values between the green channel and the other two ones.

So instead of using the RGB space as in [Italian], the image was converted to HSV space and a 2D HS histogram was calculated for the sample image. A matrix HS[180][256] was created, where HS[i][j]= how many pixels in the image have a Hue of value i and Saturation j.

Figure 4.3 The HS histogram viewed it gray levels and zoomed-in (closer to white means a greater number of pixels of that hue and saturation)

Figure 4.4 There are 5650 pixels in the image having hue=43 and saturation=171, still less than the 5764 pixels having the same hue but a greater saturation of 176.

It is to be noted that the 180 and 256 dimensions come from OpenCV’s interpretation of valid ranges for the HSV space: hue range is [0, 179], saturation range is [0, 255], and value range is [0, 255].

The analysis of the HS histogram provides interesting information, perhaps more so than the RGB one and there are several interpretations that could shape up the next steps of the algorithm. In order to have a generic view of what is happening in an image, regardless of field texture, lightning conditions and so on, it follows that a statistical measure should be used to filter out non-field pixels. Finding and using the HS histogram’s maximums lends itself to this part easily.

Choosing a window size wsize and a threshold th, we iterate through the histogram’s points and compute the average value of the window of width (2*wsize+1), centered at the current point. If we find that a point is a local maximum for the window but also greater than the average by at least th, then we consider the point to be a local maximum and we store its coordinates. Different window sizes and thresholds yield different results.

One way of using these local maximums is then to iterate the HSV image pixel by pixel and, assuming the maximums pertain to the field parts of the image(possibly with another conditioning on the hue’s range), turn black the pixels that are close to any of the found maximums, given a delta hue and delta saturation. The hue and saturation vote together for all the maximums found.

However, this is problematic as it is not very generic, having 4 variables to set manually- the deltas, wsize and th. One reasonable way to get rid of at least two of them, the deltas, would be to sort the array of maximums found, first by hue, then by saturation and compute them as:

deltaHue = maxHue – minHue + const1;

deltaSat = maxSat – minSat + const2;

One might argue that we got rid of deltaHue and deltaSat, introducing const1 and const2 instead, however the added constants contribute little to the final result, as opposed to before when a deltaHue would need adjustment of 10(ten) from one picture to another. Several tries have shown good values for const1 and const2 are 2 and 30, respectively, so the final binarization equation becomes:

(4.5)

An example of how this binarization works can be seen in figure 4.6, other images being presented in the next chapter as well.

Figure 4.5 Source image

Figure 4.6 Output after the binarization process

It is also important to note that the result of the binarized image needs to also preserve the lines of the field, aside from the players themselves, a task that is not so trivial, given the slight difference in tones from the main greens of the field. The next sections will detail the importance of retaining the lines and how they can be used for further processing.

Morphological operations

Mathematical morphology plays a great part in pre and post image processing. In an effort to detect objects in image, one can perform the binarization and then apply some morphological operators, most of which are quite simple. The two fundamental morphological operations are dilation and erosion and several other operations are based on a combination of the two.

Dilation

Dilation is one of the two fundamental morphological operations and uses a structuring element in order to probe and expand the shapes that are contained in an image. It can also be thought of as a combination of two sets by using vector addition of set elements, such that for the set of black pixels in an image, a dilation with the structuring element B is obtained by adding the points of B to the points in the underlying point of set of black pixels. Even though dilation is based on set operations, it is reminiscent of convolution, since B is practically flipped around its origin and slides around the set/image A.

The main use of dilation is to fill holes in an image or to expand a shape, as is the case with OCR for example, where gaps in the letters are being bridged.

The main notation that is being used is:

,

where A and B are sets in Z^2, based on obtaining the reflection of B about its origin and shifting this reflection by z. The result is then given by the set of all displacements z such that B and A overlap by at least one element.

Figure 4.7 a)Original image b) image dilated by the 3×3 rectangle structuring element

Erosion

Erosion is the second fundamental morphological operation and can be considered dilation’s counterpart, since it can also be obtained by dilating the complement of the black pixels and then taking the complement of the resulting point set.

Once again, we need a structuring element B that is translatated across the initial set of points A. For each pixel in A we superimpose the origin of B and if B is completely contained in A, we keep the pixel, otherwise we remove(erode) it.

The main application of the erosion is to filter out noise or other unwanted objects, or even just to break the weak connecting lines between object and non-object pixels. In the context of this paper though, erosion is heavily used to ‘break’ two players appearing as one.

The main notation being used is:

Figure 4.8 Image before and after erosion with a 3×3 rectangle structuring element (all 1’s)

Opening

Opening is obtained by first applying an erosion, followed by a dilation, with the same structuring element and has a similar effect to erosion, however it is less destructive.

In that sense, it does remove some of the foreground pixels from the edges of regions of those pixels, but it does preserve the regions that have a shape close to that of the structuring element or completely enclose it.

Another important property of the opening is its idempotence. This means that once an opening has been applied, any other successive application with the same structuring element will have no effect on the final image, since the result of the first one gives boundaries that completely fit the kernel.

The general notation is:

Figure 4.9 Effect of opening an image with a 3×3 rectangle structuring element

Closing

Similar to the opening operation, closing is also obtained by a combination of dilations and erosions, this time in reverse order though, that is a dilation followed by an erosion with the same structuring element.

Since the dilation is applied first, the effect of the opening is somewhat similar to it, in that in tends to expand the boundaries of foreground regions in the image, but not overly so, keeping more of the original shape.

Similarly to its dual operator, after having been carried out once, any new closing with the same structuring element does not change an image anymore and it can be used to filter out different shapes.

The general notation is the following

Figure 4.10. Before and after closing with the same 3×3 structuring element as before

Contour extraction based on its features

In finding the regions representing the players, one would first have to find the field’s masks, which is assumed to be the largest contour in the image and then the contours contained inside it. As previously mentioned, we also need to retain the lines of the field for further processing, so between the players, there would be several other contours detected, even after performing some morphological operations if the lines are very visible. Any other erosion to get rid of the lines would affect the contour of the players so another way of filtering out shapes is needed.

Enter contour extraction based on its features, features that can range from size, color, intensity, orientation, slope and so on. The next chapter will present in more detail which of the features gave the best results, but a few valuable features that apply to a great range of object filtering, besides the obvious area, perimeter, bounding box etc are listed below:

Aspect ratio: represents the ratio of width to height; once the bounding rectangle of a contour has been obtained, the formula follows easily as aspect_ratio = width/height; one particular useful application of this is in number plate recognition

Extent: represents the ratio of contour area to bounding rectangle area

Solidity: ratio of contour area to its convex hull area

Orientation: angle at which object is directed

Mean color/intensity: the average color or intensity in case of grayscale images

Extreme points: the leftmost, rightmost, topmost and bottommost points of a contour

Edge detection- Canny algorithm and Hough’s transform

Canny’s edge detection algorithm

Edge detection has been the subject of several research papers and numerous algorithms exist for this purpose, but one of the most popular ones is the Canny detection algorithm. The three main objectives this algorithm seeks to achieve are:

minimize the number of falsely detected edge points that appear in the image due to noise; can also be formulated in terms of maximizing the signal to noise ratio and thus the probability of obtaining real edge points

a good localization of edge points, i.e. the potential detected edge points are as close as possible to the real edges in the image

the so called suppresion of the gradient module non-maximums, that is minimizing the number of positive responses around a single edge

In order to achieve these objectives, the Canny algorithm can be decomposed in the following four steps:

filter out the noise in the image that is responsible for the false edge points with a Gaussian kernel

compute the image gradient(vector that points in the direction of intensity variation around that particular point)

non-maximum suppression of the gradient’s module in order to thin out the produced edges, which is done by only choosing the points having the greatest gradient magnitude in the direction of the image intensity variation

remove edges caused by noise and link those edges that are ’valid’ by the so called hysteresis thresholding

While the first step is pretty self-explanatory, the gradient’s computation has a sound mathematical foundation as the gradient itself can be defined as a continuous function made up by the partial derivatives in the x and y directions, as given by (4.11).

(4.11)

Figure 4.12 Illustration of the image gradient

However, in the image space, we approximate this by making ∆x = ∆y = 1, to:

(4.13)

The components of the gradient in the x and y direction could also be calculated by the convolution of the image with the Prewitt(4.14) or Sobel(4.15) kernels:

Prewitt:

(4.14)

Sobel:

(4.15)

The gradient being a vector, it is defined by both a magnitude (4.16) and a direction.

(4.16)

For the third step of Canny’s algorithm, the direction of the gradient obtained in the previous step can be put into one of the 8 bins covering the [0-360] space. The edge points that are kept in the image are those whose gradient magnitude is greater than that of both its neighbors in the direction of the gradient.

Figure 4.17 Gradient directions split into bins

So for example if the current point being analyzed has a gradient direction of ’1’ as described in Figure 4.17, it would only be retained as an edge point in the final image if its magnitude would be greater than that of the neighboring points in its north-east and south-west directions.

The last step concerned with eliminating false edge points and linking valid ones can be summed up as:

discard all points with a gradient magnitude lower than some threshold value tlow.

accept all points that have a gradient magnite higher than another threshold value thigh; also called strong edge points

for points with a gradient magnitude situated between the two threshold tlow and thigh, retain only those(called weak edge points) that are connected to other strong edge points by a continuous chain of strong and weak edge points

Figure 4.18 Result of the Canny edge detection algorithm: left- original image, right- the edges that were detected

Hough’s classical transform for line detection in an edge image

An alternative to the highly computationally expensive O(n2) pair of points line detection in an image, Hough’s classical transform solves one of the most important image processing problems- line detection in an image containing a set of interest points. Already having the advantage of real-time calculation of the number of points placed on each possible line in the image, Peter Hough’s work became even more popular after the publications of Duda and Hart.

An edge detection pre-processing is usually desirable before applying the Hough Line Transform, as it any form of filtering unwanted noise.

The starting point of the algorithm is given by the line’s dual representation:

in the Cartesian coordinate system with parameters

in the Polar coordinate system with parameters

Figure 4.19 Polar coordinate system

From figure 4.19, we can deduce the following equation for a line:

(4.20)

This can be rewritten such that for every point (x0, y0) in the Cartesian system, we can define the family of lines going though that points as:

(4.21)

Based on (4.21), if we were to plot the family of lines passing through point P(8,6), we would get the following sinusoid in the – space:

Figure 4.22 The family of lines going through P(8,6)

However, if we were to add the family of lines passing through other two points, say (9,4) and (12,3), we would get the following plot:

Figure 4.23 Family of lines going through points P1(8,6), P2(9,4) and P3(12,3)

The three curves describing the various lines passing through each of the three points intersect at the same point, which is representative of the line that contains all three points. From this we can derive the main idea of the Hough transform, that is the more intersections at a point in the space, the more points lie on it in the actual image, so we can detect lines based on a threshold value TH. If the number of intersections is greater than TH, we can store the line with its parameters.

The results of applying the classical Hough Transform are illustrated below:

Figure 4.24 Original input image

Figure 4.25 Lines found for a threshold of 70. If we were to increase this threshold, we would get fewer lines, since more points would be needed to describe such a line

Vanishing point

Another important aspect worthy of drawing attention to is represented by the camera perspective. We assume all pictures to be taken with a camera from a certain height but the view itself is not a top view one.

Lines of the field are parallel in real life and in top-view pictures like in Figure 4.26. Referees themselves make the offside calls with reference to the horizontal lines of the field and often enough the perspective is an issue for them as well, being tempted to consider a player further in the background to be closer to the goal than one situated in his proximity. Research has shown wrong offside calls should not even be considered a source of human error, since by the time the eyes and brain register the positions of the players involved, they have already changed.

Figure 4.26 Top-view of the football field

In any other image captured from a regular camera, the parallel lines that cross the field do not appear as parallel at all, but instead intersect in an imaginary, vanishing point. In order to draw the offside line at the last defender, which needs to appear as parallel to the other ones, we would first need to find the vanishing point and then unite it with the most extreme point of said defender, to make it look realistic.

The problem of finding the vanishing point itself is trivial, since it is in no way different than finding the intersection of two regular lines, the only abstraction is that the lines appear to have an intersection point only at the image’s level, not so much in real life. As such, supposing we have at least two lines, L1 and L2, given by two points, P1(x1,y1), P2(x2,y2) and P3(x3,y3), P4(x4,y4) respectively, we can compute the intersection point as given in (4.27)

(4.27)

Clustering

Found under different names depending on the literature’s context, be it unsupervised learning, numerical taxonomy, typology or partitioning, clustering is basically categorizing entities. Such an activity comes naturally to humans, given the vast amount of information we receive every day in an attempt to keep track of everything. Clustering can range from telling a male apart from a female, a chair from a table, any entity can be assigned to a cluster. The end result is a group of entities, the cluster, that is defined by the attributes shared by all entities that form it.

The ‘membership’ to a cluster is so significant that a human can draw conclusions about an entity, such as a Golden Retriever barking on the street, irrespective of that fact that this particular dog has not been encountered yet, but it pertains to the cluster ‘dog’, among whose particularities is ‘barking’.

In order to perform the task of clustering, several steps must be taken by the expert and they can be summed up as follows:

Feature selection- given the task of interest, select the features that are most relevant and describe the entities best with at least information as possible to avoid redundancy

Proximity measure- given two feature vectors, define the measure that tells you how similar or dissimilar those two vectors are

Clustering selection- can be expressed either by a cost function or other combination of rules, with different sensibilities

Clustering algorithm- putting together the information gathered in the previous steps, the general schema for the clustering task can be build

Validation of results- after the actual clustering was performed, one can check its correctness via appropriate sets of tests

Interpretation of results- results obtained upon performing the task of clustering can be combined with other related knowledge or evidence in order to ensure the conclusions drawn are correct

There are several types of clustering algorithms, each coming with advantages and disadvantages. One of the more simple ones is the sequential clustering algorithm, whose strength comes from its straight-forwardness and speedy resolution, but which is dependent on the order in which the vectors are presented to the algorithm.

Figure 4.28 Result of clustering

BSAS- Basic Sequential Algorithmic Scheme

In order to describe the BSAS, we start off with a few notations:

d(x,C) – distance(dissimilarity) between a feature vector x and the cluster C; can be given by all vectors of C, a representative one, a mean vector etc

– the maximum dissimilarity allowed, that is if a vector x has a distance d(x,C)>, it should not be assigned to that cluster

q – the maximum number of cluster allowed

m – the number of clusters created until now

A short schema of the algorithm can be presented as:

So starting with a single cluster C1 whose only member so far is feature vector x(1), compute the distance from each new feature vector x(i) to all the clusters created so far. If the distance is greater than the maximum allowed dissimilarity and we have not yet reached the maximum number of clusters than can be formed, start a new cluster to which we add x(i). Otherwise, x(i) is assigned to the closest existent clusters. Whenever a new feature vector is added to a cluster, its representative might be updated, but this is an optional step and is highly dependent on the problem’s context.

The distance d(x,C) itself can be calculated in a lot of ways, some of the more popular choices being:

(4.29)

or

, (4.30)

where x and y are feature vectors from the data set X.

When a cluster is newly formed, its representative is given by its only member and becomes d(x,C) = d(x, mc), but supposing the representative feature vector of the cluster is chosen as the mean feature vector, every time a new point is added to the cluster, we need to update the cluster’s representative by (4.31).

(4.31)

It is easy to see that the algorithm is dependent on the order in which feature vectors are evaluated, same as it is on the maximum number of clusters/dissimilarity threshold, any small change in values potentially leading to entirely different cluster formations.

Proposed solution

Having presented the details behind the main concepts being used as part of the offside detection algorithm, we can make a recap of how they fit in the grand schema:

1) HSV space conversion and determining the local maximums of the HS histogram for extraction of the field

2) Morphological operations to clean up the image of as much noise as possible, while still staying as close as possible to the original contours of the players

3) Contour finding and filtering based on features like area, aspect ratio, extent in order to detect the players’ mask and separate them from other objects present in the field’s region, such as field patches, goal lines and so on

4) Canny edge detection and Hough transform to find the ‘vertical’ lines crossing the field

5) Basic Sequential Algorithm Schema to perform the clustering of players into team #1, team #2, referees, goalkeepers

6) Finding the vanishing point as an intersection of two of the lines found in step 4), candidate lines being represented by those that make an angle greater than a threshold value, so as not to analyze to very close lines.

7) Uniting the vanishing point with the last identified defender and marking the offside line and corresponding out of play area

Detailed Design and Implementation

Figure 5.1 Overall view of the algorithm. Each step is detailed in the next sections

Background subtraction

The background subtraction method that performs the binarization of the image eliminates much of the objects that are not ‘on the field’, by classifying pixels as field pixels or not, based on their HSV color properties. The classifier’s function has been given in (4.5) and the pseudocode for obtaining the local HS histogram maximums as well as the final binarization function is presented below, as part of the binarization() method.

for(int a = wsize; a < 180-wsize; a++) {

for(int b = wsize; b < 256-wsize; b++) {

max = binVal[a-wsize][b-wsize];

sum = 0;

for( int i = a-wsize; i <= a+wsize; i++) {

for(int j = b-wsize; j <= b+wsize; j++) {

sum = sum + binVal[i][j];

if(binVal[i][j] > max)

max = binVal[i][j];

}

}

binValAvg = sum / ((2*wsize+1)*(2*wsize+1));

if(binVal[a][b] == max && binVal[a][b] > binValAvg + th){

newMax.i = a;

newMax.j = b;

maxHistogram.push_back(newMax);

}

}

}

Having set wsize previously to a value of 5, we move around the HS histogram with a size of 11×11 and if the value centered in that window is greater than the average value of the window’s cells by a threshold value th, set to 1000 manually(could be normalized) and is also the window’s maximum, we consider it to be a local maximum of the histogram.

for(int i=0; i<hsv.rows; i++){

for(int j=0; j<hsv.cols; j++){

Vec3b hhh = hsv.at<Vec3b>(i,j);

for(int k=0; k<maxHistogram.size(); k++) {

if(abs(hhh[0]-maxHistogram[k].i) <= deltaHue && abs(hhh[1]-maxHistogram[k].j) <= deltaSat

&& hhh[0]>38 && hhh[0]<80)

dst.at<uchar>(i,j)=0;

else

dst.at<uchar>(i,j)=255;

}

}

Then iteraring through the pixels of the image converted to the HSV space, if the hue of the current pixel is closer to any of the histogram’s maximums by at most deltaHue and same goes for saturation and the hue value is between 30 and 80(a green tone), we consider it to be a field pixel and turn it black, otherwise we turn it white. DeltaHue and deltaSat have been described before, obtained statistically by the difference of the greatest and lowest HS histogram maximums added to some constants.

The method works surprisingly well for more than just one photo, in different lighting conditions and with different field textures, more scenarios being presented in the testing chapter of this paper.

For the current chapter, we will limit to presenting the results obtained on the single next image, as we go through the steps involved in the offside detection algorithm.

Figure 5.2 The input image used for the rest of this chapter, the blue team is attacking, the white one is in defence, the yellow man is a referee and the red player is a keeper for the white team. The right-most blue player is in an offside position

Figure 5.3 The binarization result of Figure 5.2

Some noise is still apparent in Figure 5.3., but the main contours of the players have been preserved, as have been the lines of the field, which is what we were interested in. However, some objects are connected, like the last couple of players or the player in the midfield and the surrounding circle, so eroding the image with a small structuring element is in order. This was followed by a dilation, to not break as much of the original players’ masks, with the result given in Figure 5.4

Fig 5.4 Aside the fact that some players have been somewhat separated, we can observe that the midfield line appears to have disappeared, but this is okay for the next step where we wish to find the largest contour in the image and such a line would break the field in half

Obtaining the field’s mask

After the binarization was performed and the lines that could have broken the field in two or more have been eliminated, we can use OpenCV’s findContours() function in order to retrieve all the contours in the image, with the purpose of finding the greatest of them all, presumably represented by the field itself.

findContours(src,contours, hierarchy,CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE );

The method finds the contours in the source image src, stores them in the vector contours, where each element is it itself a vector of points, and organizes them into a two-level hierarchy, based on the algorithm developed by Suzuki in [21].

While the fourth parameter of the function could be changed to CV_RETR_EXTERNAL to only keep the external contours and not the holes inside it as well, we stick to the original parameters and iterate through the contours to mark the index of the contour that has the greatest area of all.

Figure 5.5 Result of finding the contours on figure 5.4, if all contours are kept in the image

Figure 5.6 Result of the same function findContours, but choosing to only keep the external contours and not the ‘holes’ inside it as well

Once this contour has been identified, we draw it on a blank all-black image, by filling it with white.

drawContours( contourDst, contours, largest_contour_index, color, CV_FILLED, 8, hierarchy, 0 )

Figure 5.7 The mask of the field given in white, so as to perform an AND operation between this and the binarized image containing the players’ contours as well

Figure 5.8 What the field’s mask translates to if moved back to its original colors and only have the stands/fans removed

Obtaining the players masks

After the field mask was detected, we can move further in detecting the players within the field’s boundaries. We start off by combining Fig 5.4 and 5.5., such that all white objects are potential candidates for the players’ masks. The quite crude initial result in Fig. 5.6 needs much better refinement.

Figure 5.6 The AND result between figures 5.4 and 5.7

From this point on, we need to filter out any unwanted groups of pixels that do not represent a player and this is done based on the feature of the contours, as presented in subchapter 4.4.

First, we remove any contours with an area smaller than a thresholded value minArea by iterating through the contours and filling those with an area <=minArea with a black color. While this does get rid of much of the noise and residual midfield line fragments, we still have a large concentration of pixels around the 16m goal-line and near the goalkeeper that cannot be filtered out under the same conditions.

Figure 5.7. Result of elimination of contours based on area, some lines still apparent

To get rid of the lines we use the extent and aspect ratio features of the contours as follows:

if(aspectRatios[i]<2.0 && aspectRatios[i]>0.2 && extents[i] > 0.2) filteredContours.push_back(contours2[i]);

Figure 5.8. The image with the players’ masks converted back to black and white, after all other contours have been eliminated based on extent and aspect ratio

After the labeling of the players is done, their most extreme points in both left and right directions are computer and shown in red(left) and green(right) dots. This helps after identifying the last defender and attacker plus direction of play, in uniting the vanishing point with the representative point of the last defender in order to draw the offside line. The following two figures present the labeling result, first in brighter colors for effect, the second in gray-scale for a greater ease in manipulation of the blobs but also for a better contrast with the represenative points.

Figure 5.9 The labeling result of the blobs representing the players

Figure 5.10 The same players’ masks, now with their extreme points marked in red and green

One might argue that taking the right-most and left-most pixels of the players is not exactly right since it is again a matter of perspective and they would not be wrong, but for simplicity reasons and for the fact that extremities are already prone to the erosion performed initially, we stay put with the general course of the algorithm before getting lost into too many details.

Players clustering

An important part of the offside detection process is how to group together the players of the same team, how to separate them from referees and goalkeepers and so on. This is not trivial at all, since the most intuitive algorithms that come to mind all have obvious drawbacks, so a more elaborate clustering algorithm is necessary.

Before even going into the feature selection or proximity measures, it must be pointed out that clustering itself has seen countless variations and algorithms in the specialized literature, going from the popular K-means clustering to various hierarchical algorithms. However, the K-means clustering does not apply to this problem, since we do not know in advance how many clusters there will be, this number can range from 2 to 4, realistically speaking, 2 clusters representing the teams, 1 cluster being the referees, and another one being represented by a goalkeeper (we can suppose one keeper appearing in an image, but it is a reach to assume both will, such cases are not taken into consideration in the scope of this paper).

Say we were to establish 2 as the number of clusters for the k-means algorithm, we could detect a referee/goalkeeper as a player from either teams and that could interfere with the result of the algorithm. Similarly, if we were to extend this to 4 from the beginning, we might be in the situation where neither referee, nor goalkeeper is in sight, and we would needlessly split the players into 4 groups, instead of the two. Neither situation is desirable, nor acceptable, so the clustering algorithm was the BSAS (Basic Sequential Algorithm Scheme, which is simple and flexible in terms of what we want and can achieve.

The algorithm’s steps have already been described in the previous chapter, we will concentrate on the alternatives for the feature selection, the proximity measure being more or less the same for all cases, given by a distance in histogram values.

Solution 1: for each player, obtain a mean color col(r,g,b); while this might work in some cases, it is bound to fail miserably for teams having a combination of shorts-t-shirts colors that converge to the same color. (i.e. short predominant color- (100,0,100) shirt (0,200,200) and another one having (200, 200, 100) for shorts and (0, 0, 200) for t-shirts, supposing the area of the two kit pieces would be close.

Solution 2: use the color that appears the most in a player’s mask; this makes sense since there are not bound to be many significant differences due to shadows or illumination to affect the RGB’s capabilities and it would be a well suited feature for the proximity measure both computationally and intuitively, but it has the drawback of perhaps finding a player in a position where the majority of his equipment pixels are given by his shorts instead of his t-shirt or by his socks and so on.

Solution 3: use the normalized RGB histogram for each player, made up of 16 bins for every channel; for example the current player has 0.06 of the total pixels where the red channel value is between 32 and 47, 0.06 of the total pixels where the blue channel value is between 48 and 63 and 0.14% pixels where the green channel value is between 64 and 79.

Figure 5.11. The RGB histogram on 16 bins of a particular player

Choosing the third solution for the feature selection part of the clustering algorithm, a proximity measure that can be derived is:

Dist(x1,x2) =

(5.12)

where x1 and x2 are two different players.

An example of applying this formula for the players described by the following two histograms:

Figure 5.13. Histograms of players 4 and 5, and their distance given by distances[3][4], calculated by the formula (5.12)

The simple translation of the (5.12) is given by the findDistanceBetweenBlobs(myBlob first, myBlob second), which returns the distance as a double, taking as arguments two objects of type myBlob that was previously defined to contain information about players.

double findDistanceBetweenBlobs(myBlob first, myBlob second)

{

double distance = 0;

for(int i=0; i<16; i++)

{

distance+= pow((first.histogram.bHist[i] – second.histogram.bHist[i]), 2);

distance+= pow((first.histogram.gHist[i] – second.histogram.gHist[i]), 2);

distance+= pow((first.histogram.rHist[i] – second.histogram.rHist[i]), 2);

}

return sqrt(distance);

}

While the clustering algorithm’s pseudocode was presented in the previous chapter, the C++ code can be found in the annexes of this paper, what remains to still be pointed out is how to choose the maximum distance threshold, since r, the maximum number of clusters, is easily set to 4, as explained previously. Since the algorithm greatly depends on the order in which feature vectors (players) are presented, suppose we iterate through them based on position, starting from the top to bottom and left to right. Based on the original image from Figure 5.2, the first cluster would be initialized by the player in blue equipment. Next, we would analyze a white player, which, in theory, should start its new own cluster. Therefore, the distance between the two players should be big enough so as to not put them together.

Furthermore, we can opt to update the cluster’s representatives once a new player is added to it, perhaps by computing a mean histogram for the players in the cluster, but this is problematic once a player is assigned by mistake since it would shift the average quite a bit and risk adding even more players to a cluster they should not belong to.

Instead, we choose to label each of the processed players with their cluster index, and as we iterate through the unlabeled players, we choose the closest player that has already been labeled. This does introduce some risk, especially in regard to the first couple of assignments, but an experimental value of 0.5 yielded good results and all players were correctly classified for a number of different images. The problems lay where we move from a player to its opponent and whether this player will be correctly classified. Once the clusters grow in size, supposing their members have been rightfully assigned so far, the risk of making a mistake decreases, as there are more and more already labeled players to choose the closest one from and thus, the possibility of that closest distance to further away from the thresholded distance that is given some margin for the close cases. The testing part of this paper will also present the clusters generated for other values of the said distance, but below is the assignment with no mistakes, treating even the goalkeeper as a member of a third cluster, seen in a darker blue.

Figure 5.14. Result of the clustering algorithm- 3 clusters resulted- pink for the white players in the original image, light blue for the blue players and red for the goalkeeper(the referee has been previously removed from the image from the erosions)

Line detection and finding the vanishing point

In order to find the horizontal lines crossing the field, we need to first apply Canny on the original image and then detect the lines with the Hough Transform. This has been done by using the OpenCV built-in functions Canny() and HoughLines(), both of which will be shortly described next.

Canny(srcImage, dstImage, lowThreshold, lowThreshold*ratio, kernel_size) has the following parameters:

srcImage- the source image

dstImage- the destination image, can be the same with the input

lowThreshold- the low threshold value used in the hysteresis thresholding phase

ratio- lowThreshold*ratio gives the higher threshold value used in the hysteresis thresholding phase(usually 3, as per Canny’s recommendation)

kernel_size- the size of the Sobel kernel in calculating the gradients, usually chosen to be 3

Figure 5.15 Output of the Canny edge detection function

HoughLines(dst, lines, rho, theta, threshold, srn, stn)

dst- output of the edge detector; should be a gray scale image, although in reality is a binary one

lines- vector holding the parameters in the polar coordinate systems of the detected lines; must be previously declared as vector<Vec2f>

rho- resolution of the parameter r in pixels, a value of a 1 was used, so CV_PI/180

threshold- the minimum number of intersections necessary in order to ‘detect’ a line

srn, stn- default parameters set to 0

Obviously, choosing a higher threshold gives fewer lines, but even so, we must still filter the lines to only contain those that are ‘vertical’ or perpendicular with reference to the width of the field.

Figure 5.15 All lines detected by the HoughLines function

In order to keep only the lines we are interested in, we must filter them based on their parameters, namely theta parameter. This is done with the following condition:

if(theta * 180/ PI > -delta && theta * 180/PI < delta || theta * 180/ PI > 180-delta && theta * 180/PI < 180+delta)

where delta is set to 60 degrees.

Figure 5.16 The result of filtering out the lines detected with the HoughLines function to preserve only those that are ‘parallel’ and which can be used to help the drawing of the final offside line

Once the lines have been found and drawn, we can find their ‘intersection’ point, which can help in drawing other ‘parallel’ lines to them. Given than many such lines can be found, some very close, we choose two lines that differ by a minimum angle in order to get a vanishing point that is close to reality. Other more sophisticated methods could be implemented, such as finding the intersection points of each pair of lines and then choosing the final vanishing point which is closest to greater concentration of points obtained, but for simplicity reasons, we choose this method that simply takes the first two lines that have an angle greater than 20 degrees. The lines’ angles are obtained given their start and end points, which are in turn obtained from the parameters given by the HoughLines function, with the following function:

double findLineAngle(myLine currLine)

{

double angle = atan2(currLine.end.y – currLine.start.y, currLine.end.x – currLine.start.x) * 180.0 / CV_PI;

return abs((angle > 360 ? 360 : angle));

}

Obtaining the final vanishing point is again a matter of mathematical equations, given two lines represented by their starting and ending points, as follows:

bool intersection2(Point2f o1, Point2f p1, Point2f o2, Point2f p2,

Point2f &r)

{

Point2f x = o2 – o1;

Point2f d1 = p1 – o1;

Point2f d2 = p2 – o2;

float cross = d1.x*d2.y – d1.y*d2.x;

if (abs(cross) < /*EPS*/1e-8)

return false;

double t1 = (x.x * d2.y – x.y * d2.x)/cross;

r = o1 + d1 * t1;

return true;

}

Uniting the resulting intersection point r, with the top-most pixel of every player, we get the following image:

Figure 5.17. Uniting the players first discovered point with the vanishing point

Determining direction of play and differentiating between attackers and defenders

Another problem posed by analysing a single static image shot at a random time during a match, is how to distinguish between attackers and defenders, in order to identify the last of each team, and then, in order to draw the offside line, we must get their left-most or right-most point, depending on the direction of play.

There are many approaches to both of these tasks, none guaranteeing accurate results every single time, but this can be overlooked as in a real-life scenario, both variables are known and they could be set from the start of the match.

For a general, optimistic try at finding these two variables from an image alone, we consider the following:

We only analyse players that are members of clusters with a size greater than 1(we suppose again for simplicity reasons that at most 1 referee and 1 goalkeeper would be visible, and that each team has at least 2 visible players)

We iterate through the players and find those that are closer to the previously identified goalkeeper. This is done in a less crude manner than just computing the difference on the x-axis between their representative points. Instead, we compute the angle made by the two representative points and the vanishing point obtained in 5.5.

We then sort the vector of angles in an increasing order, and iterate through it, stopping when we have found a number X of players of a certain cluster

At that moment we know there are X players of cluster C1 closer to the goalkeeper, and Y players of cluster C2, where X > Y.

The main assumption in this scenario is that there should be more defenders close to the goalkeeper than attackers

A reasonable value for X was chosen as 3

The direction of play could be deduced from the position of the goalkeeper in the event that he is visible in the picture

Other posibilities include taking into consideration the players’ inclination or number compared to the opposition, but most of them have their obvious drawbacks and there is no clear way to describe all possible scenarios and derive a formulation that fits the description every time so we can either stick to the optimistic approach detailed above or simply set the two variables at the start of the algorithm, since it is supposed to be part of the fine-tuning and settings made by real-world systems in this context as well.

Drawing the offside line

After the last defender was identified, all that remains is to unite the vanishing point with the defender’s representative point and to stretch the line so that it extends till the opposite part of the field. This is calculated by obtaining the slope of the line given by the two points, and intersecting it with the lowest part of the image.

For a better result, the line was then iterated with a built-in OpenCV function and only those pixels of the line were kept which are contained in the field mask.

An optional step to make the final picture look as close to the replays shown on TV was to darken the side of the field that is beyond the offside line, with a simple multiplication of the current RGB values with a k factor, 0 < k < 1.

Figure 5.18 Final image with the white offside line drawn, clearly showing the last blue attacker in an offside position

Testing and Validation

All the intermediary images obtained by following the steps described in Chapter 5 have already been presented. The final offside line is detected correctly and the end result is close to what one might expect to see on TV in a real situation.

However, in trying the algorithm with other pictures, the limitations of a single camera and a single static image became apparent, as there are way too many variables to be set, most of which cannot be obtained statistically, once the image has been loaded.

A good starting point was the background subtraction method which yielded good results for a variety of other pictures, and while the routine could still see improvement, its results were comparable, if not better than some of the background subtraction methods presented in Chapter 2.

Some of these results are presented below and we insist on this, since it is representative of what can be achieved given the limits, since it is the first step in the algorithm, and while a good result is imperative at this stage, having not too many variables to set allows for some flexibility in terms of differences between images.

The first two sets of images, along with those presented in Chapter 1 and Chapter 4 show good results, while the last two are given as examples where even this first step would prove to be difficult to get right.

Figure 6.1. Test image 1

Figure 6.2. Background subtraction performed on test image 1

Figure 6.3 Test image 3

Figure 6.4 Background subtraction result on test image 3, players that appear on the sideline can be detected as part of the foreground, it is the next steps’ responsibility to deal(if possible) with this scenario

Figure 6.5 Test image 4

Figure 6.6 Result of background subtraction of test image 4, hands and field irregularities make the player detection quite hard in this case and not easy to process further on, as erosions would be needed but not so much as to eliminate important parts of the players, which could be their body’s most ‘advanced’ part, which would make for a poor offside line drawing from the start

Figure 6.7 Test image 5

Figure 6.8 Result of test image 5, the players’ detection was not too bad, but in this particular image keeping the 16m line of the goal to the right of the field would be very difficult given how close it is in its tone to the green tones of the field.

The clustering algorithm also came with good results once the players’ masks have been identified, even for different scenes.

Figure 6.9 Labeling of players(that were not eroded previously..) for test image 3 from Figure 6.3.

It is worth noting that once the variable parameters have been taken out of the equation (or rather adjusted for the current test image), the algorithm can sometimes complete successfully and give the end result as expected, like in figures 6.9 and 6.10. However, in practice, there are way too many scenarios introducing too many variables with very little statistical information to be obtained from a single image, such that this version of the algorithm and perhaps even revised ones would not be able to cover all cases ever. What this paper did, in turn, was present the steps that would lead to a solution and choose among the alternatives for these steps in order to come up with a workable, runnable algorithm that can illustrate the whole process, which given the resources available seems like a fair compromise between the highly challenging task of offside detection and the algorithm’s final validity.

Figure 6.9 Final result after manipulation of variables for test image 1(correctly shows the last blue attacker in an offside position)

Figure 6.10 Final result after manipulation of variables for original image(correctly shows the last blue attacker in an offside position)

User’s manual

The application implemented is quite a straight-forward one and does not require many resources or great efforts on the user’s part.

Installation manual

Software resources

The only software restriction is in regard to the Operating System, which must be Windows 7, on 32 or (preferrably) 64 bits.

For any further development of the application, one should install Visual Studio 2010(or higher) and the OpenCV library(2.3 or higher)

Hardware resources

For a good user experience, the miminum requirements for running the application are:

At least 2 GB of RAM memory

At least a 1.8Ghz dual-core processor

Installing the application

A user can either run the OffsideDetection.exe file, the alternative being to open the project in Visual Studio and choose to Run it via the dedicated menu button.

Running the application

Upon starting the application, the user is prompted to choose a single image as an input for the algorithm., as seen in Figure 7.1.

In the demonstrative version of the program, the first couple of intermediary photos are immediately displayed and the program awaits for the user’s key press to proceed with the next stages of the algorithm and their result, leading up to the final image. The user can press any of the keys to advance through the pictures, at whatever speed they feel comfortable with, since all previous pictures are still left open. This way the user can go back and see the intermediate result and perhaps find the culprit for a scenario that does not give the desired result. After the final key press, i.e. the key press after the final image with the offiside line has been drawn, all windows are closed and the program is considered to be terminated.

Figure 7.1 File open dialog and initial state of the application once started

Figure 7.2 Images obtained so far in the algorithm, user needs to press a new key to get the next images, up until the very last one. Previously obtained images remain visible until the last key press that effectively terminates the program

Conclusions

Contributions/achievements

Overall, an algorithm that leads to the final goal of drawing the offside line in a static image based on its several features has been developed. Several sub-goals can constitute the subject of individual research papers and while in no way complete, they scratch the surface of classical, important problems in image processing. Some of these sub-goals are listed below:

Background subtraction/field extraction- while not understood in the same way the term is used in specialized literature, i.e. pertaining more to moving object detection, this part represents the backbone of the whole algorithm and in using the HSV space, as opposed to the RGB space most of the papers in Chapter 2 have, we provided an alternative that gives good results

Detecting players- in no way a trivial task, we have managed to extract the players’ masks with enough pixel accuracy, especially in the case of ‘favorable’ images in terms of contrast/field texture.

Clustering players into teams- based on their normalized RGB histograms and by making use of the BSAS algorithm, with good results for various images

Drawing the final offside line determined by the vanishing point and representative point of the last identified player, along with the associated out of play area

Critical analysis of the results achieved and possible improvements

While the technical term for this detection algorithm would be that of a classifier, i.e. saying if a certain image has a player in a potential offside situation, there has not been a solid number of test images to support its validity. However, this is understandable given the very complex task at hand, where only a few scenarios could be accounted for, and even so, there are numerous variables introduced by the particularities of each image, such as field texture, position of players at the margins of the field, occlusions that make two players be detected as one, or one player connected to a field line and so on.

The three main aforementioned sub-goals have all been achieved with relative success and, provided with a good set of input images for each of the sub-goals, are able to output the desired results. No matter the other variables introduced, the algorithms should be flexible enough to be reused in an ulterior revision of the offside detection algorithm, perhaps a new version of it, or even an algorithm that is only tangent to it, obtaining other information of interest given the scene of a football field.

Ultimately, the actual offside detection based on a single image, regardless of its properties or rather without any prior knowledge is hard to solve. A proof in this sense has been already formulated in the research part of this paper, the lesson being that one single camera cannot capture the positions of all players without the risk of occlusions, and even with several cameras installed by professional systems offering the official replays we see on TV, there is also a certain amount of fine tuning involved.

The previously mentioned variables that make it hard even for sets of images where players do not overlap much could be obtained manually, say with a color picker for the field and main jerseys of the players, in order to speed up and make the algorithm more practical, but the entire purpose of this paper was to do as much as possible with as little direct interaction with the image. Thus, given a simple map of pixels we do not know much about, we were able to extract valuable information without any further inquiries or user intervention.

The task of offside detection can be extended or viewed from several other angles, each with its benefits and disadvantages. As mentioned in the introductory part of this paper, while in theory the detection should require more effort in case of a video sequence, it would also provide a lot more useful information to make the algorithm more knowledgeable of its input or rather input features. Again, this refers to the video sequence captured by one single camera. It would indeed be interesting to have the captures of several cameras and try a 3D reconstruction of the scene. This would in no doubt get rid of most limitations we were confronted with, but it would not only be a much greater effort, it would also mean we have the resources to obtain such footage from several angles. If we limit ourselves to what could be achieved with the same resources used for the elaboration of this algorithm, we can still find several improvement ideas:

Determining the minimum area of an object for it to be detected as a player statistically

Determining the threshold distance used in the BSAS algorithm statistically as well- 0.5 proved to be a good value for most test cases, one idea would have been to choose the largest or second largest of the minimum distances from one player to another. By choosing the largest of the minimums, we could effectively solve all cases where referees and goalkeepers are not visible, since it makes sense for every player to have its closest player be in its team; This way, we are sure to assign the most problematic pair of players to the same team and the rest follows easily

A better identification of the representative point of the players. This was chosen as their right-most or left-most point given the 2D pixels that comprise a player’s area, but once again it is a matter of perspective and while in obvious calls this works, for close calls it might make all the difference

A more precise detection of the vanishing point, either by the solution given in [11] or another that makes use of all lines found in the image

Dealing with pictures where the players are very close to the camera or they are at the very far end of the field

A simple extension for a better user experience would be to allow video processing, in that a user could simply load a video sequence and pause at the frame that (s)he wants to analyze

Another similar extension could be to allow the continuous offside detection on a frame by frame basis, but this would also require improvements in terms of performance

While we have rejected the idea of manual manipulation via color pickers for example, or any other direct interaction with the colors in the image, the main variables that make it hard to have a working algorithm for a variety of pictures could be adjusted manually via sliders or other elements in a simple user interface. This would give the user the possibility of manipulating the picture and still getting a good result for a much larger set of images at the expense of maybe a few seconds. Of course, the user would have to be trained and know what constitutes as a good intermediary image, as well as what those variables refer to exactly

Try and account for the other offside scenarios as well, i.e. receiving a pass from own half does not make one passible of being detected in an offside position. Ball detection in itself would be a big improvement, but definitely a hard task as well, since there are several other objects it could be mistaken for(field patches) and, most of the times, it would be seen as part of a player’s contour since it is expected that the image is taken at the exact moment a player is kicking the ball

a

Similar Posts