转自:https://en.wikipedia.org/wiki/Connected-component_labeling
Connected-component labeling (alternatively connected-component analysis, blob extraction, region labeling, blob discovery, or region extraction) is an algorithmic application of graph theory, where subsets of connected components are uniquely labeled based on a given heuristic.
连通区域标记(或连通区域分析、blob提取、区域标记、blob发现或区域提取)是图论的算法应用,其中连通区域的子集是基于给定的启发式的唯一标记。
Connected-component labeling is not to be confused with segmentation.
连接元件标签不能与分割混淆。
Connected-component labeling is used in computer vision to detect connected regions in binary digital images, although color images and data with higher dimensionality can also be processed.
在计算机视觉中,连通区域标记用于检测二进制数字图像中的连接区域,即使彩色图像和具有较高维度的数据也可以被处理。
When integrated into an image recognition system or human-computer interaction interface, connected component labeling can operate on a variety of information.
当集成到图像识别系统或人机交互界面时,连通区域标签可以操作各种(各部分)信息。
Blob extraction is generally performed on the resulting binary image from a thresholding step, but it can be applicable to gray-scale and color images as well.
通常在阈值步骤中对产生的二进制图像执行Blob提取,但它也适用于灰度和彩色图像。(Blob在机器视觉中是指图像中的具有相似颜色、纹理等特征所组成的一块连通区域。)
Blobs may be counted, filtered, and tracked.
blob可以被计数、过滤和跟踪。
Blob extraction is related to but distinct from blob detection.
Blob提取与Blob检测有关联,但与Blob检测不同。
Contents
[hide]- 1Overview
- 2Definition [6][7]
- 3Algorithms
- 3.1One component at a time
- 3.2Two-pass
- 4Graphical example of two-pass algorithm
- 4.1Sequential algorithm
- 5Others
- 5.1One-pass version
- 6Performance evaluation
- 7Hardware architectures
- 8See also
- 9References
- 10External links
overview
A graph, containing vertices and connecting edges, is constructed from relevant input data.
一个包含顶点和连接边的图是由相关的输入数据构造的。
The vertices contain information required by the comparison heuristic, while the edges indicate connected 'neighbors'.
这些顶点包含了比较启发式所要求的信息,而边缘则表示连接的“邻居”。
An algorithm traverses the graph, labeling the vertices based on the connectivity and relative values of their neighbors.
一种算法遍历图像,根据相邻的连通性和相对值对顶点进行标记。
Connectivity is determined by the medium;
连接性是由媒介决定的;
image graphs, for example, can be 4-connected neighborhood or 8-connected neighborhood.
例如,图像图可以是4个连接的社区或8个连接的社区。
Following the labeling stage, the graph may be partitioned into subsets, after which the original information can be recovered and processed .
在标记阶段之后,图表可以被划分为子集,之后原始信息可以被恢复和处理。
definition
The usage of the term connected components labeling (CCL) and its definition is quite consistent in the academic literature, whereas connected components analysis (CCA) varies in terms of both, terminology and problem definition.
术语连通区域标记(CCL)及其定义在学术文献中是相当一致的,而连通区域分析(CCA)则在术语和问题定义方面各不相同。
Rosenfeld et al.[8] define connected components labeling as the “[c]reation of a labeled image in which the positions associated with the same connected component of the binary input image have a unique label.
罗森菲尔德等人将连通区域标记定义为“标记图像的创建”,在这个图像中,与二进制输入图像中相同的连通区域相关联的位置有一个唯一的标签。”
Shapiro et al.[9] define CCL as an operator whose “input is a binary image and
[... 夏比奥等9将CCL定义为一个操作,它的“输入是二进制图像和……”]
output is a symbolic image in which the label assigned to each pixel is an integer uniquely identifying the connected component to which that pixel belongs.”
输出是一个符号图像,其中分配给每个像素的标签是唯一标识该像素所属的连通区域的整数标签。”
There is no consensus on the definition of connected components analysis in the academic literature.
摘要在学术文献中,对连通区域分析(CCA)的定义没有达成共识。
It is often used interchangeably with CCL.
它经常与CCL互换使用。
[10][11] A more extensive definition is given by Shapiro et al.:[9] “Connected component analysis consists of connected component labeling of the black pixels followed by property measurement of the component regions and decision making.
夏比奥等人给出了一个更广泛的定义:9“连通区域分析由黑色像素的连接组件标记,然后是组件区域的属性测量和决策制定。”
The definition for connected components analysis presented here is more general, taking the thoughts expressed in [10][11][9] into account. “
这里给出的连接组件分析的定义更一般,考虑了10 11 9中所表达的思想。
“Connected components analysis derives one feature vector of each connected component in a binary 2-D input image.
“互联组件分析在二进制2-D输入图像中派生出每个连接组件的一个特征向量。
A feature vector of a connected component is an n-tuple composed of functions of the component’s pattern.”
连接组件的一个特征向量是由组件模式的函数组成的n元组。”
algorithm
The algorithms discussed can be generalized to arbitrary dimensions, albeit with increased time and space complexity.
讨论的算法可以被推广到任意维度,尽管时间和空间的复杂性增加了。
One component at a time
一次一个组件
This is a fast and very simple method to implement and understand.
这是一个快速且非常简单的实现和理解的方法。
It is based on graph traversal methods in graph theory.
它是基于图论中的图形遍历方法。
In short, once the first pixel of a connected component is found, all the connected pixels of that connected component are labelled before going onto the next pixel in the image.
简而言之,一旦找到连接组件的第一个像素,连接组件的所有连接像素都会在进入图像的下一个像素之前被标记。
This algorithm is part of Vincent and Soille's watershed segmentation algorithm,[12] other implementations also exist.[13]
该算法是Vincent和Soille的分水岭分割算法的一部分,其他12种实现也存在。
In order to do that a linked list is formed that will keep the indexes of the pixels that are connected to each other, steps (2) and (3) below.
为了做到这一点,形成了一个链表,它将保持相互连接的像素的索引,下面的步骤(2)和(3)。
The method of defining the linked list specifies the use of a depth or a breadth first search.
定义链表的方法指定了深度或广度优先搜索的使用。
For this particular application, there is no difference which strategy to use.
对于这个特殊的应用程序,使用哪种策略没有区别。
The simplest kind of a last in first out queue implemented as a singly linked list will result in a depth first search strategy.
最简单的一种最后的队列是作为单链表实现的,这将导致深度优先搜索策略。
It is assumed that the input image is a binary image, with pixels being either background or foreground and that the connected components in the foreground pixels are desired.
假设输入图像是一个二进制图像,像素是背景或前景,并且前景像素中的连接组件是需要的。
The algorithm steps can be written as:
算法步骤可以写成:
1.Start from the first pixel in the image.
从图像的第一个像素开始。
Set current label to 1.
将当前标签设置为1。
Go to (2). (2)。
2.If this pixel is a foreground pixel and it is not already labelled, give it the current label and add it as the first element in a queue, then go to (3). If it is a background pixel or it was already labelled, then repeat (2) for the next pixel in the image.
如果这是一个前景像素并没有贴上标签,给它当前的标签并将它添加作为一个队列的第一个元素,然后去(3)。如果是一个背景像素或者它已经贴上标签,然后重复(2)像素的图像。
3.Pop out an element from the queue, and look at its neighbours (based on any type of connectivity).
从队列中取出一个元素,然后查看它的邻居(基于任何类型的连接)。
If a neighbour is a foreground pixel and is not already labelled, give it the current label and add it to the queue.
如果一个邻居是一个前台像素,并且没有被标记,那么给它当前的标签并将其添加到队列中。
Repeat (3) until there are no more elements in the queue.
重复(3)直到队列中没有更多的元素。
4.Go to (2) for the next pixel in the image and increment current label by 1.
转到(2)图像中的下一个像素,并将当前标签增加1。
Note that the pixels are labelled before being put into the queue.
注意,在放入队列之前,像素被标记为标记。
The queue will only keep a pixel to check its neighbours and add them to the queue if necessary.
队列只会保留一个像素来检查其邻居,并在必要时将其添加到队列中。
This algorithm only needs to check the neighbours of each foreground pixel once and doesn't check the neighbours of background pixels.
这个算法只需要检查每个前景像素的邻居一次,并且不检查背景像素的邻居。
Two-pass
双行程
Relatively simple to implement and understand, the two-pass algorithm,[14] (also known as the Hoshen–Kopelman algorithm) iterates through 2-dimensional binary data.
相对简单的实现和理解,双程算法,14(也称为Hoshen——Kopelman算法)迭代2维二进制数据。
The algorithm makes two passes over the image.
该算法对图像进行了两次传递。
The first pass to assign temporary labels and record equivalences and the second pass to replace each temporary label by the smallest label of its equivalence class.
第一次通过分配临时标签和记录相等性,第二次传递用等价类的最小标签替换每个临时标签。
The input data can be modified in situ (which carries the risk of data corruption), or labeling information can be maintained in an additional data structure.
输入数据可以就地修改(这可能会带来数据损坏的风险),或者可以在额外的数据结构中维护标签信息。
Connectivity checks are carried out by checking neighbor pixels' labels (neighbor elements whose labels are not assigned yet are ignored), or say, the North-East, the North, the North-West and the West of the current pixel (assuming 8-connectivity).
连接检查是通过检查邻居像素的标签(标签未被分配的邻居元素被忽略),或者说是东北、北部、西北和当前像素的西部(假设8个连接)。
4-connectivity uses only North and West neighbors of the current pixel.
4-连通性只使用当前像素的北和西邻居。
The following conditions are checked to determine the value of the label to be assigned to the current pixel (4-connectivity is assumed)
检查以下条件,以确定被分配给当前像素的标签的值(假定4-连通性)
Conditions to check: 条件检查:
1. Does the pixel to the left (West) have the same value as the current pixel?
左边(西方)的像素是否与当前像素有相同的值?
Yes – We are in the same region.
是的——我们在同一个地区。
Assign the same label to the current pixel
将相同的标签分配给当前像素
No – Check next condition
不——检查下一个条件
2.Do both pixels to the North and West of the current pixel have the same value as the current pixel but not the same label?
在当前像素的北部和西部,两个像素都有与当前像素相同的值,但不是相同的标签吗?
Yes – We know that the North and West pixels belong to the same region and must be merged.
是的——我们知道,北部和西部的像素属于同一地区,必须合并。
Assign the current pixel the minimum of the North and West labels, and record their equivalence relationship
将当前的像素分配给北部和西部的最小值,并记录它们的等价关系
No – Check next condition
不——检查下一个条件
3.Does the pixel to the left (West) have a different value and the one to the North the same value as the current pixel?
左边(西方)的像素有不同的值,而向北的像素与当前像素的值相同吗?
Yes – Assign the label of the North pixel to the current pixel
是的——将北部像素的标签分配给当前像素
No – Check next condition
不——检查下一个条件
4.Do the pixel's North and West neighbors have different pixel values than current pixel?
像素的北部和西部邻居的像素值是否与当前像素不同?
Yes – Create a new label id and assign it to the current pixel
是的——创建一个新的label id并将其分配给当前像素
The algorithm continues this way, and creates new region labels whenever necessary.
该算法以这种方式继续进行,并在必要时创建新的区域标签。
The key to a fast algorithm, however, is how this merging is done.
然而,快速算法的关键在于如何实现这种合并。
This algorithm uses the union-find data structure which provides excellent performance for keeping track of equivalence relationships.
该算法使用了联合查找的数据结构,为跟踪等价关系提供了良好的性能。
[15] Union-find essentially stores labels which correspond to the same blob in a disjoint-set data structure, making it easy to remember the equivalence of two labels by the use of an interface method E.g.: findSet(l).
15个工会发现本质上是在一个分离的数据结构中存储与相同的blob对应的标签,这样就可以很容易地记住两个标签的等价性,例如:
findSet(l)。findSet(l) returns the minimum label value that is equivalent to the function argument 'l'.
findSet(l)返回等价于函数参数'l'的最小标签值。
Once the initial labeling and equivalence recording is completed, the second pass merely replaces each pixel label with its equivalent disjoint-set representative element.
一旦初始标记和等效记录完成,第二次传递仅仅用等效的dis联合集合代表元素替换每个像素标签。
A faster-scanning algorithm for connected-region extraction is presented below.[16]
下面给出了一种用于连接区域提取的快速扫描算法。
On the first pass:
第一遍:
1.Iterate through each element of the data by column, then by row (Raster Scanning)
逐列遍历数据的每个元素,然后逐行(光栅扫描)
2.If the element is not the background
如果元素不是背景
2.1 Get the neighboring elements of the current element
获取当前元素的相邻元素
2.2 If there are no neighbors, uniquely label the current element and continue
如果没有邻居,唯一的标签是当前元素并继续
2.3 Otherwise, find the neighbor with the smallest label and assign it to the current element
否则,找到最小标签的邻居并将其分配给当前元素
2.4 Store the equivalence between neighboring labels
存储相邻标签之间的等价性
On the second pass:
第二步:
1.Iterate through each element of the data by column, then by row
遍历数据的每个元素,逐列,然后逐行遍历
2.If the element is not the background
如果元素不是背景
2.1 Relabel the element with the lowest equivalent label
将元素与最低的等价标签进行重新标记
Here, the background is a classification, specific to the data, used to distinguish salient elements from the foreground.
在这里,背景是一种特定于数据的分类,用于区分突出的元素和前景。
If the background variable is omitted, then the two-pass algorithm will treat the background as another region.
如果省略了背景变量,那么双程算法将把背景作为另一个区域来处理。
Graphical example of two-pass algorithm
双程算法的图形示例
The array from which connected regions are to be extracted is given below (8-connectivity based).
下面给出了要提取连接区域的阵列(基于8个连接)。
We first assign different binary values to elements in the graph.
我们首先将不同的二进制值分配给图中的元素。
Attention should be paid that the "0~1" values written on the center of the elements in the following graph are elements' values.
应该注意的是,在下面的图中,在元素的中心写的“0 1”值是元素的值。
While, the "1,2,... ……,7" values in the next two graphs are the elements' labels.
同时,“1、2、.... 7“下两个图中的值是元素的标签。
The two concepts should not be confused.
这两个概念不应该被混淆。
2. After the first pass, the following labels are generated.
在第一次通过之后,会生成下列标签。
A total of 7 labels are generated in accordance with the conditions highlighted above.
根据上面提到的条件,总共生成了7个标签。
The label equivalence relationships generated are,
生成的标签等价关系是,
Set ID | Equivalent Labels |
---|---|
1 | 1,2 |
2 | 1,2 |
3 | 3,4,5,6,7 |
4 | 3,4,5,6,7 |
5 | 3,4,5,6,7 |
6 | 3,4,5,6,7 |
7 | 3,4,5,6,7 |
3. Array generated after the merging of labels is carried out.
在标签合并后生成的数组被执行。
Here, the label value that was the smallest for a given region "floods" throughout the connected region and gives two distinct labels, and hence two distinct labels.
在这里,给定区域中最小的标签值“洪水”贯穿整个连接区域,并给出两个不同的标签,因此有两个不同的标签。
4. 所示。
Final result in color to clearly see two different regions that have been found in the array.
最后的结果是颜色,清楚地看到在数组中找到的两个不同的区域。
The pseudocode is as follows:
伪代码如下:
algorithm TwoPass(data)linked = []labels = structure with dimensions of data, initialized with the value of BackgroundFirst passfor row in data:for column in row:if data[row][column] is not Backgroundneighbors = connected elements with the current element's valueif neighbors is emptylinked[NextLabel] = set containing NextLabellabels[row][column] = NextLabelNextLabel += 1elseFind the smallest labelL = neighbors labelslabels[row][column] = min(L)for label in Llinked[label] = union(linked[label], L)Second passfor row in datafor column in rowif data[row][column] is not Backgroundlabels[row][column] = find(labels[row][column])return label
Sequential algorithm
序贯算法
Create a region counter
创建一个地区计数器
Scan the image (in the following example, it is assumed that scanning is done from left to right and from top to bottom):
扫描图像(在下面的例子中,假设扫描是从左到右,从上到下):
For every pixel check the north and west pixel (when considering 4-connectivity) or the northeast, north, northwest, and west pixel for 8-connectivity for a given region criterion (i.e. intensity value of 1 in binary image, or similar intensity to connected pixels in gray-scale image).
对于每个像素,检查北部和西部的像素(在考虑4-连通性)或东北、北、西北和西像素,为一个给定区域的标准(即二进制图像的强度值为1,或灰度图像中与连接像素相似的强度)。
If none of the neighbors fit the criterion then assign to region value of the region counter.
如果没有一个邻居符合这个标准,那么就分配区域计数器的区域值。
Increment region counter.
增加地区计数器。
If only one neighbor fits the criterion assign pixel to that region.
如果只有一个邻居符合这个标准,那么就把像素分配给那个区域。
If multiple neighbors match and are all members of the same region, assign pixel to their region.
如果多个邻居匹配并且都是同一区域的成员,那么将像素分配给他们的区域。
If multiple neighbors match and are members of different regions, assign pixel to one of the regions (it doesn't matter which one).
如果多个邻居匹配并且是不同区域的成员,那么将像素分配给其中一个区域(这并不重要)。
Indicate that all of these regions are equivalent.
表示所有这些区域都是等价的。
Scan image again, assigning all equivalent regions the same region value.
再次扫描图像,将相同区域的值分配给相同的区域。
Others
其他
Some of the steps present in the two-pass algorithm can be merged for efficiency, allowing for a single sweep through the image.
双程算法中的一些步骤可以合并为效率,允许对图像进行一次扫描。
Multi-pass algorithms also exist, some of which run in linear time relative to the number of image pixels.[17]
多通道算法也存在,其中一些算法在线性时间内运行,相对于图像像素的数量。
In the early 1990s, there was considerable interest in parallelizing connected-component algorithms in image analysis applications, due to the bottleneck of sequentially processing each pixel.[18]
在20世纪90年代早期,由于顺序处理每个像素的瓶颈,在图像分析应用程序中有相当大的兴趣在图像分析应用程序中并行化连接组件算法。
The interest to the algorithm arises again with an extensive use of CUDA.
对该算法的兴趣再次出现,广泛使用了CUDA。
One-pass version
一次通过的版本
A one pass (also referred to as single pass) version of the connected-component-labeling algorithm is given as follows.
摘要给出了一种基于连接件标记算法的递进(也称为单通)版本。
The algorithm identifies and marks the connected components in a single pass.
该算法在一次传递中识别并标记连接的组件。
The run time of the algorithm depends on the size of the image and the number of connected components (which create an overhead).
该算法的运行时间取决于图像的大小和连接组件的数量(这会产生开销)。
The run time is comparable to the two pass algorithm if there are a lot of small objects distributed over the entire image such that they cover a significant number of pixels from it.
如果在整个图像上分布了大量的小对象,那么运行时间就可以与两种传递算法相媲美,这样它们就可以覆盖大量的像素。Otherwise the algorithm runs fairly fast.
否则,算法运行得相当快。
Algorithm:
算法:
1.Connected-component matrix is initialized to size of image matrix.
连接组件矩阵被初始化为图像矩阵的大小。
2.A mark is initialized and incremented for every detected object in the image.
对图像中每个检测到的对象都初始化并增加一个标记。
3.A counter is initialized to count the number of objects.
一个计数器被初始化以计算对象的数量。
4.A row-major scan is started for the entire image.
对整个图像进行了一次大的扫描。
5.If an object pixel is detected, then following steps are repeated while (Index !=0)
如果检测到一个对象像素,则重复执行下列步骤(index!=0)
5.1 Set the corresponding pixel to 0 in Image.
在图像中设置相应的像素为0。
5.2 A vector (Index) is updated with all the neighboring pixels of the currently set pixels.
一个矢量(索引)被更新为当前设置的像素的所有相邻像素。
5.3 Unique pixels are retained and repeated pixels are removed.
独特的像素被保留,重复的像素被删除。
5.4 Set the pixels indicated by Index to mark in the connected-component matrix.
在连接组件矩阵中设置由索引标记的像素。
6. Increment the marker for another object in the image.
增加图像中另一个物体的标记。
The source code is as follows(4-connectivity based):
源代码如下(4-连通性):
One-Pass(Image)[M, N]=size(Image);Connected = zeros(M,N);Mark = Value;Difference = Increment;Offsets = [-1; M; 1; -M];Index = [];No_of_Objects = 0;for i: 1:M :for j: 1:N:if(Image(i,j)==1)No_of_Objects = No_of_Objects +1;Index = [((j-1)*M + i)];Connected(Index)=Mark;while ~isempty(Index)Image(Index)=0;Neighbors = bsxfun(@plus, Index, Offsets');Neighbors = unique(Neighbors(:));Index = Neighbors(find(Image(Neighbors)));Connected(Index)=Mark;endMark = Mark + Difference;endendend