Skip to content

API Reference

This is an implementation of the Chinese Whispers clustering algorithm.

WeightDict

Bases: TypedDict

A dictionary-like class that stores weights for nodes.

Attributes:

Name Type Description
weight float

The weight value associated with a key.

Source code in chinese_whispers/chinese_whispers.py
17
18
19
20
21
22
23
class WeightDict(TypedDict):
    """A dictionary-like class that stores weights for nodes.

    Attributes:
        weight: The weight value associated with a key.
    """
    weight: float

aggregate_clusters(G, label_key='label')

Produce a dictionary with the keys being cluster IDs and the values being sets of cluster elements.

Parameters:

Name Type Description Default
G Graph[T]

The graph object containing the clusters.

required
label_key str

The attribute key used to identify the clusters. Defaults to 'label'.

'label'

Returns:

Type Description
Dict[int, Set[T]]

A dictionary where the keys represent cluster IDs and the values are sets of cluster elements.

Source code in chinese_whispers/chinese_whispers.py
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
def aggregate_clusters(
        G: 'Graph[T]',
        label_key: str = 'label'
) -> Dict[int, Set[T]]:
    """
    Produce a dictionary with the keys being cluster IDs and the values being sets of cluster elements.

    Parameters:
        G: The graph object containing the clusters.
        label_key: The attribute key used to identify the clusters. Defaults to 'label'.

    Returns:
        A dictionary where the keys represent cluster IDs and the values are sets of cluster elements.
    """

    clusters: Dict[int, Set[T]] = {}

    for node in G:
        label = G.nodes[node][label_key]

        if label not in clusters:
            clusters[label] = {node}
        else:
            clusters[label].add(node)

    return clusters

chinese_whispers(G, weighting='top', iterations=20, seed=None, label_key='label')

Perform clustering of nodes in a graph using the 'weighting' method.

Parameters:

Name Type Description Default
G Graph[T]

The input graph.

required
weighting Union[str, Callable[[Graph[T], T, T], float]]

The weighing method to use. It can be either a string specifying one of the three available schemas ('top', 'lin', 'log'), or a custom weighting function. Defaults to 'top'.

'top'
iterations int

The maximum number of iterations to perform. Defaults to 20.

20
seed Optional[int]

The random seed to use. Defaults to None.

None
label_key str

The key to store the cluster labels in the graph nodes. Defaults to 'label'.

'label'

Returns:

Type Description
Graph[T]

The input graph with cluster labels assigned to nodes.

Three weighing schemas are available:

  • top: Just use the edge weights from the input graph.
  • lin: Normalize an edge weight by the degree of the related node.
  • log: Normalize an edge weight by the logarithm of the related node degree.

It is possible to specify the maximum number of iterations as well as the random seed to use.

Source code in chinese_whispers/chinese_whispers.py
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
def chinese_whispers(
        G: 'Graph[T]',
        weighting: Union[str, Callable[['Graph[T]', T, T], float]] = 'top',
        iterations: int = 20,
        seed: Optional[int] = None,
        label_key: str = 'label'
) -> 'Graph[T]':
    """
    Perform clustering of nodes in a graph using the 'weighting' method.

    Parameters:
        G: The input graph.
        weighting: The weighing method to use.
            It can be either a string specifying one of the three available schemas ('top', 'lin', 'log'),
            or a custom weighting function. Defaults to 'top'.
        iterations: The maximum number of iterations to perform. Defaults to 20.
        seed: The random seed to use. Defaults to None.
        label_key: The key to store the cluster labels in the graph nodes. Defaults to 'label'.

    Returns:
        The input graph with cluster labels assigned to nodes.

    Three weighing schemas are available:

    - `top`: Just use the edge weights from the input graph.
    - `lin`: Normalize an edge weight by the degree of the related node.
    - `log`: Normalize an edge weight by the logarithm of the related node degree.

    It is possible to specify the maximum number of iterations as well as the random seed to use.
    """
    weighting_func = resolve_weighting(weighting)

    rng = create_py_random_state(seed)

    for i, node in enumerate(G):
        G.nodes[node][label_key] = i + 1

    nodes = list(G)

    for i in range(iterations):
        changes = False

        rng.shuffle(nodes)

        for node in nodes:
            previous = G.nodes[node][label_key]

            if G[node]:
                scores = score(G, node, weighting_func, label_key)
                G.nodes[node][label_key] = random_argmax(scores.items(), choice=rng.choice)

            changes = changes or previous != G.nodes[node][label_key]

        if not changes:
            break

    return G

linear_weighting(G, node, neighbor)

Calculates the weight of an edge between two nodes in a graph using linear weighting, which is the edge weight divided by the degree of the destination node. If the 'weight' attribute is not present, a default weight of 1.0 is assumed.

Parameters:

Name Type Description Default
G Graph[T]

The graph that contains the nodes and edges.

required
node T

The source node of the edge.

required
neighbor T

The destination node of the edge.

required

Returns:

Type Description
float

The weight of the edge.

Source code in chinese_whispers/chinese_whispers.py
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
def linear_weighting(G: 'Graph[T]', node: T, neighbor: T) -> float:
    """
    Calculates the weight of an edge between two nodes in a graph using linear weighting,
    which is the edge weight divided by the degree of the destination node.
    If the 'weight' attribute is not present, a default weight of 1.0 is assumed.

    Parameters:
        G: The graph that contains the nodes and edges.
        node: The source node of the edge.
        neighbor: The destination node of the edge.

    Returns:
        The weight of the edge.
    """
    return cast(WeightDict, G[node][neighbor]).get('weight', 1.) / G.degree[neighbor]

log_weighting(G, node, neighbor)

Calculates the weight of an edge between two nodes in a graph using logarithm weighting, which is the edge weight divided by the logarithm of the degree of the destination node. If the 'weight' attribute is not present, a default weight of 1.0 is assumed.

Parameters:

Name Type Description Default
G Graph[T]

The graph that contains the nodes and edges.

required
node T

The source node of the edge.

required
neighbor T

The destination node of the edge.

required

Returns:

Type Description
float

The weight of the edge.

Source code in chinese_whispers/chinese_whispers.py
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
def log_weighting(G: 'Graph[T]', node: T, neighbor: T) -> float:
    """
    Calculates the weight of an edge between two nodes in a graph using logarithm weighting,
    which is the edge weight divided by the logarithm of the degree of the destination node.
    If the 'weight' attribute is not present, a default weight of 1.0 is assumed.

    Parameters:
        G: The graph that contains the nodes and edges.
        node: The source node of the edge.
        neighbor: The destination node of the edge.

    Returns:
        The weight of the edge.
    """
    return cast(WeightDict, G[node][neighbor]).get('weight', 1.) / log2(G.degree[neighbor] + 1)

random_argmax(items, choice=random.choice)

An argmax function that breaks the ties randomly.

Parameters:

Name Type Description Default
items ItemsView[T, float]

A sequence of items with their corresponding weights.

required
choice Callable[[Sequence[T]], T]

A callable function that takes in a sequence of items and returns one of them.

choice

Returns:

Type Description
Optional[int]

An optional integer representing the index of the maximum item, if exists.

Source code in chinese_whispers/chinese_whispers.py
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
def random_argmax(
        items: ItemsView[T, float],
        choice: Callable[[Sequence[T]], T] = random.choice
) -> Optional[int]:
    """
    An argmax function that breaks the ties randomly.

    Args:
        items: A sequence of items with their corresponding weights.
        choice: A callable function that takes in a sequence of items and returns one of them.

    Returns:
        An optional integer representing the index of the maximum item, if exists.
    """
    if not items:
        # https://github.com/python/mypy/issues/1003
        return None

    _, maximum = max(items, key=itemgetter(1))

    keys = [k for k, v in items if v == maximum]

    return cast('Optional[int]', choice(keys))

resolve_weighting(weighting)

Resolve the weighting function.

Parameters:

Name Type Description Default
weighting Union[str, Callable[[Graph[T], T, T], float]]

The weighing method to use. It can be either a string specifying one of the three available schemas ('top', 'lin', 'log'), or a custom weighting function. Defaults to 'top'.

required

Returns:

Type Description
Callable[[Graph[T], T, T], float]

The weighting function.

Source code in chinese_whispers/chinese_whispers.py
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
def resolve_weighting(
        weighting: Union[str, Callable[['Graph[T]', T, T], float]]
) -> Callable[['Graph[T]', T, T], float]:
    """
    Resolve the weighting function.

    Parameters:
        weighting: The weighing method to use.
            It can be either a string specifying one of the three available schemas ('top', 'lin', 'log'),
            or a custom weighting function. Defaults to 'top'.

    Returns:
        The weighting function.
    """
    if isinstance(weighting, str):
        return WEIGHTING[weighting]
    else:
        return weighting

score(G, node, weighting_func, label_key)

Compute label scores in the given node neighborhood.

Parameters:

Name Type Description Default
G Graph[T]

The input graph.

required
node T

The node in the graph.

required
weighting_func Callable[[Graph[T], T, T], float]

A function to calculate the weight between two nodes.

required
label_key str

The key to access the label value for each node in the graph.

required

Returns:

Type Description
DefaultDict[int, float]

A dictionary with label scores as values.

Source code in chinese_whispers/chinese_whispers.py
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
def score(
        G: 'Graph[T]',
        node: T,
        weighting_func: Callable[['Graph[T]', T, T], float],
        label_key: str
) -> DefaultDict[int, float]:
    """
    Compute label scores in the given node neighborhood.

    Parameters:
        G: The input graph.
        node: The node in the graph.
        weighting_func: A function to calculate the weight between two nodes.
        label_key: The key to access the label value for each node in the graph.

    Returns:
        A dictionary with label scores as values.
    """
    scores: DefaultDict[int, float] = defaultdict(float)

    if node not in G:
        return scores

    for neighbor in G[node]:
        scores[G.nodes[neighbor][label_key]] += weighting_func(G, node, neighbor)

    return scores

top_weighting(G, node, neighbor)

Return the weight of an edge between two nodes.

This function calculates the weight of an edge between two nodes in a graph. The weight is determined by the 'weight' attribute of the edge. If the 'weight' attribute is not present, a default weight of 1.0 is assumed.

Parameters:

Name Type Description Default
G Graph[T]

The graph containing the edge.

required
node T

The source node of the edge.

required
neighbor T

The target node of the edge.

required

Returns:

Type Description
float

The weight of the edge.

Source code in chinese_whispers/chinese_whispers.py
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
def top_weighting(G: 'Graph[T]', node: T, neighbor: T) -> float:
    """
    Return the weight of an edge between two nodes.

    This function calculates the weight of an edge between two nodes in a graph.
    The weight is determined by the 'weight' attribute of the edge.
    If the 'weight' attribute is not present, a default weight of 1.0 is assumed.

    Parameters:
        G: The graph containing the edge.
        node: The source node of the edge.
        neighbor: The target node of the edge.

    Returns:
        The weight of the edge.
    """
    return cast(WeightDict, G[node][neighbor]).get('weight', 1.)