Skip to content

API Reference

An implementation of the Chinese Whispers clustering algorithm.

UnknownWeightingError

Bases: ValueError

Exception raised when an unknown weighting schema is encountered.

Source code in chinese_whispers/chinese_whispers.py
23
24
25
26
27
28
class UnknownWeightingError(ValueError):
    """Exception raised when an unknown weighting schema is encountered."""

    def __init__(self, weighting: object) -> None:
        """Initialize the exception with the unknown weighting."""
        super().__init__(f"Unknown weighting: {weighting}")

__init__(weighting)

Initialize the exception with the unknown weighting.

Source code in chinese_whispers/chinese_whispers.py
26
27
28
def __init__(self, weighting: object) -> None:
    """Initialize the exception with the unknown weighting."""
    super().__init__(f"Unknown weighting: {weighting}")

WeightDict

Bases: TypedDict

A dictionary-like class that stores weights for nodes.

Attributes:

Name Type Description
weight float

The weight value associated with a key.

Source code in chinese_whispers/chinese_whispers.py
31
32
33
34
35
36
37
38
39
40
class WeightDict(TypedDict):
    """
    A dictionary-like class that stores weights for nodes.

    Attributes:
        weight: The weight value associated with a key.

    """

    weight: float

Weighting

Bases: Enum

Available weighting schemas.

Source code in chinese_whispers/chinese_whispers.py
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
class Weighting(Enum):
    """Available weighting schemas."""

    TOP = "top"
    LINEAR = "linear"
    LOGARITHMIC = "logarithmic"

    def __call__(self, graph: Graph[T], u: T, v: T) -> float:
        """Resolve to the corresponding weighting function and call it."""
        if self is Weighting.TOP:
            return top_weighting(graph, u, v)
        if self is Weighting.LINEAR:
            return linear_weighting(graph, u, v)
        if self is Weighting.LOGARITHMIC:
            return log_weighting(graph, u, v)

        raise UnknownWeightingError(self)

__call__(graph, u, v)

Resolve to the corresponding weighting function and call it.

Source code in chinese_whispers/chinese_whispers.py
118
119
120
121
122
123
124
125
126
127
def __call__(self, graph: Graph[T], u: T, v: T) -> float:
    """Resolve to the corresponding weighting function and call it."""
    if self is Weighting.TOP:
        return top_weighting(graph, u, v)
    if self is Weighting.LINEAR:
        return linear_weighting(graph, u, v)
    if self is Weighting.LOGARITHMIC:
        return log_weighting(graph, u, v)

    raise UnknownWeightingError(self)

WeightingFunc

Bases: Protocol[T]

Callable protocol for edge weighting functions.

Source code in chinese_whispers/chinese_whispers.py
103
104
105
106
107
108
class WeightingFunc(Protocol[T]):
    """Callable protocol for edge weighting functions."""

    def __call__(self, graph: Graph[T], u: T, v: T) -> float:
        """Calculate the weight of an edge between two nodes in a graph."""
        ...

__call__(graph, u, v)

Calculate the weight of an edge between two nodes in a graph.

Source code in chinese_whispers/chinese_whispers.py
106
107
108
def __call__(self, graph: Graph[T], u: T, v: T) -> float:
    """Calculate the weight of an edge between two nodes in a graph."""
    ...

aggregate_clusters(graph, label_key='label')

Produce a dictionary with the keys being cluster IDs and the values being sets of cluster elements.

Parameters:

Name Type Description Default
graph Graph[T]

The graph object containing the clusters.

required
label_key str

The attribute key used to identify the clusters. Defaults to 'label'.

'label'

Returns:

Type Description
dict[int, set[T]]

A dictionary where the keys represent cluster IDs and the values are sets of cluster elements.

Source code in chinese_whispers/chinese_whispers.py
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
def aggregate_clusters(
    graph: Graph[T],
    label_key: str = "label",
) -> dict[int, set[T]]:
    """
    Produce a dictionary with the keys being cluster IDs and the values being sets of cluster elements.

    Args:
        graph: The graph object containing the clusters.
        label_key: The attribute key used to identify the clusters. Defaults to 'label'.

    Returns:
        A dictionary where the keys represent cluster IDs and the values are sets of cluster elements.

    """
    clusters: dict[int, set[T]] = {}

    for node in graph:
        label = graph.nodes[node].get(label_key)

        if label is not None:
            if label not in clusters:
                clusters[label] = {node}
            else:
                clusters[label].add(node)

    return clusters

chinese_whispers(graph, weighting=None, iterations=20, ignore=None, seed=None, label_key='label')

Perform clustering of nodes in a graph using the 'weighting' method.

Parameters:

Name Type Description Default
graph Graph[T]

The input graph.

required
weighting Literal['top', 'linear', 'logarithmic', 'log'] | Weighting | WeightingFunc[T] | None

The weighing method to use. It can be either a string specifying one of the available schemas ('top', 'linear', 'logarithmic', 'log'), an instance of the Weighting Enum, or a custom weighting function. Defaults to Weighting.TOP.

None
iterations int

The maximum number of iterations to perform. Defaults to 20.

20
ignore Container[T] | None

The set of nodes to ignore. Defaults to an empty set.

None
seed int | None

The random seed to use. Defaults to None.

None
label_key str

The key to store the cluster labels in the graph nodes. Defaults to 'label'.

'label'

Returns:

Type Description
Graph[T]

The input graph with cluster labels assigned to nodes.

Three weighing schemas are available:

  • top: Just use the edge weights from the input graph.
  • lin: Normalize an edge weight by the degree of the related node.
  • log: Normalize an edge weight by the logarithm of the related node degree.

It is possible to specify the maximum number of iterations as well as the random seed to use.

Source code in chinese_whispers/chinese_whispers.py
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
def chinese_whispers(
    graph: Graph[T],
    weighting: Literal["top", "linear", "logarithmic", "log"] | Weighting | WeightingFunc[T] | None = None,
    iterations: int = 20,
    ignore: Container[T] | None = None,
    seed: int | None = None,
    label_key: str = "label",
) -> Graph[T]:
    """
    Perform clustering of nodes in a graph using the 'weighting' method.

    Args:
        graph: The input graph.
        weighting: The weighing method to use.
            It can be either a string specifying one of the available schemas ('top', 'linear', 'logarithmic', 'log'),
            an instance of the Weighting Enum, or a custom weighting function. Defaults to Weighting.TOP.
        iterations: The maximum number of iterations to perform. Defaults to 20.
        ignore: The set of nodes to ignore. Defaults to an empty set.
        seed: The random seed to use. Defaults to None.
        label_key: The key to store the cluster labels in the graph nodes. Defaults to 'label'.

    Returns:
        The input graph with cluster labels assigned to nodes.

    Three weighing schemas are available:

    - `top`: Just use the edge weights from the input graph.
    - `lin`: Normalize an edge weight by the degree of the related node.
    - `log`: Normalize an edge weight by the logarithm of the related node degree.

    It is possible to specify the maximum number of iterations as well as the random seed to use.

    """
    if ignore is None:
        ignore = set()

    weighting_func: WeightingFunc[T] = resolve_weighting(weighting or "top")

    rng = create_py_random_state(seed)

    nodes: list[T] = []

    for node in graph:
        graph.nodes[node].pop(label_key, None)

        if node not in ignore:
            nodes.append(node)

            graph.nodes[node][label_key] = len(nodes)

    for _ in range(iterations):
        changes = False

        rng.shuffle(nodes)

        for node in nodes:
            previous = graph.nodes[node][label_key]

            if graph[node]:
                scores = score(graph, node, weighting_func, ignore, label_key)

                graph.nodes[node][label_key] = random_argmax(scores.items(), choice=rng.choice)

            changes = changes or previous != graph.nodes[node][label_key]

        if not changes:
            break

    return graph

linear_weighting(graph, node, neighbor)

Calculate the edge weight using the linear weighting schema.

This function calculates the weight of an edge between two nodes in a graph using linear weighting, which is the edge weight divided by the degree of the destination node. If the 'weight' attribute is not present, a default weight of 1.0 is assumed.

Parameters:

Name Type Description Default
graph Graph[T]

The graph that contains the nodes and edges.

required
node T

The source node of the edge.

required
neighbor T

The destination node of the edge.

required

Returns:

Type Description
float

The weight of the edge.

Source code in chinese_whispers/chinese_whispers.py
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
def linear_weighting(graph: Graph[T], node: T, neighbor: T) -> float:
    """
    Calculate the edge weight using the linear weighting schema.

    This function calculates the weight of an edge between two nodes in a graph using linear weighting,
    which is the edge weight divided by the degree of the destination node.
    If the 'weight' attribute is not present, a default weight of 1.0 is assumed.

    Args:
        graph: The graph that contains the nodes and edges.
        node: The source node of the edge.
        neighbor: The destination node of the edge.

    Returns:
        The weight of the edge.

    """
    return cast("WeightDict", graph[node][neighbor]).get("weight", 1.0) / graph.degree[neighbor]

log_weighting(graph, node, neighbor)

Calculate the edge weight using the logarithm weighting schema.

This function calculates the weight of an edge between two nodes in a graph using logarithm weighting, which is the edge weight divided by the logarithm of the degree of the destination node. If the 'weight' attribute is not present, a default weight of 1.0 is assumed.

Parameters:

Name Type Description Default
graph Graph[T]

The graph that contains the nodes and edges.

required
node T

The source node of the edge.

required
neighbor T

The destination node of the edge.

required

Returns:

Type Description
float

The weight of the edge.

Source code in chinese_whispers/chinese_whispers.py
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
def log_weighting(graph: Graph[T], node: T, neighbor: T) -> float:
    """
    Calculate the edge weight using the logarithm weighting schema.

    This function calculates the weight of an edge between two nodes in a graph using logarithm weighting,
    which is the edge weight divided by the logarithm of the degree of the destination node.
    If the 'weight' attribute is not present, a default weight of 1.0 is assumed.

    Args:
        graph: The graph that contains the nodes and edges.
        node: The source node of the edge.
        neighbor: The destination node of the edge.

    Returns:
        The weight of the edge.

    """
    return cast("WeightDict", graph[node][neighbor]).get("weight", 1.0) / log2(graph.degree[neighbor] + 1)

random_argmax(items, choice=random.choice)

Break the ties randomly.

This is an argmax function that breaks the ties randomly.

Parameters:

Name Type Description Default
items Collection[tuple[T, float]]

A sequence of items with their corresponding weights.

required
choice Callable[[Sequence[T]], T]

A callable function that takes in a sequence of items and returns one of them.

choice

Returns:

Type Description
int | None

An optional integer representing the index of the maximum item, if exists.

Source code in chinese_whispers/chinese_whispers.py
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
def random_argmax(
    items: Collection[tuple[T, float]],
    choice: Callable[[Sequence[T]], T] = random.choice,
) -> int | None:
    """
    Break the ties randomly.

    This is an argmax function that breaks the ties randomly.

    Args:
        items: A sequence of items with their corresponding weights.
        choice: A callable function that takes in a sequence of items and returns one of them.

    Returns:
        An optional integer representing the index of the maximum item, if exists.

    """
    if not items:
        # https://github.com/python/mypy/issues/1003
        return None

    _, maximum = max(items, key=itemgetter(1))

    keys = [k for k, v in items if v == maximum]

    return cast("int | None", choice(keys))

resolve_weighting(weighting)

resolve_weighting(weighting: Literal['top', 'linear', 'logarithmic', 'log']) -> WeightingFunc[T]
resolve_weighting(weighting: Weighting) -> WeightingFunc[T]
resolve_weighting(weighting: WeightingFunc[T]) -> WeightingFunc[T]
resolve_weighting(weighting: str) -> WeightingFunc[T]

Resolve the weighting function.

Parameters:

Name Type Description Default
weighting str | Weighting | WeightingFunc[T]

The weighing method to use. It can be either a string specifying one of the available schemas ('top', 'linear', 'logarithmic'), an instance of the Weighting enum, or a custom weighting function. Defaults to 'top'.

required

Returns:

Type Description
WeightingFunc[T]

The weighting function.

Source code in chinese_whispers/chinese_whispers.py
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
def resolve_weighting(weighting: str | Weighting | WeightingFunc[T]) -> WeightingFunc[T]:
    """
    Resolve the weighting function.

    Args:
        weighting: The weighing method to use.
            It can be either a string specifying one of the available schemas ('top', 'linear', 'logarithmic'),
            an instance of the Weighting enum, or a custom weighting function. Defaults to 'top'.

    Returns:
        The weighting function.

    """
    if isinstance(weighting, Weighting):
        return weighting

    if isinstance(weighting, str):
        try:
            return Weighting(weighting.lower())
        except ValueError:
            if weighting.lower() == "log":
                return Weighting.LOGARITHMIC
            raise UnknownWeightingError(weighting) from None

    return weighting

score(graph, node, weighting_func, ignore, label_key)

Compute label scores in the given node neighborhood.

Parameters:

Name Type Description Default
graph Graph[T]

The input graph.

required
node T

The node in the graph.

required
weighting_func WeightingFunc[T]

A function to calculate the weight between two nodes.

required
ignore Container[T]

The set of nodes to ignore.

required
label_key str

The key to access the label value for each node in the graph.

required

Returns:

Type Description
defaultdict[int, float]

A dictionary with label scores as values.

Source code in chinese_whispers/chinese_whispers.py
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
def score(
    graph: Graph[T],
    node: T,
    weighting_func: WeightingFunc[T],
    ignore: Container[T],
    label_key: str,
) -> defaultdict[int, float]:
    """
    Compute label scores in the given node neighborhood.

    Args:
        graph: The input graph.
        node: The node in the graph.
        weighting_func: A function to calculate the weight between two nodes.
        ignore: The set of nodes to ignore.
        label_key: The key to access the label value for each node in the graph.

    Returns:
        A dictionary with label scores as values.

    """
    scores: defaultdict[int, float] = defaultdict(float)

    if node not in graph or node in ignore:
        return scores

    for neighbor in graph[node]:
        if neighbor not in ignore:
            scores[graph.nodes[neighbor][label_key]] += weighting_func(graph, node, neighbor)

    if not scores:
        scores[graph.nodes[node][label_key]] = math.inf

    return scores

top_weighting(graph, node, neighbor)

Return the weight of an edge between two nodes.

This function calculates the weight of an edge between two nodes in a graph. The weight is determined by the 'weight' attribute of the edge. If the 'weight' attribute is not present, a default weight of 1.0 is assumed.

Parameters:

Name Type Description Default
graph Graph[T]

The graph containing the edge.

required
node T

The source node of the edge.

required
neighbor T

The target node of the edge.

required

Returns:

Type Description
float

The weight of the edge.

Source code in chinese_whispers/chinese_whispers.py
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def top_weighting(graph: Graph[T], node: T, neighbor: T) -> float:
    """
    Return the weight of an edge between two nodes.

    This function calculates the weight of an edge between two nodes in a graph.
    The weight is determined by the 'weight' attribute of the edge.
    If the 'weight' attribute is not present, a default weight of 1.0 is assumed.

    Args:
        graph: The graph containing the edge.
        node: The source node of the edge.
        neighbor: The target node of the edge.

    Returns:
        The weight of the edge.

    """
    return cast("WeightDict", graph[node][neighbor]).get("weight", 1.0)