# Group sizes, durations and aggreation¶

The analysis of real-world face-to-face data is often based on the analysis of networks in terms of disconnected groups — the group sizes, their lifetimes and the contact durations distributions. Furthermore, an aggregated static network is often used to analyze the social structure. In the following we show how to compute and analyze these observables easily.

## Base analysis function¶

To do a complete measurement of the group size histograms,
the group life time distributions, the contact and
inter-contact distributions, and the aggregated weighted
network use `tacoma.api.measure_group_sizes_and_durations()`

```
result = tc.measure_group_sizes_and_durations(temporal_network)
```

The returned object is an instance of `_tacoma.group_sizes_and_durations`

.

## Group sizes¶

We define the group size distribution \(N_g(t)\) describing the number of groups of size \(g\) at time \(t\). Note that the group size distribution is restricted by

The time-dependent group size distribution is saved in two ways.

### For edge_lists¶

For each edge list, a size
histogram as a dictionary of (int, int)-pairs is saved in the
attribute `result.size_histograms`

. The key is the group size
\(g\) and the value is number of its occurences.

### For edge_changes¶

A single group size histogram is computed for the initial edge list.
Then, for each edge changing event,
a dictionary is saved in `result.size_histogram_differences`

, where
keys are the group size \(g\) and the corresponding value is
the change of occurences in comparison to the network’s previous state.

Note

Since evaluating the size histograms might be heavy on memory, you can omit the detailed computation using

```
result = tc.measure_group_sizes_and_durations(
temporal_network,
ignore_size_histogram=True
)
```

### Averaged size distribution¶

The average group size distribution

is saved in `result.aggregated_size_histogram`

, a list of floats
where the g-th entry contains the average number of groups of size
g. Note that this implies that the 0-th entry is always equal to 0.

You can easily get a `numpy.ndarray`

of group sizes g and corresponding
average number of groups of this size `N_g`

using the function
`tacoma.tools.group_size_histogram()`

.

```
g, N_g = tc.group_size_histogram(result)
```

Note that this function returns values for non-zero occurences only.

If you are just interested in the mean group size, use
`tacoma.tools.mean_group_size()`

```
mean_g = tc.mean_group_size(result)
```

### Number of groups¶

At all times, the number of seperated components (groups) is given by

and thus the mean number of components over time is

It can be computed with `tacoma.tools.mean_number_of_groups()`

.

```
mean_c = tc.mean_number_of_groups(result)
```

### Coordination number¶

The coordination number \(n\) of a single node is defined as the group size this node is part of, as was first done in a study by Zhao, Stehlé, Bianconi, and Barrat. The probability distribution of the coordination number is given by the group size distribution \(N_g\) as

It can also be interpreted as the probability of a single node to be in a group of size \(n\).

The mean coordination number \(\left\langle n \right\rangle\) can be computed as

```
mean_n = tc.mean_coordination_number(result)
```

### Analysis¶

Plot a distribution of group sizes as

```
from tacoma.analysis import plot_group_size_histogram
import matplotlib.pyplot as pl
ht09 = tc.load_sociopatterns_hypertext_2009()
result = tc.measure_group_sizes_and_durations(ht09)
fig, ax = pl.subplots(1,1)
plot_group_size_histogram(result, ax)
pl.show()
```

## Durations¶

### Group durations¶

A group is initiated as soon as an event leads to a change in members. The duration of this group is defined as the time it takes until the next event at which the constituents of the group changes.

All durations of groups of size \(g\) are saved in
`result.group_durations[g]`

(hence, `result.group_durations[0]`

is always empty).

Groups which were active at \(t_0\) are not considered in the measurement since we do not know when they were initiated, so including them in the analysis would skew the distribution. Likewise, groups which are still active at \(t_\mathrm{max}\) are omitted for the same reason.

The durations can be analyzed using `tacoma.analysis.plot_group_durations()`

.

```
from tacoma.analysis import plot_group_durations
import matplotlib.pyplot as pl
ht09 = tc.load_sociopatterns_hypertext_2009()
result = tc.measure_group_sizes_and_durations(ht09)
fig, ax = pl.subplots(1, 1)
plot_group_durations(result, ax, max_group=4, time_unit='s')
pl.show()
```

### (Inter-) Contact durations¶

The durations of all contacts is saved in `result.contact_durations`

.
However, contacts which were active at \(t_0\) are omitted since
we do not know when they started, so including them in the analysis would
skew the distribution. Likewise, contacts which are still active at
\(t_\mathrm{max}\) are omitted for the same reason.

The inter-contact duration of a node is defined as the time a node spends
alone (i.e. in a group of size \(g=1\)). Hence, all inter-contact
durations are saved in `result.group_durations[1]`

.

The contact and inter-contact durations can be analyzed using
`tacoma.analysis.plot_group_durations()`

.

```
from tacoma.analysis import plot_contact_durations
import matplotlib.pyplot as pl
ht09 = tc.load_sociopatterns_hypertext_2009()
result = tc.measure_group_sizes_and_durations(ht09)
fig, ax = pl.subplots(1, 1, figsize=(4,3))
plot_contact_durations(result, ax, time_unit='s')
pl.show()
```

## Aggregated network¶

The aggregated network

is given as a dictionary in `result.aggregated_network`

. Each key
is a pair of ints, representing the edge \((i, j)\), the
corresponding value is \(W_{ij}\).

If you just want the aggregated network without the other results,
use `tacoma.api.aggregated_network()`

.