Introduction
The package ZLAvian tests for patterns consistent with Zipf’s Law of
Abbreviation (ZLA) in animal communication following the methods
described in Lewis et al. (2023) and Gilman et al. (2023).
testZLA
This function measures and tests the statistical significance of the
concordance between note duration and frequency of use in a sample of
animal communication represented in a dataframe that must include
columns with the following names and information:
-
A column “note” (factor/character) that describes the type to which each
note, phrase, or syllable in the sample has been assigned;
-
A column “duration” (numeric) that describes the duration of each note,
phrase, or syllable; and
-
A column “ID” (factor) identifies the individual animal that produced
each note, phrase, or syllable.
Other columns in the dataframe are ignored in the analysis.
Youngblood (2024) observed that the column duration might alternatively
include data on any other numerical measure that estimates the effort
involved in producing a note type.
testZLA computes the mean concordance (i.e., Kendall’s tau) between
note duration and frequency of use within individuals, averages across
all individuals in the data set, and compares this to the expectation
under the null hypothesis that note duration and frequency of use are
unrelated. The null distribution is computed by permutation while
constraining for the observed similarity of note repertoires among
individuals. This controls for the possibility that individuals in the
population learn their repertoires from others. The significance test is
one-tailed, so p-values close to 1 suggest evidence for a positive
concordances, contrary to ZLA. See Lewis at al. (2023) for the formal
computation of the null distribution and Gilman et al. (2023) for
discussion.
Users can control the following parameters in testZLA:
-
minimum: the minimum number of times a note type must appear in the data
set to be included in the analysis. All notes types that appear less
than this number of times are removed from the sample before analysis.
minimium must be a positive integer. The default value is 1.
-
null: the number of permutations used to estimate the null distribution.
This must be a positive integer 99 or greater. The default value is 999.
-
est: takes values “mixed,” “mean,” or “median.” If est = “mixed” then
the expected logged duration for each note type in the population is
computed as the intercept from an intercept-only mixed effects model
(fit using the lmer() function of lme4) that includes a random effect of
individual ID. This computes a weighted mean across individuals, and
accords greater weights to individuals that produce the note type more
frequently. If est = “mean” then the expected logged duration for each
note type in the population is computed as the mean of the means for the
individuals, with each individual weighted equally. If est = “median”
then the expected logged duration for each note type within individuals
is taken to be the median logged duration of the note type when produced
by that individual, and the expected logged duration for each note type
in the population is taken to be the median of the medians for the
individuals that produced that note type.The expected durations for note
types are used in the permutation algorithm. Estimation using the
“mixed” approach is more precise - it gives greater weight to means
based on more notes, because we can estimate those means more
accurately. However, estimation using the “mean” approach is faster.
-
cores: The permutation process in testZLA is computationally expensive.
To make simulating the null distribution faster, testZLA allows users to
assign the task to multiple cores (i.e., parallelization). cores must be
an integer between 1 and the number of cores available on the user’s
machine, inclusive. Users can find the number of cores on their machines
using the function detectCores() in the package parallel. The default
value is 2.
-
transform: Takes values “log” or “none.” Indicates how duration data
should be transformed prior to analysis. Defaults to “log.” Gilman and
colleagues (2023) argue that log transformation is often appropriate for
duration data, but other measures might be better analysed as
untransformed data.
data(testdata, package = "ZLAvian")
test.ZLA.output = testZLA(data, minimum = 1, null = 999, est = "mixed", cores = 2)
#> tau p
#> individual level -0.03745038 0.4360000
#> population level -0.15000000 0.2088536
testZLA prints a table that reports concordances (tau) and p-values
at the individual and population levels. Results at the individual level
are obtained using the method described in Lewis et al (2023) and Gilman
et al (2023). Results at the population level report the concordance
between note type duration and frequency of use in the full dataset,
without considering which individuals produced which notes.
Population-level concordances may be problematic when studying ZLA in
animal communication (see Gilman et al 2023 for discussion) but have
been widely used to study ZLA in human languages.
Further information can be extracted from the function:
-
$stats: a matrix that reports Kendall’s tau and the p-value associated
with Kendall’s tau computed at both the population and individual
levels.
-
$unweighted: in stats, the population mean value of Kendall’s tau is
computed with the value of tau for each individual weighted by its
inverse variance. The inverse variance depends on the individual’s
repertoire size. In unweighted, Kendall’s tau and the p-value associated
with tau are computed with tau for each individual weighted equally
regardless of repertoire size.
-
$overview: a matrix that reports the total number of notes, total number
of individuals, total number of note types, mean number of notes per
individual, and mean number of note types produced by each individual in
the dataset.
-
$shannon: a matrix that reports the Shannon diversity of note classes in
the population and the mean Shannon diversity of note classes used by
individuals in the dataset.
-
$plotObject: a list containing data used by the plotZLA function to
produce a web plot illustrating the within-individual concordance in the
data set.
-
$thresholds: a matrix that reports the 90% inclusion interval for the
null distribution for the mean value Kendall’s tau in population, both
with and without weighting individual taus by their inverse variances.
The lower bound represents the least negative concordance that would be
inferred to be evidence for ZLA, and is thus a measure of the power of
the analysis.
plotZLA
plotZLA uses the output of testZLA to create a web plot illustrating
the concordance between note class duration and frequency of use within
individuals in the population.
Users can control the following parameters in testZLA:
-
title: a title for the webplot.
-
ylab: a label for the y-axis of the webplot. Defaults to “duration (s).”
-
x.scale: takes values “log” or “linear.” Indicates how the x-axis should
be scaled. Defaults to “log.”
-
y.base: takes values 2 or 10. Controls tick mark positions on the
y-axis. When set to 2, tick marks indicate that values differ by a
factor of 2. When set to 10, tick marks indicate that values differ by a
factor of 10. If data was not log transformed in the analysis being
illustrated, then the y-axis is linear and this argument is ignored.
Defaults to 2.
store <- testZLA(data, minimum = 1, null = 999, est = "mixed", cores = 2)
#> tau p
#> individual level -0.03745038 0.4490000
#> population level -0.15000000 0.2088536
plotZLA(store, ylab = "duration (ms)", x.scale = "linear")
In the figure produced by plotZLA, each point represents a note or
phrase type in the population repertoire. Note types are joined by a
line if at least one individual produces both note types. The weight of
the line is proportional to the number of individuals that produce both
note types. The color of the lines indicates whether there is a positive
(blue) or negative (red) concordance between the duration and frequency
of use of the joined note types. Negative concordances are consistent
with Zipf’s law of abbreviation. Shades between blue and red indicate
that the concordance is positive in some individuals and negative in
others. For example, this can happen if some individuals use the note
types more frequently than others, such that the rank order of frequency
of use varies among individuals. Grey crosses centered on each point
show the longest and shortest durations of the note type (vertical) and
the highest and lowest frequencies of use (horizontal) in the
population.
References
Gilman, R. T., Durrant, C. D., Malpas, L., and Lewis, R. N. (2023)
Does Zipf’s law of abbreviation shape birdsong? bioRxiv
(doi.org/10.1101/2023.12.06.569773).
Lewis, R. N., Kwong, A., Soma, M., de Kort, S. R., Gilman, R. T.
(2023) Java sparrow song conforms to Mezerath’s law but not to Zipf’s
law of abbreviation. bioRxiv
(doi.org/10.1101/2023.12.13.571437).
Youngblood, M. (2024) Language-like efficiency and structure in house
finch song. Proceedings of the Royal Society B - Biological
Sciences, 291:20240250
(doi.org/10.1098/rspb.2024.0250).