Supporting Information for: Fifty years later, and we still don’t know about badges of status

Author

Alfredo Sanchez-Tojar, Pietro B. D’Amelio

Importing original new dataset

This corresponds to the dataset containing all recalculated effect sizes generated by script ‘001_effect_size_calculation.R’.

Code
metadata.new <- read.csv("../data/new/meta_complete_data2_new.csv", header=T)

#excluding it here already because the authors decided to exclude from the final analyses
metadata.new <- metadata.new %>%
  filter(Classification!="pteridine")

# creating a copy for fixing and adding effect sizes
metadata.new.updated <- metadata.new

# list of columns of interest for re-extracting and adding new effect sizes
columns.of.interest <- c("Authors","Publication.Year","Study","Species",
                         "Sample.Size","Stat.Test","Test.Statistic","df1","df2",
                         "r","n1","mean1","sd1","n2","mean2","sd2","yi","vi")


knitr::kable(head(metadata.new[,-c(1:2)]),format = "html")
Authors Publication.Year Study Species Geographic Vert_Invert Color1 Color2 Color3 Pattern Classification Eu_Pheomelanin Social_Rank_Controlled Obs_vs_Exp Condition_Stats Condition Age Sex Location Season Plasticity Aggression Aggression.Units Sample.Size Stat.Test Test.Statistic df1 df2 p.value r n1 mean1 var1 se1 sd1 n2 mean2 var2 se2 sd2 yi vi
Lehtonen, TK 2014 Lehtonen 2014 - 1 Amphilophus sagittae Crater Lake Xiloa, Nicaragua vertebrate gold dark body melanocortin eumelanin dummy used Exp Covariate Length mature males field breeding No Direct rate/5 min 45 mean NA NA NA NA NA 21 2.655 NA NA 2.8620000 24 3.070 NA NA 2.207000 0.1045729 0.0351715
Lehtonen, TK 2014 Lehtonen 2014 - 2 Amphilophus sagittae Crater Lake Xiloa, Nicaragua vertebrate gold dark body melanocortin eumelanin dummy used Exp Covariate Length mature females field breeding No Direct rate/5 min 38 mean NA NA NA NA NA 15 3.463 NA NA 2.6770000 23 4.446 NA NA 1.928000 0.2721993 0.0386500
Clement, TS; Parikh, V; Schrumpf, M; Fernald, RD 2005 Clement et al 2005 Astatotilapia burtoni Lake Tanaganyika, Tanzania vertebrate drab blue/yellow bright blue/yellow body carotenoid N/A video Exp NS (F1,10=0.628, p = 0.451) Size Matched -activity level same mature males lab year round Plastic Direct territorial 28 mean NA NA NA NA NA 5 0.382 NA 0.144 0.3220000 5 0.732 NA 0.028 0.063000 0.8079880 0.0405434
Renn, SCP; Fraser, EJ; Aubin-Horth, N; Trainor, BC; Hofmann, HA 2012 Renn et al 2012 Astatotilapia burtoni Lake Tanaganyika, Tanzania vertebrate no black black face melanocortin eumelanin uncontrolled Obs Size matched Weight and length mature females lab year round No Direct number of chases 36 mean NA NA NA NA NA 21 0.840 NA 0.160 0.7332121 15 11.460 NA 0.860 3.330766 1.1689200 0.0022032
Boerner, M; Kruger, O 2009 Boerner and Kruger 2009 - 1 Buteo buteo Westphalia, Germany vertebrate light intermediate dark body melanocortin eumelanin dummy used Exp uncontrolled None measured mature males field breeding No Indirect score 54 mean NA NA NA NA NA 16 1.728 NA 0.141 0.5640000 7 0.570 NA 0.368 0.973600 -0.8151808 0.0195386
Boerner, M; Kruger, O 2009 Boerner and Kruger 2009 - 2 Buteo buteo Westphalia, Germany vertebrate light intermediate dark body melanocortin eumelanin dummy used Exp uncontrolled None measured mature females field breeding No Indirect score 54 mean NA NA NA NA NA 20 0.632 NA 0.176 0.7871000 4 1.710 NA 0.255 0.510000 0.7243681 0.0402852


In the following sections, we reassess the data extraction of 11 studies, which correspond to 15% of all studies included in the meta-analysis of Ruckman et al. (2024).

Andrews et al. 1996

This study appears in the dataset as:

Code
Andrews_et_al_1996_subset <- metadata.new %>% filter(Authors=="Andrews, TJ; Summers, CH") %>% select(all_of(columns.of.interest))

knitr::kable(Andrews_et_al_1996_subset,format = "html")
Authors Publication.Year Study Species Sample.Size Stat.Test Test.Statistic df1 df2 r n1 mean1 sd1 n2 mean2 sd2 yi vi
Andrews, TJ; Summers, CH 1996 Andrews and Summers 1996 - 1 Anolis carolinensis 18 t 1.920 16 NA NA NA NA NA NA NA NA 0.4327311 0.0388560
Andrews, TJ; Summers, CH 1996 Andrews and Summers 1996 - 2 Anolis carolinensis 18 F 16.393 5 53 NA NA NA NA NA NA NA 0.7792981 0.0090711


For this species (i.e., Anolis carolinensis), Ruckman et al. (2024) classified Light green as the “Light Color” and Dark green or Brown as the “Dark Color”. Two effect sizes were extracted.

The t value was extracted from following original text: “Dominant and subordinate females were also not significantly (t16 = 1.92, p > 0.072) different in mean body color in the absence of a male; all visible pigmented body surface of both females were a light to moderate green

  • Our assessment: The t value comes from an independent t test comparing subordinate vs dominant for the no male condition. Performed data extraction is clear.

  • Required action: None.

The F value was extracted from following original text: “However, when males were present body coloration was significantly darker in all females, and statistically darkest in dominant or single females (F5,53 = 16.393, p < 0.001)

  • Our assessment: The F value seems to correspond to the following ANOVA: “Comparisons were made statistically for aggressive, submissive and courtship behavior, perch site selection and color by paired t-test or ANOVA”), which contains 2 predictors: (1) context (levels: no male, male present), and treatment (levels: single, subordinate, dominant), which explain why df1 = 5. The reason for df2 = 53 is because there are 9 females in each group for a total of 54. Since 18 rather than 54 is used as the sample size when calculating Vr, there is no action required for this study.

  • Required action: None.

Carola et al. 2014

This study appears in the dataset as:

Code
Carola_et_al_2014_subset <- metadata.new %>% filter(Authors=="Carola, V; Perlas, E; Zonfrillo, F; Soini, HA; Novotny, MV; Gross, CT") %>% select(all_of(columns.of.interest))

knitr::kable(Carola_et_al_2014_subset,format = "html")
Authors Publication.Year Study Species Sample.Size Stat.Test Test.Statistic df1 df2 r n1 mean1 sd1 n2 mean2 sd2 yi vi
Carola, V; Perlas, E; Zonfrillo, F; Soini, HA; Novotny, MV; Gross, CT 2014 Carola et al 2014 Mus musculus 20 F -5.4 1 18 NA NA NA NA NA NA NA -0.6546537 0.0171858


For this species (i.e., Mus musculus), Ruckman et al. (2024) classified Non-agouti as the “Light Color” and Agouti as the “Dark Color”. Note that Ruckman et al. (2024) wrote that aggression between different morphs were said to be excluded “we therefore limit our data set to measure of aggression within color morphs”. One effect size was extracted.

The F value was extracted from following original text: “Non-agouti mice showed significantly increased aggressive-like behavior when compared to agouti littermates in the test, exhibiting more attacks [Figure 1A; repeated measure ANOVA, genotype effect: F1,18 = 5.40, P = 0.032]

  • Our assessment: Since the Non-agouti is considered the “Light Color”, the sign of the final effect size should be negative, which is.

  • Required action: None.

When revisiting this study, we realized that there was an additional aggression proxy that was not extracted: “…and a shorter latency to the first attack [Figure 1B; repeated measure ANOVA, genotype effect: F1,18 = 7.77; P = 0.012] toward a non-agouti intruder over three consecutive trials.”.

  • Our assessment: There is no clear reason why this proxy was not extracted since latency was extracted for other studies in the dataset. In the full dataset there are several other papers where multiple effect sizes from the same group of animals were extracted. This sort of nonindependence (i.e., multiple estimates from the same group of animals) should be accounted for with a random effect (i.e., “Group ID”). This should be done for all such cases in the dataset.

  • Required action: We extracted the data for the additional effect size from Figure 1B using the R package metaDigitise (Pick et al. 2019). The corresponding rbis are calculated below and then added to the database.

Code
Carola_et_al_2014_extra_1 <- metadata.new %>% filter(Authors=="Carola, V; Perlas, E; Zonfrillo, F; Soini, HA; Novotny, MV; Gross, CT")

# emptying entry
Carola_et_al_2014_extra_1[,c(1:ncol(Carola_et_al_2014_extra_1))] <- NA

# adding variables of interest from original sources
Carola_et_al_2014_extra_1[,c("Authors","Publication.Year","Species")] <- Carola_et_al_2014_subset[,c("Authors","Publication.Year","Species")]

Carola_et_al_2014_extra_1[,"Study"] <- "Carola et al 2014 - 2"
Carola_et_al_2014_extra_1[,"Sample.Size"] <- 20
Carola_et_al_2014_extra_1[,"Stat.Test"] <- "mean"
Carola_et_al_2014_extra_1[,"Test.Statistic"] <- 7.77
Carola_et_al_2014_extra_1[,"df1"] <- 1
Carola_et_al_2014_extra_1[,"df2"] <- 18
Carola_et_al_2014_extra_1[,"n1"] <- 10
Carola_et_al_2014_extra_1[,"mean1"] <- 22.94877
Carola_et_al_2014_extra_1[,"sd1"] <- 10.578670
Carola_et_al_2014_extra_1[,"n2"] <- 10
Carola_et_al_2014_extra_1[,"mean2"] <- 35.74442
Carola_et_al_2014_extra_1[,"sd2"] <- 9.785269

# caculating rbis
Carola_et_al_2014_extra_1 <- as.data.frame(escalc(measure = "RBIS",
                                                  n2i = n1,
                                                  n1i = n2,
                                                  m2i = mean1,
                                                  m1i = mean2,
                                                  sd2i = sd1,
                                                  sd1i = sd2,
                                                  data = Carola_et_al_2014_extra_1))

# flipping the sign to reflect that it is the Non-agouti (Light Color) that takes less time to attack
Carola_et_al_2014_extra_1[,"yi"] <- Carola_et_al_2014_extra_1[,"yi"] * (-1)

# finally, adding this entry to the new dataset
metadata.new.updated <- rbind(metadata.new.updated,Carola_et_al_2014_extra_1)

There seem to be several other effect sizes that could have been extracted from this paper: “To evaluate if aggressive behavior of the resident could be modulated by the genotype of the intruder a fourth trial was carried out in which each group was split and half were exposed to non-agouti and the other half to agouti intruders. No significant behavioral differences between mice exposed to agouti or non-agouti intruders were detected (Figure S2)”.

Naretto and Chiaraviglio 2023

This study appears in the dataset as:

Code
Naretto_and_Chiaraviglio_2023_subset <- metadata.new %>% filter(Authors=="Naretto, Sergio; Chiaraviglio, Margarita") %>% select(all_of(columns.of.interest))

knitr::kable(Naretto_and_Chiaraviglio_2023_subset,format = "html")
Authors Publication.Year Study Species Sample.Size Stat.Test Test.Statistic df1 df2 r n1 mean1 sd1 n2 mean2 sd2 yi vi
Naretto, Sergio; Chiaraviglio, Margarita 2023 Naretto et al 2023 Pristidactylus achalensis 46 F -6.88 2 45 NA NA NA NA NA NA NA -0.6636723 0.0069574


For this species (i.e., Pristidactylus achalensis), Ruckman et al. (2024) classified Lighter as the “Light Color” and Darker as the “Dark Color”. One effect size was extracted.

The extracted F value corresponds to the following: “Average lightness was higher in winners compared to both losers and males categorized as having no clear outcome in the first two rounds of the tournament (Table 2; Round 1 F2,45 = 6.88, P = 0.002)

  • Our assessment: Since the Lighter are the “Light Color”, the sign of the final effect size should indeed be negative, which is. Nonetheless, the sample size should be 48 (i.e., 17+14+17) instead of 46 according to Table 2.

  • Required action: Change the sample size to 48, and recalculate vi to account for this change.

Code
metadata.new.updated[metadata.new.updated$Study=="Naretto et al 2023","Sample.Size"] <- 48

yi.tmp <- metadata.new.updated[metadata.new.updated$Study=="Naretto et al 2023","yi"]
Sample.Size.tmp <- metadata.new.updated[metadata.new.updated$Study=="Naretto et al 2023","Sample.Size"]

metadata.new.updated[metadata.new.updated$Study=="Naretto et al 2023","vi"] <- ((1-(yi.tmp^2))^2)/(Sample.Size.tmp-1)

When revisiting this study, we realized that there were additional effect sizes that could have been extracted corresponding to Rounds 2 and 3: “…; Round 2 F2,39 = 5.752, P = 0.006; Round 3 F2,37 = 1.344, P = 0.273)”.

  • Our assessment: Those two effect sizes come from the same group of animals. In the full dataset there are several other papers where multiple effect sizes from the same group of animals were extracted. This sort of nonindependence (i.e., multiple estimates from the same group of animals) should be accounted for with a random effect (i.e., “Group ID”). This should be done for all such cases in the dataset.

  • Required action: To extract these effect sizes, we first confirm that indeed the direction should remain negative by checking Table 2, and then transforming those two F values as we did for the F value above.

Code
Naretto_and_Chiaraviglio_2023_extra_1 <- metadata.new %>% filter(Authors=="Naretto, Sergio; Chiaraviglio, Margarita")

Naretto_and_Chiaraviglio_2023_extra_2 <- metadata.new %>% filter(Authors=="Naretto, Sergio; Chiaraviglio, Margarita")

# emptying entry
Naretto_and_Chiaraviglio_2023_extra_1[,c(1:ncol(Naretto_and_Chiaraviglio_2023_extra_1))] <- NA

Naretto_and_Chiaraviglio_2023_extra_2[,c(1:ncol(Naretto_and_Chiaraviglio_2023_extra_2))] <- NA


# adding variables of interest from original sources
Naretto_and_Chiaraviglio_2023_extra_1[,c("Authors","Publication.Year","Species")] <- Naretto_and_Chiaraviglio_2023_subset[,c("Authors","Publication.Year","Species")]

Naretto_and_Chiaraviglio_2023_extra_2[,c("Authors","Publication.Year","Study","Species")] <- Naretto_and_Chiaraviglio_2023_subset[,c("Authors","Publication.Year","Study","Species")]

# Round 2 value
Naretto_and_Chiaraviglio_2023_extra_1[,"Study"] <- "Naretto et al 2023 - 2"
Naretto_and_Chiaraviglio_2023_extra_1[,"Sample.Size"] <- (9+24+9)
Naretto_and_Chiaraviglio_2023_extra_1[,"Stat.Test"] <- "F"
Naretto_and_Chiaraviglio_2023_extra_1[,"Test.Statistic"] <- 5.752
Naretto_and_Chiaraviglio_2023_extra_1[,"df1"] <- 2
Naretto_and_Chiaraviglio_2023_extra_1[,"df2"] <- 39

# caculating corresponding r
df1.tmp <- Naretto_and_Chiaraviglio_2023_extra_1[,"df1"]
df2.tmp <- Naretto_and_Chiaraviglio_2023_extra_1[,"df2"]
Test.Statistic.tmp <- Naretto_and_Chiaraviglio_2023_extra_1[,"Test.Statistic"]

Naretto_and_Chiaraviglio_2023_extra_1[,"yi"] <- sqrt((df1.tmp*Test.Statistic.tmp)/(df1.tmp*Test.Statistic.tmp+df2.tmp))

# adjusting the sign accordingly 
Naretto_and_Chiaraviglio_2023_extra_1[,"yi"] <- Naretto_and_Chiaraviglio_2023_extra_1[,"yi"]*(-1)

# calculating vi
yi.tmp <- Naretto_and_Chiaraviglio_2023_extra_1[,"yi"]
Sample.Size.tmp <- Naretto_and_Chiaraviglio_2023_extra_1[,"Sample.Size"]
Naretto_and_Chiaraviglio_2023_extra_1[,"vi"] <- ((1-(yi.tmp^2))^2)/(Sample.Size.tmp-1)

# Round 3 value
Naretto_and_Chiaraviglio_2023_extra_2[,"Study"] <- "Naretto et al 2023 - 3"
Naretto_and_Chiaraviglio_2023_extra_2[,"Sample.Size"] <- (5+30+5)
Naretto_and_Chiaraviglio_2023_extra_2[,"Stat.Test"] <- "F"
Naretto_and_Chiaraviglio_2023_extra_2[,"Test.Statistic"] <- 1.344
Naretto_and_Chiaraviglio_2023_extra_2[,"df1"] <- 2
Naretto_and_Chiaraviglio_2023_extra_2[,"df2"] <- 37

# calculating corresponding r
df1.tmp <- Naretto_and_Chiaraviglio_2023_extra_2[,"df1"]
df2.tmp <- Naretto_and_Chiaraviglio_2023_extra_2[,"df2"]
Test.Statistic.tmp <- Naretto_and_Chiaraviglio_2023_extra_2[,"Test.Statistic"]

Naretto_and_Chiaraviglio_2023_extra_2[,"yi"] <- sqrt((df1.tmp*Test.Statistic.tmp)/(df1.tmp*Test.Statistic.tmp+df2.tmp))

# adjusting the sign accordingly 
Naretto_and_Chiaraviglio_2023_extra_2[,"yi"] <- Naretto_and_Chiaraviglio_2023_extra_2[,"yi"]*(-1)

# calculating vi
yi.tmp <- Naretto_and_Chiaraviglio_2023_extra_2[,"yi"]
Sample.Size.tmp <- Naretto_and_Chiaraviglio_2023_extra_2[,"Sample.Size"]
Naretto_and_Chiaraviglio_2023_extra_2[,"vi"] <- ((1-(yi.tmp^2))^2)/(Sample.Size.tmp-1)

# finally, adding this entry to the new dataset
metadata.new.updated <- rbind(metadata.new.updated,
                              Naretto_and_Chiaraviglio_2023_extra_1,
                              Naretto_and_Chiaraviglio_2023_extra_2)

In addition, the study also provides three additional tests corresponding to differences in lightness before the trials: “There were no significant differences in lightness before the beginning of each trial between categories (Table 2; Opponent A and Opponent B in Round 1: F 1,46 = 0.003, P = 0.955; W, NCO and L in Round 2: F2,39 = 1.604, P = 0.214; W, NCO and L in Round 3: F 2,37 = 0.661, P = 0.523).”.

  • Our assessment: From what is provided, we consider these set of three effect sizes alternative to the three already extracted, meaning that there is a reasonable argument for deciding whether to extract the effect sizes before or after the trial depending on the question at hand. Since the Ruckman et al. 2024 decided to extract the post-trial values, we will use that reasoning for not extracting these three additional effect sizes - note that adding these three additional effect sizes would, overall, further reduce the overall effect size.

  • Required action: None.

Martin and Hengstebeck 1981

This study appears in the dataset as:

Code
Martin_and_Hengstebeck_1981_subset <- metadata.new %>% filter(Authors=="MARTIN, FD; HENGSTEBECK, MF") %>% select(all_of(columns.of.interest))

knitr::kable(Martin_and_Hengstebeck_1981_subset,format = "html")
Authors Publication.Year Study Species Sample.Size Stat.Test Test.Statistic df1 df2 r n1 mean1 sd1 n2 mean2 sd2 yi vi
MARTIN, FD; HENGSTEBECK, MF 1981 Martin and Hengstebeck 1981 - 2 Poecilia reticulata 33 X2 21.30 6 NA NA NA NA NA NA NA NA 0.5680909 0.0143343
MARTIN, FD; HENGSTEBECK, MF 1981 Martin and Hengstebeck 1981 - 3 Poecilia reticulata 12 X2 21.30 6 NA NA NA NA NA NA NA NA 0.9420722 0.0011506
MARTIN, FD; HENGSTEBECK, MF 1981 Martin and Hengstebeck 1981 - 4 Poecilia reticulata 61 X2 21.30 6 NA NA NA NA NA NA NA NA 0.4178399 0.0113550
MARTIN, FD; HENGSTEBECK, MF 1981 Martin and Hengstebeck 1981 - 5 Poecilia reticulata 64 X2 21.30 6 NA NA NA NA NA NA NA NA 0.4079292 0.0110298
MARTIN, FD; HENGSTEBECK, MF 1981 Martin and Hengstebeck 1981 - 6 Poecilia reticulata 27 X2 18.18 6 NA NA NA NA NA NA NA NA 0.5802298 0.0169235
MARTIN, FD; HENGSTEBECK, MF 1981 Martin and Hengstebeck 1981 - 7 Poecilia reticulata 11 X2 18.18 6 NA NA NA NA NA NA NA NA 0.9090455 0.0030150
MARTIN, FD; HENGSTEBECK, MF 1981 Martin and Hengstebeck 1981 - 8 Poecilia reticulata 16 X2 18.18 6 NA NA NA NA NA NA NA NA 0.7537407 0.0124344
MARTIN, FD; HENGSTEBECK, MF 1981 Martin and Hengstebeck 1981 - 9 Poecilia reticulata 11 X2 18.18 6 NA NA NA NA NA NA NA NA 0.9090455 0.0030150
MARTIN, FD; HENGSTEBECK, MF 1981 Martin and Hengstebeck 1981 - 1 Poecilia reticulata 74 F 3.80 2 17 NA NA NA NA NA NA NA 0.5558265 0.0065419


For this species (i.e., Poecilia reticulata), Ruckman et al. (2024) classified Less black eye as the “Light Color” and Darker eye as the “Dark Color”. Nine effect sizes were extracted.

The extracted F value corresponds to the following: “Mean bout lengths of aggressive encounters were 10.3 s for dark-eyed fish, 7.2 s for intermediate fish, and 1.8 s for light-eyed fish (F2,17 = 3.80, P < 0.005)”.

  • Our assessment: We think that the extracted F value is comparable to those extracted from other studies, and the results suggest that dark-eye fish spend more time on aggressive encounters than light-eyed fish. However, the sample size assigned (74) is not correct based on df2 (17). 74 seems to be the sets of observations performed not the number of individuals.

  • Required action: Change sample size to 17+2 = 19.

Code
metadata.new.updated[metadata.new.updated$Study=="Martin and Hengstebeck 1981 - 1","Sample.Size"] <- 17+2

yi.tmp <- metadata.new.updated[metadata.new.updated$Study=="Martin and Hengstebeck 1981 - 1","yi"]
Sample.Size.tmp <- metadata.new.updated[metadata.new.updated$Study=="Martin and Hengstebeck 1981 - 1","Sample.Size"]

metadata.new.updated[metadata.new.updated$Study=="Martin and Hengstebeck 1981 - 1","vi"] <- ((1-(yi.tmp^2))^2)/(Sample.Size.tmp-1)

The 8 extracted X2 values corresponds to Table IV, where the provided values do not correspond to number of individuals but to number of encounters.

  • Our assessment: The study does not provide the number of individuals observed to generate the data presented in Table IV, not even an approximate number. The only information on sample sizes is: “Litters selected for observations had a minimum number of five fish, and for Indiana fish the observed maximum was 21. Some of the Puerto Rico fish were removed on the first day after birth so that the maximum number in a tank was 11”, but the number of tanks is not reported. For all we can see, all the observations could come from an extremely low number of individuals (even 3, if one would go to the extreme).

  • Required action: We do not think the X2 values can be reliably use for the meta-analysis as the sample size is unknown and the raw data not present, and therefore, we are excluding them from the dataset.

Code
#saving the useful entry
Martin_and_Hengstebeck_1981.tmp <- metadata.new.updated[metadata.new.updated$Study=="Martin and Hengstebeck 1981 - 1",]

#deleting the rest
metadata.new.updated <- metadata.new.updated[metadata.new.updated$Authors!="MARTIN, FD; HENGSTEBECK, MF",]

# adding the study back
metadata.new.updated <- rbind(metadata.new.updated,Martin_and_Hengstebeck_1981.tmp)

Dijkstra et al 2009b

This study appears in the dataset as:

Code
Dijkstra_et_al_2009b_subset <- metadata.new %>% filter(Authors=="Dijkstra, PD; van Dijk, S; Groothuis, TGG; Pierotti, MER; Seehausen, O") %>% select(all_of(columns.of.interest))

knitr::kable(Dijkstra_et_al_2009b_subset,format = "html")
Authors Publication.Year Study Species Sample.Size Stat.Test Test.Statistic df1 df2 r n1 mean1 sd1 n2 mean2 sd2 yi vi
Dijkstra, PD; van Dijk, S; Groothuis, TGG; Pierotti, MER; Seehausen, O 2009 Dijkstra et al 2009b Haplochromis omnicaeruleus 12 X2 9.5 2 NA NA NA NA NA NA NA NA 0.8897565 0.0039457


For this species (i.e., Haplochromis omnicaeruleus), Ruckman et al. (2024) classified Plain (orange)as the “Light Color” and Black as the “Dark Color”. Note that Ruckman et al. (2024) wrote that aggression between different morphs were said to be excluded “we therefore limit our data set to measure of aggression within color morphs”. One effect size was extracted.

The extracted X2 value corresponds to: “The female morphs differed significantly in ranking (ranking mean+-SE: OB female 1.9 +/- 0.2; P female 2.6 +/-0.5; WB female 1.5 +/- 0.2, Friedman test, X2 = 9.50, df = 2, P = 0.009, n = 12)

  • Our assessment: The authors of the original study report: “3 distinct female color morphs coexist, black-and-white blotched (WB), orange blotched (OB), and plain (P) color morphs. First, we investigated dominance relationships among female morphs using triadic and dyadic encounters in the laboratory”. We assume therefore assume that the three morphs are part of a continuum with the extremes being P (plain) and WB (black-and-white), and orange blotched (OB) being intermediate. As far as we can see everything seems correct with the data extraction from this study.

  • Required action: None.

Yang et al 2018

This study appears in the dataset as:

Code
Yang_et_al_2018_subset <- metadata.new %>% filter(Authors=="Yang, Y; Dugas, MB; Sudekum, HJ; Murphy, SN; Richards-Zawacki, CL") %>% select(all_of(columns.of.interest))

knitr::kable(Yang_et_al_2018_subset,format = "html")
Authors Publication.Year Study Species Sample.Size Stat.Test Test.Statistic df1 df2 r n1 mean1 sd1 n2 mean2 sd2 yi vi
Yang, Y; Dugas, MB; Sudekum, HJ; Murphy, SN; Richards-Zawacki, CL 2018 Yang et al 2018 - 1 Oophaga pumilio 140 X2 2.10 2 NA NA NA NA NA NA NA NA 0.1224745 0.0069800
Yang, Y; Dugas, MB; Sudekum, HJ; Murphy, SN; Richards-Zawacki, CL 2018 Yang et al 2018 - 2 Oophaga pumilio 139 X2 4.25 2 NA NA NA NA NA NA NA NA 0.1748586 0.0068100
Yang, Y; Dugas, MB; Sudekum, HJ; Murphy, SN; Richards-Zawacki, CL 2018 Yang et al 2018 - 3 Oophaga pumilio 140 X2 2.77 2 NA NA NA NA NA NA NA NA 0.1406617 0.0069124
Yang, Y; Dugas, MB; Sudekum, HJ; Murphy, SN; Richards-Zawacki, CL 2018 Yang et al 2018 - 4 Oophaga pumilio 139 X2 1.77 2 NA NA NA NA NA NA NA NA 0.1128442 0.0070630
Yang, Y; Dugas, MB; Sudekum, HJ; Murphy, SN; Richards-Zawacki, CL 2018 Yang et al 2018 - 5 Oophaga pumilio 140 X2 4.16 2 NA NA NA NA NA NA NA NA 0.1723783 0.0067731
Yang, Y; Dugas, MB; Sudekum, HJ; Murphy, SN; Richards-Zawacki, CL 2018 Yang et al 2018 - 6 Oophaga pumilio 139 X2 0.61 2 NA NA NA NA NA NA NA NA 0.0662457 0.0071829
Yang, Y; Dugas, MB; Sudekum, HJ; Murphy, SN; Richards-Zawacki, CL 2018 Yang et al 2018 - 7 Oophaga pumilio 140 X2 2.33 2 NA NA NA NA NA NA NA NA 0.1290072 0.0069568
Yang, Y; Dugas, MB; Sudekum, HJ; Murphy, SN; Richards-Zawacki, CL 2018 Yang et al 2018 - 8 Oophaga pumilio 139 X2 3.16 2 NA NA NA NA NA NA NA NA 0.1507774 0.0069206


For this species (i.e., Oophaga pumilio), Ruckman et al. (2024) classified Light red, Green and Red as the “Light Color” and Dark red, Red and Blue as the “Dark Color”, respectively. Note that Ruckman et al. (2024) wrote that aggression between different morphs were said to be excluded “we therefore limit our data set to measure of aggression within color morphs”. Eight effect sizes were extracted.

The extracted X2 values corresponds to Table S4-S7, where Likelihood Ratio (LR) X2 are presented: “The four tables below are generalized linear models evaluating the influence of male color (red, intermediate and blue), model intruder color (red, blue) and their interaction term on the likelihood of a territorial male to track (Table S4), approach (Table S5), call (Table S6) and challenge (Table S7) in the two polymorphic populations. Perch height and conspecific interaction (y/n) were included as covariates

  • Our assessment: The direction of the provided X2 values in Tables S4-S7 (as well as the general ones provided in Tables 2 and S3) is not provided in the original study. The authors of the original study only provide a direction of the effect for those that are statistically significant, e.g. “When considering all territorial males, regardless of interaction with conspecifics during the trial, neither the main effects of male colour and intruder colour nor their interaction was a significant predictor of the probability of attack in the high-red polymorphic population (Table 2).” or “GLMs for the other four variables (likelihood to track, approach, call and challenge) are presented in Tables S4–S7. We did not detect any significant main effects or an interaction between male colour and model intruder colour in any of the models”. The only indication for Bluer males being more aggressive than Redder males come from the high-blue polymorphic population in Table S3, where the authors of the original study reported “However, the likelihood of attack was positively correlated with PC2 (a hue indicator that increases with male ‘blueness’; Table S1), suggesting that bluer males were more aggressive than redder males”. Table S3 shows PC1 (X2 = 0.45, p-value = 0.505) and PC2 (X2 = 6.49, p-value = 0.011), which are quantitative measures as opposed to the “by-eye male colour” categorizations presented in Tables 2 and S4-S7 (Table S10 shows similar results to Table S3 but for “the subset of observations in which the focal male did not interact with a conspecific”). According to the authors of the original study, “PC1 captures the brightness (but much higher green and blue loading) of the male dorsum; PC2 captures hue, or how blue the male was along the red-blue spectrum”. Summarizing, from the reported results, it is not possible to know the direction of any reported X2 value other than PC2 in Table S3 (X2 = 6.49, p-value = 0.011) and the corresponding one in Tables S9 and S10, which present a subset of the same data used in Tables 2 and S3. Thus, without additional information, we cannot assume all those X2 values are positive.

  • Required action: Exclude the study. For 7 out of 8 X2 we do not know the direction of the effect. The only X2 value for which we know the direction (PC2: X2 = 6.49, p-value = 0.011, Table S3) is the one corresponding to PC2 in Table S3 - however the corresponding PC1 (which reflects male brightness) is statistically nonsignificant and we do not know in which direction. Thus we think that extracting the only X2 value for which we know the direction would lead to a biased representation of the findings of the study. Last, all X2 values provided come from Binomial GLMs rather than X2 tests, adding additional complexity to their transformation into an effect size.

Code
#deleting the study
metadata.new.updated <- metadata.new.updated[metadata.new.updated$Authors!="Yang, Y; Dugas, MB; Sudekum, HJ; Murphy, SN; Richards-Zawacki, CL",]

Rose and Soole 2020

This study appears in the dataset as:

Code
Rose_and_Soole_2020_subset <- metadata.new %>% filter(Authors=="Rose, P; Soole, L") %>% select(all_of(columns.of.interest))

knitr::kable(Rose_and_Soole_2020_subset,format = "html")
Authors Publication.Year Study Species Sample.Size Stat.Test Test.Statistic df1 df2 r n1 mean1 sd1 n2 mean2 sd2 yi vi
Rose, P; Soole, L 2020 Rose and Soole 2020 Phoenicoparrus minor 45 F 6.45 4 40 0.33 NA NA NA NA NA NA 0.6261767 0.0083988


For this species (i.e., Phoenicoparrus minor), Ruckman et al. (2024) classified White as the “Light Color” and Pink as the “Dark Color”. One effect size was extracted.

The extracted F value corresponds to: “Differences in time spent on aggression and plumage colour score are significant between birds (F4,40 = 6.45; r2 = 33%; p = .0004).

  • Our assessment: In this study, plumage colour is scored in four categories: 1 being white, and 2 being pink, and from the results shown in Figure 7, from which the F value was extracted, it is clear that, as reported in the original study: “Figure 7 shows that the brightest flamingos are least likely to be seen foraging and being aggressive regardless of the type of foraging location. Birds with a colour score of 3 were most often seen being aggressive; birds with a colour score of 4 had the lowest foraging occurrences”. Thus, extracting the F value, which corresponds to an omnibus test on all four categories would be rather misleading. Instead, the most straightforward way of extracting this result would have been directly from the figure. However, sample sizes for each category are missing. Thus, the second best choice here would be to extract the t value, which despite seemingly coming from a GLM, explicitly shows: “Birds with a brighter plumage are more likely to be aggressive during foraging than paler birds (estimate = 10.23; SE = 4.88; t value = 2.09; p = .04).

  • Required action: To substitute the extracted F value by the t value, delete the extracted r value, which corresponds to the R2 value of the F value, and recalculate the corresponding r and Vr values using escalc() as done for the other t values.

Code
# making changes 
metadata.new.updated[metadata.new.updated$Study=="Rose and Soole 2020","Stat.Test"] <- "t"
metadata.new.updated[metadata.new.updated$Study=="Rose and Soole 2020","Test.Statistic"] <- 2.09 # no sign change needed
metadata.new.updated[metadata.new.updated$Study=="Rose and Soole 2020","df1"] <- NA
metadata.new.updated[metadata.new.updated$Study=="Rose and Soole 2020","df2"] <- NA
metadata.new.updated[metadata.new.updated$Study=="Rose and Soole 2020","p.value"] <- 0.04
metadata.new.updated[metadata.new.updated$Study=="Rose and Soole 2020","r"] <- NA

# adding the corresponding yi and vi values
metadata.new.updated[metadata.new.updated$Study=="Rose and Soole 2020","yi"] <- escalc(measure = "COR",
                                                                                       ti = 2.09,
                                                                                       ni = 45)[1]
metadata.new.updated[metadata.new.updated$Study=="Rose and Soole 2020","vi"] <- escalc(measure = "COR",
                                                                                       ti = 2.09,
                                                                                       ni = 45)[2]

Podberscek and Serpell 1996

This study appears in the dataset as:

Code
Podberscek_and_Serpell_1996_subset <- metadata.new %>% filter(Authors=="Podberscek, AL; Serpell, JA") %>% select(all_of(columns.of.interest))

knitr::kable(Podberscek_and_Serpell_1996_subset,format = "html")
Authors Publication.Year Study Species Sample.Size Stat.Test Test.Statistic df1 df2 r n1 mean1 sd1 n2 mean2 sd2 yi vi
Podberscek, AL; Serpell, JA 1996 Podberscek and Serpell 1996 - 1 Canis lupus familiaris 428 t 2.582 427 NA NA NA NA NA NA NA NA 0.1241308 0.0022703
Podberscek, AL; Serpell, JA 1996 Podberscek and Serpell 1996 - 10 Canis lupus familiaris 423 t 2.400 422 NA NA NA NA NA NA NA NA 0.1161768 0.0023061
Podberscek, AL; Serpell, JA 1996 Podberscek and Serpell 1996 - 11 Canis lupus familiaris 426 t 3.492 425 NA NA NA NA NA NA NA NA 0.1671993 0.0022232
Podberscek, AL; Serpell, JA 1996 Podberscek and Serpell 1996 - 12 Canis lupus familiaris 428 t 3.643 427 NA NA NA NA NA NA NA NA 0.1738172 0.0022025
Podberscek, AL; Serpell, JA 1996 Podberscek and Serpell 1996 - 3 Canis lupus familiaris 426 t 2.774 425 NA NA NA NA NA NA NA NA 0.1335113 0.0022698
Podberscek, AL; Serpell, JA 1996 Podberscek and Serpell 1996 - 4 Canis lupus familiaris 142 t 3.365 141 NA NA NA NA NA NA NA NA 0.2735472 0.0060705
Podberscek, AL; Serpell, JA 1996 Podberscek and Serpell 1996 - 6 Canis lupus familiaris 425 t 3.336 424 NA NA NA NA NA NA NA NA 0.1601094 0.0022391
Podberscek, AL; Serpell, JA 1996 Podberscek and Serpell 1996 - 7 Canis lupus familiaris 428 t 4.988 427 NA NA NA NA NA NA NA NA 0.2349070 0.0020906
Podberscek, AL; Serpell, JA 1996 Podberscek and Serpell 1996 - 8 Canis lupus familiaris 427 t 4.524 426 NA NA NA NA NA NA NA NA 0.2143458 0.0021367
Podberscek, AL; Serpell, JA 1996 Podberscek and Serpell 1996 - 9 Canis lupus familiaris 428 t 3.161 427 NA NA NA NA NA NA NA NA 0.1513858 0.0022358


For this species (i.e., Canis lupus familiaris), Ruckman et al. (2024) classified Black as the “Light Color” and Red/golden as the “Dark Color”. Ten effect sizes were extracted (83% of all mammal ones, 10/12).

Results are shown in: “Within the solid colour group, red/goldens were compared with blacks. Here it was found that red/goldens were significantly more likely to be aggressive in a number of situations. These included, Al (towards strange dogs; Mann-Whitney U test, Z = 2.582, P < 0.01), A4 (towards persons approaching owner away from home; Z = 2.774, P < 0.011, A5 (towards children in the household; Z= 3.365, P < 0.001), A7 (when owner gives attention to other person or animal; Z = 3.336, P < 0.001), A8 (toward owner or member of owner’s family; Z= 4.988, P < 0.001), A9 (when disciplined; Z= 4.524, P < 0.001)>, A10 (when reached for or handled; Z= 3.161, P < 0.011, All (when in restricted spaces; Z = 2.4, P < 0.05>, Al2 (at meal times/ defending food; Z = 3.492, P < 0.001)), Al3 (sudden and without apparent reason; Z= 3.643, P < 0.001).

  • Our assessment: Ruckman et al. 2024 established the following criterium: “We defined aggression as any variable that measured antagonistic behaviors (e.g., biting or chasing) toward a conspecific (of same sex, color class, and age class) or mirror image.” Of the 13 questions asked of the dogs’ owners in Podberscek and Serpell 1996, only two (A1 and A6) refer to aggression towards conspecifics (A1 and A6), see table 1 of the original paper. Hence, only these two should be considered. Of these 2, only 1 (A1) is significant and therefore reported, as only statistically significant findings were reported. Only extracting significant results would bias the results, and therefore, even this single effect size should be excluded. Moreover, the methodology used (dog owner surveys) is not at all comparable with the rest of the studies where aggression was measured directly, and therefore, we think that the study should have been excluded a priori in any case.

  • Required action: Exclude this study.

Code
metadata.new.updated <- metadata.new.updated[metadata.new.updated$Authors!="Podberscek, AL; Serpell, JA",]

Karlsson et al. 2011

This study appears in the dataset as:

Code
Karlsson_et_al_2011_subset <- metadata.new %>% filter(Authors=="Karlsson, AC; Mormede, P; Kerje, S; Jensen, P") %>% select(all_of(columns.of.interest))

knitr::kable(Karlsson_et_al_2011_subset,format = "html")
Authors Publication.Year Study Species Sample.Size Stat.Test Test.Statistic df1 df2 r n1 mean1 sd1 n2 mean2 sd2 yi vi
Karlsson, AC; Mormede, P; Kerje, S; Jensen, P 2011 Karlsson et al 2011 - 1 Gallus gallus domesticus 23 mean NA NA NA NA 9 2.2 1.500 14 1.1 1.1225 -0.5109224 0.0464459
Karlsson, AC; Mormede, P; Kerje, S; Jensen, P 2011 Karlsson et al 2011 - 2 Gallus gallus domesticus 22 mean NA NA NA NA 6 2.0 1.715 16 3.6 2.4000 0.4227407 0.0643280
Karlsson, AC; Mormede, P; Kerje, S; Jensen, P 2011 Karlsson et al 2011 - 3 Gallus gallus domesticus 23 mean NA NA NA NA 9 1.5 1.800 14 2.0 2.6192 0.1378032 0.0712365
Karlsson, AC; Mormede, P; Kerje, S; Jensen, P 2011 Karlsson et al 2011 - 4 Gallus gallus domesticus 22 mean NA NA NA NA 6 0.0 0.000 16 6.2 8.0000 0.5168769 0.0549581


For this species (i.e., Gallus gallus domesticus), Ruckman et al. (2024) classified White as the “Light Color” and Wild type (red) as the “Dark Color”. Note that Ruckman et al. (2024) wrote that aggression between different morphs were said to be excluded “we therefore limit our data set to measure of aggression within color morphs”. Four effect sizes were extracted.

Results are shown in Table 2.

  • Our assessment: All values were extracted correctly.

  • Required action: None.

Seaver and Hurd 2017

This study appears in the dataset as:

Code
Seaver_and_Hurd_2017_subset <- metadata.new %>% filter(Authors=="Seaver, CMS; Hurd, PL") %>% select(all_of(columns.of.interest))

knitr::kable(Seaver_and_Hurd_2017_subset,format = "html")
Authors Publication.Year Study Species Sample.Size Stat.Test Test.Statistic df1 df2 r n1 mean1 sd1 n2 mean2 sd2 yi vi
Seaver, CMS; Hurd, PL 2017 Seaver and Hurd 2017 - 1 Pelvicachromis pulcher 85 mean NA NA NA NA 70 6.703 1.058 15 9.937 3.293 0.8834182 0.0055856
Seaver, CMS; Hurd, PL 2017 Seaver and Hurd 2017 - 2 Pelvicachromis pulcher 85 mean NA NA NA NA 70 3.468 0.803 15 1.783 0.431 -0.9611099 0.0034565
Seaver, CMS; Hurd, PL 2017 Seaver and Hurd 2017 - 3 Pelvicachromis pulcher 85 mean NA NA NA NA 70 4.645 0.941 15 9.761 3.470 1.1248359 0.0025228
Seaver, CMS; Hurd, PL 2017 Seaver and Hurd 2017 - 4 Pelvicachromis pulcher 85 mean NA NA NA NA 70 325.038 42.573 15 457.507 135.023 0.8863804 0.0054991


For this species (i.e., Pelvicachromis pulcher), Ruckman et al. (2024) classified Yellow as the “Light Color” and Red as the “Dark Color”. Note that Ruckman et al. (2024) wrote that aggression between different morphs were said to be excluded “we therefore limit our data set to measure of aggression within color morphs”. Four effect sizes were extracted.

The effects sizes were seemingly extracted from Figure 2.

  • Our assessment: All values were extracted correctly. However, there seem to be an additional effect size that could have been extracted: “There was no significant difference between females, yellow males and red males in the proportion that showed aggression to their mirror image (X22 = 3.20, p = 0.20; Table 1)”. The corresponding 2x2 contingency table for that result would be:
Code
#create table
table1.Seaver <- matrix(c(19,51,4,11), ncol=2, byrow=TRUE)
rownames(table1.Seaver) <- c("yellow","red")
colnames(table1.Seaver) <- c("nonaggressor","aggressor")
table1.Seaver <- as.table(table1.Seaver)
table1.Seaver
       nonaggressor aggressor
yellow           19        51
red               4        11

From which we can calculate the corresponding X2 value as:

Code
#chisq.test(table1.Seaver)
chisq.test(table1.Seaver,correct=F)
Warning in chisq.test(table1.Seaver, correct = F): Chi-squared approximation
may be incorrect

    Pearson's Chi-squared test

data:  table1.Seaver
X-squared = 0.0014192, df = 1, p-value = 0.9699


  • Required action: To add this additional effect size by transforming the X2 value to an r value following the procedures used for the other studies.
Code
Seaver_and_Hurd_2017_extra_1 <- metadata.new %>% filter(Study=="Seaver and Hurd 2017 - 1")

# emptying entry
Seaver_and_Hurd_2017_extra_1[,c(1:ncol(Seaver_and_Hurd_2017_extra_1))] <- NA


# adding variables of interest from original sources
Seaver_and_Hurd_2017_extra_1[,c("Authors","Publication.Year","Species")] <- Seaver_and_Hurd_2017_subset[1,c("Authors","Publication.Year","Species")]

Seaver_and_Hurd_2017_extra_1[,"Study"] <- "Seaver and Hurd 2017 - 5"
Seaver_and_Hurd_2017_extra_1[,"Sample.Size"] <- sum(table1.Seaver)
Seaver_and_Hurd_2017_extra_1[,"Stat.Test"] <- "X2"
Seaver_and_Hurd_2017_extra_1[,"Test.Statistic"] <- chisq.test(table1.Seaver,correct=F)$statistic[[1]]
Warning in chisq.test(table1.Seaver, correct = F): Chi-squared approximation
may be incorrect
Code
Seaver_and_Hurd_2017_extra_1[,"df1"] <- chisq.test(table1.Seaver,correct=F)$parameter[[1]]
Warning in chisq.test(table1.Seaver, correct = F): Chi-squared approximation
may be incorrect
Code
Seaver_and_Hurd_2017_extra_1[,"p.value"] <- chisq.test(table1.Seaver,correct=F)$p.value[[1]]
Warning in chisq.test(table1.Seaver, correct = F): Chi-squared approximation
may be incorrect
Code
# caculating corresponding r
Seaver_and_Hurd_2017_extra_1[,"yi"] <- sqrt((chisq.test(table1.Seaver,correct=F)$statistic[[1]])/
                                              (sum(table1.Seaver)*(nrow(table1.Seaver)-1)))
Warning in chisq.test(table1.Seaver, correct = F): Chi-squared approximation
may be incorrect
Code
# calculating vi
Seaver_and_Hurd_2017_extra_1[,"vi"] <- ((1 - (Seaver_and_Hurd_2017_extra_1[,"yi"] ^ 2)) ^ 2)/(sum(table1.Seaver) - 1)

# finally, adding this entry to the new dataset
metadata.new.updated <- rbind(metadata.new.updated,
                              Seaver_and_Hurd_2017_extra_1)

Zinzow-Kramer et al. 2015

This study appears in the dataset as:

Code
Zinzow_Kramer_et_al_2015_subset <- metadata.new %>% filter(Authors=="Zinzow-Kramer, WM; Horton, BM; McKee, CD; Michaud, JM; Tharp, GK; Thomas, JW; Tuttle, EM; Yi, S; Maney, DL") %>% select(all_of(columns.of.interest))

knitr::kable(Zinzow_Kramer_et_al_2015_subset,format = "html")
Authors Publication.Year Study Species Sample.Size Stat.Test Test.Statistic df1 df2 r n1 mean1 sd1 n2 mean2 sd2 yi vi
Zinzow-Kramer, WM; Horton, BM; McKee, CD; Michaud, JM; Tharp, GK; Thomas, JW; Tuttle, EM; Yi, S; Maney, DL 2015 Zinzow-Kramer et al 2015 - 1 Zonotrichia albicollis 19 mean NA NA NA NA 10 23.516 19.6829630 9 1.733 7.0893827 -0.7591132 0.0257579
Zinzow-Kramer, WM; Horton, BM; McKee, CD; Michaud, JM; Tharp, GK; Thomas, JW; Tuttle, EM; Yi, S; Maney, DL 2015 Zinzow-Kramer et al 2015 - 2 Zonotrichia albicollis 19 mean NA NA NA NA 10 0.000 0.2469136 9 0.000 0.4938272 0.0000000 0.0874048
Zinzow-Kramer, WM; Horton, BM; McKee, CD; Michaud, JM; Tharp, GK; Thomas, JW; Tuttle, EM; Yi, S; Maney, DL 2015 Zinzow-Kramer et al 2015 - 3 Zonotrichia albicollis 19 mean NA NA NA NA 10 20.846 41.3876543 9 16.172 19.3279012 -0.0937890 0.0861864
Zinzow-Kramer, WM; Horton, BM; McKee, CD; Michaud, JM; Tharp, GK; Thomas, JW; Tuttle, EM; Yi, S; Maney, DL 2015 Zinzow-Kramer et al 2015 - 4 Zonotrichia albicollis 19 mean NA NA NA NA 10 7.565 9.0933333 9 11.342 11.4419753 0.2391339 0.0796381
Zinzow-Kramer, WM; Horton, BM; McKee, CD; Michaud, JM; Tharp, GK; Thomas, JW; Tuttle, EM; Yi, S; Maney, DL 2015 Zinzow-Kramer et al 2015 - 5 Zonotrichia albicollis 19 mean NA NA NA NA 10 10.154 0.3037037 9 10.155 0.8355556 0.0010778 0.0874046
Zinzow-Kramer, WM; Horton, BM; McKee, CD; Michaud, JM; Tharp, GK; Thomas, JW; Tuttle, EM; Yi, S; Maney, DL 2015 Zinzow-Kramer et al 2015 - 6 Zonotrichia albicollis 19 mean NA NA NA NA 10 7.168 4.4069136 9 6.047 4.0217284 -0.1737715 0.0832583
Zinzow-Kramer, WM; Horton, BM; McKee, CD; Michaud, JM; Tharp, GK; Thomas, JW; Tuttle, EM; Yi, S; Maney, DL 2015 Zinzow-Kramer et al 2015 - 7 Zonotrichia albicollis 19 mean NA NA NA NA 10 0.782 0.6508642 9 0.977 1.0834568 0.1455103 0.0844867
Zinzow-Kramer, WM; Horton, BM; McKee, CD; Michaud, JM; Tharp, GK; Thomas, JW; Tuttle, EM; Yi, S; Maney, DL 2015 Zinzow-Kramer et al 2015 - 8 Zonotrichia albicollis 19 mean NA NA NA NA 10 0.247 0.2128395 9 0.354 0.6306173 0.1530315 0.0841802


For this species (i.e., Zonotrichia albicollis), Ruckman et al. (2024) classified White (WS) as the “Light Color” and Tan (TS) as the “Dark Color”. Eight effect sizes were extracted.

The effects sizes were extracted from Figure 1.

  • Our assessment: All values extracted are correct, but there are two effect sizes for which the sign have been flipped. Those two correspond to latency to approach (the time from start of playback until the resident male arrived: the longer, the more scared), and distance of closest approach to the decoy (the further, the more scared). In addition, the extracted values correspond to medians rather than means, which should have probably accounted for since medians can be rather far from means when data is skewed (for more on this, see https://training.cochrane.org/handbook/current/chapter-06#section-6-5-2). We ignore this last issue as likely inconsequential.

  • Required action: To assign a negative sign to effect sizes corresponding to latency to approach and instance of closest approach to the decoy.

Code
# adjusting the sign accordingly 
metadata.new.updated[metadata.new.updated$Study=="Zinzow-Kramer et al 2015 - 7","yi"] <- metadata.new.updated[metadata.new.updated$Study=="Zinzow-Kramer et al 2015 - 7","yi"]*(-1)
metadata.new.updated[metadata.new.updated$Study=="Zinzow-Kramer et al 2015 - 8","yi"] <- metadata.new.updated[metadata.new.updated$Study=="Zinzow-Kramer et al 2015 - 8","yi"]*(-1)

Describing differences between original and updated dataset

After accounting for the issues found in 8 out 11 studies (64%) that were reassessed and that correspond to 15% of all studies included in Ruckman et al. (2024), the new and updated dataset contains 147 effect sizes extracted from 72 studies and covering 55 species*, whereas the original dataset contained 169 effect sizes extracted from 74 studies and covering 55 species*.

* Note that the final number of species for the analysis is 54 because we renamed Haplochromis omnicaeruleus as Haplochromis paludinosus following the updated taxonomic information.

Our reassessment allowed us to reduce (but likely not eliminate) the consequences of the excess of positive values found in the original dataset, which, based on our reassessment we believe to be largely caused by an incorrect management of effect size direction (more below). Indeed, whereas the percentage of positive values for each effect size origin for the original dataset looked like:

Code
################################################################################
# Exploring effect size type disagreements
################################################################################

# calculate percentage of positive values for each type of effect size for
# the original
effect.size.positive.perc.original <- metadata.new %>% 
  group_by(Stat.Test) %>% 
  mutate(Stat.Test = factor(Stat.Test, 
                            levels = c("r","mean","t","F","X2"))) %>%
  mutate(Stat.Test = recode(Stat.Test, r = "Pearson's r",
                            mean = "mean, SD, N",
                            t = "t value",
                            F = "F value",
                            X2 = "X2 value")) %>% 
  summarise(Percentage = round(100*table(yi<0)[1]/n(),1))# %>% 

knitr::kable(effect.size.positive.perc.original,format = "html") # output format specification is optional
Stat.Test Percentage
Pearson's r 44.0
mean, SD, N 61.0
t value 85.7
F value 93.9
X2 value 100.0

The corresponding percentages for the new and updated dataset looked like:

Code
# and the updated database
effect.size.positive.perc.updated <- metadata.new.updated %>%
  group_by(Stat.Test) %>%
  mutate(Stat.Test = factor(Stat.Test,
                            levels = c("r","mean","t","F","X2"))) %>%
  mutate(Stat.Test = recode(Stat.Test, r = "Pearson's r",
                            mean = "mean, SD, N",
                            t = "t value",
                            F = "F value",
                            X2 = "X2 value")) %>%
  summarise(Percentage = round(100*table(yi<0)[1]/n(),1))# %>% 

knitr::kable(effect.size.positive.perc.updated,format = "html") # output format specification is optional
Stat.Test Percentage
Pearson's r 44.0
mean, SD, N 54.8
t value 80.8
F value 88.2
X2 value 100.0

Here are the corresponding figures for the original dataset:

Code
# generating the data subset
metadata.original.yi <- metadata.new %>%
  select(c(yi,vi,Stat.Test)) %>%
  mutate(Stat.Test = factor(Stat.Test,
                            levels = c("r","mean","t","F","X2"))) %>%
  mutate(Stat.Test = recode(Stat.Test, r = "Pearson's r",
                            mean = "mean, SD, N",
                            t = "t value",
                            F = "F value",
                            X2 = "X2 value"))

# generating label for annotation
effect.size.positive.perc.original$label.perc <- paste0(round(effect.size.positive.perc.original$Percentage,0),
                                                        "%\npositive")


# effect size magnitude # more at: https://indrajeetpatil.github.io/ggstatsplot/reference/ggbetweenstats.html
set.seed(77)
yi.plot.original <- ggbetweenstats(
  data  = metadata.original.yi,
  x     = Stat.Test,
  y     = yi,
  point.args = list(position = ggplot2::position_jitterdodge(dodge.width = 0.6),
                    alpha = 0.4,
                    size = 1/sqrt(metadata.original.yi$vi)-min(1/sqrt(metadata.original.yi$vi))+0.1,
                    stroke = 0, na.rm = TRUE),
  #point.args = list(size = 1),
  type = "parametric",
  pairwise.display = "none",
  #p.adjust.method = "none", # if no multiple correction used, differences are everywhere
  #ggsignif.args = list(textsize = 3, tip_length = 0.02, na.rm = TRUE), # if pairwise.display on, change size
  bf.message = F,
  effsize.type = "eta", # which corresponds to the partial eta squared we are using to transform F-to-r
  #results.subtitle = F, # to remove statistical results from the top of the plot
  centrality.label.args = list(size = 3, nudge_x = 0.4,
                               segment.linetype = 3,
                               min.segment.length = 0),
  xlab = "\nEffect size origin\n",
  ylab = "\nEffect size\n(r and rbis)",
  title = "\nDoes effect size magnitude differ considerably depending on their origin?"
) +
  # modifying text size
  theme(axis.text=element_text(size=10),
        axis.title=element_text(size=11,face="bold"),
        plot.title = element_text(size=12)) +
  # adding the percentage of positive effect sizes for each type
  annotate("text",
           x = seq(0.65,4.65,1),
           y = 1.1,
           label = effect.size.positive.perc.original$label.perc) +
  # adding grey area to better signal postive vs negative values
  annotate("rect", xmin = 0, xmax = 6, ymin = -1, ymax = 0,
           alpha = .1)

yi.plot.original

and the new and updated dataset:

Code
# generating the data subset
metadata.updated.yi <- metadata.new.updated %>%
  select(c(yi,vi,Stat.Test)) %>%
  mutate(Stat.Test = factor(Stat.Test,
                            levels = c("r","mean","t","F","X2"))) %>%
  mutate(Stat.Test = recode(Stat.Test, r = "Pearson's r",
                            mean = "mean, SD, N",
                            t = "t value",
                            F = "F value",
                            X2 = "X2 value"))

# generating label for annotation
effect.size.positive.perc.updated$label.perc <- paste0(round(effect.size.positive.perc.updated$Percentage,0),
                                                        "%\npositive")


# effect size magnitude # more at: https://indrajeetpatil.github.io/ggstatsplot/reference/ggbetweenstats.html
set.seed(77)
yi.plot.updated <- ggbetweenstats(
  data  = metadata.updated.yi,
  x     = Stat.Test,
  y     = yi,
  point.args = list(position = ggplot2::position_jitterdodge(dodge.width = 0.6),
                    alpha = 0.4,
                    size = 1/sqrt(metadata.updated.yi$vi)-min(1/sqrt(metadata.updated.yi$vi))+0.1,
                    stroke = 0, na.rm = TRUE),
  #point.args = list(size = 1),
  type = "parametric",
  pairwise.display = "none",
  #p.adjust.method = "none", # if no multiple correction used, differences are everywhere
  #ggsignif.args = list(textsize = 3, tip_length = 0.02, na.rm = TRUE), # if pairwise.display on, change size
  bf.message = F,
  effsize.type = "eta", # which corresponds to the partial eta squared we are using to transform F-to-r
  #results.subtitle = F, # to remove statistical results from the top of the plot
  centrality.label.args = list(size = 3, nudge_x = 0.4,
                               segment.linetype = 3,
                               min.segment.length = 0),
  xlab = "\nEffect size origin\n",
  ylab = "\nEffect size\n(r and rbis)",
  title = "\nDoes effect size magnitude differ considerably depending on their origin?"
) +
  # modifying text size
  theme(axis.text=element_text(size=10),
        axis.title=element_text(size=11,face="bold"),
        plot.title = element_text(size=12)) +
  # adding the percentage of positive effect sizes for each type
  annotate("text",
           x = seq(0.65,4.65,1),
           y = 1.1,
           label = effect.size.positive.perc.updated$label.perc) +
  # adding grey area to better signal postive vs negative values
  annotate("rect", xmin = 0, xmax = 6, ymin = -1, ymax = 0,
           alpha = .1)

yi.plot.updated

Based on our exploration of 15% of all studies included in Ruckman et al. (2024), the excess of positive values found is likely due to an incorrect assignment of effect size direction in the original dataset due to: (1) not adjusting the direction of effect size of traits for which larger means less aggressive (e.g., latency to approach), (2) assigning a positive sign to directionless inferential statistics such as F and X2 values, and (3) an unexpected lower likelihood of negative effect sizes.

Conclusions

Based on our reassessment of 15% of all studies included in Ruckman et al. (2024) we cannot guarantee the reliability of the dataset. That is, despite that we have fixed the issues found in 73% of all reassessed studies, there is strong evidence suggesting that those (and possibly other) issues will be present for a substantial percentage of the remaining 85% of the studies that we did not reassessed. Thus, our re-analyses should be interpret with extreme caution as there is evidence to expect that their results will still exaggerate the true association between aggression and coloration.

The code below saves the new and updated dataset for the corresponding analyses.

Code
# saving dataset
write.csv(metadata.new.updated, "../data/new/meta_complete_data2_new_and_updated.csv")

R session info

Code
sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default


locale:
[1] LC_COLLATE=English_Germany.utf8  LC_CTYPE=English_Germany.utf8   
[3] LC_MONETARY=English_Germany.utf8 LC_NUMERIC=C                    
[5] LC_TIME=English_Germany.utf8    

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] knitr_1.49          ggplot2_3.5.1       dplyr_1.1.4        
[4] ggstatsplot_0.12.5  metafor_4.6-0       numDeriv_2016.8-1.1
[7] metadat_1.2-0       Matrix_1.6-1       

loaded via a namespace (and not attached):
 [1] gtable_0.3.6           xfun_0.49              bayestestR_0.15.0     
 [4] htmlwidgets_1.6.4      insight_1.4.2          ggrepel_0.9.6         
 [7] lattice_0.21-8         paletteer_1.6.0        mathjaxr_1.6-0        
[10] vctrs_0.6.5            tools_4.3.1            generics_0.1.3        
[13] datawizard_1.2.0       sandwich_3.1-0         tibble_3.2.1          
[16] pacman_0.5.1           pkgconfig_2.0.3        correlation_0.8.6     
[19] lifecycle_1.0.4        compiler_4.3.1         farver_2.1.2          
[22] munsell_0.5.1          codetools_0.2-19       htmltools_0.5.8       
[25] yaml_2.3.10            pillar_1.10.1          tidyr_1.3.1           
[28] MASS_7.3-60            statsExpressions_1.6.1 multcomp_1.4-28       
[31] nlme_3.1-162           tidyselect_1.2.1       digest_0.6.35         
[34] mvtnorm_1.3-2          purrr_1.2.1            rematch2_2.1.2        
[37] labeling_0.4.3         splines_4.3.1          fastmap_1.1.1         
[40] grid_4.3.1             colorspace_2.1-0       cli_3.6.1             
[43] magrittr_2.0.3         patchwork_1.3.0        survival_3.5-5        
[46] TH.data_1.1-3          withr_3.0.2            scales_1.3.0          
[49] estimability_1.5.1     rmarkdown_2.29         emmeans_1.10.6        
[52] zoo_1.8-14             coda_0.19-4            evaluate_1.0.3        
[55] parameters_0.23.0      rlang_1.1.4            Rcpp_1.1.0            
[58] zeallot_0.1.0          xtable_1.8-4           glue_1.8.0            
[61] rstudioapi_0.17.1      jsonlite_1.8.8         effectsize_0.8.9      
[64] R6_2.6.1               prismatic_1.1.2