Grouped column sums (preferably with dplyr) [duplicate] - r

This question already has answers here:
Calculate group mean, sum, or other summary stats. and assign column to original data
(4 answers)
Closed 1 year ago.
I am trying to produce a col sum where there are two grouping variables: p_id and stimuli_length.
df <- structure(list(p_id = c("p_id3", "p_id3", "p_id3", "p_id3", "p_id3",
"p_id3", "p_id3", "p_id3", "p_id3", "p_id4", "p_id4", "p_id4",
"p_id4", "p_id4", "p_id4", "p_id4", "p_id4", "p_id4", "p_id4",
"p_id4", "p_id5", "p_id5", "p_id5", "p_id5"), stimuli_length = c(4L,
4L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 4L, 4L, 5L, 5L, 6L,
6L, 6L, 6L, 7L, 7L, 7L, 7L), value = c(1, 1, 1, 1, 0, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1), sum = c(2, 2,
2, 2, 3, 3, 3, 3, 1, 3, 3, 3, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3,
3), expected_result = c(2, 2, 2, 2, 3, 3, 3, 3, 1, 3, 3, 3, 2,
2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3)), row.names = c(NA, -24L), class = c("tbl_df",
"tbl", "data.frame"))
For each p_id, and each stimuli_length, I would like the sum of the value column.
So, for p_id3 and stimuli_length == 4, the sum is 1 + 1 = 2.
My attempt does not give the correct sum:
res <- df %>% group_by(p_id) %>% group_by(stimuli_length) %>%
select(value) %>% rowwise() %>% mutate(sum = sum(value))

We don't need rowwise - as rowwise does a grouping by each row and there is only one observation when we do the sum. Instead, do a single group_by expression by adding the 'p_id', 'stimuli_length' and mutate directly to get the sum of 'value'
library(dplyr)
df %>%
group_by(p_id, stimuli_length) %>%
mutate(Sum = sum(value)) %>%
ungroup

Related

misaligned bars in geom_bar using fill

I have a problem with ggplot's geom_bar.
I have several bar charts rendered from several variables all using the same fourth variable as a fill:
For some reason in the third chart the columns for cohort 2 are misaligned.
All three charts use the same Dataset and the same code.
library(tidyverse)
library(patchwork)
myColours <- c("#A71C49","#11897A","#DD4814", "#282A36")
DataSet <- structure(list(`Var1` = c(3, 2, 5, 3, 4, 1, 3, 1,
5, 4, 5, 3, 5, 5, 5, 4, 4, 5, 4, 5, 5, 5, 5, 1, 5, 5, 4, 4, 3,
5, 5, 4, 1, 3, 5, 2, 5, 5, 4, 4, 2, 5, 1, 5, 3, 5, 5, 5, 2, 5,
3, 1, 5, 5, 5, 5, 4), `Var2` = c(3, 1, 4, 1, 2, 2,
3, 1, 3, 3, 3, 3, 2, 5, 5, 1, 4, 4, 5, 5, 4, 5, 3, 2, 3, 5, 2,
3, 3, 5, 5, 2, 1, 3, 4, 2, 4, 5, 3, 3, 5, 3, 1, 4, 3, 5, 3, 4,
2, 4, 1, 4, 4, 5, 1, 3, 3), `Var3` = c(3, 2, 1,
3, 1, 4, 3, 2, 4, 3, 3, 3, 5, 3, 3, 3, 3, 5, 5, 5, 3, 3, 3, 4,
5, 2, 4, 4, 4, 5, 5, 1, 1, 3, 5, 2, 5, 5, 3, 4, 3, 1, 1, 4, 3,
2, 5, 4, 2, 4, 4, 1, 4, 5, 1, 5, 2), Cohort = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"
), class = "factor")), row.names = c(NA, -57L), class = c("tbl_df",
"tbl", "data.frame"))
c5 <- ggplot(DataSet, aes(`Var1`, fill=Cohort)) +
geom_bar() +
theme(legend.position = "none") +
ylim(0,25) +
scale_fill_manual(values=myColours)
c6 <- ggplot(DataSet, aes(`Var2`, fill=Cohort)) +
geom_bar() +
theme(legend.position = "none") +
ylim(0,25) +
scale_fill_manual(values=myColours)
c7 <- ggplot(DataSet, aes(`Var3`, fill=Cohort)) +
geom_bar() +
ylim(0,25) +
scale_fill_manual(values=myColours)
(c5 | c6 ) /
(c7 | guide_area())
I have the following error messages:
1: Removed 2 rows containing missing values (geom_bar).
2: position_stack requires non-overlapping x intervals
The missing values refer to the graph for Var1, the non-overlapping x intervals for the third graph.
If I render out just Cohort two I also get these weirdly misaligned bars:
And Cohort 3 Var 3 to compare
I would have suspected the fact, that there are only two different numbers in cohort 2, but it works for the barchart with Var1 above. It is also not patchwork buggering it up, as it is the same when I render out just the Var3 barchart. It is also not the legend only being rendered for the Var3 Graph
Does anyone have an idea what the problem is or how I can force ggplot to align the bars correctly?
Thank you!
(R version 4.0.4 Patched (2021-02-17 r80030); tidyverse v.1.3.1; patchwork v.1.1.1)
The third graph is misaligned as stacked barplots interpret the value as numerical, making it unsuitable for stacking (do you stack x=1 and x= 1.00001 on top of each other etc.?). Transforming it to an ordered vector helps ggplot understand.
Consider this example only using the tidyverse:
myColours <- c("#A71C49","#11897A","#DD4814", "#282A36")
# Lenthen the dataset
DataSet2 <- DataSet %>% pivot_longer(cols = -Cohort,names_to = "Variable")
# This helps against the non-overlapping x intervals issue
DataSet2$value <- as.ordered(DataSet2$value)
ggplot(DataSet2,aes(x=value,fill=Cohort)) +
geom_bar(position= position_stack()) + ylim(0,25)+
facet_wrap(vars(Variable)) + # make multiple graphs split by the column "Variable"
scale_fill_manual(values=myColours)
Result:

recoding variables in a loop in R

I want to recode several variables together. All these variables will undergo same recoding change.
For this, I followed the thread below. The thread below describes two ways of doing it.
1). Using column number
2). using variable names
I tried both but I get an error message.
Error message for 1) and 2).
Error in (function (var, recodes, as.factor, as.numeric = TRUE, levels) :
unused arguments (2 = "1", 3 = "1", 1 = "0", 4 = "0", na.rm = TRUE)
recode variable in loop R
#Uploading libraries
library(dplyr)
library(magrittr)
library(plyr)
library(readxl)
library(tidyverse)
#Importing file
mydata <- read_excel("CCorr_Data.xlsx")
df <- data.frame(mydata)
attach(df)
#replacing codes for variables
df %>%
mutate_at(c(1:7), recode, '2'='1', '3'='1', '1'='0', '4'='0', na.rm = TRUE) %>%
mutate_at(c(15:24), recode, '2'='0', na.rm = TRUE)
df %>%
mutate_at(vars(E301, E302, E303), recode,'2'='1', '3'='1', '1'='0', '4'='0', na.rm = TRUE) %>%
mutate_at(vars(B201, B202, B203), recode, '2'='0', na.rm = TRUE)
Can someone tell me where am I going wrong?
In my dataset there are missing values that's why I have included na.rm = T. I even tried without including the missing value command, the error message was the same even then.
Please see below for sample data.
structure(list(Country = c(1, 1, 1, 1, 1, 1), HHID = c("12ae5148e245079f-122042",
"12ae5148e245079f-123032", "12ae5148e245079f-123027", "12ae5148e245079f-123028",
"12ae5148e245079f-N123001", "12ae5148e245079f-123041"), HHCode = c("122042",
"123032", "123027", "123028", "N123001", "123041"), A103 = c(2,
2, 2, 2, 2, 2), A104 = c("22", "23", "23", "23", "23", "23"),
Community = c("Mehmada", "Dhobgama", "Dhobgama", "Dhobgama",
"Dhobgama", "Dhobgama"), E301 = c(3, 3, 3, 3, 3, 3), E302 = c(3,
2, 4, 4, 3, 3), E303 = c(3, 2, 3, 3, 3, 3), E304 = c(3, 4,
4, 4, 3, 3), E305 = c(3, 2, 3, 3, 3, 3), E306 = c(3, 3, 3,
3, 3, 3), E307 = c(3, 3, 3, 3, 3, 3), E308 = c(3, 1, 3, 3,
3, 3), B201.1 = c(NA, 1, 1, 1, 1, 1), B202.1 = c(NA, 1, 1,
1, 1, 1), B203.1 = c(NA, 1, 1, 2, 2, 1), B204.1 = c(NA, 2,
1, 2, 1, 1), B205.1 = c(NA, 2, 1, 2, 2, 2), B206.1 = c(NA,
1, 1, 1, 2, 1), B207.1 = c(NA, 2, 1, 2, 2, 1), B208.1 = c(NA,
2, 2, 2, 2, 2), B209.1 = c(NA, 2, 1, 1, 1, 1), B210.1 = c(NA,
1, 1, 1, 1, 1)), row.names = c(NA, 6L), class = "data.frame")
```
The issue is with in the na.rm = TRUE, recode doesn't have that argument
library(dplyr)
df %>%
mutate_at(vars(E301, E302, E303), recode,'2'='1', '3'='1', '1'='0', '4'='0') %>%
mutate_at(vars(B201, B202, B203), recode, '2'='0')
Try using :
library(dplyr)
df %>%
mutate_at(1:7, recode, '2'='1', '3'='1', '1'='0', '4'='0') %>%
mutate_at(15:24, recode, '2'='0')

R: Error when grouping data using subset or index methods- produces list of all column names

I am encountering an error when trying to group my data based on categories of a variable and I am not sure why it is happening because I have used the two most widely recommended methods, subset(dataframe, variable==X) and dataframe[dataframe4variable ==X], successfully with past data sets and just now using the mtcars dataset.
The problem is that when I try to run my code, I get an error in which R just prints out the names of all of my variables(see below).
I am not quite sure how to "show" this problem-- any recommendations regarding what information would be useful to you all would be greatly appreciated. This problem is not reproducible with other datasets. Thank you for any advice you are able to give.
My dataset "wits" has 363 observations and 92 variables. My variable "complete" is a factor variable with four levels: "completed all", "stopped after demos", "stopped after consent", and "skipped manip bc poor id." I would like to create a new dataset made up only of participants with "completed all". I have tried these two methods:
wits_c <- wits[wits$complete=="Completed all", ]
wits_c <-subset(wits,complete=="Completed all")
Which results in the following error:
Error: Columns `Start`, `End`, `GameCode`, `workerID`, `condition`, `about`, `valid`, `consv`, `merit1`, `merit2`, `merit3`, `gender`, `gender_TEXT`, `poorid`, `choseskip`, `age`, `edu`, `race`, `race_TEXT`, `complete`, `distracted`, `happen`, `about__1`, `playread`, `thinking_1`, `thinking_2`, `thinking_3`, `thinking_4`, `thinking_5`, `thinking_6`, `thinking_7`, `text`, `logical_1`, `logical_2`, `logical_3`, `logical_4`, `controll_1`, `controll_2`, `controll_3`, `controll_4`, `controll_5`, `controll_6`, `controll_7`, `controll_8`, `controll_9`, `controll_10`, `privatesol`, `publicsol`, `privatesol_2`, `privatesol_3`, `privatesol_5`, `publicsol_2`, `publicsol_3`, `publicsol_5`, `policy_1`, `policy_2`, `policy_3`, `policy_4`, `colaction_5`, `colaction_6`, `colaction_7`, `colaction_8`, `colaction_10`, `colaction_13`, `joke`, `random`, `say`, `wits`, `wits_nb`, `neutral`, `rural_id`, `relig_id`, `prog_id`, `vignette`, `merit3R`, `policy_3R`, `policy_4R`, `controll_4R`, `controll_5R`,`co
Thank you to user Markdly for the suggestion to include the following output which provides more detailed information about my dataset:
dput(head(wits))
structure(list(Start = structure(c(1499525516, 1499516293, 1499516379,
1499516319, 1499516949, 1499516709), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), End = structure(c(1499525762, 1499518121,
1499516954, 1499517222, 1499517412, 1499517512), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), GameCode = c(2991999, 5712506, 1002944,
8916111, 3495462, 9127270), workerID = c("ACIHCWKHNFC7U", "A3UAO2LYUPO7L6",
"A8L94A9EF23BV", "A258JTYUD56LOE", "A12SJSJIUR3A23", "A1HHOCO3ZZHCJZ"
), condition = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("WITS",
"WITS No Blurb", "Neutral", "Read"), class = "factor"), about = c(2,
2, 2, 2, 2, 2), valid = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label =
c("valid responses",
"invalid responses"), class = "factor"), consv = c(4, 2, 2, 6,
4, 4), merit1 = c(5, 3, 2, 4, 6, 4), merit2 = c(4, 4, 2, 5, 5,
4), merit3 = c(3, 4, 2, 5, 4, 5), gender = structure(c(1L, 1L,
2L, 2L, 2L, 2L), .Label = c("man", "woman", "non-binary", "other"
), class = "factor"), gender_TEXT = c(NA, NA, NA, NA, NA, NA),
poorid = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("not id as poor",
"id as poor"), class = "factor"), choseskip = structure(c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), .Label = c("poor but continued", "poor and skipped"), class = "factor"),
age = c(28, 30, 33, 41, 26, 30), edu = c(5, 5, 6, 5, 3, 5
), race = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("white",
"black", "latino", "asian", "native american", "other", "multiracial"
), class = "factor"), race_TEXT = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), complete = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Completed
all",
"Stopped after demos", "Stopped after consent", "Skipped manip bc poor id"
), class = "factor"), distracted = c(4, 5, 0, 0, 0, 5), happen = c(0,
0, 0, 0, 0, 0), about__1 = c(2, 2, 2, 2, 2, 2), playread = c(1,
1, 1, 1, 1, 1), thinking_1 = c(1, 6, 5, 4, 6, 4), thinking_2 = c(4,
3, 5, 1, 5, 5), thinking_3 = c(1, 4, 7, 4, 5, 5), thinking_4 = c(6,
4, 7, 1, 5, 3), thinking_5 = c(5, 3, 6, 6, 5, 4), thinking_6 = c(4,
3, 7, 6, 6, 4), thinking_7 = c(6, 3, 7, 6, 5, 5), text = c(2,
4, 5, 4, 2, 3), logical_1 = c(1, 3, 4, 4, 4, 4), logical_2 = c(2,
4, 3, 4, 5, 4), logical_3 = c(4, 3, 5, 4, 5, 4), logical_4 = c(1,
6, 3, 4, 4, 3), controll_1 = c(6, 6, 1, 4, 4, 4), controll_2 = c(1,
3, 1, 4, 5, 5), controll_3 = c(1, 4, 3, 6, 4, 5), controll_4 = c(3,
4, 6, 3, 4, 4), controll_5 = c(6, 3, 6, 1, 4, 4), controll_6 = c(3,
3, 1, 3, 5, 4), controll_7 = c(2, 5, 1, 5, 4, 5), controll_8 = c(2,
2, 1, 4, 5, 3), controll_9 = c(6, 2, 6, 3, 5, 5), controll_10 = c(1,
3, 6, 2, 5, 3), privatesol = c(5, 5.33333333333333, 12, 8,
8.33333333333333, 8.33333333333333), publicsol = c(7.66666666666667,
2.66666666666667, 12, 2, 11.3333333333333, 8.33333333333333
), privatesol_2 = c(1, 11, 12, 11, 11, 3), privatesol_3 = c(3,
2, 12, 2, 3, 11), privatesol_5 = c(11, 3, 12, 11, 11, 11),
publicsol_2 = c(11, 3, 12, 2, 12, 11), publicsol_3 = c(11,
2, 12, 2, 11, 3), publicsol_5 = c(1, 3, 12, 2, 11, 11), policy_1 = c(1,
5, 6, 1, 4, 4), policy_2 = c(3, 2, 6, 1, 5, 4), policy_3 = c(1,
3, 2, 6, 5, 3), policy_4 = c(6, 3, 5, 6, 5, 4), colaction_5 = c(2,
5, 6, 1, 2, 5), colaction_6 = c(6, 4, 1, 6, 5, 4), colaction_7 = c(6,
2, 6, 1, 3, 4), colaction_8 = c(4, 5, 6, 1, 2, 4), colaction_10 = c(4,
3, 1, 6, 5, 4), colaction_13 = c(3, 2, 6, 1, 2, 3), joke = c(2,
2, 2, 2, 2, 2), random = c(2, 2, 2, 2, 2, 2), say = c(NA,
"Nope", NA, "This was highly biased survey.", "good survey",
"NO"), wits = c(1, 1, 1, 1, 1, 1), wits_nb = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), neutral = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), rural_id = c(0,
0, 0, 1, 0, 0), relig_id = c(0, 1, 1, 1, 1, 1), prog_id = c(0,
0, 1, 0, 1, 1), vignette = structure(c(1L, 1L, 1L, 1L, 1L,
1L), .Label = c("WITS", "WITS No Blurb", "Neutral", "Read"
), class = "factor"), merit3R = c(4, 3, 5, 2, 3, 2), policy_3R = c(6,
4, 5, 1, 2, 4), policy_4R = c(1, 4, 2, 1, 2, 3), controll_4R = c(4,
3, 1, 4, 3, 3), controll_5R = c(1, 4, 1, 6, 3, 3), controll_9R = c(1,
5, 1, 4, 2, 2), controll_10R = c(6, 4, 1, 5, 2, 4), colaction_6R = c(1,
3, 6, 1, 2, 3), colaction_10R = c(3, 4, 6, 1, 2, 3), logical = c(2,
4, 3.75, 4, 4.5, 3.75), cognition = structure(c(1, 6, 5,
4, 6, 4, 4, 3, 5, 1, 5, 5, 1, 4, 7, 4, 5, 5, 6, 4, 7, 1,
5, 3, 5, 3, 6, 6, 5, 4, 4, 3, 7, 6, 6, 4, 6, 3, 7, 6, 5,
5), .Dim = 6:7), engage = c(5, 3, 6.66666666666667, 6, 5.33333333333333,
4.33333333333333), pertake = c(3.66666666666667, 3.66666666666667,
6.33333333333333, 2, 5, 4.33333333333333), policy = c(2.75,
3.75, 4.75, 1, 3.25, 3.75), colaction = c(3.16666666666667,
3.5, 6, 1, 2.16666666666667, 3.66666666666667), controllability = c(2.7,
3.9, 1.2, 4.5, 3.7, 3.8), completebi = structure(c(1L, 1L,
1L, 1L, 1L, 1L), .Label = c("Completed all", "Stopped after demos"
), class = "factor"), gender2 = structure(c(1L, 1L, 2L, 2L,
2L, 2L), .Label = c("man", "woman"), class = "factor")), .Names = c("Start", "End", "GameCode", "workerID", "condition", "about", "valid",
"consv", "merit1", "merit2", "merit3", "gender", "gender_TEXT",
"poorid", "choseskip", "age", "edu", "race", "race_TEXT", "complete",
"distracted", "happen", "about__1", "playread", "thinking_1",
"thinking_2", "thinking_3", "thinking_4", "thinking_5", "thinking_6",
"thinking_7", "text", "logical_1", "logical_2", "logical_3",
"logical_4", "controll_1", "controll_2", "controll_3", "controll_4",
"controll_5", "controll_6", "controll_7", "controll_8", "controll_9",
"controll_10", "privatesol", "publicsol", "privatesol_2", "privatesol_3",
"privatesol_5", "publicsol_2", "publicsol_3", "publicsol_5",
"policy_1", "policy_2", "policy_3", "policy_4", "colaction_5",
"colaction_6", "colaction_7", "colaction_8", "colaction_10",
"colaction_13", "joke", "random", "say", "wits", "wits_nb", "neutral",
"rural_id", "relig_id", "prog_id", "vignette", "merit3R", "policy_3R",
"policy_4R", "controll_4R", "controll_5R", "controll_9R", "controll_10R",
"colaction_6R", "colaction_10R", "logical", "cognition", "engage",
"pertake", "policy", "colaction", "controllability", "completebi",
"gender2"), row.names = c(NA, 6L), class = c("tbl_df", "tbl",
"data.frame"))
wits$complete
#[1] Completed \nall Completed \nall Completed \nall Completed \nall Completed \nall Completed \nall
#Levels: Completed \nall Stopped after demos Stopped after consent Skipped manip bc poor id
You can see it's "Completed \nall" not "Completed all" in your wits data frame.
##So, you just use which function to subset your wits data frame.
which(wits$complete == "Completed \nall")
#[1] 1 2 3 4 5 6 ## This is index of the row. You put this to subet you data frame as below and you are good to go.
## So, this will subset your data frame
wits[which(wits$complete == "Completed \nall"),]

Setting edge attributes conditionally in list-column graphs using R igraph and dplyr (purrr)

I have a dataframe with a series of igraph objects in list-column format. I would like to conditionally set the edge color attribute.
I've included the dput output for a sample version of the actual dataframe (very large, thousands of graphs) containing just three graphs. It's still long, so I've put it at the bottom of this post and I'll explain a couple of the ideas I've tried so far.
First attempt was multiple uses of mutate and map using the purrr package.
sampleColored <- sampleGraphs %>% mutate(map(graph, function(x)
E(x)[weights == 0]$color = "blue")) %>% mutate(map(graph, function(x)
E(x)[weights < 0]$color = "red")) %>% mutate(map(graph, function(x)
E(x)[weights > 0]$color = "green"))
No error messages, but the command
shortPlots <- sampleColored %>%
mutate(plots = map(graph, function(x) plot(x, layout=layout.circle,
vertex.size=20,
edge.curved=TRUE)))
produced nice graphs with all edges colored grey.
Likewise with my second attempt where I created an edgeColor function and used a single map call.
edgecolor <- function(x) {
E(x)[weights == 0]$color <- "blue"
E(x)[weights < 0]$color <- "red"
E(x)[weights > 0]$color <- "green"
return(E(x))
}
sampleColored <- sampleGraphs %>% mutate(map(graph, function(x) edgecolor(x)))
No error and grey edges. Dropping the mutate command gives rise to the error message:
Error in as.numeric(n): cannot coerce type 'closure' to vector of type 'double'
I'm confident that this is possible and I simply don't have the understanding to get to the correct syntax. Any suggestions will be appreciated. Thanks for looking.
Here's the sampleGraph dput:
sampleGraphs <- structure(list(ID = 997:1000, graph = list(structure(list(5,
TRUE, c(0, 1, 2, 0, 3, 4, 1, 2, 4, 3, 0, 4, 2, 3, 0, 1, 3,
1, 4, 2), c(1, 0, 0, 4, 1, 1, 4, 3, 0, 2, 3, 2, 1, 4, 2,
3, 0, 2, 3, 4), c(0, 14, 10, 3, 1, 17, 15, 6, 2, 12, 7, 19,
16, 4, 9, 13, 8, 5, 11, 18), c(1, 2, 16, 8, 0, 12, 4, 5,
14, 17, 9, 11, 10, 15, 7, 18, 3, 6, 19, 13), c(0, 4, 8, 12,
16, 20), c(0, 4, 8, 12, 16, 20), list(c(1, 0, 1), structure(list(), .Names = character(0)),
structure(list(name = c("3", "0", "2", "4", "1")), .Names = "name"),
structure(list(weights = c(3L, -4L, 4L, -3L, 43L, 8L,
4L, 14L, 1L, 55L, 2L, 22L, 26L, 64L, 9L, 2L, 13L, -12L,
25L, 16L)), .Names = "weights")), <environment>), class = "igraph"),
structure(list(5, TRUE, c(0, 1, 2, 2, 1, 3, 1, 3, 4, 3, 3,
0, 4, 0, 4, 4, 2, 1, 2, 0), c(3, 3, 4, 0, 2, 1, 4, 2, 0,
4, 0, 2, 1, 4, 2, 3, 3, 0, 1, 1), c(19, 11, 0, 13, 17, 4,
1, 6, 3, 18, 16, 2, 10, 5, 7, 9, 8, 12, 14, 15), c(17, 3,
10, 8, 19, 18, 5, 12, 11, 4, 7, 14, 0, 1, 16, 15, 13, 6,
2, 9), c(0, 4, 8, 12, 16, 20), c(0, 4, 8, 12, 16, 20), list(
c(1, 0, 1), structure(list(), .Names = character(0)),
structure(list(name = c("2", "0", "1", "3", "4")), .Names = "name"),
structure(list(weights = c(4L, -4L, 25L, 22L, 4L, 3L,
2L, -3L, 55L, 2L, 9L, 16L, 43L, 14L, 64L, 13L, 1L, -12L,
8L, 26L)), .Names = "weights")), <environment>), class = "igraph"),
structure(list(5, TRUE, c(0, 1, 2, 3, 4, 0, 1, 2, 1, 3, 1,
3, 2, 4, 2, 4, 0, 0, 3, 4), c(1, 4, 3, 4, 0, 4, 2, 0, 0,
2, 3, 1, 4, 1, 1, 2, 3, 2, 0, 3), c(0, 17, 16, 5, 8, 6, 10,
1, 7, 14, 2, 12, 18, 11, 9, 3, 4, 13, 15, 19), c(8, 7, 18,
4, 0, 14, 11, 13, 17, 6, 9, 15, 16, 10, 2, 19, 5, 1, 12,
3), c(0, 4, 8, 12, 16, 20), c(0, 4, 8, 12, 16, 20), list(
c(1, 0, 1), structure(list(), .Names = character(0)),
structure(list(name = c("4", "0", "3", "2", "1")), .Names = "name"),
structure(list(weights = c(43L, 4L, 9L, 16L, 25L, 64L,
-4L, 2L, 2L, 4L, -11L, 26L, -3L, 8L, 3L, 1L, 55L, 13L,
14L, 22L)), .Names = "weights")), <environment>), class = "igraph"),
structure(list(5, TRUE, c(0, 1, 2, 3, 4, 1, 3, 2, 4, 0, 1,
3, 2, 4, 0, 0, 2, 4, 1, 3), c(4, 4, 4, 1, 2, 0, 2, 3, 0,
3, 2, 0, 1, 1, 2, 1, 0, 3, 3, 4), c(15, 14, 9, 0, 5, 10,
18, 1, 16, 12, 7, 2, 11, 3, 6, 19, 8, 13, 4, 17), c(5, 16,
11, 8, 15, 12, 3, 13, 14, 10, 6, 4, 9, 18, 7, 17, 0, 1, 2,
19), c(0, 4, 8, 12, 16, 20), c(0, 4, 8, 12, 16, 20), list(
c(1, 0, 1), structure(list(), .Names = character(0)),
structure(list(name = c("1", "4", "0", "2", "3")), .Names = "name"),
structure(list(weights = c(1L, 13L, -4L, 14L, 3L, 64L,
26L, -11L, -3L, 22L, 43L, 16L, 2L, 2L, 8L, 25L, 4L, 8L,
55L, 4L)), .Names = "weights")), <environment>), class = "igraph"))), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L), .Names = c("ID",
"graph"))
Using set_edge_attr rather than igraph's idiomatic E() edge function helps. I had to revise the sampleGraph list to a simple list of graphs, upgraded to the newer version of igraph, but this works:
graphs <- sampleGraphs$graph
graphs <- lapply(graphs, function(x) upgrade_graph(x)) #making a simple list of graphs
edgecolor <- function(x) {
E(x)[weights == 0]$color <- "blue"
E(x)[weights < 0]$color <- "red"
E(x)[weights > 0]$color <- "green"
return(E(x)$color)
} #The function now returns a list of colors conditional on statements
#Pass the function to the "values" argument of "set_edge_attr"
graphs_colored <- graphs %>% map(., function(x) set_edge_attr(x, "color", value = edgecolor(x)))
par(mfrow = c(2,2), mar = c(0,0,0,0))
shortPlots <- graphs_colored %>%
map(., function(x) plot(x,
layout=layout.circle,
vertex.size=20,
edge.curved=TRUE,
edge.arrow.size = 0.5))
Got it! Thanks to #paqmo for suggestions. I needed to use mutate to redefine the graph list-column variable.
edgecolor <- function(x) {
E(x)[weights == 0]$color <- "#FF000000"
E(x)[weights < 0]$color <- "red"
E(x)[weights > 0]$color <- "green"
return(E(x)$color)
}
sampleColored <- sampleGraphs %>% mutate(graph = map(graph, function(x)
set_edge_attr(x, "color", value = edgecolor(x))))
par(mfrow = c(2,2), mar = c(0,0,0,0))
samplePlots <- sampleColored %>%
mutate(plots = map(graph, function(x) plot(x, layout=layout.circle,
vertex.size=20,
edge.curved=TRUE)))
generates the same image as #paqmo.

Calculating medians via dplyr vs. aggregate in R

Hello: I am getting slightly different medians for a data set that looks like the one created below when I produce them via dplyr/ tidyr versus aggregate. Can anyone explain the difference? Thank you!
#dataset
out2<-structure(list(d3 = structure(c(1L, 2L, NA, NA, 1L, 1L, NA,
2L,NA,3L,1L, NA, NA, 1L, 3L, NA, 1L, 2L, 3L, 2L, 1L, 3L, 2L, 3L, 1L), .Label
= c("Professional journalist", "Elected politician", "Online blogger"),
class = "factor"), Accessible = c(3, 5, 2,NA, 1, 2, NA, 3, NA, 4, 2, 5, NA,
3, 4, NA, 2, NA, 3, 4, 4, 4,2, 2, 2), Information = c(1, 2, 1, NA, 4, 1, NA,
2, NA, 2, 1, 1, NA, 4, 1, NA, 1, 1, 1, 3, 1, 3, 3, 4, 1), Responsive = c(5,
4, 6, NA, 2, 3, NA, 1, NA, 5, 4, 4, NA, 6, 3, NA, 4, NA, 2, 2, 6, 2, 1, 1,
3), Debate = c(6, 3, 4, NA, 3, 4, NA, 5, NA, 6, 5,6, NA, 1, 5, NA, 5, 2, NA,
1, 5, 6, 5, 5, 7), Officials = c(2,1, 5, NA, 5, 5, NA, 6, NA, 3, 6, 2, NA, 2,
2, NA, 6, 3, NA, 5,2, 5, 4, 6, 5), Social = c(7, 6, 7, NA, 7, 7, NA, 4, NA,
7, 7,
7, NA, 7, 7, NA, 7, NA, NA, 7, 7, 1, 6, 7, 6), `Trade-Offs` = c(4,
7, 3, NA, 6, 6, NA, 7, NA, 1, 3, 3, NA, 5, 6, NA, 3, NA, NA,
6, 3, 7, 7, 3, 4)), .Names = c("d3", "Accessible", "Information",
"Responsive", "Debate", "Officials", "Social", "Trade-Offs"), row.names =
c(171L, 126L, 742L, 379L, 635L, 3L, 303L, 419L, 324L, 97L, 758L, 136L,
770L, 405L, 101L, 674L, 386L, 631L, 168L, 590L, 731L, 387L, 673L, 208L,
728L), class = "data.frame")
#Find Medians via tidyR and dplyr
test<-out2 %>%
gather(variable, value, -1) %>%
filter(is.na(d3)==FALSE)%>%
group_by(d3, variable) %>%
summarise(value=median(value, na.rm=TRUE))
#dataframe
test<-data.frame(test)
#find Medians via aggregate
test2<-aggregate(.~d3, data=out2, FUN=median, na.rm=TRUE)
#Gather for plotting
test2<-test2 %>%
gather(variable, value, -d3)
#Plot Medians via tidyr
ggplot(test, aes(x=d3, y=value,
group=d3))+facet_wrap(~variable)+
geom_bar(stat='identity')+labs(title='Medians via TidyR')
#Plot Medians Via aggregate
ggplot(test2, aes(x=d3, y=value,
group=d3))+facet_wrap(~variable)+geom_bar(stat='identity')+
labs(title='Medians via Aggregate')
#Compare Debate, Information and Responsive
The results produced by aggregate are different because aggregate is dropping entire rows where any value is NA, even if some variables in that row contain data.
You can correct this by specifying a value for the na.action argument, as described in this accepted answer. Here it would be:
test2<-aggregate(.~d3, data=out2, FUN=median, na.rm = TRUE, na.action=NULL)
test2<-test2 %>%
gather(variable, value, -d3)
Confirm that the results are the same:
identical(as.data.frame(test %>% arrange(d3, variable, value)),
as.data.frame(test2 %>% arrange(d3, variable, value)))
[1] TRUE

Resources