Related
This question already has answers here:
Calculate group mean, sum, or other summary stats. and assign column to original data
(4 answers)
Closed 1 year ago.
I am trying to produce a col sum where there are two grouping variables: p_id and stimuli_length.
df <- structure(list(p_id = c("p_id3", "p_id3", "p_id3", "p_id3", "p_id3",
"p_id3", "p_id3", "p_id3", "p_id3", "p_id4", "p_id4", "p_id4",
"p_id4", "p_id4", "p_id4", "p_id4", "p_id4", "p_id4", "p_id4",
"p_id4", "p_id5", "p_id5", "p_id5", "p_id5"), stimuli_length = c(4L,
4L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 4L, 4L, 5L, 5L, 6L,
6L, 6L, 6L, 7L, 7L, 7L, 7L), value = c(1, 1, 1, 1, 0, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1), sum = c(2, 2,
2, 2, 3, 3, 3, 3, 1, 3, 3, 3, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3,
3), expected_result = c(2, 2, 2, 2, 3, 3, 3, 3, 1, 3, 3, 3, 2,
2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3)), row.names = c(NA, -24L), class = c("tbl_df",
"tbl", "data.frame"))
For each p_id, and each stimuli_length, I would like the sum of the value column.
So, for p_id3 and stimuli_length == 4, the sum is 1 + 1 = 2.
My attempt does not give the correct sum:
res <- df %>% group_by(p_id) %>% group_by(stimuli_length) %>%
select(value) %>% rowwise() %>% mutate(sum = sum(value))
We don't need rowwise - as rowwise does a grouping by each row and there is only one observation when we do the sum. Instead, do a single group_by expression by adding the 'p_id', 'stimuli_length' and mutate directly to get the sum of 'value'
library(dplyr)
df %>%
group_by(p_id, stimuli_length) %>%
mutate(Sum = sum(value)) %>%
ungroup
I have this working DF :
lf3 = structure(list(session_id = c(1L, 1L, 1L, 2L, 3L, 3L, 4L, 4L,
5L, 6L, 6L, 6L, 6L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 10L,
11L, 12L, 12L, 12L, 13L, 13L, 14L), userId = c(1, 1, 1, 2, 3,
3, 4, 4, 5, 6, 6, 6, 6, 7, 7, 8, 8, 8, 8, 9, 9, 9, 10, 11, 12,
12, 12, 13, 13, 14), datetime = structure(c(1457029336, 1457029337,
1457029340, 1457029596, 1457030783, 1457030784, 1457030918, 1457030920,
1457031472, 1457031674, 1457031675, 1457031677, 1457031678, 1457032116,
1457032117, 1457032963, 1457032964, 1457032966, 1457032967, 1457033246,
1457033247, 1457033249, 1457033359, 1457033530, 1457034351, 1457034353,
1457034356, 1457034623, 1457034624, 1457035397), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), referer = c(22, 2, 6, 20, 21, 6, 23,
17, 21, 22, 11, 10, 6, 24, 10, 6, 25, 26, 27, 8, 6, 1, 6, 28,
29, 30, 31, 6, 30, 32), request = c(1, 2, 3, 4, 5, 6, 7, 8, 5,
9, 6, 10, 10, 9, 6, 11, 9, 12, 13, 8, 3, 9, 3, 14, 13, 11, 15,
6, 6, 16)), .Names = c("session_id", "userId", "datetime", "referer",
"request"), row.names = c(NA, 30L), class = "data.frame")
Now i am looking for certain requests ids to match if they are in a session or not and if they do return their position index in dataframe
I am trying this code but somehow not getting any success :
lf3 %>% group_by(session_id) %>% do(.,match(6,.,request))
checking for request=6 is in session or not and return positional index(doing foreach session)
I think this code will print your desired output:-
"as.numeric(rownames(lf3[ifelse(lf3$session_id==6,T,F),]))"
This is one option:
lf3 %>% group_by(session_id) %>%
summarise(do_match = 6 %in% request,
act_match = ifelse(6 %in% request,which(request==6),0))
Mind that a request might appear twice in a session (this captures only the first)
I am encountering an error when trying to group my data based on categories of a variable and I am not sure why it is happening because I have used the two most widely recommended methods, subset(dataframe, variable==X) and dataframe[dataframe4variable ==X], successfully with past data sets and just now using the mtcars dataset.
The problem is that when I try to run my code, I get an error in which R just prints out the names of all of my variables(see below).
I am not quite sure how to "show" this problem-- any recommendations regarding what information would be useful to you all would be greatly appreciated. This problem is not reproducible with other datasets. Thank you for any advice you are able to give.
My dataset "wits" has 363 observations and 92 variables. My variable "complete" is a factor variable with four levels: "completed all", "stopped after demos", "stopped after consent", and "skipped manip bc poor id." I would like to create a new dataset made up only of participants with "completed all". I have tried these two methods:
wits_c <- wits[wits$complete=="Completed all", ]
wits_c <-subset(wits,complete=="Completed all")
Which results in the following error:
Error: Columns `Start`, `End`, `GameCode`, `workerID`, `condition`, `about`, `valid`, `consv`, `merit1`, `merit2`, `merit3`, `gender`, `gender_TEXT`, `poorid`, `choseskip`, `age`, `edu`, `race`, `race_TEXT`, `complete`, `distracted`, `happen`, `about__1`, `playread`, `thinking_1`, `thinking_2`, `thinking_3`, `thinking_4`, `thinking_5`, `thinking_6`, `thinking_7`, `text`, `logical_1`, `logical_2`, `logical_3`, `logical_4`, `controll_1`, `controll_2`, `controll_3`, `controll_4`, `controll_5`, `controll_6`, `controll_7`, `controll_8`, `controll_9`, `controll_10`, `privatesol`, `publicsol`, `privatesol_2`, `privatesol_3`, `privatesol_5`, `publicsol_2`, `publicsol_3`, `publicsol_5`, `policy_1`, `policy_2`, `policy_3`, `policy_4`, `colaction_5`, `colaction_6`, `colaction_7`, `colaction_8`, `colaction_10`, `colaction_13`, `joke`, `random`, `say`, `wits`, `wits_nb`, `neutral`, `rural_id`, `relig_id`, `prog_id`, `vignette`, `merit3R`, `policy_3R`, `policy_4R`, `controll_4R`, `controll_5R`,`co
Thank you to user Markdly for the suggestion to include the following output which provides more detailed information about my dataset:
dput(head(wits))
structure(list(Start = structure(c(1499525516, 1499516293, 1499516379,
1499516319, 1499516949, 1499516709), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), End = structure(c(1499525762, 1499518121,
1499516954, 1499517222, 1499517412, 1499517512), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), GameCode = c(2991999, 5712506, 1002944,
8916111, 3495462, 9127270), workerID = c("ACIHCWKHNFC7U", "A3UAO2LYUPO7L6",
"A8L94A9EF23BV", "A258JTYUD56LOE", "A12SJSJIUR3A23", "A1HHOCO3ZZHCJZ"
), condition = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("WITS",
"WITS No Blurb", "Neutral", "Read"), class = "factor"), about = c(2,
2, 2, 2, 2, 2), valid = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label =
c("valid responses",
"invalid responses"), class = "factor"), consv = c(4, 2, 2, 6,
4, 4), merit1 = c(5, 3, 2, 4, 6, 4), merit2 = c(4, 4, 2, 5, 5,
4), merit3 = c(3, 4, 2, 5, 4, 5), gender = structure(c(1L, 1L,
2L, 2L, 2L, 2L), .Label = c("man", "woman", "non-binary", "other"
), class = "factor"), gender_TEXT = c(NA, NA, NA, NA, NA, NA),
poorid = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("not id as poor",
"id as poor"), class = "factor"), choseskip = structure(c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), .Label = c("poor but continued", "poor and skipped"), class = "factor"),
age = c(28, 30, 33, 41, 26, 30), edu = c(5, 5, 6, 5, 3, 5
), race = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("white",
"black", "latino", "asian", "native american", "other", "multiracial"
), class = "factor"), race_TEXT = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), complete = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Completed
all",
"Stopped after demos", "Stopped after consent", "Skipped manip bc poor id"
), class = "factor"), distracted = c(4, 5, 0, 0, 0, 5), happen = c(0,
0, 0, 0, 0, 0), about__1 = c(2, 2, 2, 2, 2, 2), playread = c(1,
1, 1, 1, 1, 1), thinking_1 = c(1, 6, 5, 4, 6, 4), thinking_2 = c(4,
3, 5, 1, 5, 5), thinking_3 = c(1, 4, 7, 4, 5, 5), thinking_4 = c(6,
4, 7, 1, 5, 3), thinking_5 = c(5, 3, 6, 6, 5, 4), thinking_6 = c(4,
3, 7, 6, 6, 4), thinking_7 = c(6, 3, 7, 6, 5, 5), text = c(2,
4, 5, 4, 2, 3), logical_1 = c(1, 3, 4, 4, 4, 4), logical_2 = c(2,
4, 3, 4, 5, 4), logical_3 = c(4, 3, 5, 4, 5, 4), logical_4 = c(1,
6, 3, 4, 4, 3), controll_1 = c(6, 6, 1, 4, 4, 4), controll_2 = c(1,
3, 1, 4, 5, 5), controll_3 = c(1, 4, 3, 6, 4, 5), controll_4 = c(3,
4, 6, 3, 4, 4), controll_5 = c(6, 3, 6, 1, 4, 4), controll_6 = c(3,
3, 1, 3, 5, 4), controll_7 = c(2, 5, 1, 5, 4, 5), controll_8 = c(2,
2, 1, 4, 5, 3), controll_9 = c(6, 2, 6, 3, 5, 5), controll_10 = c(1,
3, 6, 2, 5, 3), privatesol = c(5, 5.33333333333333, 12, 8,
8.33333333333333, 8.33333333333333), publicsol = c(7.66666666666667,
2.66666666666667, 12, 2, 11.3333333333333, 8.33333333333333
), privatesol_2 = c(1, 11, 12, 11, 11, 3), privatesol_3 = c(3,
2, 12, 2, 3, 11), privatesol_5 = c(11, 3, 12, 11, 11, 11),
publicsol_2 = c(11, 3, 12, 2, 12, 11), publicsol_3 = c(11,
2, 12, 2, 11, 3), publicsol_5 = c(1, 3, 12, 2, 11, 11), policy_1 = c(1,
5, 6, 1, 4, 4), policy_2 = c(3, 2, 6, 1, 5, 4), policy_3 = c(1,
3, 2, 6, 5, 3), policy_4 = c(6, 3, 5, 6, 5, 4), colaction_5 = c(2,
5, 6, 1, 2, 5), colaction_6 = c(6, 4, 1, 6, 5, 4), colaction_7 = c(6,
2, 6, 1, 3, 4), colaction_8 = c(4, 5, 6, 1, 2, 4), colaction_10 = c(4,
3, 1, 6, 5, 4), colaction_13 = c(3, 2, 6, 1, 2, 3), joke = c(2,
2, 2, 2, 2, 2), random = c(2, 2, 2, 2, 2, 2), say = c(NA,
"Nope", NA, "This was highly biased survey.", "good survey",
"NO"), wits = c(1, 1, 1, 1, 1, 1), wits_nb = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), neutral = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), rural_id = c(0,
0, 0, 1, 0, 0), relig_id = c(0, 1, 1, 1, 1, 1), prog_id = c(0,
0, 1, 0, 1, 1), vignette = structure(c(1L, 1L, 1L, 1L, 1L,
1L), .Label = c("WITS", "WITS No Blurb", "Neutral", "Read"
), class = "factor"), merit3R = c(4, 3, 5, 2, 3, 2), policy_3R = c(6,
4, 5, 1, 2, 4), policy_4R = c(1, 4, 2, 1, 2, 3), controll_4R = c(4,
3, 1, 4, 3, 3), controll_5R = c(1, 4, 1, 6, 3, 3), controll_9R = c(1,
5, 1, 4, 2, 2), controll_10R = c(6, 4, 1, 5, 2, 4), colaction_6R = c(1,
3, 6, 1, 2, 3), colaction_10R = c(3, 4, 6, 1, 2, 3), logical = c(2,
4, 3.75, 4, 4.5, 3.75), cognition = structure(c(1, 6, 5,
4, 6, 4, 4, 3, 5, 1, 5, 5, 1, 4, 7, 4, 5, 5, 6, 4, 7, 1,
5, 3, 5, 3, 6, 6, 5, 4, 4, 3, 7, 6, 6, 4, 6, 3, 7, 6, 5,
5), .Dim = 6:7), engage = c(5, 3, 6.66666666666667, 6, 5.33333333333333,
4.33333333333333), pertake = c(3.66666666666667, 3.66666666666667,
6.33333333333333, 2, 5, 4.33333333333333), policy = c(2.75,
3.75, 4.75, 1, 3.25, 3.75), colaction = c(3.16666666666667,
3.5, 6, 1, 2.16666666666667, 3.66666666666667), controllability = c(2.7,
3.9, 1.2, 4.5, 3.7, 3.8), completebi = structure(c(1L, 1L,
1L, 1L, 1L, 1L), .Label = c("Completed all", "Stopped after demos"
), class = "factor"), gender2 = structure(c(1L, 1L, 2L, 2L,
2L, 2L), .Label = c("man", "woman"), class = "factor")), .Names = c("Start", "End", "GameCode", "workerID", "condition", "about", "valid",
"consv", "merit1", "merit2", "merit3", "gender", "gender_TEXT",
"poorid", "choseskip", "age", "edu", "race", "race_TEXT", "complete",
"distracted", "happen", "about__1", "playread", "thinking_1",
"thinking_2", "thinking_3", "thinking_4", "thinking_5", "thinking_6",
"thinking_7", "text", "logical_1", "logical_2", "logical_3",
"logical_4", "controll_1", "controll_2", "controll_3", "controll_4",
"controll_5", "controll_6", "controll_7", "controll_8", "controll_9",
"controll_10", "privatesol", "publicsol", "privatesol_2", "privatesol_3",
"privatesol_5", "publicsol_2", "publicsol_3", "publicsol_5",
"policy_1", "policy_2", "policy_3", "policy_4", "colaction_5",
"colaction_6", "colaction_7", "colaction_8", "colaction_10",
"colaction_13", "joke", "random", "say", "wits", "wits_nb", "neutral",
"rural_id", "relig_id", "prog_id", "vignette", "merit3R", "policy_3R",
"policy_4R", "controll_4R", "controll_5R", "controll_9R", "controll_10R",
"colaction_6R", "colaction_10R", "logical", "cognition", "engage",
"pertake", "policy", "colaction", "controllability", "completebi",
"gender2"), row.names = c(NA, 6L), class = c("tbl_df", "tbl",
"data.frame"))
wits$complete
#[1] Completed \nall Completed \nall Completed \nall Completed \nall Completed \nall Completed \nall
#Levels: Completed \nall Stopped after demos Stopped after consent Skipped manip bc poor id
You can see it's "Completed \nall" not "Completed all" in your wits data frame.
##So, you just use which function to subset your wits data frame.
which(wits$complete == "Completed \nall")
#[1] 1 2 3 4 5 6 ## This is index of the row. You put this to subet you data frame as below and you are good to go.
## So, this will subset your data frame
wits[which(wits$complete == "Completed \nall"),]
I have a dataframe with a series of igraph objects in list-column format. I would like to conditionally set the edge color attribute.
I've included the dput output for a sample version of the actual dataframe (very large, thousands of graphs) containing just three graphs. It's still long, so I've put it at the bottom of this post and I'll explain a couple of the ideas I've tried so far.
First attempt was multiple uses of mutate and map using the purrr package.
sampleColored <- sampleGraphs %>% mutate(map(graph, function(x)
E(x)[weights == 0]$color = "blue")) %>% mutate(map(graph, function(x)
E(x)[weights < 0]$color = "red")) %>% mutate(map(graph, function(x)
E(x)[weights > 0]$color = "green"))
No error messages, but the command
shortPlots <- sampleColored %>%
mutate(plots = map(graph, function(x) plot(x, layout=layout.circle,
vertex.size=20,
edge.curved=TRUE)))
produced nice graphs with all edges colored grey.
Likewise with my second attempt where I created an edgeColor function and used a single map call.
edgecolor <- function(x) {
E(x)[weights == 0]$color <- "blue"
E(x)[weights < 0]$color <- "red"
E(x)[weights > 0]$color <- "green"
return(E(x))
}
sampleColored <- sampleGraphs %>% mutate(map(graph, function(x) edgecolor(x)))
No error and grey edges. Dropping the mutate command gives rise to the error message:
Error in as.numeric(n): cannot coerce type 'closure' to vector of type 'double'
I'm confident that this is possible and I simply don't have the understanding to get to the correct syntax. Any suggestions will be appreciated. Thanks for looking.
Here's the sampleGraph dput:
sampleGraphs <- structure(list(ID = 997:1000, graph = list(structure(list(5,
TRUE, c(0, 1, 2, 0, 3, 4, 1, 2, 4, 3, 0, 4, 2, 3, 0, 1, 3,
1, 4, 2), c(1, 0, 0, 4, 1, 1, 4, 3, 0, 2, 3, 2, 1, 4, 2,
3, 0, 2, 3, 4), c(0, 14, 10, 3, 1, 17, 15, 6, 2, 12, 7, 19,
16, 4, 9, 13, 8, 5, 11, 18), c(1, 2, 16, 8, 0, 12, 4, 5,
14, 17, 9, 11, 10, 15, 7, 18, 3, 6, 19, 13), c(0, 4, 8, 12,
16, 20), c(0, 4, 8, 12, 16, 20), list(c(1, 0, 1), structure(list(), .Names = character(0)),
structure(list(name = c("3", "0", "2", "4", "1")), .Names = "name"),
structure(list(weights = c(3L, -4L, 4L, -3L, 43L, 8L,
4L, 14L, 1L, 55L, 2L, 22L, 26L, 64L, 9L, 2L, 13L, -12L,
25L, 16L)), .Names = "weights")), <environment>), class = "igraph"),
structure(list(5, TRUE, c(0, 1, 2, 2, 1, 3, 1, 3, 4, 3, 3,
0, 4, 0, 4, 4, 2, 1, 2, 0), c(3, 3, 4, 0, 2, 1, 4, 2, 0,
4, 0, 2, 1, 4, 2, 3, 3, 0, 1, 1), c(19, 11, 0, 13, 17, 4,
1, 6, 3, 18, 16, 2, 10, 5, 7, 9, 8, 12, 14, 15), c(17, 3,
10, 8, 19, 18, 5, 12, 11, 4, 7, 14, 0, 1, 16, 15, 13, 6,
2, 9), c(0, 4, 8, 12, 16, 20), c(0, 4, 8, 12, 16, 20), list(
c(1, 0, 1), structure(list(), .Names = character(0)),
structure(list(name = c("2", "0", "1", "3", "4")), .Names = "name"),
structure(list(weights = c(4L, -4L, 25L, 22L, 4L, 3L,
2L, -3L, 55L, 2L, 9L, 16L, 43L, 14L, 64L, 13L, 1L, -12L,
8L, 26L)), .Names = "weights")), <environment>), class = "igraph"),
structure(list(5, TRUE, c(0, 1, 2, 3, 4, 0, 1, 2, 1, 3, 1,
3, 2, 4, 2, 4, 0, 0, 3, 4), c(1, 4, 3, 4, 0, 4, 2, 0, 0,
2, 3, 1, 4, 1, 1, 2, 3, 2, 0, 3), c(0, 17, 16, 5, 8, 6, 10,
1, 7, 14, 2, 12, 18, 11, 9, 3, 4, 13, 15, 19), c(8, 7, 18,
4, 0, 14, 11, 13, 17, 6, 9, 15, 16, 10, 2, 19, 5, 1, 12,
3), c(0, 4, 8, 12, 16, 20), c(0, 4, 8, 12, 16, 20), list(
c(1, 0, 1), structure(list(), .Names = character(0)),
structure(list(name = c("4", "0", "3", "2", "1")), .Names = "name"),
structure(list(weights = c(43L, 4L, 9L, 16L, 25L, 64L,
-4L, 2L, 2L, 4L, -11L, 26L, -3L, 8L, 3L, 1L, 55L, 13L,
14L, 22L)), .Names = "weights")), <environment>), class = "igraph"),
structure(list(5, TRUE, c(0, 1, 2, 3, 4, 1, 3, 2, 4, 0, 1,
3, 2, 4, 0, 0, 2, 4, 1, 3), c(4, 4, 4, 1, 2, 0, 2, 3, 0,
3, 2, 0, 1, 1, 2, 1, 0, 3, 3, 4), c(15, 14, 9, 0, 5, 10,
18, 1, 16, 12, 7, 2, 11, 3, 6, 19, 8, 13, 4, 17), c(5, 16,
11, 8, 15, 12, 3, 13, 14, 10, 6, 4, 9, 18, 7, 17, 0, 1, 2,
19), c(0, 4, 8, 12, 16, 20), c(0, 4, 8, 12, 16, 20), list(
c(1, 0, 1), structure(list(), .Names = character(0)),
structure(list(name = c("1", "4", "0", "2", "3")), .Names = "name"),
structure(list(weights = c(1L, 13L, -4L, 14L, 3L, 64L,
26L, -11L, -3L, 22L, 43L, 16L, 2L, 2L, 8L, 25L, 4L, 8L,
55L, 4L)), .Names = "weights")), <environment>), class = "igraph"))), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L), .Names = c("ID",
"graph"))
Using set_edge_attr rather than igraph's idiomatic E() edge function helps. I had to revise the sampleGraph list to a simple list of graphs, upgraded to the newer version of igraph, but this works:
graphs <- sampleGraphs$graph
graphs <- lapply(graphs, function(x) upgrade_graph(x)) #making a simple list of graphs
edgecolor <- function(x) {
E(x)[weights == 0]$color <- "blue"
E(x)[weights < 0]$color <- "red"
E(x)[weights > 0]$color <- "green"
return(E(x)$color)
} #The function now returns a list of colors conditional on statements
#Pass the function to the "values" argument of "set_edge_attr"
graphs_colored <- graphs %>% map(., function(x) set_edge_attr(x, "color", value = edgecolor(x)))
par(mfrow = c(2,2), mar = c(0,0,0,0))
shortPlots <- graphs_colored %>%
map(., function(x) plot(x,
layout=layout.circle,
vertex.size=20,
edge.curved=TRUE,
edge.arrow.size = 0.5))
Got it! Thanks to #paqmo for suggestions. I needed to use mutate to redefine the graph list-column variable.
edgecolor <- function(x) {
E(x)[weights == 0]$color <- "#FF000000"
E(x)[weights < 0]$color <- "red"
E(x)[weights > 0]$color <- "green"
return(E(x)$color)
}
sampleColored <- sampleGraphs %>% mutate(graph = map(graph, function(x)
set_edge_attr(x, "color", value = edgecolor(x))))
par(mfrow = c(2,2), mar = c(0,0,0,0))
samplePlots <- sampleColored %>%
mutate(plots = map(graph, function(x) plot(x, layout=layout.circle,
vertex.size=20,
edge.curved=TRUE)))
generates the same image as #paqmo.
Hello: I am getting slightly different medians for a data set that looks like the one created below when I produce them via dplyr/ tidyr versus aggregate. Can anyone explain the difference? Thank you!
#dataset
out2<-structure(list(d3 = structure(c(1L, 2L, NA, NA, 1L, 1L, NA,
2L,NA,3L,1L, NA, NA, 1L, 3L, NA, 1L, 2L, 3L, 2L, 1L, 3L, 2L, 3L, 1L), .Label
= c("Professional journalist", "Elected politician", "Online blogger"),
class = "factor"), Accessible = c(3, 5, 2,NA, 1, 2, NA, 3, NA, 4, 2, 5, NA,
3, 4, NA, 2, NA, 3, 4, 4, 4,2, 2, 2), Information = c(1, 2, 1, NA, 4, 1, NA,
2, NA, 2, 1, 1, NA, 4, 1, NA, 1, 1, 1, 3, 1, 3, 3, 4, 1), Responsive = c(5,
4, 6, NA, 2, 3, NA, 1, NA, 5, 4, 4, NA, 6, 3, NA, 4, NA, 2, 2, 6, 2, 1, 1,
3), Debate = c(6, 3, 4, NA, 3, 4, NA, 5, NA, 6, 5,6, NA, 1, 5, NA, 5, 2, NA,
1, 5, 6, 5, 5, 7), Officials = c(2,1, 5, NA, 5, 5, NA, 6, NA, 3, 6, 2, NA, 2,
2, NA, 6, 3, NA, 5,2, 5, 4, 6, 5), Social = c(7, 6, 7, NA, 7, 7, NA, 4, NA,
7, 7,
7, NA, 7, 7, NA, 7, NA, NA, 7, 7, 1, 6, 7, 6), `Trade-Offs` = c(4,
7, 3, NA, 6, 6, NA, 7, NA, 1, 3, 3, NA, 5, 6, NA, 3, NA, NA,
6, 3, 7, 7, 3, 4)), .Names = c("d3", "Accessible", "Information",
"Responsive", "Debate", "Officials", "Social", "Trade-Offs"), row.names =
c(171L, 126L, 742L, 379L, 635L, 3L, 303L, 419L, 324L, 97L, 758L, 136L,
770L, 405L, 101L, 674L, 386L, 631L, 168L, 590L, 731L, 387L, 673L, 208L,
728L), class = "data.frame")
#Find Medians via tidyR and dplyr
test<-out2 %>%
gather(variable, value, -1) %>%
filter(is.na(d3)==FALSE)%>%
group_by(d3, variable) %>%
summarise(value=median(value, na.rm=TRUE))
#dataframe
test<-data.frame(test)
#find Medians via aggregate
test2<-aggregate(.~d3, data=out2, FUN=median, na.rm=TRUE)
#Gather for plotting
test2<-test2 %>%
gather(variable, value, -d3)
#Plot Medians via tidyr
ggplot(test, aes(x=d3, y=value,
group=d3))+facet_wrap(~variable)+
geom_bar(stat='identity')+labs(title='Medians via TidyR')
#Plot Medians Via aggregate
ggplot(test2, aes(x=d3, y=value,
group=d3))+facet_wrap(~variable)+geom_bar(stat='identity')+
labs(title='Medians via Aggregate')
#Compare Debate, Information and Responsive
The results produced by aggregate are different because aggregate is dropping entire rows where any value is NA, even if some variables in that row contain data.
You can correct this by specifying a value for the na.action argument, as described in this accepted answer. Here it would be:
test2<-aggregate(.~d3, data=out2, FUN=median, na.rm = TRUE, na.action=NULL)
test2<-test2 %>%
gather(variable, value, -d3)
Confirm that the results are the same:
identical(as.data.frame(test %>% arrange(d3, variable, value)),
as.data.frame(test2 %>% arrange(d3, variable, value)))
[1] TRUE