I have a bunch of data frames that look like this in R:
print(output[2])
Button Intensity Acc Intensity RT Time tdelta SubjectID CoupleID PrePost
1: 0 30 0 0.0 0 83325.87 0.000 1531 153 Post
2: 1 30 1 13.5 0 83362.65 36.782 1531 153 Post
3: 1 30 1 15.0 0 83376.68 14.027 1531 153 Post
4: 1 30 1 6.0 0 83392.27 15.585 1531 153 Post
5: 1 30 1 15.0 0 83398.77 6.507 1531 153 Post
print(output[1])
[[1]]
Button Intensity Acc Intensity RT Time tdelta SubjectID CoupleID PrePost
1: 0 30 0 0.0 0 77987.93 0.000 1531 153 Pre
2: 1 30 1 13.5 0 78084.57 96.639 1531 153 Pre
3: 1 30 1 15.0 0 78098.62 14.054 1531 153 Pre
4: 1 30 1 6.0 0 78114.13 15.508 1531 153 Pre
5: 1 30 1 15.0 0 78120.67 6.537 1531 153 Pre
I want to combine them into one big data frame that has the following logic and format:
SubjectID CoupleID PrePost Miss1RT Miss2RT Miss3RT Hit1RT Hit2RT Hit3RT
1531 153 Post 0.00 NA NA NA 36.78 14.027
1531 153 Pre 0.00 NA NA NA 96.638 14.054
if Button == 0, then it's a Miss, if it ==1, then it's a Hit. So, it should be something like:
for row in output[i].rows:
if Button ==0:
Miss1RT ==tdelta
elif Button ==1;
Miss1RT =='NA'
and then a flipped version where if Button is 1, Hit[i]RT is tdelta or else 'NA'.
There are 26 lines per data frame and each row is either a hit or a miss so there will be 26 Miss and 26 Hit columns and each SubjectID gets two rows - one for Pre and one for Post. So the column headers for the final output will be:
SubjectID CoupleID PrePost Miss1RT Miss2RT ...Miss26RT Hit1RT Hit2RT ... Hit26RT
I'm new to R and struggling with the proper syntax.
Something like this should work:
#Get data in structure OP has
output <- list(pre, post)
output2 <- lapply(output, function(x) cbind(x, num = paste0(1:nrow(x), "RT")))
pre_post <- do.call("rbind", output2)
#Perform actual calculations
pre_post$miss <- ifelse(pre_post$Button == 0, pre_post$tdelta, NA)
pre_post$hit <- ifelse(pre_post$Button == 1, pre_post$tdelta, NA)
pre_post_melted <- melt(pre_post, id.vars = c("SubjectID", "CoupleID", "num", "PrePost"), measure.vars = c("hit","miss"))
pre_post_res <- dcast(pre_post_melted, SubjectID + CoupleID + PrePost ~ variable + num, sep = "")
pre_post_res
#SubjectID CoupleID PrePost hit_1RT hit_2RT hit_3RT hit_4RT hit_5RT miss_1RT miss_2RT miss_3RT miss_4RT miss_5RT
#1 1531 153 Post NA 36.782 14.027 15.585 6.507 0 NA NA NA NA
#2 1531 153 Pre NA 96.639 14.054 15.508 6.537 0 NA NA NA NA
We transpose the data to dynamically create all the variables we want. We also stack the data to avoid repeated steps.
Data:
pre <- structure(list(Button = c(0L, 1L, 1L, 1L, 1L), Intensity = c(30L,
30L, 30L, 30L, 30L), Acc = c(0L, 1L, 1L, 1L, 1L), Intensity = c(0,
13.5, 15, 6, 15), RT = c(0L, 0L, 0L, 0L, 0L), Time = c(77987.93,
78084.57, 78098.62, 78114.13, 78120.67), tdelta = c(0, 96.639,
14.054, 15.508, 6.537), SubjectID = c(1531L, 1531L, 1531L, 1531L,
1531L), CoupleID = c(153L, 153L, 153L, 153L, 153L), PrePost = c("Pre",
"Pre", "Pre", "Pre", "Pre")), .Names = c("Button", "Intensity",
"Acc", "Intensity", "RT", "Time", "tdelta", "SubjectID", "CoupleID",
"PrePost"), row.names = c(NA, -5L), class = "data.frame")
post <- structure(list(Button = c(0L, 1L, 1L, 1L, 1L), Intensity = c(30L,
30L, 30L, 30L, 30L), Acc = c(0L, 1L, 1L, 1L, 1L), Intensity = c(0,
13.5, 15, 6, 15), RT = c(0L, 0L, 0L, 0L, 0L), Time = c(83325.87,
83362.65, 83376.68, 83392.27, 83398.77), tdelta = c(0, 36.782,
14.027, 15.585, 6.507), SubjectID = c(1531L, 1531L, 1531L, 1531L,
1531L), CoupleID = c(153L, 153L, 153L, 153L, 153L), PrePost = c("Post",
"Post", "Post", "Post", "Post")), .Names = c("Button", "Intensity",
"Acc", "Intensity", "RT", "Time", "tdelta", "SubjectID", "CoupleID",
"PrePost"), row.names = c(NA, -5L), class = "data.frame")
Related
I have a patient data set i need to drop the rows after the first occurrence of disease column. for instance
ID Date Disease
123 02-03-2012 0
123 03-03-2013 1
123 04-03-2014 0
321 03-03-2015 1
423 06-06-2016 1
423 07-06-2017 1
543 08-05-2018 1
543 09-06-2019 0
645 08-09-2019 0
and the expected output i want
ID Date Disease
123 02-03-2012 0
123 03-03-2013 1
321 03-03-2015 1
423 06-06-2016 1
543 08-05-2018 1
One way with dplyr select rows till first occurrence of 1 for each ID.
library(dplyr)
df %>% group_by(ID) %>% filter(row_number() <= which(Disease == 1)[1])
# ID Date Disease
# <int> <fct> <int>
#1 123 02-03-2012 0
#2 123 03-03-2013 1
#3 321 03-03-2015 1
#4 423 06-06-2016 1
#5 543 08-05-2018 1
We can also use slice
df %>% group_by(ID) %>% slice(if(any(Disease == 1)) 1:which.max(Disease) else 0)
data
df <- structure(list(ID = c(123L, 123L, 123L, 321L, 423L, 423L, 543L,
543L, 645L), Date = structure(c(1L, 2L, 4L, 3L, 5L, 6L, 7L, 9L,
8L), .Label = c("02-03-2012", "03-03-2013", "03-03-2015", "04-03-2014",
"06-06-2016", "07-06-2017", "08-05-2018", "08-09-2019", "09-06-2019"
), class = "factor"), Disease = c(0L, 1L, 0L, 1L, 1L, 1L, 1L,
0L, 0L)), class = "data.frame", row.names = c(NA, -9L))
I have no idea why don't have the last line 645 08-09-2019 0 in your expected result. The first occurrence of disease column for ID 645 has not appeared yet, so I guess you might have missed it in your expected result.
Based on my guess above, maybe you can try the base R solution below, using subset + ave
dfout <- subset(df,!!ave(Disease,ID,FUN = function(v) !duplicated(cumsum(v)>0)))
such that
> dfout
ID Date Disease
1 123 02-03-2012 0
2 123 03-03-2013 1
4 321 03-03-2015 1
5 423 06-06-2016 1
7 543 08-05-2018 1
9 645 08-09-2019 0
DATA
df <- structure(list(ID = c(123L, 123L, 123L, 321L, 423L, 423L, 543L,
543L, 645L), Date = c("02-03-2012", "03-03-2013", "04-03-2014",
"03-03-2015", "06-06-2016", "07-06-2017", "08-05-2018", "09-06-2019",
"08-09-2019"), Disease = c(0L, 1L, 0L, 1L, 1L, 1L, 1L, 0L, 0L
)), class = "data.frame", row.names = c(NA, -9L))
What I am trying to do is close to propensity score matching (or causal matching, MatchIt) but not quite the same.
I am simply interested in finding and gathering together the closest (pairwise) observations from a dataset with mixed variables (categorical and numerical).
The dataset looks like this:
id child age edu y
1 11011209 0 69 some college 495
2 11011212 0 44 secondary/primary 260
3 11011213 1 40 some college 175
4 11020208 1 47 secondary/primary 0
5 11020212 1 50 secondary/primary 25
6 11020310 0 65 secondary/primary 525
7 11020315 1 43 college 0
8 11020316 1 41 secondary/primary 5
9 11031111 0 49 secondary/primary 275
10 11031116 1 42 secondary/primary 0
11 11031119 0 32 college 425
12 11040801 1 38 secondary/primary 0
13 11040814 0 52 some college 260
14 11050109 0 59 some college 405
15 11050111 1 35 secondary/primary 20
16 11050113 0 51 secondary/primary 40
17 11051001 1 38 college 165
18 11051004 1 36 college 10
19 11051011 0 63 secondary/primary 455
20 11051018 0 44 college 40
What I want is to match the variables {child, age, edu} but not y (nor id).
Because I use a dataset with mixed variables I can use the gower distance
library(cluster)
# test on first ten observations
dt = dt[1:10, ]
# gower distance
ddmen = daisy(dt[,-c(1,5)], metric = 'gower')
Now, I want to retrieve the closest observations
mg = as.matrix(ddmen)
mgg = mg %>% melt() %>% group_by(Var2) %>% filter(value != 0) %>% mutate(m =
min(value)) %>% mutate(closest = Var1[m == value]) %>% as.data.frame()
close = mgg %>% dplyr::select(Var2, closest, dis = m) %>% distinct()
close gives me
Var2 closest dis
1 1 6 0.37931034
2 2 9 0.05747126
3 3 8 0.34482759
4 4 5 0.03448276
5 5 4 0.03448276
6 6 9 0.18390805
7 7 10 0.34482759
8 8 10 0.01149425
9 9 2 0.05747126
10 10 8 0.01149425
I can merge close to my original data
dt$id = 1:10
dt2 = merge(dt, close, by.x = 'id', by.y = 'Var2', all = T)
Then, bind it
vlist = vector('list', 10)
for(i in 1:10){
vlist[[i]] = dt2[ c( which(dt2$id == i), dt2$closest[dt2$id == i] ), ] %>%
mutate(p = i)
}
bind_rows(vlist)
and get
id child age edu y closest dis p
1 1 0 69 some college 495 6 0.37931034 1
2 6 0 65 secondary/primary 525 9 0.18390805 1
3 2 0 44 secondary/primary 260 9 0.05747126 2
4 9 0 49 secondary/primary 275 2 0.05747126 2
...
p then is the identifier of the matched pairs, based on id. So, you can notice that individuals can be in different pairs (because the closest matching of 1 on 2 is not necessarily symmetrical, 2 might have another closest match than 1).
Questions
First, there is a little bug in the code here:
mgg = mg %>% melt() %>% group_by(Var2) %>% filter(value != 0) %>% mutate(m =
min(value)) %>% mutate(closest = Var1[m == value]) %>% as.data.frame()
I get this error message Column closest must be length 19 (the group size) or one, not 2
The code works for 10 observations but not for 20 (complete dataset provided here).
Why?
Second, is there a package available to do this automatically?
dt = structure(list(id = c(11011209L, 11011212L, 11011213L, 11020208L,
11020212L, 11020310L, 11020315L, 11020316L, 11031111L, 11031116L,
11031119L, 11040801L, 11040814L, 11050109L, 11050111L, 11050113L,
11051001L, 11051004L, 11051011L, 11051018L), child = structure(c(1L,
1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L,
2L, 1L, 1L), .Label = c("0", "1"), class = "factor"), age = c(69L,
44L, 40L, 47L, 50L, 65L, 43L, 41L, 49L, 42L, 32L, 38L, 52L, 59L,
35L, 51L, 38L, 36L, 63L, 44L), edu = structure(c(3L, 2L, 3L,
2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 3L, 3L, 2L, 2L, 1L, 1L, 2L,
1L), .Label = c("college", "secondary/primary", "some college"
), class = "factor"), y = c(495, 260, 175, 0, 25, 525, 0, 5,
275, 0, 425, 0, 260, 405, 20, 40, 165, 10, 455, 40)), class = "data.frame",
.Names = c("id",
"child", "age", "edu", "y"), row.names = c(NA, -20L))
I have a dataset which is right censored containing information of life times and different types of deaths for a given sample and I want to produce a plot of a survival curve (with the actual values that would be calculated from the sample and not from a model estimation) with the different types of death as a stacked area chart, something like this:
How can I accomplish this in R?
The dataset would look something like this:
death type time event
1 Type3 81 1
2 NA 868 0
3 Type3 1022 1
4 NA 868 0
5 NA 868 0
6 NA 868 0
7 NA 868 0
8 NA 887 0
9 Type3 156 1
10 NA 868 0
11 NA 868 0
12 NA 868 0
13 Type3 354 1
14 Type3 700 1
15 Type3 632 1
16 NA 868 0
17 Type1 308 1
18 NA 1001 0
19 NA 1054 0
20 NA 1059 0
21 Type3 120 1
22 NA 732 0
23 Type3 543 1
24 Type1 379 1
25 NA 613 0
26 NA 1082 0
27 Type3 226 1
28 Type2 1 0
29 NA 976 0
30 NA 1000 0
31 NA 706 0
32 NA 1015 0
33 Type3 882 1
34 NA 1088 0
35 NA 642 0
36 Type3 953 1
37 NA 1068 0
38 NA 819 0
39 NA 1029 0
40 Type3 34 1
41 NA 1082 0
42 Type3 498 1
43 NA 923 0
44 NA 1041 0
45 Type3 321 1
46 NA 557 0
47 NA 628 0
48 Type3 197 1
49 Type3 155 1
50 NA 955 0
Where death type with NA indicates censored data, time is the time of death or time of censoring, and event is 1 for those who are dead and 0 for those who are censored. (This is the format required by 'survfit' but I also have it with actual start and end times as dates)
(Now, with only 50 points it wouldn't be possible to construct such a curve, but the data has a lot more rows that wouldn't fit here).
It's an ugly bit of code, but it gets the idea in. I didn't take the time to figure out how to add the legend. Please also note that this kind of figure, while interesting in concept, isn't necessarily going to mirror a KM curve. To be honest, if you're going to present the data this way, it makes more sense to do it as stacked bars at fixed time points.
Please note, I'm pretty sure there are some flaws lurking in this code. It comes with no warranty, but might get you started.
SurvData <- structure(list(row.names = c("", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", ""), death = 1:50, type = c("Type3",
NA, "Type3", NA, NA, NA, NA, NA, "Type3", NA, NA, NA, "Type3",
"Type3", "Type3", NA, "Type1", NA, NA, NA, "Type3", NA, "Type3",
"Type1", NA, NA, "Type3", "Type2", NA, NA, NA, NA, "Type3", NA,
NA, "Type3", NA, NA, NA, "Type3", NA, "Type3", NA, NA, "Type3",
NA, NA, "Type3", "Type3", NA), time = c(81L, 868L, 1022L, 868L,
868L, 868L, 868L, 887L, 156L, 868L, 868L, 868L, 354L, 700L, 632L,
868L, 308L, 1001L, 1054L, 1059L, 120L, 732L, 543L, 379L, 613L,
1082L, 226L, 1L, 976L, 1000L, 706L, 1015L, 882L, 1088L, 642L,
953L, 1068L, 819L, 1029L, 34L, 1082L, 498L, 923L, 1041L, 321L,
557L, 628L, 197L, 155L, 955L), event = c(1L, 0L, 1L, 0L, 0L,
0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 1L,
0L, 1L, 1L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 1L, 0L,
0L, 0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L, 1L, 1L, 0L)), .Names = c("row.names",
"death", "type", "time", "event"), class = "data.frame", row.names = c(NA,
-50L))
library(dplyr)
library(zoo)
library(RColorBrewer)
SurvDataSummary <-
arrange(SurvData, time, type) %>%
mutate(type = ifelse(is.na(type), "Alive", type)) %>%
group_by(time) %>%
#* Count the number of each type at each time point
summarise(n_at_time = n(),
alive_at_time = sum(type == "Alive"),
type1_at_time = sum(type == "Type1"),
type2_at_time = sum(type == "Type2"),
type3_at_time = sum(type == "Type3")) %>%
ungroup() %>%
mutate(n_alive = sum(n_at_time) - cumsum(lag(n_at_time, default = 0)),
#* Proportion of each type
p_type1_at_time = type1_at_time / n_alive,
p_type2_at_time = type2_at_time / n_alive,
p_type3_at_time = type3_at_time / n_alive,
#* convert 0 to NA
p_type1_at_time = ifelse(p_type1_at_time == 0, NA, p_type1_at_time),
p_type2_at_time = ifelse(p_type2_at_time == 0, NA, p_type2_at_time),
p_type3_at_time = ifelse(p_type3_at_time == 0, NA, p_type3_at_time),
#* Back fill NAs with last known value
p_type1_at_time = na.locf(p_type1_at_time, FALSE),
p_type2_at_time = na.locf(p_type2_at_time, FALSE),
p_type3_at_time = na.locf(p_type3_at_time, FALSE),
#* make leading NAs 0
p_type1_at_time = ifelse(is.na(p_type1_at_time), 0, p_type1_at_time),
p_type2_at_time = ifelse(is.na(p_type2_at_time), 0, p_type2_at_time),
p_type3_at_time = ifelse(is.na(p_type3_at_time), 0, p_type3_at_time),
#* Calculate cumulative proportions
p_alive_at_time = 1 - p_type1_at_time - p_type2_at_time - p_type3_at_time,
cump_type1_at_time = p_alive_at_time + p_type1_at_time,
cump_type2_at_time = cump_type1_at_time + p_type2_at_time,
cump_type3_at_time = cump_type2_at_time + p_type3_at_time,
#* Get the following time for using geom_rect
next_time = lead(time)) %>%
pal <- brewer.pal(4, "PRGn")
ggplot(SurvDataSummary,
aes(xmin = time,
xmax = next_time)) +
geom_rect(aes(ymin = 0, ymax = p_alive_at_time), fill = pal[1]) +
geom_rect(aes(ymin = p_alive_at_time, ymax = cump_type1_at_time), fill = pal[2]) +
geom_rect(aes(ymin = cump_type1_at_time, ymax = cump_type2_at_time), fill = pal[3]) +
geom_rect(aes(ymin = cump_type2_at_time, ymax = cump_type3_at_time), fill = pal[4])
In this image, I want to arrange my table (on the left-side) to a table (on the right-side) containing 3 rows.
https://drive.google.com/file/d/0B4GgTf6nYI4YMHltWjRkeDhob3M/view?usp=sharing
That is, I have a table like this
0 3 6 9 13 16 31 64
N 100,0 98,7 96,7 97,5 91,2 15,7 0,4 0,6
N1 100,0 102,0 97,8 98,6 89,8 11,0 0,3 0,2
and want to arrange it like this:
Alkanes Time Degradation
N 0 100,0
N 3 98,7
N 6 96,7
N 9 97,5
N 13 91,2
N 16 15,7
N 31 0,4
N 64 0,6
N1 0 100,0
N1 3 102,0
N1 6 97,8
N1 9 98,6
N1 13 89,8
N1 16 11,0
N1 31 0,3
N1 64 0,2
Sample data:
x <- structure(list(X = structure(1:3, .Label = c("N", "N1", "N2"), class = "factor"), X0 = c(100, 100, 100), X3 = c(98.7, 102, 95.1), X6 = c(96.7, 97.8, 94.5), X9 = c(97.5, 98.6, 101), X13 = c(91.2, 89.8, 89.4), X16 = c(15.7, 11, 22.5), X31 = c(0.4, 0.3, 0), X64 = c(0.6, 0.2, 0)), .Names = c("X", "X0", "X3", "X6", "X9", "X13", "X16", "X31", "X64"), class = "data.frame", row.names = c(NA, -3L))
Desired output:
y <- structure(list(Alkanes = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("N", "N1", "N2"), class = "factor"), Time = c(0L, 3L, 6L, 9L, 13L, 16L, 31L, 64L, 0L, 3L, 6L, 9L, 13L, 16L, 31L, 64L, 0L, 3L, 6L, 9L, 13L, 16L, 31L, 64L), Degradation = c(100, 98.7, 96.7, 97.5, 91.2, 15.7, 0.4, 0.6, 100, 102, 97.8, 98.6, 89.8, 11, 0.3, 0.2, 100, 95.1, 94.5, 101, 89.4, 22.5, 0, 0)), .Names = c("Alkanes", "Time", "Degradation"), class = "data.frame", row.names = c(NA, -24L))
Given "x" as:
x
# X X0 X3 X6 X9 X13 X16 X31 X64
# 1 N 100 98.7 96.7 97.5 91.2 15.7 0.4 0.6
# 2 N1 100 102.0 97.8 98.6 89.8 11.0 0.3 0.2
# 3 N2 100 95.1 94.5 101.0 89.4 22.5 0.0 0.0
You can try something like:
as.data.frame(
as.table(
`dimnames<-`(as.matrix(x[-1]), list(x[[1]], gsub("X", "", names(x)[-1])))))
# Var1 Var2 Freq
# 1 N 0 100.0
# 2 N1 0 100.0
# 3 N2 0 100.0
# 4 N 3 98.7
# 5 N1 3 102.0
# 6 N2 3 95.1
# 7 N 6 96.7
# 8 N1 6 97.8
# 9 N2 6 94.5
# 10 N 9 97.5
# 11 N1 9 98.6
# 12 N2 9 101.0
# 13 N 13 91.2
# 14 N1 13 89.8
# 15 N2 13 89.4
# 16 N 16 15.7
# 17 N1 16 11.0
# 18 N2 16 22.5
# 19 N 31 0.4
# 20 N1 31 0.3
# 21 N2 31 0.0
# 22 N 64 0.6
# 23 N1 64 0.2
# 24 N2 64 0.0
From there, it's just sorting and renaming your columns, which are fairly standard operations.
You can try
library(reshape2)
names(my_data) <- sub('[^0-9]+', '', names(my_data))
m1 <- as.matrix(my_data[-1])
row.names(m1) <- my_data[,1]
d1 <- melt(m1)
d2 <- setNames(d1[order(d1$Var1),], c('Alkanes', 'Time', 'Degradation'))
Or
my_data1 <- my_data[-1]
dN <- data.frame(Alkanes= my_data[1][row(my_data1)],
Time= names(my_data1)[col(my_data1)], Degradation=unlist(my_data1))
dN1 <- dN[order(dN[,1]),]
row.names(dN1) <- NULL
I am trying to get more control over the text that appears when using add_tooltip in ggvis.
Say I want to plot 'totalinns' against 'avg' for this dataframe. Color points by 'country'.
The text I want to appear in the hovering tooltip would be: 'player', 'country', 'debutyear' 'avg'
tmp:
# player totalruns totalinns totalno totalout avg debutyear country
# 1 AG Ganteaume 112 1 0 1 112.00000 1948 WI
# 2 DG Bradman 6996 80 10 70 99.94286 1928 Aus
# 3 MN Nawaz 99 2 1 1 99.00000 2002 SL
# 4 VH Stollmeyer 96 1 0 1 96.00000 1939 WI
# 5 DM Lewis 259 5 2 3 86.33333 1971 WI
# 6 Abul Hasan 165 5 3 2 82.50000 2012 Ban
# 7 RE Redmond 163 2 0 2 81.50000 1973 NZ
# 8 BA Richards 508 7 0 7 72.57143 1970 SA
# 9 H Wood 204 4 1 3 68.00000 1888 Eng
# 10 JC Buttler 200 3 0 3 66.66667 2014 Eng
I understand that I need to make a key/id variable as ggvis only takes information supplied to it. Therefore I need to refer back to the original data. I have tried changing my text inside of my paste0() command, but still can't get it right.
tmp$id <- 1:nrow(tmp)
all_values <- function(x) {
if(is.null(x)) return(NULL)
row <- tmp[tmp$id == x$id, ]
paste0(tmp$player, tmp$country, tmp$debutyear,
tmp$avg, format(row), collapse = "<br />")
}
tmp %>% ggvis(x = ~totalinns, y = ~avg, key := ~id) %>%
layer_points(fill = ~factor(country)) %>%
add_tooltip(all_values, "hover")
Find below code to reproduce example:
tmp <- structure(list(player = c("AG Ganteaume", "DG Bradman", "MN Nawaz",
"VH Stollmeyer", "DM Lewis", "Abul Hasan", "RE Redmond", "BA Richards",
"H Wood", "JC Buttler"), totalruns = c(112L, 6996L, 99L, 96L,
259L, 165L, 163L, 508L, 204L, 200L), totalinns = c(1L, 80L, 2L,
1L, 5L, 5L, 2L, 7L, 4L, 3L), totalno = c(0L, 10L, 1L, 0L, 2L,
3L, 0L, 0L, 1L, 0L), totalout = c(1L, 70L, 1L, 1L, 3L, 2L, 2L,
7L, 3L, 3L), avg = c(112, 99.9428571428571, 99, 96, 86.3333333333333,
82.5, 81.5, 72.5714285714286, 68, 66.6666666666667), debutyear = c(1948L,
1928L, 2002L, 1939L, 1971L, 2012L, 1973L, 1970L, 1888L, 2014L
), country = c("WI", "Aus", "SL", "WI", "WI", "Ban", "NZ", "SA",
"Eng", "Eng")), .Names = c("player", "totalruns", "totalinns",
"totalno", "totalout", "avg", "debutyear", "country"), class = c("tbl_df",
"data.frame"), row.names = c(NA, -10L))
I think this is closer:
all_values <- function(x) {
if(is.null(x)) return(NULL)
row <- tmp[tmp$id == x$id, ]
paste(tmp$player[x$id], tmp$country[x$id], tmp$debutyear[x$id],
tmp$avg[x$id], sep="<br>")
}