Related
I have a data.frame (corresponding to a leaderboard) like this one:
structure(list(PJ = c(4, 4, 4, 4, 4, 4), V = c(4, 2, 2, 2, 1,
1), E = c(0, 0, 0, 0, 0, 0), D = c(0, 2, 2, 2, 3, 3), GF = c(182,
91, 92, 185, 126, 119), GC = c(84, 143, 144, 115, 141, 168),
Dif = c(98, -52, -52, 70, -15, -49), Pts = c(12, 6, 6, 6,
3, 3)), class = "data.frame", row.names = c("Player1", "Player2",
"Player3", "Player4", "Player5", "Player6"))
I would like to order the rows according to the number of points Pts. This can be done by df[order(df$Pts, decreasing=T),]. The issue appears when there is a tie between several players, then, what I want to do is to order the rows according to Dif.
How can this be done?
The order function which you are already using can take multiple arguments, each used sequentially to break ties in the previous one; see ?order
So you simply have to add Dif to you existing call:
df[order(df$Pts, df$Dif, decreasing=T),]
You can add further terms to break any remaining ties, e.g. Player2 and Player3 who have identical Pts and Dif.
If you want to specify which direction each argument should be ordered by (increasing or decreasing), you can either specify the decreasing argument as a vector, as in #r.user.05apr's comment, or my preferred lazy solution of adding - to any term that should be ordered in a decreasing direction
df[order(-df$Pts, df$Dif),]
(this will order by Pts decreasing and Dif increasing; it won't work if e.g. one of the ordering columns is character)
You can use sqldf or dplyr library
library (sqldf)
sqldf('select *
from "df"
order by "Pts" desc, "Dif" desc ')
Output
PJ V E D GF GC Dif Pts
1 4 4 0 0 182 84 98 12
2 4 2 0 2 185 115 70 6
3 4 2 0 2 91 143 -52 6
4 4 2 0 2 92 144 -52 6
5 4 1 0 3 126 141 -15 3
6 4 1 0 3 119 168 -49 3
I have educational data in R that looks like this:
df <- data.frame(
"StudentID" = c(101, 102, 103, 104, 105, 106, 111, 112, 113, 114, 115, 116, 121, 122, 123, 124, 125, 126),
"FedEthn" = c(1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3, 1, 1, 2, 2, 3, 3),
"HIST.11.LEV" = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 5, 3, 3),
"HIST.11.SCORE" = c(96, 95, 95, 97, 88, 99, 89, 96, 79, 83, 72, 95, 96, 93, 97, 98, 96, 87),
"HIST.12.LEV" = c(2, 2, 1, 2, 1, 1, 2, 3, 2, 2, 2, 2, 4, 3, 3, 3, 3, 3),
"SCI.9.LEV" = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3),
"SCI.9.SCORE" = c(91, 99, 82, 95, 65, 83, 96, 97, 99, 94, 95, 96, 89, 78, 96, 95, 97, 90),
"SCI.10.LEV" = c(1, 2, 1, 2, 1, 1, 3, 3, 2, 2, 2, 3, 3, 3, 4, 3, 4, 3)
)
## StudentID FedEthn HIST.11.LEV HIST.11.SCORE HIST.12.LEV SCI.9.LEV SCI.9.SCORE SCI.10.LEV
## 1 101 1 1 96 2 1 91 1
## 2 102 1 1 95 2 1 99 2
## 3 103 2 1 95 1 1 82 1
## 4 104 2 1 97 2 1 95 2
## 5 105 3 1 88 1 1 65 1
## 6 106 3 1 99 1 1 83 1
## 7 111 1 2 89 2 2 96 3
## 8 112 1 2 96 3 2 97 3
## 9 113 2 2 79 2 2 99 2
## 10 114 2 2 83 2 2 94 2
## 11 115 3 2 72 2 2 95 2
## 12 116 3 2 95 2 2 96 3
## 13 121 1 3 96 4 3 89 3
## 14 122 1 3 93 3 3 78 3
## 15 123 2 3 97 3 3 96 4
## 16 124 2 3 98 3 3 95 3
## 17 125 3 3 96 3 3 97 4
## 18 126 3 3 87 3 3 90 3
HIST.11.LEV stands for the student's academic level in their 11th grade history course. (5 = highest academic level, 1 = lowest academic level. For example, 5 might be an AP or IB course.) HIST.11.SCORE indicates the student's score in the course.
When a student scores 95 or higher in a course, they're eligible to move up to a higher academic level in the following year (such that HIST.12.LEV = 1 + HIST.11.LEV). However, only some of these eligible students actually move up, and the teacher must agree to it. What I'm analyzing is whether these move-up rates for eligible students differ by reported federal ethnicity.
Here's how I'm achieving this so far:
var.level <- 1
var.ethn <- 1
actual.move.ups <-
(df %>% filter(FedEthn==var.ethn,
HIST.11.LEV==var.level,
HIST.11.SCORE>94,
HIST.12.LEV==var.level+1) %>%
count) +
(df %>% filter(FedEthn==var.ethn,
SCI.9.LEV==var.level,
SCI.9.SCORE>94,
SCI.10.LEV==var.level+1) %>%
count)
eligible.move.ups <-
(df %>% filter(FedEthn==var.ethn,
HIST.11.LEV==var.level,
HIST.11.SCORE>94) %>%
count) +
(df %>% filter(FedEthn==var.ethn,
SCI.9.LEV==var.level,
SCI.9.SCORE>94) %>%
count)
This works, and I could iterate var.level from 1:5 and var.ethnicity from 1:7 and store the results in a data frame. But in my actual data, this approach would require 15 iterations of df %>% filter(...) %>% count (and I'd sum them all). The reason is that, in my actual data, there are 15 opportunities to move up across 5 subjects (HIST, SCI, MATH, ENG, WL) and 4 grade levels (9, 10, 11, 12).
My question is whether there's a more compact way to filter and count all instances where COURSE.GRADE.LEV==i, COURSE.GRADE+1.LEV==i+1, and COURSE.GRADE.SCORE>94 without typing/hard-coding each course name (HIST, SCI, MATH, ENG, WL) and each grade level (9, 10, 11, 12). And, what's the best way to store the results in a data frame?
For my sample data above, here's the ideal output. The data frame doesn't need to have this exact structure, though.
## FedEthn L1.Actual L1.Eligible L2.Actual L2.Eligible L3.Actual L3.Eligible
## 1 1 3 3 3 3 1 1
## 2 2 2 3 0 1 1 3
## 3 3 0 1 1 3 1 2
*Note: I've read this helpful answer, but for my variable names, the grade level (9, 10, 11, 12) doesn't have a consistent string location (e.g., SCI.9 vs. HIST.11). Also, in some instances, I need to count a single row multiple times, since a single student could move up in multiple classes. Maybe the solution is to reshape the data from wide to long before performing the count?
Using this great answer from #akrun, I was able to come up with a solution. I think I'm still making it unnecessarily complicated, though, and I hope to accept someone else's more compact answer.
course.names <- c("HIST.","SCI.")
grade.levels <- 9:11
tally.actual <- function(var.ethn, var.level){
total.tally.actual <- NULL
for(i in course.names){
course.tally.actual <- NULL
for(j in grade.levels){
new.tally.actual <- df %>% filter(
FedEthn == var.ethn,
!!(rlang::sym(paste0(i,j,".LEV"))) == var.level,
!!(rlang::sym(paste0(i,(j+1),".LEV"))) == (var.level+1),
!!(rlang::sym(paste0(i,j,".SCORE"))) > 94
) %>% count
course.tally.actual <- c(new.tally.actual, course.tally.actual)
}
total.tally.actual <- c(total.tally.actual, course.tally.actual)
}
return(sum(unlist(total.tally.actual)))
}
tally.eligible <- function(var.ethn, var.level){
total.tally.eligible <- NULL
for(i in course.names){
course.tally.eligible <- NULL
for(j in grade.levels){
new.tally.eligible <- df %>% filter(
FedEthn == var.ethn,
!!(rlang::sym(paste0(i,j,".LEV"))) == var.level,
!!(rlang::sym(paste0(i,j,".SCORE"))) > 94
) %>% count
course.tally.eligible <- c(new.tally.eligible, course.tally.eligible)
}
total.tally.eligible <- c(total.tally.eligible, course.tally.eligible)
}
return(sum(unlist(total.tally.eligible)))
}
results <- data.frame("FedEthn" = 1:3,
"L1.Actual" = NA, "L1.Eligible" = NA,
"L2.Actual" = NA, "L2.Eligible" = NA,
"L3.Actual" = NA, "L3.Eligible" = NA)
for(var.ethn in 1:3){
for(var.level in 1:3){
results[var.ethn,(var.level*2)] <- tally.actual(var.ethn,var.level)
results[var.ethn,(var.level*2+1)] <- tally.eligible(var.ethn,var.level)
}
}
This approach works, but it requires df to contain every combination of course (SCI, MATH, HIST, ENG, WL) and year (9, 10, 11, 12). See below for how I added to the original df. Including all possible combinations isn't a problem for my actual data, but I'm hoping there's a solution that doesn't require adding a bunch of columns filled with NA:
df$HIST.9.LEV = NA
df$HIST.9.SCORE = NA
df$HIST.10.LEV = NA
df$HIST.10.SCORE = NA
df$HIST.12.SCORE = NA
df$SCI.10.SCORE = NA
df$SCI.11.LEV = NA
df$SCI.11.SCORE = NA
df$SCI.12.LEV = NA
df$SCI.12.SCORE = NA
I have a data.table DT with 3 columns, Zeit, Spuer and Eingriff.
DT <- data.table(Zeit = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
Spuer = c(45, 45, 32, 25, 30, 44, 34, 42, 44),
Eingriff = c(0, 0, 1, 0, 0, 0, 1, 0, 0))
I want to find out if Eingriff == 1 and one of the next 3 values of Spuer < 30. If it is TRUE then Eingriff == 1 else Eingriff == 0. For my real data I check if the next 20 values or more in Spuer are smaller than 30, so a solution like lead(Spuer, 1), lead(Spuer, 2) etc. is not a good solution.
I already tried to implement a solution with frollapply and shift but couldn't make it work.
In the end the result should look like this:
res <- data.table(Zeit = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
Spuer = c(45, 45, 32, 25, 30, 44, 34, 42, 44),
Eingriff = c(0, 0, 1, 0, 0, 0, 0, 0, 0))
Here is one option using sapply :
We first find out indices where Eingriff == 1 and for each of those indices check if any of the value in the window is less than 30.
library(data.table)
window <- 3
inds <- which(DT$Eingriff == 1)
DT[inds, Eingriff := as.integer(sapply(inds, function(x)
any(DT$Spuer[x:(x+window - 1)] < 30)))]
DT
# Zeit Spuer Eingriff
#1: 1 45 0
#2: 2 45 0
#3: 3 32 1
#4: 4 25 0
#5: 5 30 0
#6: 6 44 0
#7: 7 34 0
#8: 8 42 0
#9: 9 44 0
Here is another option using non-equi join:
k <- 3L
DT[, c("start", "end") := .(.I, .I + k)]
DT[Eingriff==1L, Eingriff :=
DT[.SD, on=.(start>start, start<=end), by=.EACHI, +any(x.Spuer < 30)]$V1
]
output:
Zeit Spuer Eingriff start end
1: 1 45 0 1 4
2: 2 45 0 2 5
3: 3 32 1 3 6
4: 4 25 0 4 7
5: 5 30 0 5 8
6: 6 44 0 6 9
7: 7 34 0 7 10
8: 8 42 0 8 11
9: 9 44 0 9 12
I have a string of numbers:
n1 = c(1, 1, 0, 6, 0, 0, 10, 10, 11, 12, 0, 0, 19, 23, 0, 0)
I need to replace 0 with the corresponding number right "behind" it to get, while leaving the 0s in the tail alone (cause there is nothing right behind them):
n2 = c(1, 1, 6, 6, 10, 10, 10, 10, 11, 12, 19, 19, 19, 23, 0, 0)
How can I get from n1 to n2?
This seems to be a much harder question than the one I've asked earlier:
How to fill in the preceding numbers whenever there is a 0 in R?
where flodel has come up with an elegant solution:
n2 <- n1[cummax(seq_along(n1) * (n1 != 0))]
However, this solution does not work here; I've tried but failed to adapt the code.
Can someone else figure out an elegant solution?
Thanks in advance!
If you don't also have NA in the vector, you can use na.locf from package zoo:
n1[n1==0] <- NA
n2 = na.locf(n1, na.rm=FALSE, fromLast=TRUE)
n2[is.na(n2)] <- 0
n2
## [1] 1 1 6 6 10 10 10 10 11 12 19 19 19 23 0 0
Here's a rle approach:
out <- rle(n1)
locs <- out$values == 0 & !seq_along(out$values) %in% length(out$values)
out$values[locs] <- out$values[which(locs) + 1]
with(out, rep(values, lengths))
## [1] 1 1 6 6 10 10 10 10 11 12 19 19 19 23 0 0
You can use flodel's suggestion in reverse
na = c(1, 1, 0, 6, 0, 0, 10, 10, 11, 12, 0, 0, 19, 23, 0, 0)
locf<-function(x) {
x<-rev(x)
a<-x[cummax(seq_along(x) * (x != 0))]
c(rev(a), rep(0, length(x)-length(a)))
}
locf(na)
# [1] 1 1 6 6 10 10 10 10 11 12 19 19 19 23 0 0
I am doing some computation as a part of a scientific research, and I stuck up in a problem. That has to do with data visualization.
I got a list of the sublists of a different length. Each of those sublists is a vector of a numeric values of the main variable for every single situation. The problem is this:
is there a way to display it in a 3D plotin the following way:
Let's say x-axis stands for one factor of experiment, y-axis stands for another factor of experiment, and z-axis is the axis the numerical values of our nnumeric variable. I need to display it in the way of vertical lines (that are parralel to z-axis). The number of those vertical lines is equal to the number of Factors combinations (the x-axis and y-axis). Here is the way it looked before with a smaller amount of values (when the lists were of the same size):
https://www.dropbox.com/s/wdcgihjcqzobsqs/sample0.jpeg
I would want to make it in the same layout, only with a bigger number of points. Each of thse sublists stands for one of those 6 situations of Factors combinations.
Or maybe there is a different way, a better way of 3D visualization of this kind of data.
And here is the list of sublists I need to make my visualization for (I do not know if this is relevant here):
`> temp
[[1]]
[1] 395 310 235 290 240 490 270 225 430 385 170 55 295 320 270 130 300 285 130 200 225 90 205
[24] 340
[[2]]
[1] 3 8
[[3]]
[1] 1 0 0 0 3 2 5 2 3 5 2 3
[[4]]
[1] 1 0 0 0 3 2 5 2 3 5 2 3
[[5]]
[1] 1 1 1 2 3 5 2 5 3 3 3 2 3 2 3
[[6]]
[1] 0 0 195 150 2 2 0 2 1 1 2 1 2 1 1 1 3 2 2 1 2 2 1
[24] 1 2 3 2 2 1 3 1 1
`
Any help/suggestions will be appreciated.
Here is an alternate visualization. Note that you don't have a 6D problem, it's really a 3D problem with 2 factor dimensions and one continuous one. There are 6 possible factor combinations. Note I had to make assumptions about what factor combination corresponds to what item in your list:
facs <- cbind(f1=rep(f1, length(f2)), f2=rep(f2, each=length(f1))) # create factor combos
lst <- list(c(395, 310, 235, 290, 240, 490, 270, 225, 430, 385, 170, 55, 295, 320, 270, 130, 300, 285, 130, 200, 225, 90, 205, 340 ), c(3, 8), c(1, 0, 0, 0, 3, 2, 5, 2, 3, 5, 2, 3), c(1, 0, 0, 0, 3, 2, 5, 2, 3, 5, 2, 3), c(1, 1, 1, 2, 3, 5, 2, 5, 3, 3, 3, 2, 3, 2, 3), c(0, 0, 195, 150, 2, 2, 0, 2, 1, 1, 2, 1, 2, 1, 1, 1, 3, 2, 2, 1, 2, 2, 1, 1, 2, 3, 2, 2, 1, 3, 1, 1))
library(data.table)
facs.dt <- as.data.table(facs)[,list(time=sort(lst[[.GRP]])), by=list(f1, f2)]
facs.dt[, id:=seq_along(time), by=list(f1, f2)]
library(ggplot2)
ggplot(facs.dt, aes(x=id, y=time)) +
geom_bar(stat="identity", position="dodge") +
scale_y_log10() + facet_grid(f1 ~ f2)
The resulting plot above displays, for each of the 6 factor combinations, the log all the time values. This makes it much easier to read the continuous variable than a 3D cube.
And an alternate view with free scales:
ggplot(facs.dt, aes(x=id, y=time)) +
geom_bar(stat="identity", position="dodge") +
facet_wrap(~ f1 + f2, scales="free") +
opts(axis.text.x=element_blank(), axis.ticks.x=element_blank())