I have the following variables:
loc.dir <- c(1, -1, 1, -1, 1, -1, 1)
max.index <- c(40, 46, 56, 71, 96, 113, 156)
min.index <- c(38, 48, 54, 69, 98, 112, 155)
My goal is to produce the following:
data.loc <- c(40, 48, 56, 69, 96, 112, 156)
In words, I look at each element loc.dir. If the ith element is 1, then I will take the ith element in max.index. On the other hand, if the ith element is -1, then I will take the ith element in min.index.
I am able to get the elements that should be in data.loc by using:
plus.1 <- max.index[which(loc.dir == 1)]
minus.1 <- min.index[which(loc.dir == -1)]
But now I don't know how to combine plus.1 and minus.1 so that it is identical to data.loc
ifelse was designed for this:
ifelse(loc.dir == 1, max.index, min.index)
#[1] 40 48 56 69 96 112 156
It does something similar to this:
res <- min.index
res[loc.dir == 1] <- max.index[loc.dir == 1]
Related
I'm looking to find multiple max values using multiple ranges from a single table without using a loop.
It's difficult to explain, but here's an example:
list of value <- c(100, 110, 54, 64, 73, 23, 102)
beginning_of_max_range <- c(1, 2, 4)
end_of_max_range <- c(3, 5, 6)
output
110, 110, 73
max(100, 110, 54)
max(110, 54, 64)
max(64, 73, 23)
You may do this with mapply -
list_of_value <- c(100, 110, 54, 64, 73, 23, 102)
beginning_of_max_range <- c(1, 2, 4)
end_of_max_range <- c(3, 5, 6)
mapply(function(x, y) max(list_of_value[x:y]), beginning_of_max_range, end_of_max_range)
#[1] 110 110 73
We create a sequence from beginning_of_max_range to end_of_max_range, subset it from list_of_value and get the max from each pair.
Below, assume this is part of the data:
df <- tribble(
~temp1, ~temp2, ~temp3, ~temp4, ~temp5, ~temp6, ~temp7, ~temp8,
75, 88, 85, 71, 98, 76, 71, 57,
80, 51, 84, 72, 59, 81, 70, 64,
54, 65, 90, 66, 93, 88, 77, 59,
59, 87, 94, 75, 74, 53, 56, 87,
52, 55, 64, 77, 50, 64, 83, 87,
)
Now I want to make a loop to get the results. In this example, temp1 should go with temp2 ONLY and temp3 should go with temp4 only, temp5 with temp6 only and temp7 with temp8.
Suppose I want to run a correlation or a t-test between the intended variables ( temp1 with 2, temp3 with temp4, temp5with tem6, temp7 with temp8 ONLY)
I would also like to get only statistics, for example only the value of r in correlation... A table would be very helpful.
I have searched it seems we need to use the function of the map, but I struggled to do it. Could we do it in R?
We can use seq to subset the columns and use map2 so that we get the correlation between temp1 and temp2, temp3 and temp4 etc
library(purrr)
out <- map2_dbl(df[seq(1, ncol(df), 2)], df[seq(2, ncol(df), 2)], ~ cor(.x, .y))
names(out) <- paste0("Time", seq_along(out))
Or with Map from base R
out <- unlist(Map(function(x, y) cor(x, y), df[seq(1, ncol(df), 2)],
df[seq(2, ncol(df), 2)]))
names(out) <- paste0("Time", seq_along(out))
You could split your dataframe in two: one with columns 1,3,5,7 and the other with 2,4,6,8.
Then you one take one column per each a time and perform cor or t.test with pmap.
library(purrr)
df %>%
split.default(rep_len(1:2, ncol(.))) %>%
pmap_dbl(~cor(.x,.y))
Ciao,
I have taken much feedback to create a reproducible example and my coding attempts.
Here is a sample data frame:
df <- data.frame("STUDENT" = 1:10,
"test_FALL" = c(0, 0, 0, 0, 1, 1, 1, 0, 0, NA),
"test_SPRING" = c(1, 1, 1, 1, 0, 0, 0, 1, 1, 1),
"score_FALL" = c(53, 52, 98, 54, 57 ,87, 62, 95, 75, NA),
"score_SPRING" = c(75, 54, 57, 51, 51, 81, 77, 87, 73, 69),
"final_SCORE" = c(75, 54, 57, 51, 57, 87, 62, 87, 73, 69))
And sample code:
df$final_score[df$test_SPRING == 1] <- df$score_SPRING
df$final_score[df$test_FALL == 1] <- df$score_FALL
And a second attempt at the code:
df$final_score1[(df$test_SPRING == 1 & !is.na(df$test_SPRING))] <- df$score_SPRING
df$final_score1[(df$test_FALL == 1 & !is.na(df$test_FALL))] <- df$score_FALL
In my dataframe I have when a student took a test (test_SPRING test_FALL) and scores on tests (score_SPRING score_FALL). Basically I want to create the final_SCORE column which I include in the dataframe SUCH THAT if test_SPRING = 1, tot_score = score_SPRING else if test_FALL = 1, tot_score = score_Fall. I am unable to do so and cannot figure it out after many hours. Please offer any advice you may have.
There are couple of ways to create the 'final_score'.
1) Using ifelse - Based on the example, the 'test' columns are mutually exclusive. So, we can use a single ifelse by checking the condition based on 'test_SPRING' (test_SPRING == 1). If it is TRUE, then get the 'score_SPRING' or else get 'score_FALSE'
with(df, ifelse(test_SPRING == 1, score_SPRING, score_FALL))
#[1] 75 54 57 51 57 87 62 87 73 69
2) Arithmetic - Number multiplied by 1 is the number itself and 0 gives 0 so multiply the 'score' columns with the corresponding 'test' columns, cbind and use rowSums
with(df, NA^(!test_FALL & !test_SPRING) * rowSums(cbind(
score_SPRING * test_SPRING,
score_FALL * test_FALL),
na.rm = TRUE))
#[1] 75 54 57 51 57 87 62 87 73 69
I have a cdplot where I'm trying to find my x value where the distribution (or the y value) = .5 and couldn't find a method to do it that works. Additionally I want to find the y value when my x value is 0 and would like help finding that equation to if it's different.
I cant really provide my code as it relies on a saved workspace with a large dataframe. I'll give this as an example:
fail <- factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1,1, 2, 1, 1, 1, 1, 1),levels = 1:2, labels = c("no", "yes"))
temperature <- c(53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70,70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81)
cdplot(fail ~ temperature)
So I don't need a quick and dirty way to solve this specific example, I need a code I can apply to my own workspace.
If you capture the return of cdplot, you get a function that you can use to find these values.
CDP = cdplot(fail ~ temperature
uniroot(function(x) { CDP$no(x) - 0.5}, c(55,80))
> uniroot(function(x) { CDP$no(x) - 0.5}, c(55,80))
$root
[1] 62.34963
$f.root
[1] 3.330669e-16
How can I extract the exact probabilities for each factor y at any value of x with cdplot(y~x)
Thanks
Following the example from the help file of ?cdplot you can do...
## NASA space shuttle o-ring failures
fail <- factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1,
1, 2, 1, 1, 1, 1, 1),
levels = 1:2, labels = c("no", "yes"))
temperature <- c(53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70,
70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81)
## CD plot
result <- cdplot(fail ~ temperature)
And this is a simple way to obtain the probabilities from the cdplot output.
# Getting the probabilities for each group.
lapply(split(temperature, fail), result[[1]])
$no
[1] 0.8166854 0.8209055 0.8209055 0.8209055 0.8090438 0.7901473 0.7718317 0.7718317 0.7579343
[10] 0.7664731 0.8062898 0.8326761 0.8326761 0.8905854 0.9185472 0.9626185
$yes
[1] 3.656304e-05 6.273653e-03 1.910046e-02 6.007471e-01 7.718317e-01 7.718317e-01 8.062898e-01
Note that result is a conditional density function (cumulative over the levels of fail) returned invisibly by cdplot, therefore we can split temperature by fail and apply the returned function over those values using lapply.
Here a simple version of getS3method('cdplot','default') :
get.props <- function(x,y,n){
ny <- nlevels(y)
yprop <- cumsum(prop.table(table(y)))
dx <- density(x, n )
y1 <- matrix(rep(0, n * (ny - 1L)), nrow = (ny - 1L))
rval <- list()
for (i in seq_len(ny - 1L)) {
dxi <- density(x[y %in% levels(y)[seq_len(i)]],
bw = dx$bw, n = n, from = min(dx$x), to = max(dx$x))
y1[i, ] <- dxi$y/dx$y * yprop[i]
}
}