How to write a sequential for loop with conditional lookup capability - r

Beginner here.
I am trying to write a for loop to adjust a score.
The first part of the for loop creates a conditional output which should be passed to the second part of the for loop which has a lookup function.
In the first data.frame, a, there are 2 columns
score (participant score)
problem (problems with the participant score)
In the second data.frame, b, there are 5 columns
score (participant score)
q (adjusted score when problem = 1)
r (adjusted score when problem = 2)
s (adjusted score when problem = 0)
t (adjusted score when problem > 2)
In the first for loop, I find the index where
a$score = b$score
Then in the second loop, I pass the index to another loop.
Based on the value in a$problem, the loop returns the correct adjusted value in (q, r, s, t).
Here is data.frame a
id score problem
1 11 1
2 12 6
3 13 2
4 14 0
5 NA NA
Here is data.frame b
score q r s t
11 12 13 11 NA
12 14 15 12 NA
13 16 20 13 NA
14 18 22 14 NA
NA NA NA NA NA
I would like the output of the function to be a new column in a, a$adjusted
Here is the function I have been trying,
adjust <- function (y, z){
# y = problem
# z = score
for(j in z){
index <- sapply(j, function(x) b$score %in% x)
for (i in y){
ifelse(i > 2,
z(i) <- b[index, 5],
ifelse(i == 2,
z(i) <- b[index, 3],
ifelse(i == 1,
z(i) <- b[index, 2],
ifelse( i == 0,
z(i) <- b[index, 4],
z(i) <- b[index, 5]))))
print(z(i))
}
}
}
This is still new for me.
Not sure where I'm going wrong.
When I assign:
a$adjusted <- adjust(a$problem, a$score)
Nothing happens
Any and all help very much appreciated here.

To simplify the nested ifelse statements I used the case_when function from the dplyr package. I also used the match function to simplify the inner loop (i.e. sapply)
a<-read.table(header=TRUE, text="id score problem
1 12 1
2 11 6
3 13 2
4 14 0
5 NA NA")
b<-read.table(header=TRUE, text="score q r s t
11 12 13 11 NA
12 14 15 12 NA
13 16 20 13 NA
14 18 22 14 NA
NA NA NA NA NA")
library(dplyr)
#find column name associated with problem score if NA use the score column
colname<-case_when(
a$problem==0 ~ "s",
a$problem==1 ~ "q",
a$problem==2 ~ "r",
a$problem >2 ~ "t",
is.na(a$problem) ~"score"
)
# find the corresponding row in table b to match data frame a
rowscore<-match(a$score, b$score)
#column name and rowscore are the same length
#loop through the column name/row name to find the adjusted score
a$adjusted<-sapply(1:length(rowscore), function(i){b[[colname[i]]][rowscore[i]]} )

Related

Rollapply with expanding window in R

Let's say I have a simple toy vector in R like:
x = seq(1:10);x
[1] 1 2 3 4 5 6 7 8 9 10
I want to use the rollapply function from zoo package but in a different way.Rollapply calculates a function from a vector x with width argument to be a rolling window.I want instead of rolling to be expanding.There is similar question here and here but they don't help me with my problem.
For example what I want to calculate the sum of the first observations of vector x and then expand the window but by 2.
Doing so I did :
rollapplyr(x, seq_along(x) ,sum,by=2,partial = 5,fill=NA)
[1] NA NA NA NA 15 21 28 36 45 55
or replace the NA's
na.locf0(rollapplyr(x, 5 ,sum,by=2,partial = 5,fill=NA))
[1] NA NA NA NA 15 15 25 25 35 35
But what I ideally want as a result is:
[1] NA NA NA NA 15 15 28 28 45 45
Imagine that my dataset is huge (contains 2500 time series observations) and the function is some econometric - statistical model not a simple one like the sum that I use here.
How can I do it? Any help ?
x <- seq(10)
expandapply <- function(x, start, by, FUN){
# set points to apply function up to
checkpoints <- seq(start, length(x), by)
# apply function to all windows
vals <- sapply(checkpoints, function(i) FUN(x[seq(i)]))
# fill in numeric vector at these points (assumes output is numeric)
out <- replace(rep(NA_real_, length(x)), checkpoints, vals)
# forward-fill the gaps
zoo::na.locf(out, na.rm = FALSE)
}
expandapply(x, start = 5, by = 2, FUN = sum)
#> [1] NA NA NA NA 15 15 28 28 45 45
Created on 2022-03-13 by the reprex package (v2.0.1)
Define nonNA as the positions which should not be NA. You can change x and nonNA to whatever you need.
Then assign w a vector of widths to use using zero for those components which are to be NA. Finally apply na.locf0.
(The two extreme cases are that if nonNA is seq_along(x) so that all elements are not to be NA'd out then this is the same as rollapplyr(x, seq_along(x), sum) and if nonNA is c() so that there are no non-NAs then it returns all NAs.)
library(zoo)
x <- 1:10
nonNA <- seq(5, length(x), 2)
w <- ifelse(seq_along(x) %in% nonNA, seq_along(x), 0)
na.locf0(rollapplyr(x, w, function(x) if (length(x)) sum(x) else NA, fill=NA))
## [1] NA NA NA NA 15 15 28 28 45 45
Another way is to use a list for thewidth= argument of rollapply whose components contain the offsets. x and nonNA are from above.
L <- lapply(seq_along(x), function(x) if (x %in% nonNA) -seq(x-1, 0))
na.locf0(rollapplyr(x, L, sum, fill = NA))
## [1] NA NA NA NA 15 15 28 28 45 45
Update
Simplified solution and added second approach.

Is there a way to recode an SPSS function in R to create a single new variable?

Can somebody please help me with a recode from SPSS into R?
SPSS code:
RECODE variable1
(1,2=1)
(3 THRU 8 =2)
(9, 10 =3)
(ELSE = SYSMIS)
INTO variable2
I can create new variables with the different values. However, I'd like it to be in the same variable, as SPSS does.
Many thanks.
x <- y<- 1:20
x
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
y[x %in% (1:2)] <- 1
y[x %in% (3:8)] <- 2
y[x %in% (9:10)] <- 3
y[!(x %in% (1:10))] <- NA
y
[1] 1 1 2 2 2 2 2 2 3 3 NA NA NA NA NA NA NA NA NA NA
I wrote a function that has very similiar coding to the spss code recode. See here
variable1 <- -1:11
recodeR(variable1, c(1, 2, 1), c(3:8, 2), c(9, 10, 3), else_do= "missing")
NA NA 1 1 2 2 2 2 2 2 3 3 NA
This function now works also for other examples. This is how the function is defined
recodeR <- function(vec_in, ..., else_do){
l <- list(...)
# extract the "from" values
from_vec <- unlist(lapply(l, function(x) x[1:(length(x)-1)]))
# extract the "to" values
to_vec <- unlist(lapply(l, function(x) rep(x[length(x)], length(x)-1)))
# plyr is required for mapvalues
require(plyr)
# recode the variable
vec_out <- mapvalues(vec_in, from_vec, to_vec)
# if "missing" is written then all outside the defined range will be missings.
# Otherwise values outside the defined range stay the same
if(else_do == "missing"){
vec_out <- ifelse(vec_in < min(from_vec, na.rm=T) | vec_in > max(from_vec, na.rm=T), NA, vec_out)
}
# return resulting vector
return(vec_out)}

How to make a function to get average (B) when (A) is a certain condition on data.frame in R

this is for setting
#this is for setting
A <- c(1,1,2,2,2,3,4,4,5,5,5)
B <- c(1:10)
C <- c(11:20)
ABC <- data.frame(A,B,C)
#so, I made up my own ABC like this
A B C
1 1 1 11
2 1 2 12
3 2 3 13
4 2 4 14
5 2 5 15
6 3 6 16
7 4 7 17
8 4 8 18
9 5 9 19
10 5 10 20
On this setting,
I want to know, when (A) are in a specific condition, how to get average (B)or(C)
For example
if condition(A) are 2:4, get mean (B), and mean(C)
new_ABC <- subset(ABC, ABC$A >= 2 & ABC$A <= 4)
mean(newABC$B)
mean(newABC$C)
and it works.
But if I want to make a function like this, I tried severe days, I have no idea...
getMeanB <- function(condition){
for(i in min(condition) : max(condition){
# I do not really know what to do..
}
}
Any helps will very thanks!!
If the argument 'condition' is a vector, then we can do it
getMean <- function(data, condition, cName) {
minC <- min(condition)
maxC <- max(condition)
i1 <- data[[cName]] >= minC & data[[cName]] <= maxC
colMeans(data[i1,setdiff(names(data), cName)], na.rm = TRUE)
}
getMean(ABC, 2:4, "A")
# B C
# 5.5 15.5
NOTE: Here, the 'data' and 'cName' arguments are added to make it more dynamic and applied to other datasets with different column names.

Calculate mean of a range of rows

Given a data frame like this:
A <- c(1,2,3,4,NA,6,7,8,9,10,11,12,13,14,15)
B <- c(NA,NA,NA,20,NA,NA,NA,15,NA,NA,NA,NA,11,NA,9)
DF <- data.frame(A, B)
I would like to calculate the mean for a range of values in column A, based on the value in column B. Specifically, every time there is a non-NA value in column B, I would like to calculate the mean of the range of rows 2 above and 2 below in column A.
For example, the first non-NA value in column B is 20. So I would like to calculate the mean of the two rows above (2, 3), two rows below (NA, 6), and the row adjacent (4). So:
mean(2,3,4,NA,6)
Similarly, the next non-NA value in row B is 15. Which would be
mean(6,7,8,9,10)
So, the end result for the entire data frame would be a new column C
DF$C <- c(NA,NA,NA,3.75,NA,NA,NA,8,NA,NA,NA,NA,13,NA,14)
You could try the following.
nona <- !is.na(DF$B)
DF$C <- replace(
DF$B,
nona,
vapply(which(nona), function(i) {
ii <- (i-2):(i+2)
mean(DF$A[ii[ii > 0]], na.rm = TRUE)
}, 1)
)
Here we are finding the non-NA values in column B and then using that vector to set up the indices for the values we want to find the mean for in column A, being careful to remove any negative subscripts that might occur should the first one or two values of column B not be NA. The above code gives the following result for DF.
A B C
1 1 NA NA
2 2 NA NA
3 3 NA NA
4 4 20 3.75
5 NA NA NA
6 6 NA NA
7 7 NA NA
8 8 15 8.00
9 9 NA NA
10 10 NA NA
11 11 NA NA
12 12 NA NA
13 13 11 13.00
14 14 NA NA
15 15 9 14.00
Here is an approach with the zoo package:
library(zoo)
width <- 5 # the observation ± 2
DF$C <- rollapply(DF$A, width, mean, na.rm = TRUE, partial = TRUE)
# when DF$B is NA, assign NA to corresponding DF$C
DF$C[is.na(DF$B)] <- NA
partial = TRUE allows calculating the mean with a partial window at the leading and trailing parts of the DF$A vector where the whole window can't be accommodated (i.e. the first 2 and last 2 values of DF$A where a window of size 5 is not possible).

Calculate cumulative sums of certain values

Assume you have a data frame like this:
df <- data.frame(Nums = c(1,2,3,4,5,6,7,8,9,10), Cum.sums = NA)
> df
Nums Cum.sums
1 1 NA
2 2 NA
3 3 NA
4 4 NA
5 5 NA
6 6 NA
7 7 NA
8 8 NA
9 9 NA
10 10 NA
and you want an output like this:
Nums Cum.sums
1 1 0
2 2 0
3 3 0
4 4 3
5 5 5
6 6 7
7 7 9
8 8 11
9 9 13
10 10 15
The 4. element of the column Cum.sum is the sum of 1 and 2, the 5. element of the Column Cum.sum is the sum of 2 and 3 and so on...
This means, I would like to build the cumulative sum of the first row and save it in the second row. However I don't want the normal cumulative sum but the sum of the element 2 rows above the current row plus the element 3 rows above the current row.
I allready tried to play a little bit around with the sum and cumsum function but I failed.
Any ideas?
Thanks!
You could use the embed function to create the appropriate lags, rowSums to sum, then lag appropriately (I used head).
df$Cum.sums[-(1:3)] <- head(rowSums(embed(df$Nums,2)),-2)
You don't need any special function, just use normal vector operations (these solutions are all equivalent):
df$Cum.sums[-(1:3)] <- head(df$Nums, -3) + head(df$Nums[-1], -2)
or
with(df, Cum.sums[-(1:3)] <- head(Nums, -3) + head(Nums[-1], -2))
or
df$Cum.sums[-(1:3)] <- df$Nums[1:(nrow(df)-3)] + df$Nums[2:(nrow(df)-2)]
I believe the first 3 sums SHOULD be NA, not 0, but if you prefer zeroes, you can initialize the sums first:
df$Cum.sums <- 0
Another solution, elegant and general, using matrix multiplication - and so very inefficient for large data. So it's not much practical, though a nice excercise:
len <- nrow(df)
sr <- 2 # number of rows to sum
lag <- 3
mat <- matrix(
head(c(
rep(0, lag * len),
rep(rep(1:0, c(sr, len - sr + 1)), len)
), len * len),
nrow = 10, byrow = TRUE
)
mat %*% df$Nums

Resources