filtering scores from one variable and placing them in a new variable - r

##So I have this variable test scores is coded on a scale from 1-9.
I have to take those who score 1-3 as low, 4-6 as good and 7-9 as high in new variables.
then have to make a new variable that compares low and high and a variable that compares low and good.
test_scores<- c(sample(1:10, 122, replace = TRUE)
test_scores<-as.data.frame(test_scores)
low<- filter(test_scores,test_scores1 > 3)
high<- filter(test_scores, test_scores< 7)
good<-filter(test_scores,test_scores== 4:6)
##but the N of in the new variables are not counting up to 122
##I thought of using the if function:
low<- ifelse(test_scores$test_scores == 1:3 , 1:3 , 0)
mods<- ifelse(test_scores$test_scores == 4:6, 4:6, 0)
high<- ifelse(test_scores$test_scores == 7:9, 7:9, 0)
##but some scores are not getting filter instead they become 0 even tho the score matches. any ideas?

You can use "cut" to generate the new bins:
set.seed(123)
test_scores <- sample(1:9, 122, T)
test_scores
#> [1] 3 3 2 6 5 4 6 9 5 3 9 9 9 3 8 7 9 3 4 1 7 5 7 9 9 7 5 7 5 6 9 2 5 8 2 1 9
#> [38] 9 6 5 9 4 6 8 6 6 7 1 6 2 1 2 4 5 6 3 9 4 6 9 9 7 3 8 9 3 7 3 7 6 5 5 8 3
#> [75] 2 2 6 4 1 6 3 8 3 8 1 7 7 7 6 7 5 6 8 5 7 4 3 9 7 6 9 7 2 3 8 4 7 4 1 8 4
#> [112] 9 8 6 4 8 3 4 4 6 1 4
cuts <- cut(test_scores, c(0,3,6,9), labels = F)
cuts
#> [1] 1 1 1 2 2 2 2 3 2 1 3 3 3 1 3 3 3 1 2 1 3 2 3 3 3 3 2 3 2 2 3 1 2 3 1 1 3
#> [38] 3 2 2 3 2 2 3 2 2 3 1 2 1 1 1 2 2 2 1 3 2 2 3 3 3 1 3 3 1 3 1 3 2 2 2 3 1
#> [75] 1 1 2 2 1 2 1 3 1 3 1 3 3 3 2 3 2 2 3 2 3 2 1 3 3 2 3 3 1 1 3 2 3 2 1 3 2
#> [112] 3 3 2 2 3 1 2 2 2 1 2
if you want a variable for each bin, and zero otherwise, you must use %in%, not ==
low<- ifelse(test_scores$test_scores %in% 1:3 , test_scores$test_scores , 0)
mods<- ifelse(test_scores$test_scores %in% 4:6, test_scores$test_scores, 0)
high<- ifelse(test_scores$test_scores %in% 7:9, test_scores$test_scores, 0)

Related

Optimization of Holt-Winter's smoothing parameters using optim() stops after zero iterations [R]

For a high number of time series, I want to optimize the smoothing parameters of the Holt-Winter's forecasting method so that I get one set of optimal parameters. There are three parameters: alpha, beta, and gamma. Below I present a simplified procedure for one time series to indicate the problems encountered. I create a seasonal time series as follows:
check_vec <- rep(c(7,6,5,4,3,2,1), times = 100)
check_ts <- ts(check_vec, frequency = 7)
The time series looks like the following.
Time Series:
Start = c(1, 1)
End = c(100, 7)
Frequency = 7
[1] 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6
[52] 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4
[103] 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2
[154] 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7
[205] 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5
[256] 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3
[307] 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1
[358] 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6
[409] 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4
[460] 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2
[511] 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7
[562] 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5
[613] 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3
[664] 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1 7 6 5 4 3 2 1
Then, I use the following function to use for optimization:
check_func <- function(param) {
a <- param[[1]]
b <- param[[2]]
c <- param[[3]]
if (c > 1 - a | a < b) {
return(100000)
} else {
rmse <- accuracy(hw(check_ts, h = 14, alpha = a, beta = b, gamma = c))[2]
return(rmse)
}
}
The function thus returns the root mean squared error (which I want to minimize). It returns different values for different inputs.
> check_func(c(a = 0.18, b = 0.07, c = 0.1))
[1] 3.77942e-16
> check_func(c(a = 0.18, b = 0.07, c = 0.2))
[1] 3.382083e-16
I use the following optim() command to optimize the parameters:
optim(par = c(a = 0.18, b = 0.07, c = 0.1),
fn = check_func,
lower = c(0.005,0.005,0.005),
upper = c(0.99, 0.99, 0.99),
method = "L-BFGS-B",
control = list(trace = 6,
pgtol = 1.490117e-08))
Executing the optim() command gives the initial parameters as result (that is zero iterations of the optimization procedure. It returns the following message.
N = 3, M = 5 machine precision = 2.22045e-16
L = 0.005 0.005 0.005
X0 = 0.18 0.07 0.1
U = 0.99 0.99 0.99
At X0, 0 variables are exactly at the bounds
At iterate 0 f= 37.794 |proj g|= 0
iterations 0
function evaluations 1
segments explored during Cauchy searches 0
BFGS updates skipped 0
active bounds at final generalized Cauchy point 0
norm of the final projected gradient 0
final function value 37.7942
X = 0.18 0.07 0.1
F = 37.7942
final value 37.794202
converged
$par
a b c
0.18 0.07 0.10
$value
[1] 37.7942
$counts
function gradient
1 1
$convergence
[1] 0
$message
[1] "CONVERGENCE: NORM OF PROJECTED GRADIENT <= PGTOL"
I have tried to increase the scale of the output of the function and decrease pgtol without any success. Does somebody know what to do?
EDIT I have added more code and results of the procedure I found.
EDIT 2 This is the modified check_func I use to test whether the method of Enrico works.
check_func <- function(param) {
a <- param[[1]]
b <- param[[2]]
c <- param[[3]]
rmse <- try(accuracy(hw(check_ts, h = 14, alpha = a, beta = b, gamma = c))[2])
if (inherits(rmse, "try-error"))
return(200)
else
return(rmse)
}
You might get more-helpful answers if you provided a reproducible example. Are accuracy and hw from package forecast? In any case, you could try a grid search:
library("NMOF")
res <- gridSearch(check_func,
lower = rep(0.05, 3),
upper = rep(0.99, 3),
n = 10)
## List of 4
## $ minfun : num 4.11e-17
## $ minlevels: num [1:3] 0.05 0.05 0.886
## $ values : num [1:1000] 3.81e-16 3.81e-16 ...
## $ levels :List of 1000
## ..$ : num [1:3] 0.05 0.05 0.05
## ..$ : num [1:3] 0.154 0.05 0.05
## ..$ : num [1:3] 0.259 0.05 0.05
For many parameter choices, check_func would fail. So I'd safeguard the computation with something like:
rmse <- try(accuracy(hw(check_ts, h = 14, alpha = a, beta = b, gamma = c))[2])
if (inherits(rmse, "try-error"))
return(200)
(Disclosure: I am the maintainer of package NMOF.)

Group by each increasing sequence in data frame

If I have a data frame with a column of monotonically increasing values such as:
x
1
2
3
4
1
2
3
1
2
3
4
5
6
1
2
How do I add a column to group each increasing sequence that results in:
x y
1 1
2 1
3 1
4 1
1 2
2 2
3 2
1 3
2 3
3 3
4 3
5 3
6 3
1 4
2 4
I can only think of using a loop which will be slow.
You may choose cumsum function to do it.
> x <- c(1,2,3,4,1,2,3,1,2,4,5,1,2)
> cumsum(x==1)
[1] 1 1 1 1 2 2 2 3 3 3 3 4 4
I would use diff and compute the cumulative sum:
df$y <- c(1, cumsum(diff(df$x) < 0 ) + 1)
> df
x y
1 1 1
2 2 1
3 3 1
4 4 1
5 1 2
6 2 2
7 3 2
8 1 3
9 2 3
10 3 3
11 4 3
12 5 3
13 6 3
14 1 4
15 2 4

How do I add a vector where I collapse scores from individuals within pairs?

I have done an experiment in which participants have solved a task in pairs, with another participant. Each participant has then received a score for how well they did the task. Pairs have gone through different amounts of trials.
I have a data frame similar to the one below:
participant <- c(1,1,2,2,3,3,3,4,4,4,5,6)
pair <- c(1,1,1,1,2,2,2,2,2,2,3,3)
trial <- c(1,2,1,2,1,2,3,1,2,3,1,1)
score <- c(2,3,6,3,4,7,3,1,8,5,4,3)
data <- data.frame(participant, pair, trial, score)
participant pair trial score
1 1 1 2
1 1 2 3
2 1 1 6
2 1 2 3
3 2 1 4
3 2 2 7
3 2 3 3
4 2 1 1
4 2 2 8
4 2 3 5
5 3 1 4
6 3 1 3
I would like to add a new vector to the data frame, where each participant gets the numeric difference between their own score and the other participant's score within each trial.
Does someone have an idea about how one might do that?
It should end up looking something like this:
participant pair trial score difference
1 1 1 2 4
1 1 2 3 0
2 1 1 6 4
2 1 2 3 0
3 2 1 4 3
3 2 2 7 1
3 2 3 3 2
4 2 1 1 3
4 2 2 8 1
4 2 3 5 2
5 3 1 4 1
6 3 1 3 1
Here's a solution that involves first reordering data such that each sequential pair of rows corresponds to a single pair within a single trial. This allows us to make a single call to diff() to extract the differences:
data <- data[order(data$trial,data$pair,data$participant),];
data$diff <- rep(diff(data$score)[c(T,F)],each=2L)*c(-1L,1L);
data;
## participant pair trial score diff
## 1 1 1 1 2 -4
## 3 2 1 1 6 4
## 5 3 2 1 4 3
## 8 4 2 1 1 -3
## 11 5 3 1 4 1
## 12 6 3 1 3 -1
## 2 1 1 2 3 0
## 4 2 1 2 3 0
## 6 3 2 2 7 -1
## 9 4 2 2 8 1
## 7 3 2 3 3 -2
## 10 4 2 3 5 2
I assumed you wanted the sign to capture the direction of the difference. So, for instance, if a participant has a score 4 points below the other participant in the same trial-pair, then I assumed you would want -4. If you want all-positive values, you can remove the multiplication by c(-1L,1L) and add a call to abs():
data$diff <- rep(abs(diff(data$score)[c(T,F)]),each=2L);
data;
## participant pair trial score diff
## 1 1 1 1 2 4
## 3 2 1 1 6 4
## 5 3 2 1 4 3
## 8 4 2 1 1 3
## 11 5 3 1 4 1
## 12 6 3 1 3 1
## 2 1 1 2 3 0
## 4 2 1 2 3 0
## 6 3 2 2 7 1
## 9 4 2 2 8 1
## 7 3 2 3 3 2
## 10 4 2 3 5 2
Here's a solution built around ave() that doesn't require reordering the whole data.frame first:
data$diff <- ave(data$score,data$trial,data$pair,FUN=function(x) abs(diff(x)));
data;
## participant pair trial score diff
## 1 1 1 1 2 4
## 2 1 1 2 3 0
## 3 2 1 1 6 4
## 4 2 1 2 3 0
## 5 3 2 1 4 3
## 6 3 2 2 7 1
## 7 3 2 3 3 2
## 8 4 2 1 1 3
## 9 4 2 2 8 1
## 10 4 2 3 5 2
## 11 5 3 1 4 1
## 12 6 3 1 3 1
Here's how you can get the score of the other participant in the same trial-pair:
data$other <- ave(data$score,data$trial,data$pair,FUN=rev);
data;
## participant pair trial score other
## 1 1 1 1 2 6
## 2 1 1 2 3 3
## 3 2 1 1 6 2
## 4 2 1 2 3 3
## 5 3 2 1 4 1
## 6 3 2 2 7 8
## 7 3 2 3 3 5
## 8 4 2 1 1 4
## 9 4 2 2 8 7
## 10 4 2 3 5 3
## 11 5 3 1 4 3
## 12 6 3 1 3 4
Or, assuming the data.frame has been reordered as per the initial solution:
data$other <- c(rbind(data$score[c(F,T)],data$score[c(T,F)]));
data;
## participant pair trial score other
## 1 1 1 1 2 6
## 3 2 1 1 6 2
## 5 3 2 1 4 1
## 8 4 2 1 1 4
## 11 5 3 1 4 3
## 12 6 3 1 3 4
## 2 1 1 2 3 3
## 4 2 1 2 3 3
## 6 3 2 2 7 8
## 9 4 2 2 8 7
## 7 3 2 3 3 5
## 10 4 2 3 5 3
Alternative, using matrix() instead of rbind():
data$other <- c(matrix(data$score,2L)[2:1,]);
data;
## participant pair trial score other
## 1 1 1 1 2 6
## 3 2 1 1 6 2
## 5 3 2 1 4 1
## 8 4 2 1 1 4
## 11 5 3 1 4 3
## 12 6 3 1 3 4
## 2 1 1 2 3 3
## 4 2 1 2 3 3
## 6 3 2 2 7 8
## 9 4 2 2 8 7
## 7 3 2 3 3 5
## 10 4 2 3 5 3
Here is an option using data.table:
library(data.table)
setDT(data)[,difference := abs(diff(score)), by = .(pair, trial)]
data
# participant pair trial score difference
# 1: 1 1 1 2 4
# 2: 1 1 2 3 0
# 3: 2 1 1 6 4
# 4: 2 1 2 3 0
# 5: 3 2 1 4 3
# 6: 3 2 2 7 1
# 7: 3 2 3 3 2
# 8: 4 2 1 1 3
# 9: 4 2 2 8 1
#10: 4 2 3 5 2
#11: 5 3 1 4 1
#12: 6 3 1 3 1
A slightly faster option would be:
setDT(data)[, difference := abs((score - shift(score))[2]) , by = .(pair, trial)]
If we need the value of the other pair:
data[, other:= rev(score) , by = .(pair, trial)]
data
# participant pair trial score difference other
# 1: 1 1 1 2 4 6
# 2: 1 1 2 3 0 3
# 3: 2 1 1 6 4 2
# 4: 2 1 2 3 0 3
# 5: 3 2 1 4 3 1
# 6: 3 2 2 7 1 8
# 7: 3 2 3 3 2 5
# 8: 4 2 1 1 3 4
# 9: 4 2 2 8 1 7
#10: 4 2 3 5 2 3
#11: 5 3 1 4 1 3
#12: 6 3 1 3 1 4
Or using dplyr:
library(dplyr)
data %>%
group_by(pair, trial) %>%
mutate(difference = abs(diff(score)))
# participant pair trial score difference
# <dbl> <dbl> <dbl> <dbl> <dbl>
#1 1 1 1 2 4
#2 1 1 2 3 0
#3 2 1 1 6 4
#4 2 1 2 3 0
#5 3 2 1 4 3
#6 3 2 2 7 1
#7 3 2 3 3 2
#8 4 2 1 1 3
#9 4 2 2 8 1
#10 4 2 3 5 2
#11 5 3 1 4 1
#12 6 3 1 3 1

Comparing each element in subsets of a large data

I have a large data with raw responses and wanted to compare each element for subject 1 in group 1 with its corresponding element for subject 1 in group 2. Of course, the comparison needs to be kept between subject 2 in group 1 and subject 2 in group 2, and between subject 3 in group 1 and subject 3 in group 2, and so on. What makes the problem even complex is that there are 100 groups, which in turn are 50 paired groups.
The output needs to keep the original raw response if they are the same. If they are different, the raw response needs to be replaced with '9'.
I'm pretty sure I could do it with for-loop, but wondering if there is anything better than for-loop in r, such as ifelse or apply?
As making my data simple, it would look like below.
df<-as.data.frame(matrix(sample(c(1:5),60,replace=T),nrow=12))
df$subject<-rep(1:3)
df$group<-rep(1:4, each=3)
Thanks for any help.
#Initialization of data
df<-as.data.frame(matrix(sample(c(1:5),60,replace=T),nrow=12))
df$subject<-rep(1:3)
df$group<-rep(1:4, each=3)
>df
V1 V2 V3 V4 V5 subject group
1 3 3 3 4 5 1 1
2 4 4 3 1 3 2 1
3 3 2 2 4 2 3 1
4 4 4 3 5 3 1 2
5 3 2 1 5 1 2 2
6 2 5 4 4 1 3 2
7 3 2 3 2 2 1 3
8 1 2 3 3 3 2 3
9 2 2 2 2 5 3 3
10 3 3 3 5 4 1 4
11 5 3 5 4 2 2 4
12 5 3 1 1 3 3 4
Processing without for loop
#processing without for loop
# assumption: initial data is sorted by group (can be easily done)
coloumns<-!dimnames(x)[[2]] %in% c('group','subject');
subjects<-df[, 'subject']
tabl<-table(subjects)
rows<-order(subjects)
rows2<-cumsum(tabl)
rows1<-rows2-tabl+1
df[rows[-rows1],coloumns][df[rows[-rows1],coloumns]!=df[rows[-rows2],coloumns]]<-9
>df
V1 V2 V3 V4 V5 subject group
1 3 3 3 4 5 1 1
2 4 4 3 1 3 2 1
3 3 2 2 4 2 3 1
4 9 9 3 9 9 1 2
5 9 9 9 9 9 2 2
6 9 9 9 4 9 3 2
7 9 9 3 9 9 1 3
8 9 2 9 9 9 2 3
9 2 9 9 9 9 3 3
10 3 9 3 9 9 1 4
11 9 9 9 9 9 2 4
12 9 9 9 9 9 3 4
Below is what I did to get the output. Again, thanks to Stanislav
df<-as.data.frame(matrix(sample(c(1:5),60,replace=T),nrow=12))
df$subject<-rep(1:3)
df$group<-rep(1:4, each=3)
> df
V1 V2 V3 V4 V5 subject group
1 1 4 3 1 5 1 1
2 2 1 4 1 5 2 1
3 1 2 5 4 5 3 1
4 5 4 1 4 3 1 2
5 5 1 3 2 2 2 2
6 1 2 2 4 5 3 2
7 5 4 2 3 1 1 3
8 2 3 4 3 5 2 3
9 2 5 3 5 3 3 3
10 4 2 1 4 1 1 4
11 2 3 3 5 5 2 4
12 5 3 3 4 5 3 4
col<-!dimnames(df)[[2]] %in% c('subject','group')
n<-length(df[,1])
temp<-table(df$group)
n.sub<-temp[1]
temp<-seq(1,n,by=2*n.sub)
s1<-c(sapply(temp, function(x) seq.int(x, length.out=n.sub)))
temp<-seq(n.sub+1,n,by=2*n.sub)
s2<-c(sapply(temp, function(x) seq.int(x, length.out=n.sub)))
df[s2,col][df[s1,col]!=df[s2,col]]<-9
> df
V1 V2 V3 V4 V5 subject group
1 1 4 3 1 5 1 1
2 2 1 4 1 5 2 1
3 1 2 5 4 5 3 1
4 9 4 9 9 9 1 2
5 9 1 9 9 9 2 2
6 1 2 9 4 5 3 2
7 5 4 2 3 1 1 3
8 2 3 4 3 5 2 3
9 2 5 3 5 3 3 3
10 9 9 9 9 1 1 4
11 2 3 9 9 5 2 4
12 9 9 3 9 9 3 4

From table to data.frame

I have a table that looks like:
dat = data.frame(expand.grid(x = 1:10, y = 1:10),
z = sample(LETTERS[1:3], size = 100, replace = TRUE))
tabl <- with(dat, table(z, y))
tabl
y
z 1 2 3 4 5 6 7 8 9 10
A 5 3 1 1 3 6 3 7 2 4
B 4 5 3 6 5 1 3 1 4 4
C 1 2 6 3 2 3 4 2 4 2
Now how do I transform it into a data.frame that looks like
1 2 3 4 5 6 7 8 9 10
A 5 3 1 1 3 6 3 7 2 4
B 4 5 3 6 5 1 3 1 4 4
C 1 2 6 3 2 3 4 2 4 2
Here are a couple of options.
The reason as.data.frame(tabl) doesn't work is that it dispatches to the S3 method as.data.frame.table() which does something useful but different from what you want.
as.data.frame.matrix(tabl)
# 1 2 3 4 5 6 7 8 9 10
# A 5 4 3 1 1 3 3 2 6 2
# B 1 4 3 4 5 3 4 4 3 3
# C 4 2 4 5 4 4 3 4 1 5
## This will also work
as.data.frame(unclass(tabl))

Resources