Removing rows based on a vector - r

Given a matrix (m),
I want to remove from it the subjects given by a changing vector,
I am trying to do a loop but it does only remove the last input:
m= matrix(1:4,10,3);
changing_vector = c(2,1) or c(1,4) # etc..
for(j in 1:length(changing_vector))
{
a = subData[!(subData$subject== changing_vector [j]),]
}
Someone know why it does not work? Do you propose any other way to do it?
Thanks in advance for your help,
G.

Always try to post reproducible examples, that others can see what you are trying to do. Also try to be very precise, as it is sometimes very hard to understand what people want to do (as in your case).
Maybe this can help you with your promlem:
m <- matrix(1:5, 15, 5)
vec <- c(x,y)
for(i in 1:nrow(m)){
z[i] <- any(m[i,] %in% vec)
}
m <- m[!x,]

I appreciate your help, but altough it did not work to solve the issue, here what I did:
# removing subjects who did not reach a performance> 70 % (for ex.- easier to # understand this way
subjectsTOremove= which((performance<70)
vector_poz = c();
for(j in 1:length(subjectsTOremove))
{
S_to_remove= subjectsTOremove[j]
a = data[!(data$subject== S_to_remove),]
aa = which(data$subid == subjectsTOremove[j])
vector_poz = c(vector_poz,aa)
}
# then this subjects rows are transformed in NaN and the NaN removed
data[vector_poz,]=NaN # this tranf allows to check visually the data out
data= data[complete.cases(data),]

Related

Subtract each col in a df from every other col

I would like to try out a normalisation method a friend recommended, in which each col of a df should be subtracted, at first from the first col and next from every other col of that df.
eg:
df <- data.frame(replicate(9,1:4))
x_df_1 <- df[,1] - df[2:ncol(df)]
x_df_2 <- df[,2] - df[c(1, 3:ncol(df))]
x_df_3 <- df[,3] - df[c(1:2, 4:ncol(df))]
...
x_cd_ncol(df) <- df[c(1: (1-ncol(df)))]
As the df has 90 cols, doing this by hand would be terrible (and very bad coding). I am sure there must be an elegant way to solve this and to receive at the end a list containing all the dfs, but I am totally stuck how to get there. I would appreciate a dplyr method (for familiarity) but any working solution would be fine.
Thanks a lot for your help!
Sebastian
I may have found a solution that I am sharing here.
Please correct me if im wrong.
This is a permutation without replacement task.
The original df has 90 cols.
Lets check how many combinations there are possible first:
(from: https://davetang.org/muse/2013/09/09/combinations-and-permutations-in-r/)
comb_with_replacement <- function(n, r){
return( factorial(n + r - 1) / (factorial(r) * factorial(n - 1)) )
}
comb_with_replacement(90,2) #4095 combinations
Now using a modified answer from here: https://stackoverflow.com/a/16921442/10342689
(df has 90 cols. don't know how to create this proper as an example df here.)
cc_90 <- combn(colnames(df), 90)
result <- apply(cc_90, 2, function(x) df[[x[1]]]-df[[x[2]]])
dim(result) #4095
That should work.
In R one can index using negative indices to represent "all except this index".
So we can re-write the first of your normalization rows:
x_df_1 <- df[,1] - df[2:ncol(df)]
# rewrite as:
x_df_1 <- df[,1] - df[,-1]
From this, it's a pretty easy next step to write a loop to generate the 90 new dataframes that you generated 'by hand':
list_of_dfs=lapply(seq_len(ncol(df)),function(x) df[,x]-df[,-x])
This seems to be somewhat different to what you're proposing in your own answer to your question, though...

R Matching closest number from columns

I have a list of responses to 7 questions from a survey, each their own column, and am trying to find the response within the first 6 that is closest (numerically) to the 7th. Some won't be the exact same, so I want to create a new variable that produces the difference between the closest number in the first 6 and the 7th. The example below would produce 0.
s <- c(1,2,3,4,5,6,3)
s <- t(s)
s <- as.data.frame(s)
s
Any help is deeply appreciated. I apologize for not having attempted code as nothing I have tried has actually gotten close.
How about this?
which.min( abs(s[1, 1:6] - s[1, 7]))
I'm assuming you want it generalized somehow, but you'd need to provide more info for that. Or just run it through a loop :-)
EDIT: added the loop from the comment and changed exactly 2 tiny things.
s <- c(1,2,3,4,5,6,3)
t <- c(1,2,3,4,5,6,7)
p <- c(1,2,3,4,5,6,2)
s <- data.frame(s,t,p)
k <- t(s)
k <- as.data.frame(k)
k$t <- NA ### need to initialize the column
for(i in 1:3){
## need to refer to each line of k when populating the t column
k[i,]$t <- which.min(abs(k[i, 1:6] - k[i, 7])) }

Matrix of expected values from matrix of observed using a loop

I am trying to figure out how to use a for loop to create a matrix of expected values. it should be able to handle a matrix of any size. This is all I've been able to come up with so far.
for(i in 1:obsv){
for(j in 1:obsv){
obsv[i,j]<-(sum(obsv[i,])*sum(obsv[,j]))/sum(obsv)
}
}
##obsv is the name of the matrix of observed values
Your loop is obviously wrong, see below. The main error was that you need to loop through 1:nrow(obsv) and 1:ncol(obsv), not like you are doing it.
I will use a fake matrix, since you haven't posted an example dataset.
obsv <- matrix(1:25, ncol = 5)
obsv2 <- obsv # modify a copy
for(i in 1:nrow(obsv)){
for(j in 1:ncol(obsv)){
obsv2[i, j] <- sum(obsv[i, ])*sum(obsv[, j])/sum(obsv)
}
}
Now, the above code can be greatly simplified. A one-liner will do it.
obsv3 <- rowSums(obsv) %*% t(colSums(obsv))/sum(obsv)
identical(obsv2, obsv3)
#[1] TRUE

How to extract a parameter from a list of functions in a loop

I have a large data set and I want to perform several functions at once and extract for each a parameter.
The test dataset:
testdf <- data.frame(vy = rnorm(60), vx = rnorm(60) , gvar = rep(c("a","b"), each=30))
I first definded a list of functions:
require(fBasics)
normfuns <- list(jarqueberaTest=jarqueberaTest, shapiroTest=shapiroTest, lillieTest=lillieTest)
Then a function to perform the tests by the grouping variable
mynormtest <- function(d) {
norm_test <- res_reg <- list()
for (i in c("a","b")){
res_reg[[i]] <- residuals(lm(vy~vx, data=d[d$gvar==i,]))
norm_test[[i]] <- lapply(normfuns, function(f) f(res_reg[[i]]))
}
return(norm_test)
}
mynormtest(testdf)
I obtain a list of test summaries for each grouping variable.
However, I am interested in getting only the parameter "STATISTIC" and I did not manage to find out how to extract it.
You can obtain the value stored as "STATISTIC" in the output of the various tests with
res_list <- mynormtest(testdf)
res_list$a$shapiroTest#test#statistic
res_list$a$jarqueberaTest#test#statistic
res_list$a$lillieTest#test#statistic
And correspondingly for set b:
res_list$b$shapiroTest#test$statistic
res_list$b$jarqueberaTest#test$statistic
res_listb$lillieTest#test$statistic
Hope this helps.
Concerning your function fgetparam I think that it is a nice starting point. Here's my suggestion with a few minor modifications:
getparams2 <- function(myp) {
m <- matrix(NA, nrow=length(myp), ncol=3)
for (i in (1:length(myp))){
m[i,] <- sapply(1:3,function(x) myp[[i]][[x]]#test$statistic)}
return(m)
}
This function represents a minor generalization in the sense that it allows for an arbitrary number of observations, while in your case this was fixed to two cases, a and b. The code can certainly be further shortened, but it might then also become somewhat more cryptic. I believe that in developing a code it is helpful to preserve a certain compromise between efficacy and compactness on one hand and readability or easiness to understand on the other.
Edit
As pointed out by #akrun and #Roland the function getparams2() can be written in a much more elegant and shorter form. One possibility is
getparams2 <- function(myp) {
matrix(unname(rapply(myp, function(x) x#test$statistic)),ncol=3)}
Another great alternative is
getparams2 <- function(myp){t(sapply(myp, sapply, function(x) x#test$statistic))}

usings a for loop to append to an empty object in r

this may seem like a novice question, but I'm struggling to understand why this doesn't work.
answer = c()
for(i in 1:8){
answer = c()
knn.pred <- knn(data.frame(train_week$Lag2), data.frame(test_week$Lag2), train_week$Direction, k=i)
test <- mean(knn.pred == test_week$Direction)
append(answer, test)
}
I want the results 1-8 in a vector called answer. it should loop through 8 times, so ideally a vector with 8 numbers would be my output. When I run the for loop, I only get the final answer, meaning it isn't appending. any help would be appreciated, sorry for the novice question, really trying to learn R.
First of all, please include a reproducible example in your question next time. See How to make a great R reproducible example?.
Second, you set answer to c() in the first line of your loop, so this happens in each iteration.
Third, append, just like almost all functions in R, does not modify its argument in place, but it returns a new object. So the correct code is:
answer = c()
for (i in 1:8){
knn.pred <- knn(data.frame(train_week$Lag2), data.frame(test_week$Lag2),
train_week$Direction, k = i)
test <- mean(knn.pred == test_week$Direction)
answer <- append(answer, test)
}
While this wasn't the question, I can't help noting that this is a very inefficient way of creating vectors and lists. It is an anti-pattern. If you know the length of the result vector, then allocate it, and set its elements. E.g
answer = numeric(8)
for (i in 1:8){
knn.pred <- knn(data.frame(train_week$Lag2), data.frame(test_week$Lag2),
train_week$Direction, k = i)
test <- mean(knn.pred == test_week$Direction)
answer[i] <- test
}
You are overwriting answer inside the for loop. Try removing that line. Also, append doesn't act on its arguments directly; it returns the modified vector. So you need to assign it.
answer <- c()
for(i in 1:8){
knn.pred <- knn(data.frame(train_week$Lag2), data.frame(test_week$Lag2), train_week$Direction, k=i)
test <- mean(knn.pred == test_week$Direction)
answer <- append(answer, test)
}

Resources