I'm quite new to R and failed to google the answer. The question is, how can I tell R to treat a character string value as a part of code, in a manner SAS resolve() function does?
Say, I have a data frame containing numeric columns V1, ..., Vn and exactly n rows. I wish to sum up all the 'diagonal' elements V = V1[1] + V2[2]... + Vn[n] but n is large enough for manual summation (below, n=2 for simplicity).
I'm trying to put the strings "dat$V1[1]", "dat$V2[2]" in a character C and then extract the corresponding numerical value (all in a loop step):
> dat <- data.frame(V1 = c(2,3), V2 = c(7,11))
> dat
V1 V2
1 2 7
2 3 11
V = 0
for(i in 1:nrow(dat))
{
C = paste('dat$V',format(i,trim=TRUE),'[',format(i,trim=TRUE),']',
sep="" )
f = Xfun(C)
V = V + f
}
What should be used instead of Xfun? I've tried as.formula(), asOneSidedFormula(), get("...") and some other, but it's essential that dat$V1 is not an object:
> exists("dat")
[1] TRUE
> exists("dat$V1")
[1] FALSE
Your help is much appreciated.
If you just want to sum the diagonal elements of a square matrix, just do
dat <- data.frame(V1 = c(2,3), V2 = c(7,11))
sum(diag(as.matrix(dat)))
If you want to evaluate a text string in R, read up a bit on eval or do ? eavl in R.
For your problem, you can do this:
dat <- data.frame(V1 = c(2,3), V2 = c(7,11))
dat
V = 0
for(i in 1:nrow(dat))
{
C = paste('dat$V',format(i,trim=TRUE),'[',format(i,trim=TRUE),']',
sep="" )
f = eval(parse(text=C))
V = V + f
}
To finalize:
a) correct answer using text parsing and evaluation
V = 0
for(i in 1:nrow(dat))
{ C = paste('dat$V',as.character(i),'[',as.character(i),']',
sep='' )
V = V + eval(parse(text=C))
}
b) correct action ESPECIALLY for a data frame - without requiring text evaluation but addressing data frame columns directly by name
V = 0
for(i in 1:nrow(dat))
{ col = paste('V',as.character(i),
sep='')
V = V + dat[i,col]
}
Related
I have troubles using the grep function within a for loop.
In my data set, I have several columns where only the last 5-6 letters change. With the loop I want to use the same functions for all 16 situations.
Here is my code:
situations <- c("KKKTS", "KKKNL", "KKDTS", "KKDNL", "NkKKTS", "NkKKNL", "NkKDTS", "NkKDNL", "KTKTS", "KTKNL", "KTDTS", "KTDNL", "NkTKTS", "NkTKNL", "NkTDTS", "NkTDNL")
View(situations)
for (i in situations[1:16]) {
## Trust Skala
a <- vector("numeric", length = 1L)
b <- vector("numeric", length = 1L)
a <- grep("Tru_1_[i]", colnames(cleandata))
b <- grep("Tru_5_[i]", colnames(cleandata))
cleandata[, c(a:b)] <- 8-cleandata[, c(a:b)]
attach(cleandata)
cleandata$scale_tru_[i] <- (Tru_1_[i] + Tru_2_[i] + Tru_3_[i] + Tru_4_[i] + Tru_5_[i])/5
detach(cleandata)
}
With the grep function I first want to finde the column number of e.g. Tru_1_KKKTS and Tru_5_KKKTS. Then I want to reverse code the items of the specific column numbers. The last part worked without the loop when I manually used grep for every single situation.
Here ist the manual version:
# KKKTS
grep("Tru_1_KKKTS", colnames(cleandata)) #29 -> find the index of respective column
grep("Tru_5_KKKTS", colnames(cleandata)) #33
cleandata[,c(29:33)] <- 8-cleandata[c(29:33)] # trust scale ranges from 1 to 7 [8-1/2/3/4/5/6/7 = 7/6/5/4/3/2/1]
attach(cleandata)
cleandata$scale_tru_KKKTS <- (Tru_1_KKKTS + Tru_2_KKKTS + Tru_3_KKKTS + Tru_4_KKKTS + Tru_5_KKKTS)/5
detach(cleandata)
You can do:
Mean5 <- function(sit) {
cnames <- paste0("Tru_", 1:5, "_", sit)
rowMeans(cleandata[cnames])
}
cleandata[, paste0("scale_tru_", situations)] <- sapply(situations, FUN=Mean5)
how about something like this. It's a bit more compact and you don't have to use attach..
situations <- c("KKKTS", "KKKNL", "KKDTS", "KKDNL", "NkKKTS", "NkKKNL", "NkKDTS", "NkKDNL", "KTKTS", "KTKNL", "KTDTS", "KTDNL", "NkTKTS", "NkTKNL", "NkTDTS", "NkTDNL")
for (i in situations[1:16]) {
cols <- paste("Tru", 1:5, i, sep = "_")
result <- paste("scale_tru" , i, sep = "_")
cleandata[cols] <- 8 - cleandata[cols]
cleandata[result] <- rowMeans(cleandata[cols])
}
I took for granted that when you write a:b you mean all the columns between those, which I assumed were named from 2 to 4
situations <- c("KKKTS", "KKKNL", "KKDTS", "KKDNL", "NkKKTS", "NkKKNL", "NkKDTS", "NkKDNL", "KTKTS", "KTKNL", "KTDTS", "KTDNL", "NkTKTS", "NkTKNL", "NkTDTS", "NkTDNL")
# constructor for column names
get_col_names <- function(part) paste("Tru", 1:5, part, sep="_")
for (situation in situtations) {
# revert the values in the columns in situ
cleandata[, get_col_names(situation)] <- 8 - cleandata[, get_col_names(situtation)]
# and calculate the average
subdf <- cleandata[, get_col_names(situation)]
cleandata[, paste0("scale_tru_", situation)] <- rowSums(subdf)/ncol(subdf)
}
By the way, you call it "scale" but your code shows an average/mean calculation.
(Scale without centering).
More newbie questions... I am trying to understand why rollapply is turning all my columns to strings. Suppose I have this:
> df <- data.frame(col1=c(1,2,3,4),
col2=c("a","b","c","d"),
col3=c("!","#","#","$"),
stringsAsFactors = F))
> v <- zoo(df, toupper(df$col2))
> v
col1 col2 col3
A 1 a !
B 2 b #
C 3 c #
D 4 d $
And then I run rollapply:
> rollapply(v, 2, by.column = F, function(x) {
+ sum(x[,"col1"])
+ })
Error in sum(x[, "col1"]) : invalid 'type' (character) of argument
Why is col1 now a character? and how do I fix it so I get a slice of my original zoo object in each window?
Rolled my own rollapply function based on some reading of other posts on SO. This just returns the indexes into the data (i.e. the zoo object):
rollapply.list <- function(data, width, FUN) {
len <- NROW(data)
add <- rep(0:(len-width),each=width)
lst <- rep(1:(width),len-width+1)
seq.list <- split(lst+add, add)
lapply(seq.list, FUN)
}
and then apply the indexes to the original data like:
rollapply.list(data=v, width=2, FUN=function(x) {
slice <- v[x] #slice out indexes from the original zoo object
...
}
I would like to create evaluate different indexes in a for loop cycle.
those indexes has different formulas and not always they need to be evaluated.
f.i. :
my indices to evaluate might be
a=1
b=2
c=5
d=8
IDX1=function(a,b) {result=a+b}
IDX2=function(c,b) {result=c+b}
IDX3=function(d,b) {result=d+b-c}
IDX4=function(a,d) {result=a+d+b+c}
the formulas doesn't really matter
in a data frame I have the iteration number and the indices i need to take at each loop (let's say that I have to evaluate 2 indices for each iteration)
head=c("iter","IndexA","IndexB")
r1=c(1,"IDX1","IDX2")
r2=c(2,"IDX3","IDX4")
r3=c(3,"IDX1","IDX4")
df=as.data.frame(rbind(head,r1,r2,r3))
what I would like to do is within the loop evaluate for each iteration the respective 2 indices, calling automatically the right formula ad feed it with the right arguments
iter 1 : IndexA=IDX1(args)=3 ; IndexB(args)=IDX2(args)=7
iter 2 : IndexA=IDX3(args)=5 ; IndexB(args)=IDX4(args)=16
iter 3 : IndexA=IDX1(args)=3 ; IndexB(args)=IDX4(args)=16
Plese do not answer with "just run all the function and recall the
needed result in the loop".
I'm working with big matrix and memory is a problem indeed. I need to evaluate the function within the loop to reduce the usage of memory
I believe that the answer is some what inside this discussion but I can't get trough.
How to create an R function programmatically?
Can somebody explain me
1. how to built a function that can be programmatically changed in a loop?
2. once I have it how can I run the formula and get the result I want?
thanks
You can use a combination of eval and parse function to call (evaluate) any string as code. First, you have to construct such a string. For this, you can specify your indexes as character strings. For example: IDX1 = "a + b". Then, you can get that value by name with get("IDX1").
Try this code:
# Your preparations
a <- 1
b <- 2
c <- 5
d <- 8
IDX1 <- "a + b"
IDX2 <- "c + b"
IDX3 <- "d + b - c"
IDX4 <- "a + d + b + c"
head = c("iter", "IndexA", "IndexB")
r1 = c(1, "IDX1", "IDX2")
r2 = c(2, "IDX3", "IDX4")
r3 = c(3, "IDX1", "IDX4")
df = as.data.frame(rbind(r1, r2, r3))
colnames(df) <- head
# Loop over df with apply
result <- apply(df, 1, function(x){
# Construct call string for further evaluation
IndexA_call <- paste("ia <- ", get(x[2]), sep = "")
IndexB_call <- paste("ib <- ", get(x[3]), sep = "")
# eval each call string
eval(parse(text = IndexA_call))
eval(parse(text = IndexB_call))
x[2:3] <- c(ia, ib)
return(as.numeric(x))
})
result <- t(result)
colnames(result) <- head
print(result)
This gives:
iter IndexA IndexB
r1 1 3 7
r2 2 5 16
r3 3 3 16
i am trying to build a dataframe (df2) based on the following relationship: df1[i,j] = df2[i,j]^2. For doing this, i need to solve a system of non-linear equations:
library(nleqslv)
df1 = data.frame(a = c(9,9), b = c(9,9))
df2 = df1
for(i in colnames(df1)){
f = function(x) {df1[i] - x^2}
xstart = c(df2[i])
df2[i] = nleqslv(xstart, f)[[1]]
}
The expected result is:
a b
1 3 3
2 3 3
But i get the following error message:
Error in nleqslv(xstart, f) :
Argument 'x' cannot be converted to numeric!
not sure what causes the problem. Could you give me some advice please?
Well, I don't know what you are trying to accomplish, but I think the function you defined has to be fixed. You can do it in the following manner, although the answer is not correct.
f <- function(x) x - x^2
df1 = data.frame(a = c(9,9), b = c(9,9))
sapply(df1, function(y) nleqslv(y, f)[[1]])
You should instead use sqrt() since it is vectorized.
sqrt(df1)
# a b
# 1 3 3
# 2 3 3
I'm unclear as to why you need such a complex solution for such a simple operation (df2 <- sqrt(df1) would produce your example solution). But if you want to know what's producing that error, it comes down to how R indexes lists.
df1[1] returns a list, whereas df1[[1]] (double brackets) returns the vector. The nleqslv function expects vectors. So all we have to do is modify your existing code to use double brackets instead of singles:
library(nleqslv)
df1 = data.frame(a = c(9,9), b = c(9,9))
df2 = df1
for(i in colnames(df1)){
f = function(x) {df1[[i]] - x^2}
xstart = c(df2[[i]])
df2[i] = nleqslv(xstart, f)[[1]]
}
First creating the data:
df2 <- data.frame(a=c(9,9), b=c(9,9))
df1 <- df2
Now on solving it iteratively, here's the R code:
for(i in 1:nrow(df1)){
for(j in 1:ncol(df1)){
df2[i, j] <- sqrt(df1[i,j])
}
}
df2
This will return:
<dbl>
a b
3 3
3 3
You could have used a vectorized solution (df2 <- sqrt(df1)) to achieve the above as well, but the loop function above will work for you if you need to solve for it iteratively using a traditional loop.
I have the the following data frame and variables:
u0 <- c(1,1,1,1,1)
df <- data.frame (u0)
a = .793
b = 2.426
r = 0.243
q = 1
w = 2
j = 1
z = .314
using the following loop I am doing some calculations and put the results in the first row of my data frame.
while (j<5){
df[q,w] <- df[q, w-1] * (r+j-1)*(b+j-1)*(z) / ((a+b+j-2)*j)
j = j + 1
w = w + 1
}
now I want to create another loop to do the same calculations for all rows (i.e I need the 'q' variable to vary) of my data frame. I would be thankful if anyone helps me.
You could either do this by putting your while loop inside of a for loop that goes over q, but a more R-tastic way would be to simply define q <- 1:5, and leave the rest of your code as-is. Then df will fill up entirely. I take it in this example you want all rows to be identical?
Can't you just put it in a for loop?
df <- data.frame (d1=u0, d2=u0+1, d3=u0+2, d4=u0+4, d5=u0+5)
for (q in 2:5) {
while (j<5){
df[q,w] <- df[q, w-1] * (r+j-1)*(b+j-1)*(z) / ((a+b+j-2)*j)
j = j + 1
w = w + 1
} }
You may want to check the algorithm. It doesn't seem to be doing anything very interesting.