I have another simple r question that hopefully someone can help with. I have a series of dataframes that have a repetitive name structure. I would like to loop through them and perform some analysis. Here is hardcoded example of what I want to do using some fake data:
#Create some fake data
n1 = c(2, 3, 5, 7)
s1 = c(1, 1, 2, 0)
b1 = c(6, 0, 0, 0)
Tank001.df = data.frame(n1, s1, b1)
n2 = c(1, 2, 4, 6)
s2 = c(2, 2, 0, 0)
b2 = c(8, 9, 10, 0)
Tank002.df = data.frame(n2, s2, b2)
n3 = c(7, 12, 0, 0)
s3 = c(5, 3, 0, 0)
b3 = c(8, 9, 10, 4)
Tank003.df = data.frame(n3, s3, b3)
The first action I would like to automate is the conversion of 0 values to "NA". Here is the harcoded version but I would ideally automate this dependant on how many Tankxxx.df dataframes I have:
#Convert zeros to NA
Tank001.df[Tank001.df==0] <- NA
Tank002.df[Tank002.df==0] <- NA
Tank003.df[Tank003.df==0] <- NA
Finally I would like to complete a series of queries of the data, a simple example of which might be the number of values smaller than 5 in each dataframe:
#Return the number of values smaller than 5
Tank001.less.than.5 <- numeric(length(Tank001.df))
for (i in 1:(length(Tank001.df))) {Tank001.less.than.5[i] <- sum(Tank001.df[[i]] < 5,na.rm=TRUE)}
Tank002.less.than.5 <- numeric(length(Tank002.df))
for (i in 1:(length(Tank002.df))) {Tank002.less.than.5[i] <- sum(Tank002.df[[i]] < 5,na.rm=TRUE)}
Tank003.less.than.5 <- numeric(length(Tank003.df))
for (i in 1:(length(Tank003.df))) {Tank003.less.than.5[i] <- sum(Tank003.df[[i]] < 5,na.rm=TRUE)}
Ideally I would also like to know how to write the results of such simple calculations to a new dataframe. In this case for example Less.than.5$TankXXX etc.
Any help would be greatly appreciated.
Create a list of your data.frames and use a combination of lapply and sapply as follows:
TankList <- list(Tank001.df, Tank002.df, Tank003.df)
lapply(TankList, function(x) {
x[x == 0] <- NA
sapply(x, function(y) sum(y < 5, na.rm = TRUE))
})
# [[1]]
# n1 s1 b1
# 2 3 0
#
# [[2]]
# n2 s2 b2
# 3 2 0
#
# [[3]]
# n3 s3 b3
# 0 1 1
This also works with a single lapply and colSums:
l <- list(Tank001.df, Tank002.df, Tank003.df) # create a list
lapply(l, function(x) colSums("is.na<-"(x, !x) < 5, na.rm = TRUE))
# [[1]]
# n1 s1 b1
# 2 3 0
#
# [[2]]
# n2 s2 b2
# 3 2 0
#
# [[3]]
# n3 s3 b3
# 0 1 1
Related
I really need some help to write a recursion in R.
The function that I want changes a certain observation according to a set of comparisons between different rows in a data frame, which I shall call g. One of these comparisons depends on the previous value of this same observation.
Suppose first that I want to update the value of column index, row i in my data df in the following way:
j <- 1:4
g <- (df$dom[i] > 0 &
abs(df$V2009[i] - df$V2009[j]) <= w) |
df$index[i] == df$index[j]
df$index[i] <- ifelse(any(g), which(g)[[1]], df$index[[i]])
The thing is, the object w is actually a list:
w = list(0, 1, 2, df$age[i])
So, as you can see, I want to create a function foo() that updates df$index iteratively. It changes it by looping through w and comparisons depend on updated values.
Here is some data:
df <- data.frame(dom = c(0, 0, 6, 6),
V2009 = c(9, 11, 9, 11),
index = c(1, 2, 1, 2),
age = c(2, 2, 2, 2))
I am not sure if a recursive function is actually needed or if something like reduce or map would do it.
Thank you!
The following function uses a double for loop to change the values of column index according to the condition defining g. It accepts a data.frame as input and returns the updated data.frame.
foo <- function(x){
change_index <- function(x, i, w){
j <- seq_len(nrow(x))
(x$dom[i] > 0 & abs(x$V2009[i] - x$V2009[j]) <= w) |
x$index[i] == x$index[j]
}
for(i in seq_len(nrow(x))){
W <- list(0, 1, 2, x$age[i])
for(w in W){
g <- change_index(x, i, w)
if(any(g)) x$index[i] <- which(g)[1]
}
}
x
}
foo(df)
# dom V2009 index age
#1 0 9 1 2
#2 0 11 2 2
#3 6 9 1 2
#4 6 11 1 2
One can define w inside a function and use lexical scoping (closure).
Using your instructions, the function index_value calculates for any given i the index value.
correct_index_col returns the corrected df.
df <- data.frame(dom = c(0, 0, 6, 6),
V2009 = c(9, 11, 9, 11),
index = c(1, 2, 1, 2),
age = c(2, 2, 2, 2))
index_value <- function(df, i) {
j <- nrow(df)
w <- c(0, 1, 2, df$age[i])
g <- (df$dom[i] > 0 & abs(df$V2009[i] - df$V2009[j]) <= w) |
df$index[i] == df$index[j]
ifelse(any(g), which(g)[[1]], df$index[[i]])
}
correct_index_col <- function(df) {
indexes <- Vectorize(function(i) {
index_value(df, i)
})
df$index <- indexes(1:nrow(df))
df
}
# > correct_index_col(df)
# dom V2009 index age
# 1 0 9 1 2
# 2 0 11 1 2
# 3 6 9 3 2
# 4 6 11 1 2
#
If you want to really update (mutate) your df, then you have to do
df <- correct_index_col(df).
Here is an attempt of my own. I guess I figured out a way to use recursion over mutate:
test <- function(i, df, k){
j <- 1:nrow(df)
w <- list(0, 1, 2, df$age[i])
g <- (df$dom[i] > 0 & abs(df$V2009[i] - df$V2009[j]) <= w[k]) |
df$index[i] == df$index[j]
l <- ifelse(any(g), which(g)[1], df$index[i])
return(l)
}
loop <- function(data,
k = 1) {
data <- data %>%
mutate(index = map_dbl(seq(n()),
~ test(.x, df = cur_data(), k)))
if (k == 4) {
return(data)
} else {
return(loop(data, k + 1))
}
}
df %>% loop()
I welcome any comments in case this is inefficient considering large datasets
I need to find which elements in a new vector (vb) have been added to another vector (va). If there for example is unly one "2" in va, but two "2" in vb, then one "2" has been added.
The comment in the code below shows what is sought.
va <- c(1, 2) # Original vector
vb <- c(1, 2) # NA or NULL
vb <- c(2, 2) # 2
vb <- c(1, 1) # 1
vb <- c(1) # NA or NULL
vb <- c(2) # NA or NULL
vb <- c(3, 3) # c(3, 3)
I've tried match, union, intersect, %in%, etc. but can't get it to work to consider also multiple instances. This feels irritatingly simple...
The following reproduces your expected outcome. Just as an honest heads-up, I'm not really happy with my solution, this seems oddly convoluted:
f <- function(a, b) {
a <- as.data.frame(unclass(rle(a)));
b <- as.data.frame(unclass(rle(b)));
t <- merge(a, b, by = "values", all = TRUE);
t$lengths.x[is.na(t$lengths.x)] <- 0;
t$diff <- t$lengths.y - t$lengths.x;
t <- t[!is.na(t$diff) & t$diff > 0, ];
return(rep(t$values, t$diff));
}
va <- c(1, 2);
vb <- c(1, 2) # NA or NULL
f(va, vb);
#numeric(0)
vb <- c(2, 2) # 2
f(va, vb);
#[1] 2
vb <- c(1, 1) # 1
f(va, vb);
#[1] 1
vb <- c(1) # NA or NULL
f(va, vb);
#numeric(0)
vb <- c(2) # NA or NULL
f(va, vb);
#numeric(0)
vb <- c(3, 3) # c(3, 3)
#[1] 3 3
Explanation: I'm making use of rle to compare the lengths (level of duplicity) of different entries in va and vb; then report only those that are not already in va.
Update
Here is a much cleaner method using a recursive function.
f <- function(a, b) {
if (length(a) == 0 | length(b) == 0) return(NULL);
m <- data.frame(idx.a = 1:length(a), idx.b = match(a, b));
m <- m[complete.cases(m), ];
# Here is the recursive call
if (nrow(m) > 0) f(a[-m$idx.a[1]], b[-m$idx.b[1]]) else b;
}
va <- c(1, 2) # Original vector
f(va, c(1, 2));
#NULL
f(va, c(2, 2));
#[1] 2
f(va, c(1, 1));
#[1] 1
f(va, c(1));
#NULL
f(va, c(2));
#NULL
f(va, c(3, 3));
#[1] 3 3
Not the most elegant, but it works for all your cases:
Diff_frequency <- function(va,vb){
df <- merge(as.data.frame(table(va)), as.data.frame(table(vb)), by.x="va", by.y="vb", all=T)
df$Freq.x[is.na(df$Freq.x)] <- 0
df$Dif <- df$Freq.y - df$Freq.x
df$Dif[is.na(df$Dif) | df$Dif < 0] <- 0
return(rep(as.numeric(as.character(df[,1])), df$Dif))
}
Diff_frequency(va,vb)
Examples of output:
va=c(1,1,1,2,2,2,3)
vb=c(1,1,4,4,2,2,5)
Diff_frequency(va,vb)
[1] 4 4 5
va=c(1,1,1,2,2,2,3)
vb=c(1,1,1,1,2,2,2,3,3,5)
Diff_frequency(va,vb)
1] 1 3 5
va=c(1,1,1,2,2,2,3)
vb=c(1,1,2,3)
Diff_frequency(va,vb)
numeric(0)
I have a dataframe like this
sample <- data.frame(x1=c(1,2),y1=c(2,1), x2=c(2,4),y2=c(3,4),x3=c(5,2),y3=c(1,6))
How can I operate pairwise (sum up x1&y1, x2&y2, x3&y3, 2 columns at a time) to create 3 new columns sum1, sum2, sum3? Thanks
Method 1
Since it is pairwise, you can do it with mapply.
Create two vectors referring to the pairs
p1 <- seq(1, 6, by = 2)
p2 <- seq(2, 6, by = 2)
Then use mapply to apply pairwise summation for the columns desired:
mapply(x = p1, y = p2, function(x, y) sample[[x]] + sample[[y]])
Result:
[,1] [,2] [,3]
[1,] 3 5 6
[2,] 3 8 8`
Method 2
I also like to use the packages dplyr and wrapr in conjunction if you need to output the pairwise operation in the sample table.
require(dplyr)
require(wrapr)
newcols <- paste0(names(sample)[seq(1, 6, by = 2)], names(sample)[seq(2, 6, by = 2)])
for (i in c(1:3)) {
wrapr::let(list(RES = newcols[i],
COL1 = names(sample)[i],
COL2 = names(sample)[i + 1]),
sample <- dplyr::mutate(sample, RES = COL1 + COL2))}
sample
x1 y1 x2 y2 x3 y3 x1y1 x2y2 x3y3
1 1 2 2 3 5 1 3 4 5
2 2 1 4 4 2 6 3 5 8
I liek to use those packages because I find it easier to understand. But if you can't download those packages for any reason. You can do it with base R:
newcols <- paste0(names(sample)[seq(1, 6, by = 2)], names(sample)[seq(2, 6, by = 2)])
for (i in c(1:3)) {
sample[newcols[i]] <- sample[, names(sample)[i]] + sample[, names(sample)[i + 1]]
}
How about this?
sample <- data.frame(x1=c(1,2),y1=c(2,1),
x2=c(2,4),y2=c(3,4),x3=c(5,2),y3=c(1,6))
sample_new <- data.frame(x1=c(1,2),y1=c(2,1),
x2=c(2,4),y2=c(3,4),x3=c(5,2),y3=c(1,6),sumx1y1=(sample$x1 + sample$y1),
sumx2y2=(sample$x2 + sample$y2),sumx3y3=(sample$x3 + sample$y3))
Sorry if the title is confusing.
I have a list of data frames combined into temp.list. I want to raise each row of a specific column based on the value in vec. For example, vec has the values 2, 0, and 3. I want to do: X2^2, log(X2), X2^3. So do log(X2) if the value in vec==0. The last three lines of code is where I have an issue.
M1 <- data.frame(matrix(1:4, nrow = 2, ncol = 2))
M2 <- data.frame(matrix(1:9, nrow = 3, ncol = 3))
M3 <- data.frame(matrix(1:4, nrow = 2, ncol = 2))
mlist <- list(M1, M2, M3)
temp.list <-mlist
vec <- c(2,0,3)
The code below works! But I don't want to raise X2^0.
for(i in 1:length(vec)){
temp.list[[i]]$X2 <- temp.list[[i]]$X2^vec[[i]]
}
The code below replaces all rows of X2 by the first value calculated in X2.
for(i in 1:length(vec)){
temp.list[[i]]$X2 <- ifelse(vec[[i]]==0,log(temp.list[[i]]$X2),temp.list[[i]]$X2^vec[[i]]
}
Any other ways of doing this would also be much appreciated.
You could use this:
for(i in 1:length(vec)){
temp.list[[i]]$X2 <- if(vec[[i]]==0) log(temp.list[[i]]$X2)
else temp.list[[i]]$X2^vec[[i]]
}
temp.list
# [[1]]
# X1 X2
# 1 1 9
# 2 2 16
# [[2]]
# X1 X2 X3
# 1 1 1.386294 7
# 2 2 1.609438 8
# 3 3 1.791759 9
# [[3]]
# X1 X2
# 1 1 27
# 2 2 64
The problem is with the ifelse(...) statement, which returns a vector of the same length as the condition (e.g., 1 in your case). The if (...) ... else ... statement evaluates the expression and executes whichever block of code is appropriate.
I have a question on the following issue:
Suppose I have some matrices
A1 <- matrix(runif(rowsA1*T), rowsA1, T)
…
AD <- matrix(runif(rowsAD*T), rowsAD, T)
The number of matrices is variable (but most certainly not too large).
Is there a way to perform the following more efficiently (but in a set-up that allows for a variable number of matrices):
f1 <- function(A1, A2, ..., AD) {
for(i in 1:nrow(A1)) {
for(j in 1:nrow(A2)) {
...
for(d in 1:nrow(AD)) {
ret[i,j,...,d] <- \sum_{t=1}^T (A1[i,t]*A2[j,t]*...*AD[d,t])
}
...
}
}
ret
}
Thank you very much for your help!
Romain
---------------------------------- Edit with example ----------------------------------
A1 <- |a b c| A2 <- |j k l| A3 <- |s t u|
|d e f| |m n o| |v w x|
|g h i| |p q r| |y z ä|
And I want for instance to get the following:
ret[1,1,1] <- a*j*s + b*k*t + c*l*u
ret[2,1,3] <- d*j*y + e*k*z + f*l*ä
Hopefully this makes my point clearer.
---------------------------------- Edit Nov. 26th, 2013 -------------------------------
Hi #flodel. I tried to implement your code, but there seems to be an issue once one has more than three matrices.
Suppose, I have the following matrices
A1 <- matrix(runif(4*3), nrow = 4, ncol = 3)
A2 <- matrix(runif(3*3), nrow = 3, ncol = 3)
A3 <- matrix(runif(2*3), nrow = 2, ncol = 3)
A4 <- matrix(runif(1*3), nrow = 1, ncol = 3)
and pluging them into your code
output.f1 <- f1(A1,A2,A3,A4)
provides the correct number of dimensions
dim(output)
# [1] 4 3 2 1
but the output is full of NAs
output.f1
# , , 1, 1
# [,1] [,2] [,3]
# [1,] 0.13534704 NA NA
# [2,] 0.07360135 NA NA
# [3,] 0.07360135 NA NA
# [4,] 0.07360135 NA NA
# , , 2, 1
# [,1] [,2] [,3]
# [1,] NA NA NA
# [2,] NA NA NA
# [3,] NA NA NA
# [4,] NA NA NA
Thanks for some help...
Best,
Romain
Give this a try. With a big apply loop, it might be slow with large matrices, but it will do the job as far as being general to any number of matrices without necessarily the same number of rows:
f1 <- function(...) {
args <- list(...)
nrows <- sapply(args, nrow)
idx <- do.call(expand.grid, lapply(nrows, seq.int))
get.row <- function(i, mat) mat[i, ]
get.val <- function(i.vec) sum(Reduce(`*`, Map(get.row, i.vec, args)))
idx$val <- apply(idx, 1, get.val)
ret <- array(NA, dim = nrows)
ret[as.matrix(idx[, seq_along(args)])] <- idx$val
ret
}
Example usage:
A1 <- matrix(1:12, nrow = 4, ncol = 3)
A2 <- matrix(1:9, nrow = 3, ncol = 3)
A3 <- matrix(1:6, nrow = 2, ncol = 3)
out <- f1(A1, A2, A3)
Check:
identical(out[3, 2, 1],
sum(A1[3, ] * A2[2, ] * A3[1, ]))
# [1] TRUE