r rbind dataframes in each list using lapply function - r

I want to add some data points. odtl is the original data andadtl is the data points to add. adtl is set to NA but will be interpolated by zoo :: na.spline after rbind.
During this process, two lists(odtl and adtl) contain three data frames each. I want to combine the data frames in the order in which they are loaded into each list.
I succeed this using the for function as follows. But my lapply function doesn't work. Could you make this loop as a lapply or apply family functions?
Thanks.
> odtl # original dataset
[[1]]
x index
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
[[2]]
x index
1 1 1
2 2 2
3 3 3
4 4 4
[[3]]
x index
1 1 1
2 2 2
3 3 3
> adtl # dataset for add
[[1]]
x index
1 NA 1.5
[[2]]
x index
1 NA 1.5
2 NA 2.5
3 NA 3.5
[[3]]
x index
1 NA 1.5
2 NA 2.5
> wdtl <- list() # This is the goal.
> for(i in 1:length(odtl)){
+ wdtl[[i]] <- rbind(odtl[[i]], adtl[[i]])
+ }
> wdtl # This is the goal but I want complete it by lapply or something
[[1]]
x index
1 1 1.0
2 2 2.0
3 3 3.0
4 4 4.0
5 5 5.0
6 NA 1.5
[[2]]
x index
1 1 1.0
2 2 2.0
3 3 3.0
4 4 4.0
5 NA 1.5
6 NA 2.5
7 NA 3.5
[[3]]
x index
1 1 1.0
2 2 2.0
3 3 3.0
4 NA 1.5
5 NA 2.5

You may use Map() which element-wise applies a function to the first elements of each of its arguments.
Map(rbind, odtl, adtl)
# [[1]]
# x index
# 1 1 1.0
# 2 2 2.0
# 3 3 3.0
# 4 4 4.0
# 5 5 5.0
# 6 NA 1.5
# 7 NA 2.5
# 8 NA 3.5
# 9 NA 4.5
# 10 NA 5.5
#
# [[2]]
# x index
# 1 1 1.0
# 2 2 2.0
# 3 3 3.0
# 4 4 4.0
# 5 NA 1.5
# 6 NA 2.5
# 7 NA 3.5
# 8 NA 4.5
#
# [[3]]
# x index
# 1 1 1.0
# 2 2 2.0
# 3 3 3.0
# 4 NA 1.5
# 5 NA 2.5
# 6 NA 3.5
Data
odtl <- list(data.frame(x=1:5, index=1:5),
data.frame(x=1:4, index=1:4),
data.frame(x=1:3, index=1:3))
adtl <- list(data.frame(x=NA, index=seq(1.5, 5.5, 1)),
data.frame(x=NA, index=seq(1.5, 4.5, 1)),
data.frame(x=NA, index=seq(1.5, 3.5, 1)))

I think the solution in the comments by #thelatemail should be the most elegant one. If you want to use lapply, then the below would be the something you want
wdtl <- sapply(seq(odtl), function(k) rbind(odtl[[k]],adtl[[k]]))

Specifically from the lapply, apply etc. family of functions, you could use mapply
> odtl <- list(data.frame(x=1:5, index=1:5),
data.frame(x=1:4, index=1:4),
data.frame(x=1:3, index=1:3))
> adtl <- list(data.frame(x=NA, index=seq(1.5, 5.5, 1)),
data.frame(x=NA, index=seq(1.5, 4.5, 1)),
data.frame(x=NA, index=seq(1.5, 3.5, 1)))v
> mapply(rbind, odtl, adtl, SIMPLIFY = FALSE)
# [[1]]
# x index
# 1 1 1.0
# 2 2 2.0
# 3 3 3.0
# 4 4 4.0
# 5 5 5.0
# 6 NA 1.5
# 7 NA 2.5
# 8 NA 3.5
# 9 NA 4.5
# 10 NA 5.5
#
# [[2]]
# x index
# 1 1 1.0
# 2 2 2.0
# 3 3 3.0
# 4 4 4.0
# 5 NA 1.5
# 6 NA 2.5
# 7 NA 3.5
# 8 NA 4.5
#
# [[3]]
# x index
# 1 1 1.0
# 2 2 2.0
# 3 3 3.0
# 4 NA 1.5
# 5 NA 2.5
# 6 NA 3.5
Note that Map is a wrapper around mapply(FUN = f, ..., SIMPLIFY = FALSE).

Related

Replace NA with average of the case before and after the NA, unless row starts or ends with NA [duplicate]

This question already has answers here:
Replace NA with average of the case before and after the NA
(2 answers)
Closed 5 years ago.
Say I have a data.frame:
t<-c(1,1,2,4,NA,3)
u<-c(1,3,4,6,4,2)
v<-c(2,3,4,NA,3,2)
w<-c(2,3,4,5,2,3)
x<-c(2,3,4,5,6,NA)
df<-data.frame(t,u,v,w,x)
df
t u v w x
1 1 1 2 2 2
2 1 3 3 3 3
3 2 4 4 4 4
4 4 6 NA 5 5
5 NA 4 3 2 6
6 3 2 2 3 NA
I would like to change the NAs so that the NA becomes replaced by the average of the one value before the NA and the one value after the NA. However, if a row starts with an NA I would like it to be replaced by the value that follows it. When a row ends with NA, I would like it to be replaced by the value before the NA. Thus, I would like to get the following result:
t u v w x
1 1 1 2 2 2
2 1 3 3 3 3
3 2 4 4 4 4
4 4 6 5.5 5 5 --> NA becomes average of 6 and 5
5 4 4 3 2 6 --> NA becomes value of next case
6 3 2 2 3 3 --> NA becomes value of previous case
I have thousands of rows, so any help is very much appreciated!
Based on previous na.approx solutions this might do the trick:
library(zoo)
t(apply(df, 1,function(x) na.approx(x,rule=2)))
Always search for parameter na.rm = T in functions that you use.
In this case you want to use mean of one of the column with the na.rm param set to true.
Then you want to substitute NA-s.
dt[is.na(dt[,'t']),'t'] = 0
(assuming that I did not reverse the order of dimensions)
Here is a possible solution,
if is NA replace with (lag + lead) /2 if still NA replace with lag if still NA replace with lead.
library(dplyr)
t(apply(df, 1, function(x){
lagx = dplyr::lag(x)
leadx = dplyr::lead(x)
b = ifelse(is.na(x),(leadx+lagx)/2, x)
b = ifelse(is.na(b), leadx, b)
b = ifelse(is.na(b), lagx, b)
return(b)
}
))
#output
t u v w x
[1,] 1 1 2.0 2 2
[2,] 1 3 3.0 3 3
[3,] 2 4 4.0 4 4
[4,] 4 6 5.5 5 5
[5,] 4 4 3.0 2 6
[6,] 3 2 2.0 3 3
t<-c(1,1,2,4,NA,3)
u<-c(1,3,4,6,4,2)
v<-c(2,3,4,NA,3,2)
w<-c(2,3,4,5,2,3)
x<-c(2,3,4,5,6,NA)
df<-data.frame(t,u,v,w,x)
df[which(is.na(t)), "t"] <- df[which(is.na(t)), "u"]
df[which(is.na(x)), "x"] <- df[which(is.na(x)), "w"]
df[which(is.na(v)), "v"] <- (df[which(is.na(v)), "u"] + df[which(is.na(v)), "w"])/2
> df
t u v w x
1 1 1 2.0 2 2
2 1 3 3.0 3 3
3 2 4 4.0 4 4
4 4 6 5.5 5 5
5 4 4 3.0 2 6
6 3 2 2.0 3 3

Average Value of Subgraph in r

The Data looks like this
library(igraph)
From <- c(1,2,3,4,5,6,7,8)
To <- c(NA,1,2,3,2,NA,6,7)
Value<- c(1,0,0.5,0.5,0,-1,-1,-0.5)
Data <- data.frame(From,To, Value)
Network <- graph.data.frame(Data[,c("From","To")])
Network<- Network - "NA"
plot(Network)
Now i would like to know the AverageValue of the Partial Graph they are in and at it to the initial Dataframe.
At the end it should look like this:
From <- c(1,2,3,4,5,6,7,8)
To <- c(NA,1,2,3,2,NA,6,7)
Value<- c(1,0,0.5,0.5,0,-1,-1,-0.5)
AverageTreeValue<- c(0.4,0.4,0.4,0.4,0.4,-0.833,-0.833,-0.833)
FinalData <- data.frame(From,To, Value, AverageTreeValue)
You can use the clusters function to compute connected components in your graph, aggregate to compute the mean value for each of these clusters, and merge to combine the two together:
Data$group <- clusters(Network)$membership
(FinalData <- merge(Data, aggregate(Value~group, Data, mean), by="group"))
# group From To Value.x Value.y
# 1 1 1 NA 1.0 0.4000000
# 2 1 2 1 0.0 0.4000000
# 3 1 3 2 0.5 0.4000000
# 4 1 4 3 0.5 0.4000000
# 5 1 5 2 0.0 0.4000000
# 6 2 6 NA -1.0 -0.8333333
# 7 2 7 6 -1.0 -0.8333333
# 8 2 8 7 -0.5 -0.8333333
Alternately, you could use match to perform the merge and get some more control over the names of the generated column:
groups <- clusters(Network)$membership
means <- aggregate(Value~group, data.frame(Value=Data$Value, group=groups), mean)
Data$AverageTreeValue <- means$Value[match(groups, means$group)]
Data
# From To Value AverageTreeValue
# 1 1 NA 1.0 0.4000000
# 2 2 1 0.0 0.4000000
# 3 3 2 0.5 0.4000000
# 4 4 3 0.5 0.4000000
# 5 5 2 0.0 0.4000000
# 6 6 NA -1.0 -0.8333333
# 7 7 6 -1.0 -0.8333333
# 8 8 7 -0.5 -0.8333333

Computing Colwise Means on a Given Interval

I have a data frame in R that can be approximated as:
df <- data.frame(x = rep(1:5, each = 4), y = rep(2:6, each = 4), z = rep(3:7, each = 4))
> df
x y z
1 1 2 3
2 1 2 3
3 1 2 3
4 1 2 3
5 2 3 4
6 2 3 4
7 2 3 4
8 2 3 4
9 3 4 5
10 3 4 5
11 3 4 5
12 3 4 5
13 4 5 6
14 4 5 6
15 4 5 6
16 4 5 6
17 5 6 7
18 5 6 7
19 5 6 7
20 5 6 7
I'd like to compute colwise means at intervals of 5, and then collapse these means into a new data frame. For example, I'd like to compute the colwise means of df[1:5,], df[6:10,], df[11:15,], and df[16:20,], and return a df that looks as follows:
[,1] [,2] [,3]
[1,] 1.2 2.2 3.2
[2,] 2.4 3.4 4.4
[3,] 3.6 4.6 5.6
[4,] 4.8 5.8 6.8
I'm currently using a for-loop as such (where temp.coeff would correspond to the "5" specified above):
my.means <- NULL
for (j in 1:baseFreq) {
temp.mean <- colMeans(temp.df[(temp.coeff*(j-1)+1):(temp.coeff*j),])
my.means <- rbind(my.means, temp.mean)
}
my.means <- t(my.means)
collapsed.df <- t(data.frame(colMeans(my.means)))
}
..but I feel like there's an apply statement that could do the job a lot more efficiently. In addition, while the above data frame only has 20 rows, the one's on which I'll be working will have several thousand. Thoughts?
Many thanks in advance SO.
aggregate can do this if you aggregate against an appropriate running index. You do end up with another column in the result (which can be removed).
aggregate(. ~ rep(seq(nrow(df)/5), each=5), data=df, FUN=mean)
## rep(seq(nrow(df)/5), each = 5) x y z
## 1 1 1.2 2.2 3.2
## 2 2 2.4 3.4 4.4
## 3 3 3.6 4.6 5.6
## 4 4 4.8 5.8 6.8
I really think data.table works great for situations like this. It is fast and easy.
require("data.table")
dt <- data.table(df)
dt[,row.num:=.I]
dt[,lapply(.SD,mean),by=list(interval=cut(row.num,seq(0,nrow(dt),by=5)))]
# interval x y z
# 1: (0,5] 1.2 2.2 3.2
# 2: (5,10] 2.4 3.4 4.4
# 3: (10,15] 3.6 4.6 5.6
# 4: (15,20] 4.8 5.8 6.8
This is a possible solution with a combination of apply and sapply:
apply(df, 2, function(x) sapply(seq(1,nrow(df),5), function(y) mean(x[y:(y+4)])))
# x y z
#[1,] 1.2 2.2 3.2
#[2,] 2.4 3.4 4.4
#[3,] 3.6 4.6 5.6
#[4,] 4.8 5.8 6.8
Edit after comment by #jbaums: depending on the desired behavior, you might want to add na.rm=TRUE to the mean calculation:
apply(df, 2, function(x) sapply(seq(1,nrow(df),5), function(y) mean(x[y:(y+4)], na.rm = TRUE)))

Finding the original vectors from an interaction table

A = c(1,2,3,2,1,2,2,2,1,2,3,2,1)
B = c(2,3,2,3,2,2,1,1,2,1,2,2,3)
mytable = table(A,B)
What is the best solution to find back the two vectors from mytable? Of course, it will not be the exact same vectors but the order of A compared to B has to be respected. Does it make sense?
You can use data.frame and rep:
X <- as.data.frame(mytable)
X[] <- lapply(X, function(z) type.convert(as.character(z)))
Y <- X[rep(rownames(X), X$Freq), 1:2]
Y
# A B
# 2 2 1
# 2.1 2 1
# 2.2 2 1
# 4 1 2
# 4.1 1 2
# 4.2 1 2
# 5 2 2
# 5.1 2 2
# 6 3 2
# 6.1 3 2
# 7 1 3
# 8 2 3
# 8.1 2 3
Y$A contains the same values as A, and Y$B contains the same values as B.
all.equal(sort(Y$A), sort(A))
# [1] TRUE
all.equal(sort(Y$B), sort(B))
# [1] TRUE
Alternatively, with #Matthew's comments:
X <- data.matrix(data.frame(mytable))
X[rep(sequence(nrow(X)), X[, "Freq"]), 1:2]
The result in this case is a two-column matrix.
Update (more than a year later)
You can also use expandRows from my "splitstackshape" package after converting the table to a data.table. Notice that it also gives you a message about what combinations had zero values, and were thus dropped when expanding to a long form.
library(splitstackshape)
expandRows(as.data.table(mytable), "N")
# The following rows have been dropped from the input:
#
# 1, 3, 9
#
# A B
# 1: 2 1
# 2: 2 1
# 3: 2 1
# 4: 1 2
# 5: 1 2
# 6: 1 2
# 7: 2 2
# 8: 2 2
# 9: 3 2
# 10: 3 2
# 11: 1 3
# 12: 2 3
# 13: 2 3

R - subset data if conditions

How can I subset data with logical conditions.
Assume that I have data as below. I would like to subset data set with first condition that all animals having FCR record, then I would like to take all animals in same pen with these animals in new data set.
animal Feed Litter Pen
1 0.2 5 3
2 NA 5 3
3 0.2 5 3
4 0.2 6 4
5 0.3 5 4
6 0.3 4 4
7 0.3 5 3
8 0.3 5 3
9 NA 5 5
10 NA 3 5
11 NA 3 3
12 NA 3 5
13 0.4 7 3
14 0.4 7 3
15 NA 7 5
I'm assuming that "FCR record" (in your question) relates to "Feed". Then, if I understand the question correctly, you can do this:
split(df[complete.cases(df),], df[complete.cases(df), 4])
# $`3`
# animal Feed Litter Pen
# 1 1 0.2 5 3
# 3 3 0.2 5 3
# 7 7 0.3 5 3
# 8 8 0.3 5 3
# 13 13 0.4 7 3
# 14 14 0.4 7 3
#
# $`4`
# animal Feed Litter Pen
# 4 4 0.2 6 4
# 5 5 0.3 5 4
# 6 6 0.3 4 4
In the above, complete.cases drops any of the incomplete observations. If you needed to match the argument on a specific variable, you can use something like df[!is.na(df$Feed), ] instead of complete.cases. Then, split creates a list of data.frames split by Pen.
# all animals with Feed data
df[!is.na(df$Feed), ]
# all animals from pens with at least one animal with feed data in the pen
df[ave(!is.na(df$Feed), df$Pen, FUN = any), ]

Resources