Calculations on data frames in a list - r

I have a list of data frames:
str(Test)
List of 3
$ A:'data.frame': 32400 obs. of 4 variables:
..$ X : num [1:32400] -0.0152 -0.0302 -0.0453 -0.0604 -0.0755 ...
..$ Y : num [1:32400] 0.00875 0.01745 0.02615 0.0349 0.0436 ...
..$ Z : num [1:32400] -1 -0.999 -0.999 -0.998 -0.996 ...
..$ Ts: num [1:32400] 0.000427 0.001696 0.003805 0.006765 0.010537 ...
$ B:'data.frame': 32400 obs. of 4 variables:
..$ X : num [1:32400] -0.0153 -0.0305 -0.0457 -0.061 -0.0763 ...
..$ Y : num [1:32400] 0.00848 0.01692 0.02536 0.03384 0.04228 ...
..$ Z : num [1:32400] -1 -0.999 -0.999 -0.998 -0.996 ...
..$ Ts: num [1:32400] 0.000427 0.001696 0.003805 0.006765 0.010537 ...
$ C:'data.frame': 32400 obs. of 4 variables:
..$ X : num [1:32400] -0.0155 -0.0308 -0.0462 -0.0616 -0.077 ...
..$ Y : num [1:32400] 0.00822 0.01638 0.02455 0.03277 0.04094 ...
..$ Z : num [1:32400] -1 -0.999 -0.999 -0.998 -0.996 ...
..$ Ts: num [1:32400] 0.000427 0.001696 0.003805 0.006765 0.010537 ...
I want to create two new columns in each dataframe. The new values are based on X, Y, Z of each dataframe:
new_x = sqrt(2/(1-Z)) * X
new_y = sqrt(2/(1-Z)) * Y
I have tried a few things (and read a lot) and this is what I think should work:
t=function(x){
new_x = sqrt(2/(1-x[,3])) * x[,1]
new_y = sqrt(2/(1-x[,3])) * x[,2] }
New_Test=lapply(Test, within, t)
However, this only creates a new list that is exactly like the old list.
I have tried to use mapply and looked into the plyr package but could not find a solution. I am fairly new to R so be kind ;-)
Edit: Both solutions posted below work! Thanks for your help :-)

Here is a fully implementation of the suggestion I made in the comments.
First we simulate some data:
listOfDataframes<- list(
df1 = data.frame(X = runif(100), Y = runif(100), Z = runif(100)),
df2 = data.frame(X = runif(100), Y = runif(100), Z = runif(100)),
df3 = data.frame(X = runif(100), Y = runif(100), Z = runif(100))
)
Then we write a function to perform the desire operation. Note that the use of return is unnecessary and only included for clarity.
yourFun <- function(df) {
df$new_x <- sqrt(2/(1-df$Z)) * df$X
df$new_y <- sqrt(2/(1-df$Z)) * df$Y
return(df) # just "df" would produce the same result
}
Then we apply the function to your list of data.frames and assign the result to a new list:
newList <- lapply(listOfDataframes, yourFun)
Finally, we display the first few entries of every dataframe to verify our results.
lapply(newList, head)
$df1
X Y Z new_x new_y
1 0.7122989 0.85574735 0.26176397 1.1724104 1.4085198
2 0.8373206 0.18083472 0.19733040 1.3217167 0.2854489
3 0.6780758 0.76722834 0.48987088 1.3426203 1.5191462
4 0.3694669 0.42579811 0.10287797 0.5516515 0.6357597
5 0.1466816 0.69924651 0.08006688 0.2162781 1.0310202
6 0.3280546 0.06574292 0.22372561 0.5265669 0.1055252
$df2
X Y Z new_x new_y
1 0.9385518 0.50570095 0.3062779 1.593604 0.85864969
2 0.6672409 0.66002494 0.3721208 1.190857 1.17797831
3 0.7559528 0.73025591 0.4063969 1.387591 1.34042338
4 0.1960170 0.01639017 0.9700715 1.602382 0.13398487
5 0.9336734 0.76437690 0.3301318 1.613301 1.32077206
6 0.7320958 0.03640788 0.1761000 1.140632 0.05672482
$df3
X Y Z new_x new_y
1 0.45050818 0.9843507 0.8956288 1.9720924 4.3089794
2 0.32775145 0.1385610 0.9713440 2.7381165 1.1575725
3 0.07208382 0.8635344 0.5244027 0.1478200 1.7708221
4 0.50439997 0.1328935 0.5827728 1.1043424 0.2909594
5 0.46265459 0.3394566 0.4912585 0.9173252 0.6730551
6 0.15894944 0.4517309 0.3610197 0.2812097 0.7991919

Related

How to retrieve name of element in list (data frame) to use it as a title of the plot?

So briefly and without further ado - is it possible to retrieve only a name of element in list and use it as a main title of plot?
Let me explain - example:
Let's create a random df:
a <- c(1,2,3,4)
b <- runif(4)
c <- runif(4)
d <- runif(4)
e <- runif(4)
f <- runif(4)
df <- data.frame(a,b,c,d,e,f)
head(df)
a b c d e f
1 1 0.9694204 0.9869154 0.5386678 0.39331278 0.15054698
2 2 0.8949330 0.9910894 0.1009689 0.03632476 0.15523628
3 3 0.4930752 0.7179144 0.6957262 0.36579883 0.32006026
4 4 0.4850141 0.5539939 0.3196953 0.14348259 0.05292068
Then I want to create a list of data frame (based on this above) with specific columns to make a plot. In other words I'd like to make plot where first column of df (a) will be x axis on the plot and columns b,c,d,e and gonna represent values on y axis on the plot. Yes there'll be 5 plots - that's the point!
So my idea was to write some simple function which be able to create a list of df's based on that created above so:
my_fun <- function(x){
a <- df[1]
b <- x
aname <- "x_label"
bname <- "y_label"
df <- data.frame(a,b)
names(df) <- c(aname,bname)
return(df)
}
Run it for all (specified) columns:
df_s <- apply(df[,2:6], 2, function(x) my_fun(x))
So I have now:
class(df_s)
[1] "list"
str(df_s)
List of 5
$ b:'data.frame': 4 obs. of 2 variables:
..$ x_label: num [1:4] 1 2 3 4
..$ y_label: num [1:4] 0.969 0.895 0.493 0.485
$ c:'data.frame': 4 obs. of 2 variables:
..$ x_label: num [1:4] 1 2 3 4
..$ y_label: num [1:4] 0.987 0.991 0.718 0.554
$ d:'data.frame': 4 obs. of 2 variables:
..$ x_label: num [1:4] 1 2 3 4
..$ y_label: num [1:4] 0.539 0.101 0.696 0.32
$ e:'data.frame': 4 obs. of 2 variables:
..$ x_label: num [1:4] 1 2 3 4
..$ y_label: num [1:4] 0.3933 0.0363 0.3658 0.1435
$ f:'data.frame': 4 obs. of 2 variables:
..$ x_label: num [1:4] 1 2 3 4
..$ y_label: num [1:4] 0.1505 0.1552 0.3201 0.0529
Something that I wanted, but here's the question. I'd like to create a plot for every df in my list... As a result I want 5 plots with main titles b, c, d, e, f respectively Axis labels are the same name of the plot isn't... So I tried:
lapply(df_s, function(x) plot(x[2] ~ x[1], data = x, main = ???))
What should be instead of question marks? I tried main = names(df_s)[x] however it didin't work...
I think the following works. However, I think it might be best to use ggplot2 instead of the plot function (unless you are saving the plots inside inside lapply).
lapply(1 : length(df_s), function(x)
plot(df_s[[x]][,2] ~ df_s[[x]][,1],
xlab = names(df_s[[x]])[1],
ylab = names(df_s[[x]])[1],
main = names(df_s[x])))
With ggplot2
plot_lst <- lapply(seq_along(df_s), function(i) {
ggplot(df_s[[i]], aes(x=x_label, y=y_label)) +
geom_point() +
theme(plot.title = element_text(hjust = 0.5)) +
ggtitle(names(df_s)[i]) })

How to split list at every 10th item in R?

I have a list of 100 items.
I want to split it after each 10th item in Code 1.
Code 2 is about a list of two former lists and splitting it to 20 lists of 10 items each.
Code 1
Expected output: ten lists of 10 items.
A <- 100
a <- rnorm(A) # [1:100]
n <- 10
str(a)
# Not resulting in equal size of chunks with vectors so reject
# http://stackoverflow.com/a/3321659/54964
#d <- split(d, ceiling(seq_along(d)/(length(d)/n)))
# Works for vectors but not with lists
# http://stackoverflow.com/a/16275428/54964
#d <- function(d,n) split(d, cut(seq_along(d), n, labels = FALSE))
str(d)
Test code 2
Input: a list of two lists
aa <- list(a, rnorm(a))
Expected output: 20 lists of 10 item size
Testing Loki's answer
segmentLists <- function(A, segmentSize) {
res <- lapply(A, function(x) split(unlist(x), cut(seq_along(unlist(x)), segmentSize, labels = F)))
#print(res)
res <- unlist(res, recursive = F)
}
segmentLists(aa, 10)
Output: loop going on, never stopping
OS: Debian 8.5
R: 3.3.1
you can use lapply.
aa <- list(a, rnorm(a))
aa
n <- 10
x <- lapply(aa, function(x) split(unlist(x), cut(seq_along(unlist(x)), n, labels = F)))
y <- unlist(x, recursive = F)
str(y)
# List of 20
# $ 1 : num [1:10] 1.0895 -0.0477 0.225 -0.6308 -0.1558 ...
# $ 2 : num [1:10] -0.469 -0.381 0.709 -0.798 1.183 ...
# $ 3 : num [1:10] 0.757 -1.128 -1.394 -0.712 0.494 ...
# $ 4 : num [1:10] 1.135 0.324 0.75 -0.83 0.794 ...
# $ 5 : num [1:10] -0.786 -0.068 -0.179 0.354 -0.597 ...
# $ 6 : num [1:10] -0.115 0.164 -0.365 -1.827 -2.036 ...
...
length(y)
# [1] 20
to remove the names of the list elements in y ($ 1, $ 2 etc.) you can use unname()
str(unname(y))
# List of 20
# $ : num [1:10] 1.0895 -0.0477 0.225 -0.6308 -0.1558 ...
# $ : num [1:10] -0.469 -0.381 0.709 -0.798 1.183 ...
# $ : num [1:10] 0.757 -1.128 -1.394 -0.712 0.494 ...
# $ : num [1:10] 1.135 0.324 0.75 -0.83 0.794 ...
# $ : num [1:10] -0.786 -0.068 -0.179 0.354 -0.597 ...
...
Using a function, you have to return res at the end of the function.
segmentLists <- function(A, segmentSize)
{
res <- lapply(A, function(x) split(unlist(x), cut(seq_along(unlist(x)), segmentSize, labels = F)))
#print(res)
res <- unlist(res, recursive = F)
res <- unname(res)
res
}

Subsetting and replacing in a list variable nested in a dataframe

Here is my dataframe example. It includes a column variable, named "dta" which is a single list of n values I want to keep for each of my scenario:
set.seed(777)
df <- data.frame(theo = numeric(),
size = numeric(),
dta = I(list()))
df[ 1: 5,"theo"] <- qlnorm(0.1, meanlog=0, sdlog=1, lower.tail = TRUE, log.p = FALSE)
df[ 6:10,"theo"] <- qlnorm(0.2, meanlog=0, sdlog=1, lower.tail = TRUE, log.p = FALSE)
df[ 1: 5,"size"] <- 10
df[ 6:10,"size"] <- 20
for(i in 1:10){
df$dta[i] <- list(rlnorm(df$size[i], meanlog = 0, sdlog = 1))
}
df
str(df)
This should give a df like:
theo size dta
1 0.2776062 10 1.631967....
2 0.2776062 10 0.737667....
3 0.2776062 10 0.131252....
4 0.2776062 10 1.937334....
5 0.2776062 10 0.739868....
6 0.4310112 20 4.631176....
7 0.4310112 20 2.610180....
8 0.4310112 20 0.175918....
9 0.4310112 20 3.501670....
10 0.4310112 20 0.588178....
or:
'data.frame': 10 obs. of 4 variables:
$ theo: num 0.278 0.278 0.278 0.278 0.278 ...
$ size: num 10 10 10 10 10 20 20 20 20 20
$ dta :List of 10
..$ : num 1.632 0.671 1.667 0.671 5.148 ...
..$ : num 0.738 1.056 0.152 0.967 10.089 ...
..$ : num 0.131 1.256 0.457 3.574 4.211 ...
..$ : num 1.937 2.359 3.496 0.297 4.587 ...
..$ : num 0.74 0.66 0.481 0.434 1.874 ...
..$ : num 4.631 0.298 10.28 0.933 1.286 ...
..$ : num 2.61 0.472 0.251 1.61 0.303 ...
..$ : num 0.176 0.566 2.156 0.407 3.52 ...
..$ : num 3.502 1.748 1.283 0.648 1.359 ...
..$ : num 0.588 0.392 2.447 1.926 0.86 ...
..- attr(*, "class")= chr "AsIs"
Now, I want to subset that list in such a way that:
for each list, each value is compared with the fixed value "theo" stored in the dataframe
when that value is below or equal to "theo", then recode that value NA
Here is a working code and gives me exactly what I want:
df$dta2 <- df$dta
for(i in 1:10){
df$dta2[[i]] [ df$dta2[[i]] <= df$theo[i] ] <- NA
}
However I was wondering is there is a way to get the same result with a single line of code and no "for loop" to proceed with a conditional replacement of values contained in a list which is nested in a dataframe?
We can use Map
df$dta3 <- Map(function(x,y) replace(x, x<=y, NA), df$dta, df$theo)
all.equal(df$dta2, df$dta3, check.attributes=FALSE)
#[1] TRUE

Using Reduce function to merge recursively [duplicate]

This question already has answers here:
Simultaneously merge multiple data.frames in a list
(9 answers)
Closed 6 years ago.
If I have a list of a list, and the list contains a set of dataframes and I want to merge the dataframes together but don't to merge all the list together. For example
list<- list(list(df1_2010,df2_2010,df3_2010), list(df1_2011,df2_2011,df3_2011), list(df1_2012,df2_2012,df3_2012))
And i want to merge all the 2010 dataframe together by let say column id. And I want to merge the 2011 dataframes together by a similar column id, and I want to merge all the 2012 dataframes together by another similar column id.
I want to output a list of merged dataframes by year:
list(df2010, df2011, df2012)
Here's a schematic of how I want to use the Reduce function:
f<-function(b) merge(...,by="ID",all.x=T)
list<- Reduce(f, list)
But I think this will merge all three lists together instead of each list separately. Let me know your suggestions.
Here's a simple reproducible example that I think maps onto your structure:
n <- 5
set.seed(n)
l <- list( list( data.frame(ID = 1:5, a = rnorm(n)),
data.frame(ID = 1:5, b = rnorm(n)),
data.frame(ID = 1:5, c = rnorm(n)),
data.frame(ID = 1:5, d = rnorm(n)) ),
list( data.frame(ID = 1:5, a = rnorm(n)),
data.frame(ID = 1:5, b = rnorm(n)),
data.frame(ID = 1:5, c = rnorm(n)),
data.frame(ID = 1:5, d = rnorm(n)) ),
list( data.frame(ID = 1:5, a = rnorm(n)),
data.frame(ID = 1:5, b = rnorm(n)),
data.frame(ID = 1:5, c = rnorm(n)),
data.frame(ID = 1:5, d = rnorm(n)) ))
You can write an lapply based function that uses Reduce on each element of the list:
out <-
lapply(l, function(x) Reduce(function(...) merge(..., by="ID", all.x=T), x))
And you should get a list of merged dataframes:
str(out)
List of 3
$ :'data.frame': 5 obs. of 5 variables:
..$ ID: int [1:5] 1 2 3 4 5
..$ a : num [1:5] -0.8409 1.3844 -1.2555 0.0701 1.7114
..$ b : num [1:5] -0.603 -0.472 -0.635 -0.286 0.138
..$ c : num [1:5] 1.228 -0.802 -1.08 -0.158 -1.072
..$ d : num [1:5] -0.139 -0.597 -2.184 0.241 -0.259
$ :'data.frame': 5 obs. of 5 variables:
..$ ID: int [1:5] 1 2 3 4 5
..$ a : num [1:5] 0.901 0.942 1.468 0.707 0.819
..$ b : num [1:5] -0.293 1.419 1.499 -0.657 -0.853
..$ c : num [1:5] 0.316 1.11 2.215 1.217 1.479
..$ d : num [1:5] 0.952 -1.01 -2 -1.762 -0.143
$ :'data.frame': 5 obs. of 5 variables:
..$ ID: int [1:5] 1 2 3 4 5
..$ a : num [1:5] 1.5501 -0.8024 -0.0746 1.8957 -0.4566
..$ b : num [1:5] 0.5622 -0.887 -0.4602 -0.7243 -0.0692
..$ c : num [1:5] 1.463 0.188 1.022 -0.592 -0.112
..$ d : num [1:5] -0.925 0.7533 -0.1126 -0.0641 0.2333
Another way to perform the recursive merge would be to use join_all from library(plyr)
library(plyr)
out1 <- lapply(l, join_all, by="ID") #using the example dataset of #Thomas
identical(out, out1)
# [1] TRUE

Building a list in a loop in R - getting item names correct

I have a function which contains a loop over two lists and builds up some calculated data. I would like to return these data as a lists of lists, indexed by some value, but I'm getting the assignment wrong.
A minimal example of what I'm trying to do, and where i'm going wrong would be:
mybiglist <- list()
for(i in 1:5){
a <- runif(10)
b <- rnorm(16)
c <- rbinom(8, 5, i/10)
name <- paste('item:',i,sep='')
tmp <- list(uniform=a, normal=b, binomial=c)
mybiglist[[name]] <- append(mybiglist, tmp)
}
If you run this and look at the output mybiglist, you will see that something is going very wrong in the way each item is being named.
Any ideas on how I might achieve what I actually want?
Thanks
ps. I know that in R there is a sense in which one has failed if one has to resort to loops, but in this case I do feel justified ;-)
It works if you don't use the append command:
mybiglist <- list()
for(i in 1:5){
a <- runif(10)
b <- rnorm(16)
c <- rbinom(8, 5, i/10)
name <- paste('item:',i,sep='')
tmp <- list(uniform=a, normal=b, binomial=c)
mybiglist[[name]] <- tmp
}
# List of 5
# $ item:1:List of 3
# ..$ uniform : num [1:10] 0.737 0.987 0.577 0.814 0.452 ...
# ..$ normal : num [1:16] -0.403 -0.104 2.147 0.32 1.713 ...
# ..$ binomial: num [1:8] 0 0 0 0 1 0 0 1
# $ item:2:List of 3
# ..$ uniform : num [1:10] 0.61 0.62 0.49 0.217 0.862 ...
# ..$ normal : num [1:16] 0.945 -0.154 -0.5 -0.729 -0.547 ...
# ..$ binomial: num [1:8] 1 2 2 0 2 1 0 2
# $ item:3:List of 3
# ..$ uniform : num [1:10] 0.66 0.094 0.432 0.634 0.949 ...
# ..$ normal : num [1:16] -0.607 0.274 -1.455 0.828 -0.73 ...
# ..$ binomial: num [1:8] 2 2 3 1 1 1 2 0
# $ item:4:List of 3
# ..$ uniform : num [1:10] 0.455 0.442 0.149 0.745 0.24 ...
# ..$ normal : num [1:16] 0.0994 -0.5332 -0.8131 -1.1847 -0.8032 ...
# ..$ binomial: num [1:8] 2 3 1 1 2 2 2 1
# $ item:5:List of 3
# ..$ uniform : num [1:10] 0.816 0.279 0.583 0.179 0.321 ...
# ..$ normal : num [1:16] -0.036 1.137 0.178 0.29 1.266 ...
# ..$ binomial: num [1:8] 3 4 3 4 4 2 2 3
Change
mybiglist[[name]] <- append(mybiglist, tmp)
to
mybiglist[[name]] <- tmp
To show that an explicit for loop is not required
unif_norm <- replicate(5, list(uniform = runif(10),
normal = rnorm(16)), simplify=F)
binomials <- lapply(seq_len(5)/10, function(prob) {
list(binomial = rbinom(n = 5 ,size = 8, prob = prob))})
biglist <- setNames(mapply(c, unif_norm, binomials, SIMPLIFY = F),
paste0('item:',seq_along(unif_norm)))
In general if you go down the for loop path it is better to preassign the list beforehand. This is more memory efficient.
mybiglist <- vector('list', 5)
names(mybiglist) <- paste0('item:', seq_along(mybiglist))
for(i in seq_along(mybiglist)){
a <- runif(10)
b <- rnorm(16)
c <- rbinom(8, 5, i/10)
tmp <- list(uniform=a, normal=b, binomial=c)
mybiglist[[i]] <- tmp
}

Resources