Flatten or unlist a data frame in R - r

I am using the Googleway package to get the elevation information for a bunch of lat long coordinates of which there are 954 in total.
I've broken the calls into 3 separate files but they're in list format and when I convert them to a dataframe they are in nested dataframe formats. I've been trying to flatten the files and unlist them but I am having no success.
DF <- read.csv("Site Coor R.csv", header = T, colClasses = c("numeric","numeric"))
result1 <- google_elevation(df_locations = DF[1:350,], key = "KEY")
result2 <- google_elevation(df_locations = DF[351:700,], key = "KEY")
result3 <- google_elevation(df_locations = DF[701:954,], key = "KEY")
> str(result1)
List of 2
$ results:'data.frame': 350 obs. of 3 variables:
..$ elevation : num [1:350] 14.15 2.14 2.66 6.78 23.27 ...
..$ location :'data.frame': 350 obs. of 2 variables:
.. ..$ lat: num [1:350] 52.7 52.7 52.7 52.9 52.7 ...
.. ..$ lng: num [1:350] -8.61 -8.83 -8.92 -8.98 -8.91 ...
..$ resolution: num [1:350] 611 611 611 611 611 ...
$ status : chr "OK"
do.call("c", result1[["location"]])
or
result1 <- unlist(result1, recursive = TRUE, use.names = TRUE)
or
write.table(data.frame(subset(result1DF,select=-c(results.location)),unclass(result1DF$results.location)))
Since result1, result2 and result3 are of the same structure is there a simple way to merge them, flatten the conjoined table and then export as CSV?

We can get all the objects in a list and create data.frame in a single call
lst <- lapply(mget(paste0("result", 1:3)), function(x) do.call(data.frame, x$results))
str(lst[[1]])
#'data.frame': 12 obs. of 3 variables:
#$ elevation : num -0.546 0.537 0.42 -0.584 0.847 ...
#$ location.lat: int 61 85 53 80 82 52 66 62 68 57 ...
#$ location.lng: int 11 7 10 19 1 -2 -6 -8 -14 -13 ...
If we need a single table, then rbind them together
library(data.table)
dt <- rbindlist(lst)
fwrite(dt, file = "yourfile.csv")
data
f1 <- function(seed){
set.seed(seed)
results <- data.frame(elevation = rnorm(12))
results$location <- data.frame(lat = sample(50:100, 12, replace=TRUE),
lng = sample(-15:20, 12, replace=TRUE))
results
}
result1 <- list(results = f1(24), status = "OK")
result2 <- list(results = f1(42), status = "OK")
result3 <- list(results = f1(343), status = "OK")

Related

how to plot dataframe nest list in group?

I have a directory structure with rasters
like this:
folder absorbance: farm1.tif, farm2.tif
folder resistance: farm1.tif,farm2.tif
They are rasters in geotiff
I have to make comparison graphs between the two farms (farm1 and farm2) for the same type of data (absorbance, resistance).
Farm1 and Farm2 are not stackable, so I don't use rasterStacks
Farm1 and Farm2 can have a different number of cells.
I went so far as to create nested lists of dataframes
```
raster_dir <- c(list.dirs(path = cartella,recursive = F,full.names = F))
raster_files <- lapply(raster_dir,function(dir) {
raster_files <- as.list(list.files(path=paste(cartella,dir,sep='/'),
pattern = "\\.tif$",
full.names = TRUE,
recursive = F))
})
names(raster_files) <- raster_dir
rasters <- rapply(raster_files,rast,how = "list",deflt = NA_integer_)
rast_df <- rapply(rasters,terra::as.data.frame,how = "list",deflt = NA_integer_)
```
> str(rast_df)`
List of 2
absorbance:List of 2
..$ :'data.frame': 1920 obs. of 1 variable:
ognibene: num [1:1920] 1 1 1 0.000392 0.000252 ...
..$ :'data.frame': 2401 obs. of 1 variable:
toderici: num [1:2401] 0.000467 0.000565 1 1 1 ...
resistance :List of 3
..$ :'data.frame': 1920 obs. of 1 variable:
ognibene: num [1:1920] 1 1 1 0.404 0.211 ...
..$ :'data.frame': 2401 obs. of 1 variable:
toderici: num [1:2401] 0.584 0.706 1 1 1 ...
probably the problem is in renaming the nested list
why this doesn't work:
```
fun_violin_plot <- function(df) gg <- ggplot(df,aes(x='',y=df[,1])) +
geom_violin(na.rm = T,scale="count") +
labs(x=NULL,y = NULL)
rast_violin_plot <- rapply(rast_df,fun_violin_plot,how = "unlist",deflt = NA_integer_)
```
> Error in `fortify()`:
! `data` must be a <data.frame>, or an object coercible by `fortify()`, not a list.

Loop in R not working, generating single Value

I have some metabolomics data I am trying to process (validate the compounds that are actually present).
`'data.frame': 544 obs. of 48 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ No. : int 2 32 34 95 114 141 169 234 236 278 ...
$ RT..min. : num 0.89 3.921 0.878 2.396 0.845 ...
$ Molecular.Weight : num 70 72 72 78 80 ...
$ m.z : num 103 145 114 120 113 ...
$ HMDB.ID : chr "HMDB0006804" "HMDB0031647" "HMDB0006112" "HMDB0001505" ...
$ Name : chr "Propiolic acid" "Acrylic acid" "Malondialdehyde" "Benzene" ...
$ Formula : chr "C3H2O2" "C3H4O2" "C3H4O2" "C6H6" ...
$ Monoisotopic_Mass: num 70 72 72 78 80 ...
$ Delta.ppm. : num 1.295 0.833 1.953 1.023 0.102 ...
$ X1 : num 288.3 16.7 1130.9 3791.5 33.5 ...
$ X2 : num 276.8 13.4 1069.1 3228.4 44.1 ...
$ X3 : num 398.6 19.3 794.8 2153.2 15.8 ...
$ X4 : num 247.6 100.5 1187.5 1791.4 33.4 ...
$ X5 : num 98.4 162.1 1546.4 1646.8 45.3 ...`
I tried to write a loop so that if the Delta.ppm value is larger than (m/z - molecular weight)/molecular weight, the entire row is deleted in the subsequent dataframe.
for (i in 1:nrow(rawdata)) {
ppm <- (rawdata$m.z[i] - rawdata$Molecular.Weight[i]) /
rawdata$Molecular.Weight[i]
if (ppm > rawdata$Delta.ppm[i]) {
filtered_data <- rbind(filtered_data, rawdata[i,])
}
}
Instead of giving me a new df with the validated compounds, under the 'Values' section, it generates a single number for 'ppm'.
Still very new to R, any help is super appreciated!
No need to do this row-by-row, we can remove all undesired rows in one operation:
## base R
good <- with(rawdat, (m.z - Molecular.Weight)/Molecular.Weight < Delta.ppm.)
newdat <- rawdat[good, ]
## dplyr
newdat <- filter(rawdat, (m.z - Molecular.Weight)/Molecular.Weight < Delta.ppm.)
Iteratively adding rows to a frame using rbind(old, newrow) works in practice but scales horribly, see "Growing Objects" in The R Inferno. For each row added, it makes a complete copy of all rows in old, which works but starts to slow down a lot. It is far better to produce a list of these new rows and then rbind them at one time; e.g.,
out <- list()
for (...) {
# ... newrow ...
out <- c(out, list(newrow))
}
alldat <- do.call(rbind, out)
ppm[i] <- NULL
for (i in 1:nrow(rawdata)) {
ppm[i] <- (rawdata$m.z[i] - rawdata$Molecular.Weight[i]) /
rawdata$Molecular.Weight[i]
if (ppm[i] > rawdata$Delta.ppm[i]) {
filtered_data <- rbind(filtered_data, rawdata[i,])
}
}

How to create a list of list and then perform a vectorised function over it

I'm looking for two specific help point in this request
1) how to create a list of list given my data base (all.df) below
2) how to vectorise a function over this list of list
I'm trying to generate a forecast at a customer / product level using the Prophet library.
Im struggling to vectorise the operation.
I currently run a for loop, which I want to avoid and speed-up my calculations.
Data for the analysis
set.seed(1123)
df1 <- data.frame(
Date = seq(dmy("01/01/2017"), by = "day", length.out = 365*2),
Customer = "a",
Product = "xxx",
Revenue = sample(1:100, 365*2, replace=TRUE))
df2 <- data.frame(
Date = seq(dmy("01/01/2017"), by = "day", length.out = 365*2),
Customer = "a",
Product = "yyy",
Revenue = sample(25:200, 365*2, replace=TRUE))
df3 <- data.frame(
Date = seq(dmy("01/01/2017"), by = "day", length.out = 365*2),
Customer = "b",
Product = "xxx",
Revenue = sample(1:100, 365*2, replace=TRUE))
df4 <- data.frame(
Date = seq(dmy("01/01/2017"), by = "day", length.out = 365*2),
Customer = "b",
Product = "yyy",
Revenue = sample(25:200, 365*2, replace=TRUE) )
all.df <- rbind(df1, df2, df3, df4)
This is my forecast function
daily_forecast <- function(df, forecast.days = 365){
# fit actuals into prophet
m <- prophet(df,
yearly.seasonality = TRUE,
weekly.seasonality = TRUE,
changepoint.prior.scale = 0.55) # default value is 0.05
# create dummy data frame to hold prodictions
future <- make_future_dataframe(m, periods = forecast.days, freq = "day")
# run the prediction
forecast <- predict(m, future)
### Select the date and forecast from the model and then merge with actuals
daily_fcast <- forecast %>% select(ds, yhat) %>% dplyr::rename(Date = ds, fcast.daily = yhat)
actual.to.merge <- df %>% dplyr::rename(Date = ds, Actual.Revenue = y)
daily_fcast <- merge(actual.to.merge, daily_fcast, all = TRUE)
}
Currently, I work through one customer/product at a time using a for loop
x <- df1 %>% select(-c(Customer, Product)) %>%
dplyr::rename(ds = Date, y = Revenue) %>%
daily_forecast()
I would like to instead, vectorise the whole operation:
1-Create a list of list, i.e. split the all.df by:
a) Product and then
b) by customer
2-Then have the daily_forecast function map over the list of list created in 1) above
I would very much like to use functions out of purrr.
Here is how I would do what you're asking with purrr:
library(tidyverse)
library(lubridate)
library(prophet)
res <-
all.df %>%
split(.$Customer) %>%
map(~ split(.x, .x$Product)) %>%
at_depth(2, select, ds = Date, y = Revenue) %>%
at_depth(2, daily_forecast)
str(res)
# List of 2
# $ a:List of 2
# ..$ xxx:'data.frame': 1095 obs. of 3 variables:
# .. ..$ Date : Date[1:1095], format: "2017-01-01" ...
# .. ..$ Actual.Revenue: int [1:1095] 76 87 87 56 83 17 19 72 92 35 ...
# .. ..$ fcast.daily : num [1:1095] 55.9 57.9 51.9 51.9 54 ...
# ..$ yyy:'data.frame': 1095 obs. of 3 variables:
# .. ..$ Date : Date[1:1095], format: "2017-01-01" ...
# .. ..$ Actual.Revenue: int [1:1095] 62 87 175 186 168 190 30 192 119 170 ...
# .. ..$ fcast.daily : num [1:1095] 121 121 119 119 116 ...
# $ b:List of 2
# ..$ xxx:'data.frame': 1095 obs. of 3 variables:
# .. ..$ Date : Date[1:1095], format: "2017-01-01" ...
# .. ..$ Actual.Revenue: int [1:1095] 71 94 81 32 85 59 59 55 50 50 ...
# .. ..$ fcast.daily : num [1:1095] 51.9 54.2 54.5 53.1 51.9 ...
# ..$ yyy:'data.frame': 1095 obs. of 3 variables:
# .. ..$ Date : Date[1:1095], format: "2017-01-01" ...
# .. ..$ Actual.Revenue: int [1:1095] 105 46 153 136 59 59 34 72 70 85 ...
# .. ..$ fcast.daily : num [1:1095] 103.3 103.3 103.1 103.1 91.5 ...
But the following would be more natural to me (keeping everything in a dataframe):
res_2 <-
all.df %>%
rename(ds = Date, y = Revenue) %>%
nest(ds, y) %>%
transmute(Customer, Product, res = map(data, daily_forecast)) %>%
unnest()
# # A tibble: 4,380 × 5
# Customer Product Date Actual.Revenue fcast.daily
# <fctr> <fctr> <date> <int> <dbl>
# 1 a xxx 2017-01-01 76 55.93109
# 2 a xxx 2017-01-02 87 57.92577
# 3 a xxx 2017-01-03 87 51.92263
# 4 a xxx 2017-01-04 56 51.86267
# 5 a xxx 2017-01-05 83 54.04588
# 6 a xxx 2017-01-06 17 52.75289
# 7 a xxx 2017-01-07 19 52.35083
# 8 a xxx 2017-01-08 72 53.91887
# 9 a xxx 2017-01-09 92 55.81202
# 10 a xxx 2017-01-10 35 49.78302
# # ... with 4,370 more rows

Flatten list and push list key to vector on second level

I suppose this is simple, but I just can't seem to figure it out.
I need to flatten the second level structure and push the list name/key to a vector on the same level as the other vectors. The current structure of myList is
$ 13454:List of 30
..$ subjectId : num 187
..$ procedureId : num 3
..$ procedureSampleId: num 3
..$ timestamp : chr "2017-04-21T17:15:10.911Z"
..$ n001 : num -999
..$ n002 : num -999
..$ gender : num 1
..$ age : num 18
$ 13455:List of 30
..$ subjectId : num 188
..$ procedureId : num 3
..$ procedureSampleId: num 3
..$ timestamp : chr "2017-04-21T17:15:10.913Z"
..$ n001 : num -999
..$ n002 : num -999
..$ gender : num -999
..$ age : num 28
whereas this is the structure I'm looking for
$ ID : chr '13455' '13455'
$ subjectId : num 187 188
$ procedureId: : num 3 3
and so on
I've tried to achieve this by:
myList2 <- sapply(names(myList), function(y){
y <- unlist(c('ID' = y, myList[[y]]), use.names = TRUE)
})
But I end up with the full transposed result of what I need. I could go t(myList2) but I want to understand how to do this correctly. Thank you!
EDIT: Reproducible data:
myList <- list('13454' = list('subjectId' = 187, 'procedureId' = 3, 'procedureSampleId' = 3, 'timestamp' = "2017-04-21T17:15:10.911Z", 'n001' = -999, 'n002' = -999, 'gender' = 1, 'age' = 18), '13455' = list('subjectId' = 188, 'procedureId' = 3, 'procedureSampleId' = 3, 'timestamp' = "2017-04-21T17:15:10.913Z", 'n001' = -999, 'n002' = -999, 'gender' = -999, 'age' = 28))
myList can be turned into a data.frame using lapply() and rbindlist() from the data.table package:
result <- data.table::rbindlist(lapply(myList, as.data.frame), idcol = "ID")
result[["ID"]] <- names(myList)
result
# ID subjectId procedureId procedureSampleId timestamp n001 n002 gender age
#1: 13454 187 3 3 2017-04-21T17:15:10.911Z -999 -999 1 18
#2: 13455 188 3 3 2017-04-21T17:15:10.913Z -999 -999 -999 28
Edit: This can be even more streamlined:
library(data.table)
rbindlist(myList, idcol = "ID")[, ID := names(myList)][]
Edit: To get to the list format you indicate above in base R, use rbind to build the data frame, then unlist the necessary elements with lapply.
With your list above, you can use do.call to call rbind in base R:
example<-data.frame(ID = as.character(names(myList)), do.call("rbind", myList), row.names = NULL)
exAsList <-lapply(example, function(x) x <- unlist(x, use.names = FALSE))
exAsList

Using Reduce function to merge recursively [duplicate]

This question already has answers here:
Simultaneously merge multiple data.frames in a list
(9 answers)
Closed 6 years ago.
If I have a list of a list, and the list contains a set of dataframes and I want to merge the dataframes together but don't to merge all the list together. For example
list<- list(list(df1_2010,df2_2010,df3_2010), list(df1_2011,df2_2011,df3_2011), list(df1_2012,df2_2012,df3_2012))
And i want to merge all the 2010 dataframe together by let say column id. And I want to merge the 2011 dataframes together by a similar column id, and I want to merge all the 2012 dataframes together by another similar column id.
I want to output a list of merged dataframes by year:
list(df2010, df2011, df2012)
Here's a schematic of how I want to use the Reduce function:
f<-function(b) merge(...,by="ID",all.x=T)
list<- Reduce(f, list)
But I think this will merge all three lists together instead of each list separately. Let me know your suggestions.
Here's a simple reproducible example that I think maps onto your structure:
n <- 5
set.seed(n)
l <- list( list( data.frame(ID = 1:5, a = rnorm(n)),
data.frame(ID = 1:5, b = rnorm(n)),
data.frame(ID = 1:5, c = rnorm(n)),
data.frame(ID = 1:5, d = rnorm(n)) ),
list( data.frame(ID = 1:5, a = rnorm(n)),
data.frame(ID = 1:5, b = rnorm(n)),
data.frame(ID = 1:5, c = rnorm(n)),
data.frame(ID = 1:5, d = rnorm(n)) ),
list( data.frame(ID = 1:5, a = rnorm(n)),
data.frame(ID = 1:5, b = rnorm(n)),
data.frame(ID = 1:5, c = rnorm(n)),
data.frame(ID = 1:5, d = rnorm(n)) ))
You can write an lapply based function that uses Reduce on each element of the list:
out <-
lapply(l, function(x) Reduce(function(...) merge(..., by="ID", all.x=T), x))
And you should get a list of merged dataframes:
str(out)
List of 3
$ :'data.frame': 5 obs. of 5 variables:
..$ ID: int [1:5] 1 2 3 4 5
..$ a : num [1:5] -0.8409 1.3844 -1.2555 0.0701 1.7114
..$ b : num [1:5] -0.603 -0.472 -0.635 -0.286 0.138
..$ c : num [1:5] 1.228 -0.802 -1.08 -0.158 -1.072
..$ d : num [1:5] -0.139 -0.597 -2.184 0.241 -0.259
$ :'data.frame': 5 obs. of 5 variables:
..$ ID: int [1:5] 1 2 3 4 5
..$ a : num [1:5] 0.901 0.942 1.468 0.707 0.819
..$ b : num [1:5] -0.293 1.419 1.499 -0.657 -0.853
..$ c : num [1:5] 0.316 1.11 2.215 1.217 1.479
..$ d : num [1:5] 0.952 -1.01 -2 -1.762 -0.143
$ :'data.frame': 5 obs. of 5 variables:
..$ ID: int [1:5] 1 2 3 4 5
..$ a : num [1:5] 1.5501 -0.8024 -0.0746 1.8957 -0.4566
..$ b : num [1:5] 0.5622 -0.887 -0.4602 -0.7243 -0.0692
..$ c : num [1:5] 1.463 0.188 1.022 -0.592 -0.112
..$ d : num [1:5] -0.925 0.7533 -0.1126 -0.0641 0.2333
Another way to perform the recursive merge would be to use join_all from library(plyr)
library(plyr)
out1 <- lapply(l, join_all, by="ID") #using the example dataset of #Thomas
identical(out, out1)
# [1] TRUE

Resources