how to plot dataframe nest list in group? - r

I have a directory structure with rasters
like this:
folder absorbance: farm1.tif, farm2.tif
folder resistance: farm1.tif,farm2.tif
They are rasters in geotiff
I have to make comparison graphs between the two farms (farm1 and farm2) for the same type of data (absorbance, resistance).
Farm1 and Farm2 are not stackable, so I don't use rasterStacks
Farm1 and Farm2 can have a different number of cells.
I went so far as to create nested lists of dataframes
```
raster_dir <- c(list.dirs(path = cartella,recursive = F,full.names = F))
raster_files <- lapply(raster_dir,function(dir) {
raster_files <- as.list(list.files(path=paste(cartella,dir,sep='/'),
pattern = "\\.tif$",
full.names = TRUE,
recursive = F))
})
names(raster_files) <- raster_dir
rasters <- rapply(raster_files,rast,how = "list",deflt = NA_integer_)
rast_df <- rapply(rasters,terra::as.data.frame,how = "list",deflt = NA_integer_)
```
> str(rast_df)`
List of 2
absorbance:List of 2
..$ :'data.frame': 1920 obs. of 1 variable:
ognibene: num [1:1920] 1 1 1 0.000392 0.000252 ...
..$ :'data.frame': 2401 obs. of 1 variable:
toderici: num [1:2401] 0.000467 0.000565 1 1 1 ...
resistance :List of 3
..$ :'data.frame': 1920 obs. of 1 variable:
ognibene: num [1:1920] 1 1 1 0.404 0.211 ...
..$ :'data.frame': 2401 obs. of 1 variable:
toderici: num [1:2401] 0.584 0.706 1 1 1 ...
probably the problem is in renaming the nested list
why this doesn't work:
```
fun_violin_plot <- function(df) gg <- ggplot(df,aes(x='',y=df[,1])) +
geom_violin(na.rm = T,scale="count") +
labs(x=NULL,y = NULL)
rast_violin_plot <- rapply(rast_df,fun_violin_plot,how = "unlist",deflt = NA_integer_)
```
> Error in `fortify()`:
! `data` must be a <data.frame>, or an object coercible by `fortify()`, not a list.

Related

How can I write a dataframe to a csv after running scale() in R?

I'm scaling one column in a dataset with the intention of fitting a linear model. However, when I try to write the dataframe (with scaled column) to a csv, it doesn't work because the scaled column became complex with center and scale attributes.
Can someone please indicate how to convert the scaled column to something that can write to a csv? (and maybe why scale() needs to do it this way.)
# make a data frame
testDF <- data.frame(x1 = c(1,2,2,3,2,4,4,5,6,15,36,42,11,12,23,24,25,66,77,18,9),
x2 = c(1,4,5,9,4,15,17,25,35,200,1297,1764,120,150,500,500,640,4200,6000,365,78))
# scale the x1 attribute
testDF <- testDF %>%
mutate(x1_scaled = scale(x1, center = TRUE, scale = TRUE))
# write to csv doesn't work
write_csv(as.matrix(testDF), "testDF.csv")
# but plotting and lm do work
ggplot(testDF, aes(x1_scaled)) +
geom_histogram(aes(y = ..density..),binwidth = 1)
Lm_scaled <- lm(x2 ~ x1_scaled, data = testDF)
plot(Lm_scaled)
scale returns a matrix output. We could extract the column or use as.vector to remove the dim attribute
testDF <- testDF %>%
mutate(x1_scaled = as.vector(scale(x1, center = TRUE, scale = TRUE)))
Check the structure of the output without as.vector and with as.vector
> testDF %>%
+ mutate(x1_scaled = scale(x1, center = TRUE, scale = TRUE)) %>% str
'data.frame': 21 obs. of 3 variables:
$ x1 : num 1 2 2 3 2 4 4 5 6 15 ...
$ x2 : num 1 4 5 9 4 15 17 25 35 200 ...
$ x1_scaled: num [1:21, 1] -0.824 -0.776 -0.776 -0.729 -0.776 ...
..- attr(*, "scaled:center")= num 18.4
..- attr(*, "scaled:scale")= num 21.2
> testDF %>%
+ mutate(x1_scaled = as.vector(scale(x1, center = TRUE, scale = TRUE))) %>% str
'data.frame': 21 obs. of 3 variables:
$ x1 : num 1 2 2 3 2 4 4 5 6 15 ...
$ x2 : num 1 4 5 9 4 15 17 25 35 200 ...
$ x1_scaled: num -0.824 -0.776 -0.776 -0.729 -0.776 ...
You can simply convert the scale column to numeric in base R and write out the dataframe:
testDF$x1_scaled <- as.numeric(testDF$x1_scaled)
write_csv(testDF, "testDF.csv")

How to retrieve name of element in list (data frame) to use it as a title of the plot?

So briefly and without further ado - is it possible to retrieve only a name of element in list and use it as a main title of plot?
Let me explain - example:
Let's create a random df:
a <- c(1,2,3,4)
b <- runif(4)
c <- runif(4)
d <- runif(4)
e <- runif(4)
f <- runif(4)
df <- data.frame(a,b,c,d,e,f)
head(df)
a b c d e f
1 1 0.9694204 0.9869154 0.5386678 0.39331278 0.15054698
2 2 0.8949330 0.9910894 0.1009689 0.03632476 0.15523628
3 3 0.4930752 0.7179144 0.6957262 0.36579883 0.32006026
4 4 0.4850141 0.5539939 0.3196953 0.14348259 0.05292068
Then I want to create a list of data frame (based on this above) with specific columns to make a plot. In other words I'd like to make plot where first column of df (a) will be x axis on the plot and columns b,c,d,e and gonna represent values on y axis on the plot. Yes there'll be 5 plots - that's the point!
So my idea was to write some simple function which be able to create a list of df's based on that created above so:
my_fun <- function(x){
a <- df[1]
b <- x
aname <- "x_label"
bname <- "y_label"
df <- data.frame(a,b)
names(df) <- c(aname,bname)
return(df)
}
Run it for all (specified) columns:
df_s <- apply(df[,2:6], 2, function(x) my_fun(x))
So I have now:
class(df_s)
[1] "list"
str(df_s)
List of 5
$ b:'data.frame': 4 obs. of 2 variables:
..$ x_label: num [1:4] 1 2 3 4
..$ y_label: num [1:4] 0.969 0.895 0.493 0.485
$ c:'data.frame': 4 obs. of 2 variables:
..$ x_label: num [1:4] 1 2 3 4
..$ y_label: num [1:4] 0.987 0.991 0.718 0.554
$ d:'data.frame': 4 obs. of 2 variables:
..$ x_label: num [1:4] 1 2 3 4
..$ y_label: num [1:4] 0.539 0.101 0.696 0.32
$ e:'data.frame': 4 obs. of 2 variables:
..$ x_label: num [1:4] 1 2 3 4
..$ y_label: num [1:4] 0.3933 0.0363 0.3658 0.1435
$ f:'data.frame': 4 obs. of 2 variables:
..$ x_label: num [1:4] 1 2 3 4
..$ y_label: num [1:4] 0.1505 0.1552 0.3201 0.0529
Something that I wanted, but here's the question. I'd like to create a plot for every df in my list... As a result I want 5 plots with main titles b, c, d, e, f respectively Axis labels are the same name of the plot isn't... So I tried:
lapply(df_s, function(x) plot(x[2] ~ x[1], data = x, main = ???))
What should be instead of question marks? I tried main = names(df_s)[x] however it didin't work...
I think the following works. However, I think it might be best to use ggplot2 instead of the plot function (unless you are saving the plots inside inside lapply).
lapply(1 : length(df_s), function(x)
plot(df_s[[x]][,2] ~ df_s[[x]][,1],
xlab = names(df_s[[x]])[1],
ylab = names(df_s[[x]])[1],
main = names(df_s[x])))
With ggplot2
plot_lst <- lapply(seq_along(df_s), function(i) {
ggplot(df_s[[i]], aes(x=x_label, y=y_label)) +
geom_point() +
theme(plot.title = element_text(hjust = 0.5)) +
ggtitle(names(df_s)[i]) })

Add level to R Data frame

Let's say we have a data frame/table organized like this
x$user1, x$user2, etc..
x$usern is a data table with attributes like $age, $department, $sale, $price, etc.
I would like to "push" and regroup the data frame in x$usern to one lower level, so that I can add other data tables below x$usern
Perhaps it's better with illustration : the current structure is
x
$user1 $user2
$price,$age, etc. $price, $age, etc.
Target structure is
x
$user1 $user2
$data $stat $data $stat
$price,$age, etc. $min, $max, etc. $price,$age, etc. $min, $max, etc.
What would be the best way to achieve this. I am thinking of lapply and/or loop through all user, but perhaps there is a more elegant way to do this ?
Thank you.
This seems like a good place for lapply (or one of its kin). Some mock data:
x <- list(
user1 = data.frame(price = 11, age = 12),
user2 = data.frame(price = 21, age = 22)
)
str(x)
# List of 2
# $ user1:'data.frame': 1 obs. of 2 variables:
# ..$ price: num 11
# ..$ age : num 12
# $ user2:'data.frame': 1 obs. of 2 variables:
# ..$ price: num 21
# ..$ age : num 22
The transformation:
newx <- lapply(x, function(l) {
st <- data.frame(min = 0.9*min(l$price), max = 1.1*max(l$age))
list(data = l, stat = st)
})
str(newx)
# List of 2
# $ user1:List of 2
# ..$ data:'data.frame': 1 obs. of 2 variables:
# .. ..$ price: num 11
# .. ..$ age : num 12
# ..$ stat:'data.frame': 1 obs. of 2 variables:
# .. ..$ min: num 9.9
# .. ..$ max: num 13.2
# $ user2:List of 2
# ..$ data:'data.frame': 1 obs. of 2 variables:
# .. ..$ price: num 21
# .. ..$ age : num 22
# ..$ stat:'data.frame': 1 obs. of 2 variables:
# .. ..$ min: num 18.9
# .. ..$ max: num 24.2
(Obviously, my definition of st would have to be tailored to your needs. Additionally, it does not strictly need to be defined within the lapply, but it makes sense to do it there if you already know its definition based on x$user1$....)

Flatten or unlist a data frame in R

I am using the Googleway package to get the elevation information for a bunch of lat long coordinates of which there are 954 in total.
I've broken the calls into 3 separate files but they're in list format and when I convert them to a dataframe they are in nested dataframe formats. I've been trying to flatten the files and unlist them but I am having no success.
DF <- read.csv("Site Coor R.csv", header = T, colClasses = c("numeric","numeric"))
result1 <- google_elevation(df_locations = DF[1:350,], key = "KEY")
result2 <- google_elevation(df_locations = DF[351:700,], key = "KEY")
result3 <- google_elevation(df_locations = DF[701:954,], key = "KEY")
> str(result1)
List of 2
$ results:'data.frame': 350 obs. of 3 variables:
..$ elevation : num [1:350] 14.15 2.14 2.66 6.78 23.27 ...
..$ location :'data.frame': 350 obs. of 2 variables:
.. ..$ lat: num [1:350] 52.7 52.7 52.7 52.9 52.7 ...
.. ..$ lng: num [1:350] -8.61 -8.83 -8.92 -8.98 -8.91 ...
..$ resolution: num [1:350] 611 611 611 611 611 ...
$ status : chr "OK"
do.call("c", result1[["location"]])
or
result1 <- unlist(result1, recursive = TRUE, use.names = TRUE)
or
write.table(data.frame(subset(result1DF,select=-c(results.location)),unclass(result1DF$results.location)))
Since result1, result2 and result3 are of the same structure is there a simple way to merge them, flatten the conjoined table and then export as CSV?
We can get all the objects in a list and create data.frame in a single call
lst <- lapply(mget(paste0("result", 1:3)), function(x) do.call(data.frame, x$results))
str(lst[[1]])
#'data.frame': 12 obs. of 3 variables:
#$ elevation : num -0.546 0.537 0.42 -0.584 0.847 ...
#$ location.lat: int 61 85 53 80 82 52 66 62 68 57 ...
#$ location.lng: int 11 7 10 19 1 -2 -6 -8 -14 -13 ...
If we need a single table, then rbind them together
library(data.table)
dt <- rbindlist(lst)
fwrite(dt, file = "yourfile.csv")
data
f1 <- function(seed){
set.seed(seed)
results <- data.frame(elevation = rnorm(12))
results$location <- data.frame(lat = sample(50:100, 12, replace=TRUE),
lng = sample(-15:20, 12, replace=TRUE))
results
}
result1 <- list(results = f1(24), status = "OK")
result2 <- list(results = f1(42), status = "OK")
result3 <- list(results = f1(343), status = "OK")

How to perform function on a list of dataframes

I have a list of dataframes as follows (dput is way too big even with head=1 so I've had to do a mockup here with str(df_list))
$ OC_AH_026C :'data.frame': 13081 obs. of 3 variables:
..$ chr : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
..$ Means : num [1:13081] 45.183 111.038 162.785 -0.712 83.473 ...
$ OC_AH_026C.1:'data.frame': 13081 obs. of 3 variables:
..$ chr : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
..$ Means : num [1:13081] 69.6 125.1 156.4 12.8 97.4 ...
$ OC_AH_026T :'data.frame': 13081 obs. of 3 variables:
..$ chr : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
..$ Means : num [1:13081] 13 12.5 103.1 56.7 145.4 ...
$ OC_AH_058T :'data.frame': 13081 obs. of 3 variables:
..$ chr : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
..$ Means : num [1:13081] 87.114 118.963 184.31 -0.173 171.733 ...
$ OC_AH_084T :'data.frame': 13081 obs. of 3 variables:
..$ chr : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
..$ Means : num [1:13081] 29.111 103.142 57.476 -0.712 50.156 ...
$ OC_AH_086T :'data.frame': 13081 obs. of 3 variables:
..$ chr : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
..$ Means : num [1:13081] 49.8 81 111.5 47 98.8 ...
$ OC_AH_088T :'data.frame': 13081 obs. of 3 variables:
..$ chr : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
..$ Means : num [1:13081] 117 152 224 121 196 ...
$ OC_AH_096T :'data.frame': 13081 obs. of 3 variables:
..$ chr : num [1:13081] 1 1 1 1 1 1 1 1 1 1 ...
..$ leftPos: num [1:13081] 736092 818159 4105086 4140849 4464314 ...
..$ Means : num [1:13081] 49.5 102.8 93.6 15.2 103.2 ...
I am trying to calculate all the significant scores for each of the third column of each dataframe (Means grouped into bins using dplyr) and if they are significantly elevated they are ascribed a 1 ,significantly depressed a -1 and neither, a zero in a new column for each dataframe.
To do the grouping I have done as follows which works fine:
CLL <- function (col) {
col <- col %>%
group_by(chr, binnum = (leftPos) %/% 500000) %>%
summarise(Means = mean(Means)) %>%
mutate(leftPos = (binnum+1) * 120000) %>%
select(leftPos, Means)}
CML<-lapply(df_list, CLL)
I am stuck on then calculating the upper and lower limits for each Means column in each dataframe. I think this is because I do not know how to reference this column because it is in a list of dataframes. For a non list dataframe I use:
UL = median(col2, na.rm = TRUE) + alpha*IQR(col2[1], na.rm = TRUE)
LL = median(col2, na.rm = TRUE) - alpha*IQR(col2, na.rm = TRUE)
I have tried to reference the third column of each dataframe as follows:
tre<-lapply(CML, "[[", 3)
but of course this extracts the third column and puts it in 'tre' whereas I want to alter the dataframes in the list so that the third column has its relationship with the other two columns maintained.
So.....
a) How do I reference the Means column and get the upper and lower limit of each dataframe and then
b) on the basis of whether the row in the Means column of each dataframe are >upper limit or
This is what you can do, which is similar to #Roland's answer.
Say that you have data that looks like this (a simplified version of the data you showed):
df_list <- list(OC_AH_026C = data.frame(chr = 1,
leftPos= c(73, 81, 41, 44),
Means = c(111, 111, 162, -0.7)),
OC_AH_026C.1 = data.frame(chr = 1,
leftPos = c(73, 81, 41, 44),
Means = c(69, 125, 156, 12)))
You can use lapply to "loop" through the elements of the list like this, which calculates the UL and LL of an input (defaults to "leftPos"), additionally, it calculates a binary column (res) which indicates if the Means-value is outside of the confidence-interval:
df_list2 <- lapply(df_list, function(df, alpha, col2) {
# perform all your calculations here
df$LL <- median(df[, col2], na.rm = T) - alpha*IQR(df[, col2], na.rm = T)
df$UL <- median(df[, col2], na.rm = T) + alpha*IQR(df[, col2], na.rm = T)
# -1 if Means < LL,
# 1 if Means > UL
# 0 otherwise, nest the operators
# if you wish to calculate more complex conditions
df$res <- 0 + ((df$Means < df$LL)*(-1)) + ((df$Means > df$UL)*1)
return(df)
}, alpha = 0.95, col2 = "Means")
df_list2
# $OC_AH_026C
# chr leftPos Means LL UL res
# 1 1 73 111.0 72.35875 149.6412 0
# 2 1 81 111.0 72.35875 149.6412 0
# 3 1 41 162.0 72.35875 149.6412 1
# 4 1 44 -0.7 72.35875 149.6412 -1
#
# $OC_AH_026C.1
# chr leftPos Means LL UL res
# 1 1 73 69 22.9 171.1 0
# 2 1 81 125 22.9 171.1 0
# 3 1 41 156 22.9 171.1 0
# 4 1 44 12 22.9 171.1 -1
(I hope I got your question right of what you need, otherwise let me know and I will correct the answer).
data.table way
For the sake of the completeness, I incude a data.table-way, which is faster (but gets rid of the list-structure). The approach looks like this:
library(data.table)
library(magrittr) # for some piping
# combine all listed data.frames to one data.table with another column, which indicates the name
dt <- lapply(1:length(df_list), function(i) {
nam <- names(df_list)[i]
df <- df_list[[i]]
tmpdt <- data.table(name = nam, df)
}) %>% rbindlist
# calculate the limits
alpha = 0.95
dt[, LL := median(Means, na.rm = T) - alpha*IQR(Means, na.rm = T), by = name]
dt[, UL := median(Means, na.rm = T) + alpha*IQR(Means, na.rm = T), by = name]
dt[, res := 0 + ((df$Means < df$LL)*(-1)) + ((df$Means > df$UL)*1)]

Resources