ggplot2: fortify copies all SpPolyDaFr entries multiple times - r

My goal is to plot this shapefile colored by a specific column.
It contains 100 polygons. I apply fortify() on it and join some missing columns
# convert SpPolyDaFrame into normal dataFrame for plotting
data.df = fortify(data)
# join missing columns
data#data$id = rownames(data#data)
data.df$perc_ch = data#data$perc_ch
data.df = left_join(data.df, data#data, by=c('id'='id'))
After calling fortify(), every entry exists five times. (see 'order').
Calling str() on 'data.df':
'data.frame': 500 obs. of 11 variables:
$ long : num 421667 421667 416057 416057 421667 ...
$ lat : num 8064442 8060421 8060421 8064442 8064442 ...
$ order : int 1 2 3 4 5 1 2 3 4 5 ...
$ hole : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ piece : Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
$ id : chr "0" "0" "0" "0" ...
$ group : Factor w/ 100 levels "0.1","1.1","2.1",..: 1 1 1 1 1 2 2 2 2 2 ...
$ perc_ch.x: num 17.4 11.4 20.5 12 15 ...
$ z : int 1 1 1 1 1 2 2 2 2 2 ...
$ Ch_area : num 3914498 3914498 3914498 3914498 3914498 ...
$ perc_ch.y: num 17.4 17.4 17.4 17.4 17.4 ...
This is introduced by fortify(). However, it does not change the plot outcome as long as I join the missing columns based on a matching column (= perc_ch.y).
If I add missing columns without a matching index (=perc_ch.x), I run in troubles because of the redundant entries because wrong values are assigned to the polygons.
I do not see a reason for this copy effect?

No need to bind the data to the polygons:
library(rgeos)
library(maptools)
library(rgdal)
URL <- "https://www.dropbox.com/s/rsr49jwm1pf9abu/data.zip?dl=1"
fil <- "sodata.zip"
if (!file.exists(fil)) download.file(URL, fil)
fils <- unzip(fil)
shp <- grep("shp$", fils, value=TRUE)
geo <- readOGR(shp, ogrListLayers(shp)[[1]], stringsAsFactors=FALSE, verbose=FALSE)
geo_map <- fortify(geo, region="z")
gg <- ggplot()
gg <- gg + geom_map(data=geo_map, map=geo_map,
aes(x=long, y=lat, map_id=id),
color=NA, size=0, fill=NA)
gg <- gg + geom_map(data=geo#data, map=geo_map,
aes(fill=perc_ch, map_id=z),
color="#2b2b2b", size=0.15)
gg <- gg + viridis::scale_fill_viridis()
gg <- gg + ggthemes::theme_map()
gg <- gg + theme(legend.position="right")
gg

Related

how to plot dataframe nest list in group?

I have a directory structure with rasters
like this:
folder absorbance: farm1.tif, farm2.tif
folder resistance: farm1.tif,farm2.tif
They are rasters in geotiff
I have to make comparison graphs between the two farms (farm1 and farm2) for the same type of data (absorbance, resistance).
Farm1 and Farm2 are not stackable, so I don't use rasterStacks
Farm1 and Farm2 can have a different number of cells.
I went so far as to create nested lists of dataframes
```
raster_dir <- c(list.dirs(path = cartella,recursive = F,full.names = F))
raster_files <- lapply(raster_dir,function(dir) {
raster_files <- as.list(list.files(path=paste(cartella,dir,sep='/'),
pattern = "\\.tif$",
full.names = TRUE,
recursive = F))
})
names(raster_files) <- raster_dir
rasters <- rapply(raster_files,rast,how = "list",deflt = NA_integer_)
rast_df <- rapply(rasters,terra::as.data.frame,how = "list",deflt = NA_integer_)
```
> str(rast_df)`
List of 2
absorbance:List of 2
..$ :'data.frame': 1920 obs. of 1 variable:
ognibene: num [1:1920] 1 1 1 0.000392 0.000252 ...
..$ :'data.frame': 2401 obs. of 1 variable:
toderici: num [1:2401] 0.000467 0.000565 1 1 1 ...
resistance :List of 3
..$ :'data.frame': 1920 obs. of 1 variable:
ognibene: num [1:1920] 1 1 1 0.404 0.211 ...
..$ :'data.frame': 2401 obs. of 1 variable:
toderici: num [1:2401] 0.584 0.706 1 1 1 ...
probably the problem is in renaming the nested list
why this doesn't work:
```
fun_violin_plot <- function(df) gg <- ggplot(df,aes(x='',y=df[,1])) +
geom_violin(na.rm = T,scale="count") +
labs(x=NULL,y = NULL)
rast_violin_plot <- rapply(rast_df,fun_violin_plot,how = "unlist",deflt = NA_integer_)
```
> Error in `fortify()`:
! `data` must be a <data.frame>, or an object coercible by `fortify()`, not a list.

How to retrieve name of element in list (data frame) to use it as a title of the plot?

So briefly and without further ado - is it possible to retrieve only a name of element in list and use it as a main title of plot?
Let me explain - example:
Let's create a random df:
a <- c(1,2,3,4)
b <- runif(4)
c <- runif(4)
d <- runif(4)
e <- runif(4)
f <- runif(4)
df <- data.frame(a,b,c,d,e,f)
head(df)
a b c d e f
1 1 0.9694204 0.9869154 0.5386678 0.39331278 0.15054698
2 2 0.8949330 0.9910894 0.1009689 0.03632476 0.15523628
3 3 0.4930752 0.7179144 0.6957262 0.36579883 0.32006026
4 4 0.4850141 0.5539939 0.3196953 0.14348259 0.05292068
Then I want to create a list of data frame (based on this above) with specific columns to make a plot. In other words I'd like to make plot where first column of df (a) will be x axis on the plot and columns b,c,d,e and gonna represent values on y axis on the plot. Yes there'll be 5 plots - that's the point!
So my idea was to write some simple function which be able to create a list of df's based on that created above so:
my_fun <- function(x){
a <- df[1]
b <- x
aname <- "x_label"
bname <- "y_label"
df <- data.frame(a,b)
names(df) <- c(aname,bname)
return(df)
}
Run it for all (specified) columns:
df_s <- apply(df[,2:6], 2, function(x) my_fun(x))
So I have now:
class(df_s)
[1] "list"
str(df_s)
List of 5
$ b:'data.frame': 4 obs. of 2 variables:
..$ x_label: num [1:4] 1 2 3 4
..$ y_label: num [1:4] 0.969 0.895 0.493 0.485
$ c:'data.frame': 4 obs. of 2 variables:
..$ x_label: num [1:4] 1 2 3 4
..$ y_label: num [1:4] 0.987 0.991 0.718 0.554
$ d:'data.frame': 4 obs. of 2 variables:
..$ x_label: num [1:4] 1 2 3 4
..$ y_label: num [1:4] 0.539 0.101 0.696 0.32
$ e:'data.frame': 4 obs. of 2 variables:
..$ x_label: num [1:4] 1 2 3 4
..$ y_label: num [1:4] 0.3933 0.0363 0.3658 0.1435
$ f:'data.frame': 4 obs. of 2 variables:
..$ x_label: num [1:4] 1 2 3 4
..$ y_label: num [1:4] 0.1505 0.1552 0.3201 0.0529
Something that I wanted, but here's the question. I'd like to create a plot for every df in my list... As a result I want 5 plots with main titles b, c, d, e, f respectively Axis labels are the same name of the plot isn't... So I tried:
lapply(df_s, function(x) plot(x[2] ~ x[1], data = x, main = ???))
What should be instead of question marks? I tried main = names(df_s)[x] however it didin't work...
I think the following works. However, I think it might be best to use ggplot2 instead of the plot function (unless you are saving the plots inside inside lapply).
lapply(1 : length(df_s), function(x)
plot(df_s[[x]][,2] ~ df_s[[x]][,1],
xlab = names(df_s[[x]])[1],
ylab = names(df_s[[x]])[1],
main = names(df_s[x])))
With ggplot2
plot_lst <- lapply(seq_along(df_s), function(i) {
ggplot(df_s[[i]], aes(x=x_label, y=y_label)) +
geom_point() +
theme(plot.title = element_text(hjust = 0.5)) +
ggtitle(names(df_s)[i]) })

Heatmap with leaflet

I get geojson from Austin Government with polygons of districts and names.
gjson_austin <- geojson_read('Neighborhood Planning Areas.geojson',
method = "local", what = "sp")
gjson_austin#proj4string<-CRS("+proj=longlat +datum=WGS84")
gjson_austin<-as(gjson_austin, 'SpatialPolygons')
place_name_austin = fread("ci6h-vmgm.csv")
After that, i make DF:
austin_frame <- gjson_austin%>%fortify()
austin_frame$id <- as.numeric(austin_frame$id)
objects_austin <- left_join(austin_frame, place_name_austin%>%select(FID, PLANNING_A), by = c("id"="FID"))
Now i have a dataframe:
'data.frame': 10268 obs. of 8 variables:
$ long : num -97.7 -97.7 -97.7 -97.7 -97.7 ...
$ lat : num 30.3 30.3 30.3 30.3 30.3 ...
$ order : int 1 2 3 4 5 6 7 8 9 10 ...
$ hole : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ piece : Factor w/ 16 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
$ id : num 0 0 0 0 0 0 0 0 0 0 ...
$ group : Factor w/ 126 levels "0.1","1.1","2.1",..: 1 1 1 1 1 1 1 1 1 1 ...
$ PLANNING_A: chr "CHESTNUT" "CHESTNUT" "CHESTNUT" "CHESTNUT" ...
Polygons grouped by id. Every polygon has a name in PLANNING_A column.
I want to draw this polygons with leaflet.
leaflet() %>% addTiles()%>%
addPolygons(data = objects_austin, lng = ~long, lat = ~lat)
But i get not what i need.
I can draw polygons if I use
leaflet() %>% addTiles()%>%
addPolygons(data=gjson_austin)
but I need values from json for my calculations.
Help me please with this problem.
After that I calculate any special values grouped by polygons. I want draw correct polygons by this values like a heatmap.

Shading a specific area using a density plot - ggplot2

I have a data visualization question regarding ggplot2.
I'm trying to figure out how can I shade a specificity area in my density_plot. I googled it a lot and I tried all solutions.
My code is:
original_12 <- data.frame(sum=rnorm(100,30,5), sex=c("M","F"))
cutoff_12 <- 35
ggplot(data=original_12, aes(original_12$sum)) + geom_density() +
facet_wrap(~sex) +
geom_vline(data=original_12, aes(xintercept=cutoff_12),
linetype="dashed", color="red", size=1)
So, from this:
I want this:
The question on ggplot2 shade area under density curve by group is different than mine because they use different groups and graphs.
Similar to this SO question except the facet adds an additional complexity.
You need to rename the PANEL data as "sex" and factor it correctly to match your already existing aesthetic option. Your original "sex" factor is ordered alphabetically (default data.frame option), which is a little confusing at first.
make sure you name your plot "p" to create a ggplot object:
p <- ggplot(data=original_12, aes(original_12$sum)) +
geom_density() +
facet_wrap(~sex) +
geom_vline(data=original_12, aes(xintercept=cutoff_12),
linetype="dashed", color="red", size=1)
The ggplot object data can be extracted...here is the structure of the data:
str(ggplot_build(p)$data[[1]])
'data.frame': 1024 obs. of 16 variables:
$ y : num 0.00114 0.00121 0.00129 0.00137 0.00145 ...
$ x : num 17 17 17.1 17.1 17.2 ...
$ density : num 0.00114 0.00121 0.00129 0.00137 0.00145 ...
$ scaled : num 0.0121 0.0128 0.0137 0.0145 0.0154 ...
$ count : num 0.0568 0.0604 0.0644 0.0684 0.0727 ...
$ n : int 50 50 50 50 50 50 50 50 50 50 ...
$ PANEL : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ group : int -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
$ ymin : num 0 0 0 0 0 0 0 0 0 0 ...
$ ymax : num 0.00114 0.00121 0.00129 0.00137 0.00145 ...
$ fill : logi NA NA NA NA NA NA ...
$ weight : num 1 1 1 1 1 1 1 1 1 1 ...
$ colour : chr "black" "black" "black" "black" ...
$ alpha : logi NA NA NA NA NA NA ...
$ size : num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
$ linetype: num 1 1 1 1 1 1 1 1 1 1 ...
It cannot be used directly because you need to rename the PANEL data and factor it to match your original dataset. You can extract the data from the ggplot object here:
to_fill <- data_frame(
x = ggplot_build(p)$data[[1]]$x,
y = ggplot_build(p)$data[[1]]$y,
sex = factor(ggplot_build(p)$data[[1]]$PANEL, levels = c(1,2), labels = c("F","M")))
p + geom_area(data = to_fill[to_fill$x >= 35, ],
aes(x=x, y=y), fill = "red")
#DATA
set.seed(2)
original_12 <- data.frame(sum=rnorm(100,30,5), sex=c("M","F"))
cutoff_12 <- 35
#Calculate density for each sex
temp = do.call(rbind, lapply(split(original_12, original_12$sex), function(a){
d = density(a$sum)
data.frame(sex = a$sex[1], x = d$x, y = d$y)
}))
#For each sex, seperate the data for the shaded area
temp2 = do.call(rbind, lapply(split(temp, temp$sex), function(a){
rbind(data.frame(sex = a$sex[1], x = cutoff_12, y = 0), a[a$x > cutoff_12,])
}))
#Plot
ggplot(temp) +
geom_line(aes(x = x, y = y)) +
geom_vline(xintercept = cutoff_12) +
geom_polygon(data = temp2, aes(x = x, y = y)) +
facet_wrap(~sex) +
theme_classic()

ggplot2 facet_wrap geom_text not accepting date values

I have a small data set, local, (5 observations) with two types: a and b.
Each observation has a Date field (p.start), a ratio, and a duration.
local
principal p.start duration allocated.days ratio
1 P 2015-03-18 1 162.0000 162.0000
2 V 2015-08-28 4 24.0000 6.0000
3 V 2015-09-03 1 89.0000 89.0000
4 V 2015-03-30 1 32.0000 32.0000
5 P 2015-01-29 1 150.1667 150.1667
str(local)
'data.frame': 5 obs. of 5 variables:
$ principal : chr "P" "V" "V" "V" ...
$ p.start : Date, format: "2015-03-18" "2015-08-28" "2015-09-03" "2015-03-30" ...
$ duration : Factor w/ 10 levels "1","2","3","4",..: 1 4 1 1 1
$ allocated.days: num 162 24 89 32 150
$ ratio : num 162 6 89 32 150
I have another data frame, stats, with text to be added to a faceted plot.
stats
principal xx yy zz
1 P 2015-02-28 145.8 Average = 156
2 V 2015-02-28 145.8 Average = 24
str(stats)
'data.frame': 2 obs. of 4 variables:
$ principal: chr "P" "V"
$ xx : Date, format: "2015-02-28" "2015-02-28"
$ yy : num 146 146
$ zz : chr "Average = 156" "Average = 24"
The following code fails:
p = ggplot (local, aes (x = p.start, y = ratio, size = duration))
p = p + geom_point (colour = "blue"); p
p = p + facet_wrap (~ principal, nrow = 2); p
p = p + geom_text(aes(x=xx, y=yy, label=zz), data= stats)
p
Error: Continuous value supplied to discrete scale
Any ideas? I'm missing something obvious.
The problem is that you are plotting from 2 data.frames, but your initial ggplot call includes aes parameters referring to just the local data.frame.
So although your geom_text specifies data=stats, it is still looking for size=duration.
The following line works for me:
ggplot(local) +
geom_point(aes(x=p.start, y=ratio, size=duration), colour="blue") +
facet_wrap(~ principal, nrow=2) +
geom_text(data=stats, aes(x=xx, y=yy, label=zz))
Just remove size = duration from ggplot (local, aes (x = p.start, y = ratio, size = duration)) and add it into geom_point (colour = "blue"). Then, it should work.
ggplot(local, aes(x=p.start, y=ratio))+
geom_point(colour="blue", aes(size=duration))+
facet_wrap(~principal, nrow=2)+
geom_text(aes(x=xx, y=yy, label=zz), data=stats)

Resources