Creating a heatmap based on values in R - r

I try to generate a heatmap based on values.
Here is my dataset which consists of three variables: Lat (latitude), Lon (longitude), and Value.
https://www.dropbox.com/s/s53xeplywz9jh15/sample_data.csv?dl=0
I have looked through the relevant posts and found this useful:
Generating spatial heat map via ggmap in R based on a value
I copied the code in that post and here my code looks like:
# import data and libaries
library(ggplot2)
library(ggmap)
Yunan<-read.csv("C:\\Program Files\\RStudio\\data\\pb_sp\\sample_data.csv", header = TRUE)
# call the map to see point distribution
Yunan_map<-get_map(location="yunan",zoom=6,maptype="terrain",scale=2)
ggmap(Yunan_map)+geom_point(data=Yunan,aes(x=Yunan$Lon,y=Yunan$Lat,fill="red",alpha=0.3,size=0.05,shape=21))+scale_shape_identity()
# 1. generate bins for x, y coordinates (unit=decimal degree)
xbreaks <- seq(floor(min(Yunan$Lat,na.rm=TRUE)), ceiling(max(Yunan$Lat,na.rm=TRUE)), by = 0.5)
ybreaks <- seq(floor(min(Yunan$Lon,na.rm=TRUE)), ceiling(max(Yunan$Lon,na.rm=TRUE)), by = 0.5)
# 2. allocate the data points into the bins
Yunan$latbin <- xbreaks[cut(Yunan$Lat, breaks = xbreaks, labels=F)]
Yunan$longbin <- ybreaks[cut(Yunan$Lon, breaks = ybreaks, labels=F)]
# 3. summarise the data for each bin (use the median)
datamat <- Yunan[, list(Value= median(Value)),
by = c("latbin", "longbin" )]
# 4. Merge the summarised data with all possible x, y coordinate combinations to get
# a value for every bin
datamat <- merge(setDT(expand.grid(latbin = xbreaks, longbin = ybreaks)), datamat,
by = c("latbin", "longbin"), all.x = TRUE, all.y = FALSE)
# 5. Fill up the empty bins 0 to smooth the contour plot
datamat[is.na(Value), ]$Value <- 0
# 6. Plot the contours
ggmap(Yunan_map,extent ="device") +
stat_contour(data = datamat, aes(x = longbin, y = latbin, z = Value,
fill = ..level.., alpha = ..level..), geom = 'polygon', binwidth = 30) +
scale_fill_gradient(name = "Value", low = "green", high = "red") +
guides(alpha = FALSE)
However, I encountered two problems
After executing the step 3 (summarise the data for each bin), I got this error message:
Error in [.data.frame(Yunan, , list(Value = median(Value)), by = c("latbin", :
unused argument (by = c("latbin", "longbin"))
I wish to change the colour scheme from gradient to discrete colours, something like this map:
Since the values in my dataset range from 17 to 21, I want to classify them in to different bins such as 17-17.5, 17.5-18, 18-18.5.... with corresponding colours.
Any suggestions that I can fix these problems. Thanks in advance.

Related

Specify number bins when passing sf object to ggmap R

I have a dataframe object, created by reading in a shape file with sf::read_sf and merged with some pre-existing data with a common geography column:
boundaries <- sf::read_sf('./shapefile')
map <- merge(boundaries, data, by.x = "InterZone",
by.y = "IntermediateZone2011Code", all.x = FALSE, duplicateGeoms = TRUE)
This is then overlaid using ggmap on top of a provider tile obtained with the sf get_map function:
myMap <- get_map(location = c(lon = -2.27, lat = 57.1), zoom = 6,
maptype="toner", crop=FALSE, source = 'stamen')
ggmap(myMap) +
geom_sf(data = map, aes(fill=as.factor(column1)), inherit.aes = FALSE) +
scale_fill_brewer(palette = "OrRd") +
coord_sf(crs = st_crs(4326)) +
labs(x = 'Longitude', y = 'Latitude', fill = 'column1') +
ggtitle('column1')
The issue is that this auto creates hundreds of bins.
I have been looking through the documentation but cannot find an additional argument to specify the number of bins. How can I make it clear to breakdown the column by a fixed number of bins and then map this?
Without a reproducible example it is hard to say exactly what is going on, but it looks like you might be converting a continuous variable into a factor with fill=as.factor(column1).
One option is you remove as.factor and use scale_fill_continuous or some other continuous color scale of your choice.
Another option is to look into cut, where you bin continuous data by specifying the number of bins, or the specific start and end points of your bins.
# Make n bins
map$data_bin <- cut(map$column, breaks = n )
# Or make specific start and end points for bins
map$data_bin <- cut(map$column, breaks = c(-Inf,50,100,Inf) )

Using ggmap. What the error "Computation failed in `stat_density2d()`: missing value where TRUE/FALSE needed" means?

I am using R as a GIS tool for creating maps.
I wanted to create a contour, or heatmap of a species distribution on a geographical area.
I wanted to see on the map where each species (animal or plant) is present and color the area in a specific color.
I am using a dataset downloaded from GBIF.
You can download the datasets from my GitHub ([https://github.com/RosarioIacono/stackoverflow_question/blob/master/species2t.csv][1]).
species <- read.delim("./species.txt")
library(readr)
species2t <- read_csv("species2t.csv")
View(species2t)
ggmap(map1)+
stat_density_2d(data = subset(species2t, order=="Anseriformes"),
aes( x = decimalLongitude,
y = decimalLatitude,
fill = ..level..),
alpha = .3,
geom = "polygon",
color = species)+
theme(legend.position = "none")
But I get an error:
Error: Aesthetics must be either length 1 or the same as the data (190): colour
I don't have your data frame, but I think your problem comes with one of the groups having n=1. This can be caused by, some of your species_dens's longitude and latitude being out of the map:
library("ggmap")
map <- c(left = -0.7, bottom = 53, right =-0.3 , top =53.6 )
map1<-get_stamenmap(map, maptype = "toner-lite")
#simulate data
species_dens = data.frame(species=c("A","B","A","B"),
decimalLongitude=c(-0.4,-0.5,-0.3,-0.2),
decimalLatitude=c(53.1,53.2,53.3,53.4))
# returns your error
ggmap(map1)+
geom_density_2d(data = species_dens,aes( x = decimalLongitude,
y = decimalLatitude,
colour = species))
From the above, you can see the last data point is off the map, so if you do geom_density with your limits, the species "B" will have n=1. Using your current dataset, if you set the colors to be species, you still end up with n=1:
library(readr)
species2t <- read_csv("species2t.csv")
X=subset(species2t, order=="Anseriformes")
table(X$species)
Anas crecca Anas platyrhynchos Anas strepera Anser anser
1 1 1 1
Aythya fuligula Cygnus olor Tadorna tadorna
1 1 1
This means you cannot colour according to species. But you see how this order is distributed:
ggmap(map1)+
stat_density_2d(data = X,
aes( x = decimalLongitude,
y = decimalLatitude,
fill = ..level..),
alpha = .3,
geom = "polygon")+
theme(legend.position = "none")

Adding layer of interpolated values to ggplot chart in R

I have created the following dataframe in R to generate a plot using ggplot
library(data.table)
library(ggplot2)
library(plotly)
df <- data.frame("X_Frequency" = c(5, 10, 55, 180, 300, 360, 1000, 2000)
, "X_Axis" = c(0.009185742, 0.207822221, 0.067542222, 0.002597778,
0.002597778, 0.001454756, 0.001454756 , 0.001454756))
Next I have generated a plot using ggplot
B <- ggplot(data = df,
mapping = aes(x = X_Frequency, y = X_Axis)) +
geom_line() + labs(x = "Frequency(Hz)", y="Axis")
B <- ggplotly(B, dynamicTicks = TRUE)###Hovering enabled
B <- layout(B, yaxis = list(type = "log"))##X Y log scales enabled
B <- layout(B, xaxis = list(type = "log"))
B
I have created the following dataframe df241 with interpolated values between various observations in df1. First we create the slopes
df$X_Slope2 <- 0### Initiate slope column
for(i in 2:nrow(df)){
df$X_Slope2[i] = (df$X_Axis[i] - df$X_Axis[i-1]) /
(df$X_Frequency[i] - df$X_Frequency[i - 1])
}
Next we assign the respective slopes to all values
df_new <- bind_cols(df %>%
select(X_Frequency, X_Axis, X_Slope2) %>%
complete(., expand(., X_Frequency = 5:2000))
Now we calculate the interpolated values of X-Frequency, X_Axis from the df_new using slopes
for(i in 1: nrow(df241)){
if(is.na(df241$X_Axis[i]) == T){
df241$X_Axis[i] = df241$X_Slope2[i] *
(df241$X_Frequency[i] - df241$X_Frequency[i-1]) +
df241$X_Axis[i-1] } else {
df241$X_Axis[i] = df241$X_Axis[i]}}
I want to place these interpolated values from df241 on the original chart B generated above. How can this be accomplished. I request someone to help me.
Note: I have tried generating a new plot based df_new dataframe. but the chart appears very different from the original chart -B.
It might be simpler to use the approx function for your interpolation. I believe this gets a similar result as your interpolation steps.
df_interp <- approx(df$X_Frequency, df$X_Axis, xout = 5:2000) %>%
as_tibble() %>%
rename(X_Frequency = x, X_Axis = y)
A linear interpolation may look unexpected on a log-log scale. I was unable to run your code as provided (is df241 created somewhere?), so I'm not sure if this is what you encountered when you said the chart with the interpolated values appears very different.
B <- ggplot(data = df,
mapping = aes(x = X_Frequency, y = X_Axis)) +
geom_line() +
geom_point(data = df_interp, size = 0.1, color = "blue") +
labs(x = "Frequency(Hz)", y="Axis")
B <- ggplotly(B, dynamicTicks = TRUE)###Hovering enabled
B <- layout(B, yaxis = list(type = "log"))##X Y log scales enabled
B <- layout(B, xaxis = list(type = "log"))
B
Edit: interpolation on log scale
Alternatively, you could interpolate using log-transformed inputs, and then use exp to convert back onto the original scale:
df_interp <- approx(log(df$X_Frequency), log(df$X_Axis), xout = log(5:2000)) %>%
as_tibble() %>%
mutate(X_Frequency = exp(x),
X_Axis = exp(y))
Which would result in this:

Generating spatial heat map via ggmap in R based on a value

I'd like to generate a choropleth map using the following data points:
Longitude
Latitude
Price
Here is the dataset - https://www.dropbox.com/s/0s05cl34bko7ggm/sample_data.csv?dl=0.
I would like the map to show the areas where the price is higher and the where price is lower. It should most probably look like this (sample image):
Here is my code:
library(ggmap)
map <- get_map(location = "austin", zoom = 9)
data <- read.csv(file.choose(), stringsAsFactors = FALSE)
data$average_rate_per_night <- as.numeric(gsub("[\\$,]", "",
data$average_rate_per_night))
ggmap(map, extent = "device") +
stat_contour( data = data, geom="polygon",
aes( x = longitude, y = latitude, z = average_rate_per_night,
fill = ..level.. ) ) +
scale_fill_continuous( name = "Price", low = "yellow", high = "red" )
I'm getting the following error message:
2: Computation failed in `stat_contour()`:
Contour requires single `z` at each combination of `x` and `y`.
I'd really appreciate any help on how this can be fixed or any other method to generate this type of heatmap. Please note that I'm interested in the weight of the price, not density of the records.
If you insist on using the contour approach then you need to provide a value for every possible x,y coordinate combination you have in your data. To achieve this I would highly recommend to grid the space and generate some summary statistics per bin.
I attach a working example below based on the data you provided:
library(ggmap)
library(data.table)
map <- get_map(location = "austin", zoom = 12)
data <- setDT(read.csv(file.choose(), stringsAsFactors = FALSE))
# convert the rate from string into numbers
data[, average_rate_per_night := as.numeric(gsub(",", "",
substr(average_rate_per_night, 2, nchar(average_rate_per_night))))]
# generate bins for the x, y coordinates
xbreaks <- seq(floor(min(data$latitude)), ceiling(max(data$latitude)), by = 0.01)
ybreaks <- seq(floor(min(data$longitude)), ceiling(max(data$longitude)), by = 0.01)
# allocate the data points into the bins
data$latbin <- xbreaks[cut(data$latitude, breaks = xbreaks, labels=F)]
data$longbin <- ybreaks[cut(data$longitude, breaks = ybreaks, labels=F)]
# Summarise the data for each bin
datamat <- data[, list(average_rate_per_night = mean(average_rate_per_night)),
by = c("latbin", "longbin")]
# Merge the summarised data with all possible x, y coordinate combinations to get
# a value for every bin
datamat <- merge(setDT(expand.grid(latbin = xbreaks, longbin = ybreaks)), datamat,
by = c("latbin", "longbin"), all.x = TRUE, all.y = FALSE)
# Fill up the empty bins 0 to smooth the contour plot
datamat[is.na(average_rate_per_night), ]$average_rate_per_night <- 0
# Plot the contours
ggmap(map, extent = "device") +
stat_contour(data = datamat, aes(x = longbin, y = latbin, z = average_rate_per_night,
fill = ..level.., alpha = ..level..), geom = 'polygon', binwidth = 100) +
scale_fill_gradient(name = "Price", low = "green", high = "red") +
guides(alpha = FALSE)
You can then play around with the bin size and the contour binwidth to get the desired result but you could additionally apply a smoothing function on the grid to get an even smoother contour plot.
You could use the stat_summary_2d() or stat_summary_hex() function to achieve a similar result. These functions divide the data into bins (defined by x and y), and then the z values for each bin are summarised based on a given function. In the example below I have selected mean as an aggregation function and the map basically shows the average price in each bin.
Note: I needed to treat your average_rate_per_night variable appropriately in order to convert it into numbers (removed the $ sign and the comma).
library(ggmap)
library(data.table)
map <- get_map(location = "austin", zoom = 12)
data <- setDT(read.csv(file.choose(), stringsAsFactors = FALSE))
data[, average_rate_per_night := as.numeric(gsub(",", "",
substr(average_rate_per_night, 2, nchar(average_rate_per_night))))]
ggmap(map, extent = "device") +
stat_summary_2d(data = data, aes(x = longitude, y = latitude,
z = average_rate_per_night), fun = mean, alpha = 0.6, bins = 30) +
scale_fill_gradient(name = "Price", low = "green", high = "red")

Adding dummy values on axis in ggplot2 to add asymmetric distance between ticks

How to add dummy values on x-axis in ggplot2
I have 0,2,4,6,12,14,18,22,26 in data and that i have plotted on x-axis. Is there a way to add the remaining even numbers for which there is no data in table? this will create due spaces on the x-axis.
after the activity the x-axis should show 0,2,4,6,8,10,12,14,16,18,20,22,24,26
i have tried using rbind.fill already to add dummy data but when I make them factor the 8,10,12etc coming in last
Thanks
enter image description here
Hope this make sense:
library(ggplot2)
gvals <- factor(letters[1:3])
xvals <- factor(c(0,2,4,6,12,14,18,22,26), levels = seq(0, 26, by = 2))
yvals <- rnorm(10000, mean = 2)
df <- data.frame(x = sample(xvals, size = length(yvals), replace = TRUE),
y = yvals,
group = sample(gvals, size = length(yvals), replace = TRUE))
ggplot(df, aes(x = x, y = y)) + geom_boxplot(aes(fill = group)) +
scale_x_discrete(drop = FALSE)
The tricks are to make the x-variable with all levels you need and to specify drop = FALSE in scale.

Resources