Create a map of spatial clusters LISA in R - r

I would like to create a map showing local spatial cluster of a phenomenon, preferably using Local Moran (LISA).
In the reproducible example below, I calculate the local moran's index using spdep but I would like to know if there is as simple way to map the clustes, prefebly using ggplot2. Help ?
library(UScensus2000tract)
library(ggplot2)
library(spdep)
# load data
data("oregon.tract")
# plot Census Tract map
plot(oregon.tract)
# create Queens contiguity matrix
spatmatrix <- poly2nb(oregon.tract)
#calculate the local moran of the distribution of black population
lmoran <- localmoran(oregon.tract#data$black, nb2listw(spatmatrix))
Now to make this example more similar to my real dataset, I have some NA values in my shape file, which represent holes in the polygon, so these areas shouldn't be used in the calculation.
oregon.tract#data$black[3:5] <- NA

Here is a strategy:
library(UScensus2000tract)
library(spdep)
library(ggplot2)
library(dplyr)
# load data
data("oregon.tract")
# plot Census Tract map
plot(oregon.tract)
# create Queens contiguity matrix
spatmatrix <- poly2nb(oregon.tract)
# create a neighbours list with spatial weights
listw <- nb2listw(spatmatrix)
# calculate the local moran of the distribution of white population
lmoran <- localmoran(oregon.tract$white, listw)
summary(lmoran)
# padronize the variable and save it to a new column
oregon.tract$s_white <- scale(oregon.tract$white) %>% as.vector()
# create a spatially lagged variable and save it to a new column
oregon.tract$lag_s_white <- lag.listw(listw, oregon.tract$s_white)
# summary of variables, to inform the analysis
summary(oregon.tract$s_white)
summary(oregon.tract$lag_s_white)
# moran scatterplot, in basic graphics (with identification of influential observations)
x <- oregon.tract$s_white
y <- oregon.tract$lag_s_white %>% as.vector()
xx <- data.frame(x, y)
moran.plot(x, listw)
# moran sccaterplot, in ggplot
# (without identification of influential observations - which is possible but requires more effort)
ggplot(xx, aes(x, y)) + geom_point() + geom_smooth(method = 'lm', se = F) + geom_hline(yintercept = 0, linetype = 'dashed') + geom_vline(xintercept = 0, linetype = 'dashed')
# create a new variable identifying the moran plot quadrant for each observation, dismissing the non-significant ones
oregon.tract$quad_sig <- NA
# high-high quadrant
oregon.tract[(oregon.tract$s_white >= 0 &
oregon.tract$lag_s_white >= 0) &
(lmoran[, 5] <= 0.05), "quad_sig"] <- "high-high"
# low-low quadrant
oregon.tract[(oregon.tract$s_white <= 0 &
oregon.tract$lag_s_white <= 0) &
(lmoran[, 5] <= 0.05), "quad_sig"] <- "low-low"
# high-low quadrant
oregon.tract[(oregon.tract$s_white >= 0 &
oregon.tract$lag_s_white <= 0) &
(lmoran[, 5] <= 0.05), "quad_sig"] <- "high-low"
# low-high quadrant
oregon.tract#data[(oregon.tract$s_white <= 0
& oregon.tract$lag_s_white >= 0) &
(lmoran[, 5] <= 0.05), "quad_sig"] <- "low-high"
# non-significant observations
oregon.tract#data[(lmoran[, 5] > 0.05), "quad_sig"] <- "not signif."
oregon.tract$quad_sig <- as.factor(oregon.tract$quad_sig)
oregon.tract#data$id <- rownames(oregon.tract#data)
# plotting the map
df <- fortify(oregon.tract, region="id")
df <- left_join(df, oregon.tract#data)
df %>%
ggplot(aes(long, lat, group = group, fill = quad_sig)) +
geom_polygon(color = "white", size = .05) + coord_equal() +
theme_void() + scale_fill_brewer(palette = "Set1")
This answer was based on this page, suggested by Eli Knaap on twitter, and also borrowed from the answer by #timelyportfolio to this question.
I used the variable white instead of black because black had less explicit results.
Concerning NAs, localmoran() includes the argument na.action, about which the documentation says:
na.action is a function (default na.fail), can also be na.omit or > na.exclude - in these cases the weights list will be subsetted to remove NAs in the data. It may be necessary to set zero.policy to TRUE because this subsetting may create no-neighbour observations. Note that only weights lists created without using the glist argument to nb2listw may be subsetted. If na.pass is used, zero is substituted for NA values in calculating the spatial lag.
I tried:
oregon.tract#data$white[3:5] <- NA
lmoran <- localmoran(oregon.tract#data$white, listw, zero.policy = TRUE,
na.action = na.exclude)
But run into problems in lag.listw but did not have time to look into it. Sorry.

This is obviously very late, but I came across the post while working on something similar. This uses the rgeoda package, which wasn't out when the question was posted, but it's developed by the GeoDa folks to port some of the functionality of that software to R. sf has also really taken off in the meantime, which makes manipulating spatial data very easy; the rgeoda functions generally expect sf objects.
Like another poster, I'm using the white population instead of Black because the clusters show up better. I converted the original data, with those few observations missing, to sf. rgeoda::local_moran doesn't work when there's missing data, but if you make a copy with the missing observations removed, you can run the analysis and join them back together by ID. Use a right join so you retain all the IDs from the original data, including missing values.
Because this mimics GeoDa, the same colors and labels are stored in the LISA object that local_moran returns. Extract those and use them as the color palette. Because the palette is named, and those names don't include "NA", you can add an NA value to the palette vector, or manually specify a color for NA values to make sure those shapes still get drawn. I made it green just so it would be visible (top left corner).
library(UScensus2000tract)
library(ggplot2)
library(dplyr)
library(sf)
library(rgeoda)
# load data
data("oregon.tract")
oregon.tract#data$white[3:5] <- NA
ore_sf <- st_as_sf(oregon.tract) %>%
tibble::rownames_to_column("id")
to_clust <- ore_sf %>%
filter(!is.na(white))
queen_wts <- queen_weights(to_clust)
moran <- local_moran(queen_wts, st_drop_geometry(to_clust["white"]))
moran_lbls <- lisa_labels(moran)
moran_colors <- setNames(lisa_colors(moran), moran_lbls)
ore_clustered <- to_clust %>%
st_drop_geometry() %>%
select(id) %>%
mutate(cluster_num = lisa_clusters(moran) + 1, # add 1 bc clusters are zero-indexed
cluster = factor(moran_lbls[cluster_num], levels = moran_lbls)) %>%
right_join(ore_sf, by = "id") %>%
st_as_sf()
ggplot(ore_clustered, aes(fill = cluster)) +
geom_sf(color = "white", size = 0) +
scale_fill_manual(values = moran_colors, na.value = "green") +
theme_dark()

I don't think this answer is worthy of a bounty, but perhaps it will get you closer to an answer. Since I don't know anything about localmoran, I just guessed at a fill.
library(UScensus2000tract)
library(ggplot2)
library(spdep)
# load data
data("oregon.tract")
# plot Census Tract map
plot(oregon.tract)
# create Queens contiguity matrix
spatmatrix <- poly2nb(oregon.tract)
#calculate the local moran of the distribution of black population
lmoran <- localmoran(oregon.tract#data$black, nb2listw(spatmatrix))
# get our id from the rownames in a data.frame
oregon.tract#data$id <- rownames(oregon.tract#data)
oregon.tract#data$lmoran_ii <- lmoran[,1]
oregon_df <- merge(
# convert to a data.frame
fortify(oregon.tract, region="id"),
oregon.tract#data,
by="id"
)
ggplot(data=oregon_df, aes(x=long,y=lat,group=group)) +
geom_polygon(fill=scales::col_numeric("Blues",domain=c(-1,5))(oregon_df$lmoran_ii)) +
geom_path(color="white")

Related

How to preserve scale_fill_color from first plot using shapefiles in ggplot2

I need help with scale_fill_manual using a shapefile in ggplot2.
I have tried many thing and I am finally posting so hopefully someone
will be able to give me a hint.
I am basically ploting a shapefile and use scale_fill_manual to vizualise it
with custom colors, I then overlay some point on it but when I tried to include
my new points in the legend, my original colors are there but the values get all
messed up. The upper part ploting the shapefile works fine, but the bottom part
when overlaying the new points is where I need help. I have some comments inline. See below:
The path to download the shapefile is:
https://login.filesanywhere.com/fs/v.aspx?v=8c6a66875b6574bbaa68
library(tidyverse)
library(rgdal)
library(maptools)
library(plyr)
library(sp)
library(geosphere)
library(data.table)
library(rgeos)
wolves.map <- readOGR(dsn=".", layer="PNW_wolf_habitat_grid")
message(proj4string(wolves.map)) # it is in Albers Equal Area projection.
#Select presence/abscense only (1 and 0)
wolfsub <- wolves.map[!wolves.map$WOLVES_99 %in% 2,]
wolfsub$MAJOR_LC <-as.numeric(as.character(wolfsub$MAJOR_LC))
# Add columns to the wolfsub dataset. 42 = Forest, 51 = Shrub, > 81 = Agriculture
wolfsub$Forest<-ifelse(wolfsub$MAJOR_LC==42,1,0)
wolfsub$Shrub<-ifelse(wolfsub$MAJOR_LC==51,1,0)
wolfsub$Agriculture <- ifelse(wolfsub$MAJOR_LC > 81,1,0)
# create the model
mod1<-glm(WOLVES_99 ~ RD_DENSITY + Forest + Shrub + Agriculture,family = binomial,data = wolfsub)
summary(mod1)
#fitted(mod1)
wolfsub$WOLVES_99pred <- fitted(mod1) # add the predicted values to wolfsub
# Convert the wolves.map shapefile to data.frame
wolves.mapDF <- as.data.frame(wolves.map)
#fortify wolves.map to be used with ggplot2
wolves.ds <- fortify(wolves.map,region="GRID2_ID")
# Rename the 'GRID2'_ID to 'id' to to be able to merge with the shapefile wolves.map
wolves.mapDF <- rename(wolves.mapDF,c(GRID2_ID="id"))
# merge the shapefile wolves.ds and wolves.mapDF dataframe to be able to use the wolves.mapDF variables with ggplot2
wolves.ggmap <- merge(wolves.ds, wolves.mapDF, by = "id", all = TRUE)
wolves.ggmap <- wolves.ggmap[order(wolves.ggmap$order), ]
wolves.ggmap$MAJOR_LC <-as.numeric(as.character(wolves.ggmap$MAJOR_LC))
### Now do the whole data set
# 42 = Forest, 51 = Shrub, > 81 = Agriculture
wolves.ggmap$Forest<-ifelse(wolves.ggmap$MAJOR_LC==42,1,0)
wolves.ggmap$Shrub<-ifelse(wolves.ggmap$MAJOR_LC==51,1,0)
wolves.ggmap$Agriculture<-ifelse(wolves.ggmap$MAJOR_LC>81,1,0)
# Predict probabilities for the whole dataset
wolves.ggmap$PredictedSuit <- predict(mod1,newdata=wolves.ggmap,type='response')
#Make PredictedSuit a factor
wolves.ggmap$DiscretePred <- cut(wolves.ggmap$PredictedSuit,breaks=c(0,0.29,0.40,0.45,0.6,0.69),dig.lab = 2,include.lowest=TRUE)
#plot and display a legend with the new cuts
Palette1 <- c('grey80','orange','yellow','green','green3','blue')
wolves.pred3 <- ggplot(wolves.ggmap,aes(long,lat,group=group)) + theme_bw() + theme_void() +
geom_polygon(aes(fill=DiscretePred), colour = alpha("white", 1/2), size = 0.2) + theme(legend.position = c(0.14, 0.16)) +
scale_fill_manual(values=Palette1) + guides(fill=guide_legend(ncol=2,"Predicted\n Suitability\n > 0.45"))
wolves.pred3
I get the following graph(good):
All of the above code works as expected. The problem that I am having is down below. The code below works well overlaying the points from a subset of the same shapefile above. However, I lose my scale_fill_manual colors when I try to add the new points to the legend.
#Extract wolves from 2001 first and overlay them on map
wolfsub_01 <- wolves.map[wolves.map$WOLVES_01 %in% 1,]
wolfsub_01$MAJOR_LC <-as.numeric(as.character(wolfsub_01$MAJOR_LC))
#Get centroids to overlay on existing plot
test <- gCentroid(wolfsub_01, byid = TRUE)
#Convert to dataframe to be used with ggplot2
wolf <- as.data.frame(wolfsub_01)
test <- as.data.frame(test)
wolves_test <- cbind(wolf,test)
#Overlay on existing plot
wolves.pred3 +
geom_point(data=wolves_test,aes(x,y,group=NULL,fill='2001 wolves'),color='blue')
If I try to include '2001 wolves' in my legend, my colors stay in the correct order. However, my legend values get all messed up. I tried to re-arrange them with a different palette but it only makes it worse because the colors and labels don't line up with the corresponding color. I also would like help with removing the dots from the legend. How can I get my colors back to my original Palette1 used above on the original plot? Probably a simple thing but I have spent many hours trying and can't figure it out. Thanks beforehand.
I get this plot. Notice the values are all over. I need the values to be in the same order as on first plot.
EDIT: This is what my plots show behind the scenes. The first plot has the following color order:
> g <- ggplot_build(wolves.pred3)
> unique(g$data[[1]]["fill"])
fill
1 grey80
9 orange
115 yellow
241 green3
271 green
And my second plot has this color order which is different from the first one. I wonder how I can make the second match the first color order.
> g <- ggplot_build(a)
> unique(g$data[[1]]["fill"])
fill
1 green3
9 grey80
115 orange
241 green
271 yellow
>
Here is what I tried for you. I went through all of your code and had the impression that you made data processing complicated in my view. I used to use the sp approach and write codes like yours. I think this approach made you "twist" your data processing somewhere (e.g. the moment you used merge(), for example). Here I wrote your code in another way in order to deliver the expected outcome. I left explanation in the script below. It seems to me that the takeaway is to avoid some tricky data manipulation. One way of doing it is to use the sf and tidyverse packages. I hope this will help you.
library(sf)
library(dplyr)
library(ggplot2)
library(rgeos)
# You can use the sf package to read a shapefile.
wolves.map <- st_read(dsn = ".", layer = "PNW_wolf_habitat_grid")
# Step 1
# Sub data
# Select presence/abscense only (1 and 0)
# You used base R to write your script. The sp class objects do not accept
# tidyverse ways. But sf objects can take tidyverse ways, which makes your life much easier.
wolfsub <- filter(wolves.map, WOLVES_99 != 2) %>%
mutate(Forest = if_else(MAJOR_LC == 42, 1, 0),
Shrub = if_else(MAJOR_LC == 51, 1, 0),
Agriculture = if_else(MAJOR_LC > 81, 1, 0))
# Create the model
mod1 <- glm(WOLVES_99 ~ RD_DENSITY + Forest + Shrub + Agriculture, family = binomial, data = wolfsub)
summary(mod1)
# Fitted(mod1)
wolfsub$WOLVES_99pred <- fitted(mod1) # add the predicted values to wolfsub
# Step 2: Whole data
# Here I can avoid creating a new data frame for ggplot2. I saw that you worked
# to arrange a new data frame with all numbers. But that is not necessary any more.
wolves.map %>%
mutate(Forest = if_else(MAJOR_LC == 42, 1, 0),
Shrub = if_else(MAJOR_LC == 51, 1, 0),
Agriculture = if_else(MAJOR_LC > 81, 1, 0)) -> wolves.map
wolves.map$PredictedSuit <- predict(mod1,newdata = wolves.map,type = 'response')
mutate(wolves.map,
DiscretePred = cut(PredictedSuit,
breaks = c(0,0.29,0.40,0.45,0.6,0.69),
dig.lab = 2,include.lowest = TRUE)) -> out
# Plot and display a legend with the new cuts
Palette1 <- c('grey80','orange','yellow','green','green3','blue')
ggplot() +
geom_sf(data = out, aes(fill = DiscretePred),
colour = alpha("white", 1/2), size = 0.2) +
scale_fill_manual(values = Palette1) +
theme_bw() +
theme_void() +
theme(legend.position = c(0.14, 0.16)) +
guides(fill = guide_legend(ncol = 2,"Predicted\n Suitability\n > 0.45")) -> g
# Step 3
# Extract wolves from 2001 first and overlay them on map
wolfsub_01 <- filter(wolves.map, WOLVES_01 == 1)
# Get centroids to overlay on existing plot. I used st_centroid() instead of Gcentroid().
# Then, I added long and lat to the original data frame, `wolfsub_01`.
# I also added a new column for color.
test <- bind_cols(wolfsub_01,
as.data.frame(st_coordinates(st_centroid(wolfsub_01)))) %>%
mutate(color = "blue")
# Finally, I am adding a new layer to the previous graphic.
g +
geom_point(data = test, aes(x = X, y = Y, color = color)) +
scale_color_identity(labels = "2001 wolves", guide ="legend",
name = NULL) -> gg
I'm not cool enough to comment, but I'm not confident you need a fill aesthetic on the dots. Adding new values to the fill scale is likely shifting the colors. If deleting it doesn't work, try giving the dots fill=NA inside geom_point() but not in aes().

R PCA makes graph that is fishy, can't ID why

Link to data as txt file here
I'm having trouble with this PCA. PC1 results appear binary, and I can't figure out why as none of my variables are binary.
df = bees
pca_dat_condition <- bees %>% ungroup() %>%
select(Length.1:Length.25, OBJECTID, Local, Elevation, Longitude,
Latitude, Cubital.Index) %>%
na.omit()
pca_dat_first <- pca_dat_condition %>% #remove the final nonnumerical information
select(-Local, -OBJECTID, -Elevation, -Longitude, -Latitude)
pca <- pca_dat_first%>%
scale() %>%
prcomp()
# add identifying information back into PCA data
pca_data <- data.frame(pca$x, Local=pca_dat_condition$Local, ID =
pca_dat_condition$OBJECTID, elevation = pca_dat_condition$Elevation,
Longitude = pca_dat_condition$Longitude, Latitude =
pca_dat_condition$Latitude)
ggplot(pca_data, aes(x=PC1, y=PC2, color = Latitude)) +
geom_point() +ggtitle("PC1 vs PC2: All Individuals") +
scale_colour_gradient(low = "blue", high = "red")
I'm not getting any error messages with the code, and when I look at the data frame nothing looks out of place. Should I be using a different function for the PCA? Any insight into why my graph may look like this?
Previously, I did the same PCA but for the average values for each Local (whereas this is each individual), and it came out as a normal PCA with no clear clustering. I don't understand why this problem would arise when looking at individual points. It's possible I merged some other data frames in a wonky way, but the structure of the dataset seems completely normal.
This is how the PCA looks.
bees <- read.csv(paste0("https://gist.githubusercontent.com/AkselA/",
"08a4e78a6a29a918ed597e9a32adc228/raw/",
"6d0005fad4cb91830bcf7087176283b18683e9cd/bees.csv"),
header=TRUE)
# bees <- bees[bees[,1] < 10,] # This will remove the three offending rows
bees <- na.omit(bees)
bees.cond <- bees[, grep("Length|OBJ|Loc|Ele|Lon|Lat|Cubi", colnames(bees))]
bees.first <- bees[, grep("Length|Cubi", colnames(bees))]
summary(bees.first)
par(mfrow=c(7, 4), mar=rep(1, 4))
q <- lapply(1:ncol(bees.first), function(x) {
h <- hist(scale(bees.first[, x]), plot=FALSE)
h$counts <- log1p(h$counts)
plot(h, main="", axes=FALSE, ann=FALSE)
legend("topright", legend=names(bees.first[x]),
bty="n", cex=0.8, adj=c(0, -2), xpd=NA)
})
bees.pca <- prcomp(bees.first, scale.=TRUE)
biplot(bees.pca)
Before removing the outliers
After

How to plot numerous polygons in each data category?

I am working with ggplot to plot bivariate data in groups along with standard ellipses of these data using a separate set of tools. These return n=100 x,y coordinates that define each ellipse, and then for each group, I would like to plot about 10-25 ellipses.
Conceptually, how can this be achieved? I can plot a single ellipse easily using geom_polygon, but I am confused how to get the data organized to make it work so multiple ellipses are plotted and guides (color, fills, linetypes, etc.) are applied per group.
In the traditional R plotting, I could just keep adding lines using a for loop.
Thanks!
UPDATE: Here is a CSV containing 100 coordinates for a single ellipse.
Data
Let's say I have three groups of bivariate data to which the ellipse fitting has been applied: Green, Red, Blue. For each group, I'd like to plot several ellipses.
I don't know how I would organize the data in such a way to work in the long format prefered by ggplot and preserve the group affiliations. Would a list work?
UPDATE2:
Here is a csv of raw x and y data organized into two groups: river and lake
Data
The data plot like this:
test.data <- read.csv("ellipse_test_data.csv")
ggplot(test.data) +
geom_point(aes(x, y, color = group)) +
theme_classic()
I am using a package called SIBER, which will fit Bayesian ellipses to the data for comparing groups by ellipse area, etc. The output of the following creates a list with number of elements = number of groups of data, and each element contains a 6 x n (n=number of draws) for each fitted ellipse - first four columns are a covariance matrix Sigma in vector format and the last two are the bivariate means:
# options for running jags
parms <- list()
parms$n.iter <- 2 * 10^5 # number of iterations to run the model for
parms$n.burnin <- 1 * 10^3 # discard the first set of values
parms$n.thin <- 100 # thin the posterior by this many
parms$n.chains <- 2 # run this many chains
# define the priors
priors <- list()
priors$R <- 1 * diag(2)
priors$k <- 2
priors$tau.mu <- 1.0E-3
# fit the ellipses which uses an Inverse Wishart prior
# on the covariance matrix Sigma, and a vague normal prior on the
# means. Fitting is via the JAGS method.
ellipses.test <- siberMVN(siber.test, parms, priors)
First few rows of the first element in the list:
$`1.river`
Sigma2[1,1] Sigma2[2,1] Sigma2[1,2] Sigma2[2,2] mu[1] mu[2]
[1,] 1.2882740 2.407070e-01 2.407070e-01 1.922637 -15.52846 12.14774
[2,] 1.0677979 -3.997169e-02 -3.997169e-02 2.448872 -15.49182 12.37709
[3,] 1.1440816 7.257331e-01 7.257331e-01 4.040416 -15.30151 12.14947
I would like to be able to extract a random number of these ellipses and plot them with ggplot using alpha transparency.
The package SIBER has a function (addEllipse) to convert the '6 x n' entries to a set number of x and y points that define an ellipse, but I don't know how to organize that output for ggplot. I thought there might be an elegant way to do with all internally with ggplot.
The ideal output would be something like this, but in ggplot so the ellipses could match the aesthetics of the levels of data:
some code to do this on the bundled demo dataset from SIBER.
In this example we try to create some plots of the multiple samples of the posterior ellipses using ggplot2.
library(SIBER)
library(ggplot2)
library(dplyr)
library(ellipse)
Fit a basic SIBER model to the example data bundled with the package.
# load in the included demonstration dataset
data("demo.siber.data")
#
# create the siber object
siber.example <- createSiberObject(demo.siber.data)
# Calculate summary statistics for each group: TA, SEA and SEAc
group.ML <- groupMetricsML(siber.example)
# options for running jags
parms <- list()
parms$n.iter <- 2 * 10^4 # number of iterations to run the model for
parms$n.burnin <- 1 * 10^3 # discard the first set of values
parms$n.thin <- 10 # thin the posterior by this many
parms$n.chains <- 2 # run this many chains
# define the priors
priors <- list()
priors$R <- 1 * diag(2)
priors$k <- 2
priors$tau.mu <- 1.0E-3
# fit the ellipses which uses an Inverse Wishart prior
# on the covariance matrix Sigma, and a vague normal prior on the
# means. Fitting is via the JAGS method.
ellipses.posterior <- siberMVN(siber.example, parms, priors)
# The posterior estimates of the ellipses for each group can be used to
# calculate the SEA.B for each group.
SEA.B <- siberEllipses(ellipses.posterior)
siberDensityPlot(SEA.B, xticklabels = colnames(group.ML),
xlab = c("Community | Group"),
ylab = expression("Standard Ellipse Area " ('\u2030' ^2) ),
bty = "L",
las = 1,
main = "SIBER ellipses on each group"
)
Now we want to create some plots of some sample ellipses from these distributions. We need to create a data.frame object of all the ellipses for each group. In this exmaple we simply take the frist 10 posterior draws assuming them to be independent of one another, but you could take a random sample if you prefer.
# how many of the posterior draws do you want?
n.posts <- 10
# decide how big an ellipse you want to draw
p.ell <- 0.95
# for a standard ellipse use
# p.ell <- pchisq(1,2)
# a list to store the results
all_ellipses <- list()
# loop over groups
for (i in 1:length(ellipses.posterior)){
# a dummy variable to build in the loop
ell <- NULL
post.id <- NULL
for ( j in 1:n.posts){
# covariance matrix
Sigma <- matrix(ellipses.posterior[[i]][j,1:4], 2, 2)
# mean
mu <- ellipses.posterior[[i]][j,5:6]
# ellipse points
out <- ellipse::ellipse(Sigma, centre = mu , level = p.ell)
ell <- rbind(ell, out)
post.id <- c(post.id, rep(j, nrow(out)))
}
ell <- as.data.frame(ell)
ell$rep <- post.id
all_ellipses[[i]] <- ell
}
ellipse_df <- bind_rows(all_ellipses, .id = "id")
# now we need the group and community names
# extract them from the ellipses.posterior list
group_comm_names <- names(ellipses.posterior)[as.numeric(ellipse_df$id)]
# split them and conver to a matrix, NB byrow = T
split_group_comm <- matrix(unlist(strsplit(group_comm_names, "[.]")),
nrow(ellipse_df), 2, byrow = TRUE)
ellipse_df$community <- split_group_comm[,1]
ellipse_df$group <- split_group_comm[,2]
ellipse_df <- dplyr::rename(ellipse_df, iso1 = x, iso2 = y)
Now to create the plots. First plot all the raw data as we want.
first.plot <- ggplot(data = demo.siber.data, aes(iso1, iso2)) +
geom_point(aes(color = factor(group):factor(community)), size = 2)+
ylab(expression(paste(delta^{15}, "N (\u2030)")))+
xlab(expression(paste(delta^{13}, "C (\u2030)"))) +
theme(text = element_text(size=15))
print(first.plot)
Now we can try to add the posterior ellipses on top and facet by group
second.plot <- first.plot + facet_wrap(~factor(group):factor(community))
print(second.plot)
# rename columns of ellipse_df to match the aesthetics
third.plot <- second.plot +
geom_polygon(data = ellipse_df,
mapping = aes(iso1, iso2,
group = rep,
color = factor(group):factor(community),
fill = NULL),
fill = NA,
alpha = 0.2)
print(third.plot)
Facet-wrapped plot of sample of posterior ellipses by group

How to make plots scales the same or trun them into Log scales in ggplot

I am using this script to plot chemical elements using ggplot2 in R:
# Load the same Data set but in different name, becaus it is just for plotting elements as a well log:
Core31B1 <- read.csv('OilSandC31B1BatchResultsCr.csv', header = TRUE)
#
# Calculating the ratios of Ca.Ti, Ca.K, Ca.Fe:
C31B1$Ca.Ti.ratio <- (C31B1$Ca/C31B1$Ti)
C31B1$Ca.K.ratio <- (C31B1$Ca/C31B1$K)
C31B1$Ca.Fe.ratio <- (C31B1$Ca/C31B1$Fe)
C31B1$Fe.Ti.ratio <- (C31B1$Fe/C31B1$Ti)
#C31B1$Si.Al.ratio <- (C31B1$Si/C31B1$Al)
#
# Create a subset of ratios and depth
core31B1_ratio <- C31B1[-2:-18]
#
# Removing the totCount column:
Core31B1 <- Core31B1[-9]
#
# Metling the data set based on the depth values, to have only three columns: depth, element and count
C31B1_melted <- melt(Core31B1, id.vars="depth")
#ratio melted
C31B1_ra_melted <- melt(core31B1_ratio, id.vars="depth")
#
# Eliminating the NA data from the data set
C31B1_melted<-na.exclude(C31B1_melted)
# ratios
C31B1_ra_melted <-na.exclude(C31B1_ra_melted)
#
# Rename the columns:
colnames(C31B1_melted) <- c("depth","element","counts")
# ratios
colnames(C31B1_ra_melted) <- c("depth","ratio","percentage")
#
# Ploting the data in well logs format using ggplot2:
Core31B1_Sp <- ggplot(C31B1_melted, aes(x=counts, y=depth)) +
theme_bw() +
geom_path(aes(linetype = element))+ geom_path(size = 0.6) +
labs(title='Core 31 Box 1 Bioturbated sediments') +
scale_y_reverse() +
facet_grid(. ~ element, scales='free_x') #rasterImage(Core31Image, 0, 1515.03, 150, 0, interpolate = FALSE)
#
# View the plot:
Core31B1_Sp
I got the following image (as you can see the plot has seven element plots, and each one has its scale. Please ignore the shadings and the image at the far left):
My question is, is there a way to make these scales the same like using log scales? If yes what I should change in my codes to change the scales?
It is not clear what you mean by "the same" because that will not give you the same result as log transforming the values. Here is how to get the log transformation, which, when combined with the no using free_x will give you the plot I think you are asking for.
First, since you didn't provide any reproducible data (see here for more on how to ask good questions), here is some that gives at least some of the features that I think your data has. I am using tidyverse (specifically dplyr and tidyr) to do the construction:
forRatios <-
names(iris)[1:3] %>%
combn(2, paste, collapse = " / ")
toPlot <-
iris %>%
mutate_(.dots = forRatios) %>%
select(contains("/")) %>%
mutate(yLocation = 1:n()) %>%
gather(Comparison, Ratio, -yLocation) %>%
mutate(logRatio = log2(Ratio))
Note that the last line takes the log base 2 of the ratio. This allows ratios in each direction (above and below 1) to plot meaningfully. I think that step is what you need. you can accomplish something similar with myDF$logRatio <- log2(myDF$ratio) if you don't want to use dplyr.
Then, you can just plot that:
ggplot(
toPlot
, aes(x = logRatio
, y = yLocation) ) +
geom_path() +
facet_wrap(~Comparison)
Gives:

Trying to vertically scale the graph of a data set with R, ggplot2

I'm working with a data frame of size 2 x 400. I need to graph this (let's call it data set A) on the same graph as the main data set for my project.
All I need is the general shape of data set A's graph. ie i only need to see the trend.
The scale that data set A takes place on happens to be much smaller than that of the main graph. So dataset A just looks like a horizontal line.
I decided to scale data set A by multiplying it by a factor of... I tried various values to get the optimum vertical scaling, which leads me to the problem I'm having.
When trying to find the ideal multiplicative factor by trial and error, I expected the general shape of data set A's graph to retain its shape, and only vary in its relative vertical points . ie the horizontal coordinates of all maxes and mins shouldn't move, and only the vertical points should be moving. but this wasn't happening. I'd like to know why.
Here's the data set A (yellow), when multiplied by factor of 3:
factor of 5:
The yellow dots are the geom_point and the yellow curve is the corresponding geom_smooth.
EDIT:
here is my the code original code:
I haven't had much formal training with code. I'm apologize for any messiness!
library("ggplot2")
library("dplyr")
# READ IN DATA
temp_data <-read.table(col.names = "y",
"C:/Users/Ben/Documents/Visual Studio 2013/Projects/Home/Home/steamdata2.txt")
boilpoint <- which(temp_data$y == "boil") # JUST A MARKER..
temp_data <- filter(temp_data, y != "boil") # GETTING RID OF THE MARKER ENTRY
# DON'T KNOW WHY BUT I HAD TO DO THIS INTERMEDIATE STEP
# BEFORE I COULD CONVERT FROM FACTOR -> NUMERIC
temp_data$y <- as.character(temp_data$y)
# CONVERTING TO NUMERIC
temp_data$y <- as.numeric(temp_data$y)
# GETTING RID OF BASICALLY THE LAST ENTRY WHICH HAS THE LARGEST VALUE
temp_data <- filter(temp_data, y<max(temp_data$y))
# ADD ANOTHER COLUMN WITH THE ROW NUMBER,
# BECAUSE I DON'T KNOW HOW TO ACCESS THIS FOR GGPLOT
temp_data <- transform(temp_data, x = 1:nrow(temp_data))
n <- nrow(temp_data) # Num of readings
period <- temp_data[n,1] # (sec)
RpS <- n / period # Avg Readings per Second
MIN <- min(temp_data$y)
MAX <- max(temp_data$y)
# DERIVATIVE OF ORIGINAL
deriv <- data.frame(matrix(ncol=2, nrow=n))
# ADD ANOTHER COLUMN TO ACCESS ROW NUMBERS FOR GGPLOT LATER
colnames(deriv) <- c("y","x")
deriv <- transform(deriv, x = c(1:n))
# FILL DERIVATIVE DATAFRAME
deriv[1, 1] <- 0
for(i in 2:n){
deriv[i - 1, 1] <- temp_data[i, 1] - temp_data[i - 1, 1]
}
deriv <- filter(deriv, y != 0)
# DID THE SAME FOR SECOND DERIVATIVE
dderiv <- data.frame(matrix(ncol = 2, nrow = nrow(deriv)))
colnames(dderiv) <- c("y", "x")
dderiv <- transform(dderiv, x=rep(0, nrow(deriv)))
dderiv[1, 1] <- 0
for(i in 2:nrow(deriv)) {
dderiv$y[i - 1] <- (deriv$y[i] - deriv$y[i - 1]) /
(deriv$x[i] - deriv$x[i - 1])
dderiv$x[i - 1] <- deriv$x[i] + (deriv$x[i] - deriv$x[i - 1]) / 2
}
dderiv <- filter(dderiv, y!=0)
# HERE'S WHERE I FACTOR BY VARIOUS MULTIPLES
deriv <- MIN + deriv * 3
dderiv <- MIN + dderiv * 3
graph <- ggplot(temp_data, aes(x, y)) + geom_smooth()
graph <- graph + geom_point(data = deriv, color = "yellow")
graph <- graph + geom_smooth(data = deriv, color = "yellow")
graph <- graph + geom_point(data = dderiv, color = "green")
graph <- graph + geom_smooth(data = dderiv, color = "green")
graph <- graph + geom_vline(xintercept = boilpoint, color = "red")
graph <- graph + xlab("Readings (n)") +
ylab(expression(paste("Temperature (",degree,"C)")))
graph <- graph + xlim(c(0,n)) + ylim(c(MIN, MAX))
It's hard to check without your raw data, but I'm 99% sure that your main problem is that you're hard-coding the y limits with ylim(c(MIN, MAX)). This is exacerbated by accidentally scaling both variables in your deriv and dderiv data frame, not just y.
I was able to debug the problem when I noticed that your top "scale by 3" graph has a lot more yellow points than your bottom "scale by 5" graph.
The quick fix is don't scale the row numbers, only scale the y values, which is to say, replace this
# scales entire data frame: bad!
deriv <- MIN + deriv * 3
dderiv <- MIN + dderiv * 3
with this:
# only scale y
deriv$y <- MIN + deriv$y * 3
dderiv$y <- MIN + dderiv$y * 3
I think there is another problem too: even with my correction above, negative values of your derivatives will be excluded. If deriv$y or dderiv$y is ever negative, then MIN + deriv$y * 3 will be less than MIN, and since your y axis begins at MIN it won't be plotted.
So I think the whole fix would be to instead do something like
# keep the original y values around so we can experiment with scaling
# without running *all* the code again
deriv$y_orig <- deriv$y
# multiplicative scale
# fill in the value of `prop` to be the proportion of the vertical plot area
# that you want taken up by the derivative
deriv$y <- deriv$y_orig * diff(c(MIN, MAX)) / diff(range(deriv$y_orig)) * prop
# shift into plot range
# fill in the value of `intercept` to be the y value of the
# lowest point of this line
deriv$y <- deriv$y + MIN - min(deriv$y) + 1
I normally don't answer questions that aren't reproducible with data because I hate lack of clarity and I hate the inability to test. However, your question was very clear and I'm pretty sure this will work even without testing. Fingers crossed!
A few other, more general comments:
It's good you know that to convert factor to numeric you need to go via character. It's an annoyance, but if you want to understand more here's the r-faq on it.
I'm not sure why you bother with (deriv$x[i] - deriv$x[i - 1]) in your for loop. Since you define x to be 1, 2, 3, ... the difference is always 1. I'm more confused by why you divide by 2 in the second derivative.
Your for loop can probably be replaced by the diff() function. (See below.)
You seem to have just gotten your foot in the dplyr door, so I used base functions in my recommendation. Keep working with dplyr, I think you'll like it. The big dplyr function you're not using is mutate. It works like base::transform for adding new columns.
I dislike that you've created all these different data frames, it clutters things up. I think your code could be simplified to something like this
all_data = filter(temp_data, y != "boil") %>%
mutate(y = as.numeric(as.character(y))) %>%
filter(y < max(y)) %>%
mutate(
x = 1:n(),
deriv = c(NA, diff(y)) / c(NA, diff(x)),
dderiv = c(NA, diff(deriv)) / 2
)
Rather than having separate data frames for the original data, first derivative and second derivative, this puts them all in the same data frame.
The big benefit of having things in one data frame is that you could then "gather" it into a nice, long (rather than wide) tidy format and simplify your plotting call:
library(tidyr)
long_data = gather(all_data, key = function, value = y, y, deriv, dderiv)
Then your ggplot call would look more like this:
graph <- ggplot(temp_data, aes(x, y, color = function)) +
geom_smooth() +
geom_point() +
geom_vline(xintercept = boilpoint, color = "red") +
scale_color_manual(values = c("green", "yellow", "blue")) +
xlab("Readings (n)") +
ylab(expression(paste("Temperature (",degree,"C)"))) +
xlim(c(0,n)) + ylim(c(MIN, MAX))
With data in long format, you'd have a column of you data (I've named it "function") that maps to color, so you don't have to add all the layers one at a time, and you get a nicely generated legend!

Resources