ggplot2 stat_density_2d not working properly when grouping - r

I have a dataset of city houses. Each house is in one region. You can have the dataset here, and below is a graph of the city with its regions.
raw_csv = read.csv("melb_data.csv")
ggplot(raw_csv, aes(Lattitude, Longtitude)) + geom_point(aes(color = Regionname))
When I use stat_density_2d it works OK. Here is a picture of the result.
ggplot(raw_csv, aes(Lattitude, Longtitude)) + stat_density_2d()
But the problem is when I group stat_density_2d to regions. It does not work properly. I want the density of each region separately (something like this, but it doesn't work).
Here is the weird result of grouping it.
ggplot(raw_csv, aes(Lattitude, Longtitude)) + stat_density_2d(aes(group = Regionname))
Where am I doing wrong?
UPDATE:
It is very strange! but when I excluded the region "Western Victoria" from the map, others went OK. I still don't understand what is the problem here.

As I'm not familiar with stat_density_2d I can't tell you what's going wrong with the grouping. However, as a workaround you could split your data frame by region and add a density layer for each region separately where I make use of lapply to loop over the splitted df:
library(ggplot2)
split_csv <- split(raw_csv, raw_csv$Regionname)
ggplot(mapping = aes(Lattitude, Longtitude, color = Regionname)) +
lapply(split_csv, function(x) stat_density_2d(data = x))

Related

using ggridges to compare to numerical variables for different countries

I've been working with a tibble I created and I was wanting to create a ridge plot with ggplot2.
I have a list of regions, GDP and Infant Mortality.
I was hoping to compare each region as a coloured ridge showing GDP as x and Infant Mortality as y.
This is as far as I've gotten but it's not working (probably because I don't think I'm quite understanding each part).
library(ggplot2)
install.packages("ggridges")
library(ggridges)
as.tibble(Region_Compare)
colnames(Region_Compare)
ggplot(Region_Compare, aes(x = `GDP`, y = `Infant mortality`, fill = cut)) +
geom_density_ridges() +
theme_ridges() +
theme()
I get the following error:
Error: Aesthetics must be valid data columns. Problematic aesthetic(s): fill = cut. Did you mistype the name of a data column or forget to add after_stat()?
Could someone please help me understand where I'm going wrong and where what I might need to add?
I was hoping to do something like the picture attached. With x showing regions labelled and y showing Infant mortality, with the peaks or density showing GDP. Is that possibly or should I be looking at something else.
Wishful thinking pic:

labels right next points in gg plot

Ive tried to google my way to the answere to the question, but none seems to give the answer to what im trying to do.
My goal is to add legends right besides the observations within the plot to show the name of the observation. Name of observations are located in the first column of my data frame.
ggplot(data = coef.vec)+aes(x = coef.x, y = variable.mean) +
geom_point()
You can use labels with geom_text() in next style. I have used simulated data:
library(tidyverse)
#Code
data <- data.frame(group=paste0('Obs',1:10),
coef.x=rnorm(10,0,1),
variable.mean=runif(10,0.015,0.05),stringsAsFactors = F)
#Plot
ggplot(data,aes(x=coef.x,y=variable.mean))+
geom_point()+
geom_text(aes(label=group),hjust=-0.15)
Output:

barplot 2 bars one stacked the other not

Despite some similar questions and my research I cannot seem to solve my little problem. Please forgive if the answer is very easy and I am being silly....I have a data frame
df<-data.frame(X = c("Germany", "Chile","Netherlands","Papua New Guinea","Cameroon"), R_bar_Ger = c(1300000000, 620000, 550000, 400000, 320000))
I would like to produce a barplot with 2 bars (Country names on x-achsis, amounts on y-achsis).
The left bar should show Germany, the right one should be stacked with the remaining 4 countrys.
Please help and Thank you very much in advance!
One way to solve this is by using ggplot2 and a little bit of manipulating your data frame.
First, add a column to your data frame that indicates which bar a country should be plotted in (Germany or Not-Germany):
df$bar <- ifelse(df$X == "Germany", 1, 0)
Now, create the plot:
ggplot(data = df, aes(x = factor(bar), fill = factor(X), y = R_bar_Ger)) +
geom_bar(stat = "identity") +
scale_y_sqrt() +
labs(x = "Country Group", title = "Square Root Scale", fill = "Country") +
scale_x_discrete(labels = c("Not Germany", "Germany"))
Note that if you're not familiar with ggplot2, only the first two lines are necessary for creating the plot - the others are to make it look nice. Since Germany is orders of magnitude larger than your other countries, this isn't going to look very good without some sort of scaling. ggplot2 has a number of built in scaling commands that might be worth exploring - here, I've added the square root scale so you can that the non-Germany countries actually do get stacked as desired.
The documentation for ggplot2 bar charts can be found here - it's definitely worth a read if you're looking for a powerful plotting tool.
There are a number of ways to skin a cat, and your exact question will often change as you learn new tools. I probably wouldn't have set the problem specification up this way, but sticking as close to your data and barplot as possible, one way to achieve what I think you want is:
with(aggregate(R_bar_Ger ~ X=="Germany", data=df, sum), barplot(R_bar_Ger, names.arg=c("Other", "Germany")))
So what we're doing here is aggregating Germany and non-Germany figures by addition, and then passing those values to the barplot function along with sensible x-axis labels.
You'll need to add an additional column to your data first:
df$group <- ifelse(df$X=="Germany","Germany","Other")
Then we can use the following ggplot approach
library(ggplot)
qplot(x = factor(group), y = R_bar_Ger, data=df, geom = "bar", stat = "identity", fill = factor(X))

Filling cross over under a Cumulative Frequency plot using ggplot in R

I am trying to plot two Cumulative Frequency curves in ggplot, and shade the cross over at a certain cut off. I haven't been using ggplot for long, so I was hoping someone might be able to help me with this one.
The plot without filled regions, looks like this...
Which I have created using the following code...
library(ggplot2) # required
north <- rnorm(3060, mean=277,sd=3.01) # to create synthetic data
south <- rnorm(3060, mean=278, sd=3.26) # in place of my real data.
#placing in dataframe
df_temp <- data.frame(temp=c(north,south),
region=c(rep("north",length=3060),rep("south",length=3060)))
#manipulating into cdf, as I've seen in other examples
temp.regions <- ddply(df_temp, .(region), summarize,
temp = unique(temp),
ecdf = ecdf(temp)(unique(temp)))
# feeding into ggplot.
ggplot(temp.regions,aes(x=temp, y=ecdf, color = region)) +
geom_line(aes(x=temp,color=region))+
scale_colour_manual(values = c("blue","red"))
What I would then like, would be to shade both curves for temperatures below 0.2 on the y axis. Ideally I'd like to see the blue one shaded in blue, and the red one shaded in red. Then, where they cross over in purple.
However, the closest I have managed is as follows... ]
Which I have achieved using the following additions to my code.
# creating a dataframe with just the temperatures for below 0.2
# to try and aid control when plotting
temp.below <- temp.regions[which(temp.regions$ecdf<0.2),]
# plotting routine again.
ggplot(temp.regions, aes(x=temp, y=ecdf, color = region)) +
geom_line(aes(x=temp,color=region))+
scale_colour_manual(values = c("blue","red"))+
# with additional line for shading.
geom_ribbon(data=temp.below,
aes(x=temp,ymin=0,ymax=0.2), alpha=0.5)
I've seen a few examples of people shading for a normal distribution density plot, which is where I have adapted my code from. But for some reason my boxes don't seem to want anything to do with the temperature curve.
Please help! I'm sure it's quite simple, I'm just really lost and have tried a few, producing less convincing results than these.
Thank you so much for taking a look.
PROBLEM SOLVED THANKS TO HELP BELOW...
running suggested code from below
geom_ribbon(aes(ymin=0,ymax=ecdf, fill=region), alpha=0.5)
gives...
which is so very almost the solution I'm after, but with one final addition... like so
#geom_ribbon(aes(ymin=0,ymax=ecdf, fill=region), alpha=0.5)
geom_ribbon(data=temp.below, aes(ymin=0,ymax=ecdf, fill=region), alpha=0.5)
I get what I'm after...
The reason I set the data again is so that it only fills the lowest 20% of the two regions.
Thank you so much for the help :-)
Looks like you're thinking about it in the right way.
With geom_ribbon i dont think you need to set data to anything else. Just set aes(ymin = 0, ymax = ecdf, fill = region). I think that should do it.

ggplot boxplots with scatterplot overlay (same variables)

I'm an undergrad researcher and I've been teaching myself R over the past few months. I just started trying ggplot, and have run into some trouble. I've made a series of boxplots looking at the depth of fish at different acoustic receiver stations. I'd like to add a scatterplot that shows the depths of the receiver stations. This is what I have so far:
data <- read.csv(".....MPS.csv", header=TRUE)
df <- data.frame(f1=factor(data$Tagging.location), #$
f2=factor(data$Station),data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), data$depth)
df$f1f2 <- interaction(df$f1, df$f2) #$
plot1 <- ggplot(aes(y = data$Detection.depth, x = f2, fill = f1), data = df) + #$
geom_boxplot() + stat_summary(fun.data = give.n, geom = "text",
position = position_dodge(height = 0, width = 0.75), size = 3)
plot1+xlab("MPS Station") + ylab("Depth(m)") +
theme(legend.title=element_blank()) + scale_y_reverse() +
coord_cartesian(ylim=c(150, -10))
plot2 <- ggplot(aes(y=data$depth, x=f2), data=df2) + geom_point()
plot2+scale_y_reverse() + coord_cartesian(ylim=c(150,-10)) +
xlab("MPS Station") + ylab("Depth (m)")
Unfortunately, since I'm a new user in this forum, I'm not allowed to upload images of these two plots. My x-axis is "Stations" (which has 12 options) and my y-axis is "Depth" (0-150 m). The boxplots are colour-coded by tagging site (which has 2 options). The depths are coming from two different columns in my spreadsheet, and they cannot be combined into one.
My goal is to to combine those two plots, by adding "plot2" (Station depth scatterplot) to "plot1" boxplots (Detection depths). They are both looking at the same variables (depth and station), and must be the same y-axis scale.
I think I could figure out a messy workaround if I were using the R base program, but I would like to learn ggplot properly, if possible. Any help is greatly appreciated!
Update: I was confused by the language used in the original post, and wrote a slightly more complicated answer than necessary. Here is the cleaned up version.
Step 1: Setting up. Here, we make sure the depth values in both data frames have the same variable name (for readability).
df <- data.frame(f1=factor(data$Tagging.location), f2=factor(data$Station), depth=data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), depth=data$depth)
Step 2: Now you can plot this with the 'ggplot' function and split the data by using the `col=f1`` argument. We'll plot the detection data separately, since that requires a boxplot, and then we'll plot the depths of the stations with colored points (assuming each station only has one depth). We specify the two different plots by referencing the data from within the 'geom' functions, instead of specifying the data inside the main 'ggplot' function. It should look something like this:
ggplot()+geom_boxplot(data=df, aes(x=f2, y=depth, col=f1)) + geom_point(data=df2, aes(x=f2, y=depth), colour="blue") + scale_y_reverse()
In this plot example, we use boxplots to represent the detection data and color those boxplots by the site label. The stations, however, we plot separately using a specific color of points, so we will be able to see them clearly in relation to the boxplots.
You should be able to adjust the plot from here to suit your needs.
I've created some dummy data and loaded into the chart to show you what it would look like. Keep in mind that this is purely random data and doesn't really make sense.

Resources