Different rendering between chloroplethr package and ggplot2 - r

I got a visualisation problem and I can't get my head around. From some data with position information included I want to create a map. I found the great choroplethr package which was a great starting point and it really helped me a lot to understand how to process data for meaningful results. Here is the map the way I would like to have it:
But when I try to replicate the steps (cf. self$render of the chlorplethr package using ggplot2) I get the following result:
Does anyone have an idea where my parameters are wrong/lacking something? Here is the code:
fig <- ggplot(data=merge.shp, aes(x = long, y = lat, group = group)) +
geom_polygon(aes(fill = sqm_cat), na.rm=FALSE, rule="evenodd", position="identity") +
coord_equal() +
scale_fill_brewer("", drop = FALSE, na.value = "black") +
ggplot2::theme_void() +
ggtitle("People per square kilometers")
Edit: I think, I found the problem. If I just plot the map for one region
merge.shp %>% filter(plz %in% c("645"))
%>% ggplot(aes(x=long, y=lat, group=group)) + geom_path
I get the following result:
So everything may be related to the "wrong" connection of coordinates. If I replace geom_path with geom_point there are some reasonable outlines. But how do I translate this to the map?

The solution is actually quite simple. The order is essential for geom_polygon and I was mistakenly assuming my dataframe merge.shp was in an ascending order for all factors, which it wasn't. Introducing
merge.shp = merge.shp[order(merge.shp$order), ]
made it work.

Related

Issue adding second variable to scatter plot in R

Been set this question for an assignment - but i've never used R before - any help is appreciated.
Many thanks.
Question:
Produce a scatter plot to compare CO2 emissions from Brazil and Argentina between 1950 and 2019....
I can get it for Brazil but cannot figure out how to add Argentina.
I think i have to do something with geom_point and filter?
df%>%
filter(Country=="Brazil", Year<=2019 & Year>=1950) %>%
ggplot(aes(x = Year, y = CO2_annual_tonnes)) +
geom_point(na.rm =TRUE, shape=20, size=2, colour="green") +
labs(x = "Year", y = "CO2Emmissions (tonnes)")
The answer depends on what you're looking to do, but generally adding another dimension to a scatter plot where you already have clear x and y dimensions is done by applying an aesthetic (color, shape, etc) or via faceting.
In both approaches, you actually don't want to filter the data. You use either aesthetics or faceting to "filter" in a way and map the data appropriately based on the country column in the dataset. If your dataset contains more countries than Argentina and Brazil, you will want to filter to only include those, so:
your_filtered_df <- your_df %>%
dplyr::filter(Country %in% c("Argentina", "Brazil"))
Faceting
Faceting is another way of saying you want to split up your one plot into two separate plots (one for Argentina, one for Brazil). Each plot will have the same aesthetics (look the same), but will have the appropriate "filtered" dataset.
In your case, you can try:
your_filtered_df %>%
ggplot(aes(x = Year, y = CO2_annual_tonnes)) +
geom_point(na.rm =TRUE, shape=20, size=2, colour="green") +
facet_wrap(~Country)
Aesthetics
Here, you have a lot of options. The idea is that you tell ggplot2 to map the appearance of individual points in the point geom to the value specified in your_filtered_df$Country. You do this by placing one of the aesthetic arguments for geom_point() inside of aes(). If you use shape=, for example it might look like this:
your_filtered_df %>%
ggplot(aes(x = Year, y = CO2_annual_tonnes)) +
geom_point(aes(shape=Country), na.rm =TRUE, size=2, colour="green")
This should show a plot that has a legend created to and two different shapes for the points that correspond to the country name. It's very important to remember that when you put an aesthetic like shape or color or size inside of aes(), you must not also have it outside. So, this will behave correctly:
geom_point(aes(colour=Country), ...)
But this will not:
geom_point(aes(colour=Country), colour="green", ...)
When one aesthetic is outside, it overrides the one in aes(). The second one will still show all points as green.
Don't Do this... but it works
OP posted a comment that indicated some additional hints from the professor, which was:
We were given the hint in the question "you can embed piped filter
functions within geom_point objects"
I believe they are referring to a final... very bad way of generating the points. This method would require you to have two geom_point() objects, and send each one a different filtered dataset. You can do this by accessing the data= argument within each geom_point() object. There are many problems with this approach, including the lack of a legend being generated, but if you simply must do it this way... here it is:
# painful to write this. it goes against all good practices with ggplot
your_filtered_df %>%
ggplot(aes(x = Year, y = CO2_annual_tonnes)) +
geom_point(data=your_filtered_df %>% dplyr::filter(Country=="Argentina"),
color="green", shape=20) +
geom_point(data=your_filtered_df %>% dplyr::filter(Country=="Brazil"),
color="red", shape=20)
You should probably see why this is not a good convention. Think about what you would do for representing 50 different countries... the above codes or methods would work, but with this method, you would have 50 individual geom_point() objects in your plot... ugh. Don't make a typo!

Identifying values in R Plot

I have been trying to identify extreme values in a R ggplot2.
Is there any way to have a plot where besides the point (or instead of it) representing the values, it also shows the index? Or any other thing that allows you to quickly identify it?
The closest thing I found was with the identify() function, but it didn't work very well for me.
Any recommendations?
I'll give a basic ggplot plot:
df = data.frame(x = runif(10,0,1), y = runif(10,0,1))
ggplot(df, aes(x,y)) +
geom_point(col="red") + theme_bw()
Update:
I've been trying new things. I finally got exactly what I wanted.
df = data.frame(x = runif(10,0,1), y = runif(10,0,1))
ggplot(df, aes(x,y, label = rownames(df))) +
geom_point() + geom_text() + theme_bw()
Now I can easily identify the values that I want. Hope it helps other people that are new to ggplot.
If anyone knows ways to improve it, feel free to do so.
I'd suggest installing the plotly package and then running:
plotly::ggplotly(.Last.value)

ggplot2 heatmap plot not equally spread onto background theme

I want to make simple heat map using ggplot, but the heatmap I get is weird. the heatmap plot spreads unequally on the background and I don't know how to fix this.
I have already tried this following code which gave me the heat map plot, but with weird result.
ggplot(data = vmd, aes(x = x, y = y)) +
geom_tile(aes(fill = val)) +
scale_fill_gradientn(colours = mycol)
Someone already said that you can use +coord_equal() to fix the ratio of the X and Y axis.
If you talk about the NA, what I usually do in this case is to clean the data before plotting it.
vmd_clean = vmd[complete.cases(vmd),]
You can also dig into the subset(...) function.
There might be a solution built in ggplot2, but I don't know about it. If someone knows about it, I'd love to learn about.

R geom_line not plotting as expected

I am using the following code to plot a stacked area graph and I get the expected plot.
P <- ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) + #ggplot initial parameters
geom_ribbon(position='fill', aes(ymin=0, ymax=1))
but then when I add lines which are reading the same data source I get misaligned results towards the right side of the graph
P + geom_line(position='fill', aes(group=model, ymax=1))
does anyone know why this may be? Both plots are reading the same data source so I can't figure out what the problem is.
Actually, if all you wanted to do was draw an outline around the areas, then you could do the same using the colour aesthetic.
ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) +
geom_ribbon(position='fill', aes(ymin=0, ymax=1), colour = "black")
I have an answer, I hope it works for you, it looks good but very different from your original graph:
library(ggplot2)
DATA2 <- read.csv("C:/Users/corcoranbarriosd/Downloads/porsche model volumes.csv", header = TRUE, stringsAsFactors = FALSE)
In my experience you want to have X as a numeric variable and you have it as a string, if that is not the case I can Change that, but this will transform your bucket into a numeric vector:
bucket.list <- strsplit(unlist(DATA2$bucket), "[^0-9]+")
x=numeric()
for (i in 1:length(bucket.list)) {
x[i] <- bucket.list[[i]][2]
}
DATA2$bucket <- as.numeric(x)
P <- ggplot(DATA2, aes(x=bucket,y=volume, group=model, fill=model,label=volume)) +
geom_ribbon(aes(ymin=0, ymax=volume))+ geom_line(aes(group=model, ymax=volume))
It gives me the area and the line tracking each other, hope that's what you needed
If you switch to using geom_path in place of geom_line, it all seems to work as expected. I don't think the ordering of geom_line is behaving the same as geom_ribbon (and suspect that geom_line -- like geom_area -- assumes a zero base y value)
ggplot(DATA2, aes(x=bucket, y=volume, ymin=0, ymax=1,
group=model, fill=model, label=volume)) +
geom_ribbon(position='fill') +
geom_path(position='fill')
Should give you

Adding legend (ggplot) doesn't work

I feel like I am asking a totally silly question, but I can't force ggplot to show the legend for lines colours.
The thing is that I have two data frames with the same data, just the first data.frame represents new data (plus additional numbers) and the second represents the old data. I am trying to compare new and old data, thus to understand which is which I have to see the legend. I have tried to use scale_colour_manual, but it still doesn't appear.
I have read a number of various answers on similar questions and non of them worked or led to a better. You can see a simple example of my problem below:
rm(list = ls())
library(ggplot2)
xnew<-3:10
y<-5:12
xold<-4:11
years<-2000:2007
xfact<-rep("x", times=8)
yfact<-rep("y", times=8)
Newdata<-data.frame(indicator=c(xfact,yfact),Years=c(years,years), data=c(xnew,y))
Olddata<-data.frame(indicator=xfact,Years=c(years), data=xold)
graph<-ggplot(mapping=aes(Years, data, group=1)) +
geom_line(,Newdata[Newdata=="x",], size=1.5, colour="lightblue")+
geom_line(,Olddata[Olddata=="x",], size=1.5, colour="orange")+
ggtitle("OLD vs NEW")+
scale_colour_manual(name="Legend", values=c("New"="lightblue", "Old"="orange"))
the result is without the legend.
Thanks for all the help I have already found on this website and thank you in advance for helping to solve this problem.
Legends are created in ggplot by mapping aesthetics to a single variable. Your mistake is that you're trying to set colors manually in each layer.
Newdata$type <- "New"
Olddata$type <- "Old"
all_data <- rbind(Newdata,Olddata)
ggplot(data = all_data[all_data$indicator == 'x',],aes(x = Years,y = data,colour = type)) +
geom_line() +
ggtitle("OLD vs NEW") +
scale_colour_manual(name="Legend", values=c("New"="lightblue", "Old"="orange"))
There are countless examples illustrating this basic technique in ggplot here.

Resources