Hey I'm relatively new to R and I have the following problem i could not solve using the search function. I have this excel file i created with data form world bank. Its a simple year and country gdp sheet with 3 countries Switzerland, Burkina Faso and the United States. The converted file in csv looks like this
year;Burkina Faso ;Switzerland;United States
1990;351.9793229;38332.15172;23954.47935
2000;226.4759814;37813.23426;36449.85512
2007;475.1100122;63223.46778;48061.53766
2008;569.7612784;72119.56087;48401.42734
2009;552.7455521;69672.00471;47001.55535
2010;575.4464527;74276.71842;48373.87882
2011;666.8402783;87998.44468;49790.66548
2012;673.8227;83164.38795;51450.1223
2013;699.0452847;84658.88768;52787.02695
2014;705.1464113;85814.58857;54598.55069
2015;615.592225;80989.84024;56207.03675
2016;649.7304837;78812.65069;57466.78711
I tried to plot it with ggplot2 the following way:
qplot(year, Switzerland, data = DATA_WORLD_CSV, xlab= "Year", geom = c("point", "smooth"))
but I always get an error message and I don't know why. Also does anyone have an idea how to get those 3 countries into one plot.
Thanks in advance
I'm guessing the error may be because you're trying to plot Burkina Faso and the United States by doing something like this:
qplot(year, Burkina Faso, data = DATA_WORLD_CSV, xlab = "Year", geom = c("point", "smooth"))
This will fail because of the spaces in the country name — same with "United States". Behind the scenes, ggplot2 will convert your column names by replacing the spaces with periods. So, instead, try:
qplot(year, Burkina.Faso, data = DATA_WORLD_CSV, xlab = "Year", geom = c("point", "smooth"))
To plot multiple lines on one graph, use ggplot() instead of qplot(). See for instance: Plotting two variables as lines using ggplot2 on the same graph
Related
I am trying to make a plot of GDP vs CO2 emissions globally. I have found that I have two countries that have data that is a lot larger than the rest of the data so I am trying to separate it with facet_wrap so I have one graph of the two outlier countries and one graph with the rest of the data.
My code thus far is
ggplot(CO2_GDP, aes(x= GDP, y=value)) +
geom_point(size=1)+
labs(title = "GDP and CO2 Emissions", y= "CO2 Emissions in Tons", x= "GDP in Billions of USD") +
facet_wrap(~country_name==c("China", "United States"))
This gives me one graph with all of the countries including China and the United States and another graph of just China and United States. I need to find a way to remove China and United States from the first graph but have just that data on the second graph.
I thought by adding the comma between China and United States in the last row would remove them from the first graph and just show it on the second but thats not the case as you can see in this image the data on the "True" graph is still on the false graph and its not supposed to be.
I have plotted a figure of the US states in R.
Here is the very simple code:
library(usmap)
library(ggplot2)
plot_usmap(region = 'states')
And here is the resulting figure:
Figure of US states in R - states are not colored
Furthermore, I have a csv file containing the names of the states in US, and a color value, equal to red if that state voted for Republicans or blue if the state voted for Democrats. This is the top 5 rows of the CSV file:
State
Color
Alabama
#E81B23
Alaska
#E81B23
Arizona
#1405bd
Arkansas
#E81B23
How can I fill the states of my figure based on the colors in the CSV file?
To color the regions specified in the plot_usmap() function, you can provide your data via data= and then set the values= argument to the column in your data used for mapping the colors.
Here's an example with some randomly-generated data. The plot_usmap() is using a dataset that includes the 50 US states + the District of Columbia, so you'll want to make sure they are all in your dataset or you may get some NA labels.
library(usmap)
library(ggplot2)
set.seed(1234)
color_data <- data.frame(
state = c(state.name, "District of Columbia"),
the_colors = sample(c("A", "B"), size=51, replace=TRUE)
)
plot_usmap(
region = "states",
data = color_data,
values = "the_colors",
color="white"
) +
scale_fill_manual(values=c("#E81B23", "#1405bd"))
Note that I think the lines between the states look good in white, so color="white" fixes that. You may also notice that you typically don't specify the actual color in the dataframe - you can specify that via scale_fill_manual(values=...). In your case, you can use scale_fill_identity().
For your data, just make sure the "States" column in your dataset is renamed "state" and it should work.
I am trying to plot dataframe like:
code name description estimate
0 Australia Vegetables 854658
0 Australia Fruit 667541
1 New South Wales Vegetables 45751
1 New South Wakes Fruit 77852
2 Victoria Vegetables 66211
2 Victoria Fruit 66211
.
.
.
For each region in Australia there are multiple rows with different description. What packages may I use to plot a map with estimate without coordinates?
I try ggplot and ozmaps with sf which mentioned in ggplot2 tutorial, and I filter the dataframe for only fruit, but there is error message :
stat_sf requires the following missing aesthetics: geometry
the code I tried :
ggplot() +
geom_sf(oz_states,mapping=aes())+
geom_sf(df,mapping=aes()) +
coord_sf()
The methods I found are all required langitude and latitude to plot the data map, I tried ggmaps or geom_ploygon but didn't figure out the correct way to do so. Is there a possible way to plot map with only region labels?
this is what I plot by tableau, and this is expected plot by using r as well:
So essentially, your first problem is that you're calling the wrong object within the ozmaps package. it's ozmap_states, meanwhile you called yours oz_states
I came up with this solution that I think takes what you want and elevates it.
df <- data.frame(code = rep(c(0,1,2), 2), name = rep(c("Australia", "New South Wales", "Victoria"), 2), description = rep(c("Vegetables", "Fruit"), 3), count =
c(854658, 45751, 66211, 667541, 77852, 66211))
library(tidyverse)
library(sf)
library(ozmaps)
library(leaflet)
library(tmap)
states_full <- right_join(df, ozmap_states, by = c("name" = "NAME"))
data <- states_full %>%
filter(description == "Fruit") %>%
select(name, geometry, count)
ozmap1 = tm_shape(ozmap_states) +tm_polygons()
tmap_mode("view")
ozmap1 + tm_shape(st_as_sf(data)) + tm_fill(col = "count")
Basically, instead of using the sample dataframe that I created from your data, you would just use your data in the right join. You can also choose whether you want fruits or vegetables in your filter function.
The tmap package is a mapping package that can make interactive leaflet like maps.
You can look at some tutorials here: https://geocompr.robinlovelace.net/adv-map.html
End solution looks something like this.
Note: This solution uses lng/lat, but it pulls it directly from the shape file for oz state maps in the ozmaps package, therefore fulfilling the need of the question.
When you add in more data, more of Australia will be colored in depending on their count.
So I'm plotting a shape file (from the ONS) of Great Britain split into 11 regions with the hope of creating a choropleth map based on COVID-19 cases.
I join the covid data with the shape file so that I can work within 1 data frame, joining on the region name.
I've used the longitude and latitude fields of the shape file for the x and y values within the aesthetics.
covid <- data.frame(Name = c("Scotland","Eastern","West Midlands","Yorkshire and the Humber","East Midlands","London","South West","South East","North West","North East","Wales"),
Cases = c(20,50,45,30,25,75,100,5,60,35,80))
#'greatb' is the name of the shape file
join <- merge(greatb,covid,by=c("NAME","Name"),by.x=c("NAME"),by.y=c("Name"), all=TRUE)
ggplot()+
geom_polygon(data=join, aes(x=long, y=lat, group=group, fill=Cases))
However, it seems that once I do this I can't use a variable name to fill the regions of the map. I get confronted with the error message: object 'Cases' not found
I'm unsure why I get this is message though as 'covid$data' is clearly an object and therefore so is 'join$data'. Can anyone help me with this?
I have a dataset of city houses. Each house is in one region. You can have the dataset here, and below is a graph of the city with its regions.
raw_csv = read.csv("melb_data.csv")
ggplot(raw_csv, aes(Lattitude, Longtitude)) + geom_point(aes(color = Regionname))
When I use stat_density_2d it works OK. Here is a picture of the result.
ggplot(raw_csv, aes(Lattitude, Longtitude)) + stat_density_2d()
But the problem is when I group stat_density_2d to regions. It does not work properly. I want the density of each region separately (something like this, but it doesn't work).
Here is the weird result of grouping it.
ggplot(raw_csv, aes(Lattitude, Longtitude)) + stat_density_2d(aes(group = Regionname))
Where am I doing wrong?
UPDATE:
It is very strange! but when I excluded the region "Western Victoria" from the map, others went OK. I still don't understand what is the problem here.
As I'm not familiar with stat_density_2d I can't tell you what's going wrong with the grouping. However, as a workaround you could split your data frame by region and add a density layer for each region separately where I make use of lapply to loop over the splitted df:
library(ggplot2)
split_csv <- split(raw_csv, raw_csv$Regionname)
ggplot(mapping = aes(Lattitude, Longtitude, color = Regionname)) +
lapply(split_csv, function(x) stat_density_2d(data = x))