I am trying to make a plot of GDP vs CO2 emissions globally. I have found that I have two countries that have data that is a lot larger than the rest of the data so I am trying to separate it with facet_wrap so I have one graph of the two outlier countries and one graph with the rest of the data.
My code thus far is
ggplot(CO2_GDP, aes(x= GDP, y=value)) +
geom_point(size=1)+
labs(title = "GDP and CO2 Emissions", y= "CO2 Emissions in Tons", x= "GDP in Billions of USD") +
facet_wrap(~country_name==c("China", "United States"))
This gives me one graph with all of the countries including China and the United States and another graph of just China and United States. I need to find a way to remove China and United States from the first graph but have just that data on the second graph.
I thought by adding the comma between China and United States in the last row would remove them from the first graph and just show it on the second but thats not the case as you can see in this image the data on the "True" graph is still on the false graph and its not supposed to be.
Related
I have a Count for each Site (which corresponds with a country), and each Site belongs to a Region. The data looks like this:
> summary_data
Site Count Region
1 Chad 5 Africa
2 Angola 1 Africa
3 France 10 Europe
4 USA 6 Americas
5 Bolivia 3 Americas
6 Chile 4 Americas
I would like to generate a bar graph that:
Has a bar per country
The bars for a region are all next to each other in the bar graph
Per region, the bars appear in descending order
The bars are all the same width, but the heights are all on the same scale
Can be generalized (in particular: arbitrary regions, arbitrary countries per region)
I do not want to use fill color to represent the region (I want to use color to represent another characteristic eventually)
I want to have some visual representation to group the columns. For instance, having a gray background behind all the columns for the Americas region, a blue background behind all the columns for the Africa region, etc). I actually would be open to other approaches (perhaps a line at the top spanning all of Africa with "Africa" as a label or something).
Obviously each region can have a different number of country sites, and no country site spans two regions (I tried using facets but quickly realized that was not the right route). I also tried looping through all the regions to generate separate graphs per region and then put them together but that didn't quite seem the right approach either.
I have generated a graph like this (Closest I have gotten):
Using this code:
library("dplyr")
library(ggplot2)
sorted <- arrange(summary_data,Region,-Count)
sorted$Site <- factor(sorted$Site, levels = sorted$Site)
bar = ggplot(sorted,
aes(
x = Site,
y = Count,
fill = Region
)) +
geom_col()
print(bar)
But this does not meet the last two requirements I set above (I specifically do not want to use fill to represent region). I started down the path of geom_rect() but did not understand the coordinate system for discrete x values rather than continuous (I did find Stackoverflow questions / answers on continuous but didn't see how to translate to this). I think having shaded rectangles behind the columns is probably the best approach, but I would appreciate any input in general approach as well as how to pull it off.
You could consider defining a new panel for each region to separate them using facet_grid. If you want the colors to be the same, just remove the aes(fill = Site) argument inside geom_bar.
The argument space = "free_x" assures that the width of the bars are the same and with scale = free only those axis values corresponding to the specific region are shown.
ggplot(sorted, aes(x = Site, y = Count)) +
geom_bar(position = "dodge", stat = "identity", aes(fill = Site)) +
facet_grid(. ~ Region,scale="free", space="free_x")
Suppose we have a set of commodities (apples, bananas, potatoes etc) distributed over different continents. We visualize their distribution on continents via faceted barcharts in ggplot2 package, and these commodities (called in what follows "stuff" field) act as factors to be displayed on x axis. Each continent has its own set of stuff, as shown in the data, although certain commodities can be common (bananas) on two or more continents. Here is the data example in short format. Fields "medium" and "giant" additionally subdivide the market separating out things into medium and big sizes (to be plotted with different colours).
data<-read.csv(text="continent,stuff,average,giant
North America,apples,20,30
North America,bananas,25,32
Europe,bananas,15,25
Europe,potatoes,10,20
Europe,mosquitoes,13,17
Asia,snakes,26,35
Asia,snails,7,15
Asia,pandas,10,20")
First we reduce the data to long format, and next plot it via geom_col() and faceting technique:
library(dplyr)
library(tidyr)
library(ggplot2)
data.tidied<-data %>%
gather(key=size, value=val,-continent,-stuff)
ggplot(data.tidied,aes(x=stuff,y=val,fill=size))+
geom_col(position="dodge")+
facet_grid(~continent)+coord_flip()
All factors in the stuff are aligned across all continents, although most of them are not needed, so there are many gaps. But we don't need any snails in North America and Europe, it is natural to have this field only for the Asia facet and so on. (To make things clearer, you may think of apples/bananas/potatoes as some geographical localities, unique for a continent: we do not have any California in Europe). So: how to display this situation using nevertheless faceting technique of ggplot (or any alternative)? That is: how to draw a unique set of factors for each facet?
You can use facet_wrap instead of facet_grid and specify scales = "free_y" (has to be free_y as you flipped the axes). But it makes the charts look a little odd, in my opinion.
data %>%
gather(size, val, -continent, -stuff) %>%
ggplot(aes(stuff, val)) +
geom_col(aes(fill = size), position = "dodge") +
facet_wrap(~continent, scales = "free_y") +
coord_flip()
Hey I'm relatively new to R and I have the following problem i could not solve using the search function. I have this excel file i created with data form world bank. Its a simple year and country gdp sheet with 3 countries Switzerland, Burkina Faso and the United States. The converted file in csv looks like this
year;Burkina Faso ;Switzerland;United States
1990;351.9793229;38332.15172;23954.47935
2000;226.4759814;37813.23426;36449.85512
2007;475.1100122;63223.46778;48061.53766
2008;569.7612784;72119.56087;48401.42734
2009;552.7455521;69672.00471;47001.55535
2010;575.4464527;74276.71842;48373.87882
2011;666.8402783;87998.44468;49790.66548
2012;673.8227;83164.38795;51450.1223
2013;699.0452847;84658.88768;52787.02695
2014;705.1464113;85814.58857;54598.55069
2015;615.592225;80989.84024;56207.03675
2016;649.7304837;78812.65069;57466.78711
I tried to plot it with ggplot2 the following way:
qplot(year, Switzerland, data = DATA_WORLD_CSV, xlab= "Year", geom = c("point", "smooth"))
but I always get an error message and I don't know why. Also does anyone have an idea how to get those 3 countries into one plot.
Thanks in advance
I'm guessing the error may be because you're trying to plot Burkina Faso and the United States by doing something like this:
qplot(year, Burkina Faso, data = DATA_WORLD_CSV, xlab = "Year", geom = c("point", "smooth"))
This will fail because of the spaces in the country name — same with "United States". Behind the scenes, ggplot2 will convert your column names by replacing the spaces with periods. So, instead, try:
qplot(year, Burkina.Faso, data = DATA_WORLD_CSV, xlab = "Year", geom = c("point", "smooth"))
To plot multiple lines on one graph, use ggplot() instead of qplot(). See for instance: Plotting two variables as lines using ggplot2 on the same graph
I'm sure this has been done many times, but clearly I'm not searching using the correct terms.
I have some time series data in R with columns like this:
country year deaths region global.region
1 Afghanistan 2006 0.095830775 Asia & Pacific Global South
2 Afghanistan 1994 0.127597064 Asia & Pacific Global South
3 Algeria 2000 0.003278038 Arab States Global South
4 Algeria 2001 0.003230578 Arab States Global South
5 Algeria 1998 0.006746176 Arab States Global South
6 Algeria 1999 0.019952364 Arab States Global South
...
Basically, I want to plot all the lines by country, but I want them colored (and labeled in the legend) by region. I'm hoping to look at some regional trends in the data without trying build an average model (partly because I want to see outliers, partly because a lot of the countries have missing data and I think a good regional model might be difficult for me to make at this point, at best just misleading).
So in the end I'll have, for example, separate lines for Burkina Faso, Algeria, and Cote d'Ivoire plotted, but they'll all be orange. And I'll have separate lines for Afghanistan, Pakistan, and Iran, but they'll all be blue.
It is preferable that it's done with ggplot2 since that's the plotting library I am learning at the moment. But maybe there's a standard way of doing this in R that works across all (most) plot libraries?
Edit: Final solution: Group aesthetic. (Thanks #baptiste)
qplot(data=df, x=year, y=deaths, color=region, group=country) +
geom_line() +
xlab('Year') + ylab('Deaths per 100,000') + ggtitle('Deaths per 100,000 by country (WHO)')
Which makes:
Slightly different than your desired result, but here it goes..
ggplot(df, aes(x = year, y = deaths)) +
geom_line(aes(color = country, linetype = region))
Final solution: Group aesthetic. (Thanks #baptiste)
qplot(data=df, x=year, y=deaths, color=region, group=country) +
geom_line() +
xlab('Year') + ylab('Deaths per 100,000') + ggtitle('Deaths per 100,000 by country (WHO)')
I have a data frame containing order data for each of 20+ products from each of 20+ countries. I have put it in a highlight table using ggplot2 with code similar to this:
require(ggplot2)
require(reshape)
require(scales)
mydf <- data.frame(industry = c('all industries','steel','cars'),
'all regions' = c(250,150,100), americas = c(150,90,60),
europe = c(150,60,40), check.names = FALSE)
mydf
mymelt <- melt(mydf, id.var = c('industry'))
mymelt
ggplot(mymelt, aes(x = industry, y = variable, fill = value)) +
geom_tile() + geom_text(aes(fill = mymelt$value, label = mymelt$value))
Which produces a plot like this:
In the real plot, the 450 cell table very nicely shows the 'hotspots' where orders are concentrated. The last refinement I want to implement is to arrange the items on both the x-axis and y-axis in alphabetical order. So in the plot above, the y-axis (variable) would be ordered as all regions, americas, then europe and the x-axis (industry) would be ordered all industries, cars and steel. In fact the x-axis is already ordered alphabetically, but I wouldn't know how to achieve that if it were not already the case.
I feel somewhat embarrassed about having to ask this question as I know there are many similar on SO, but sorting and ordering in R remains my personal bugbear and I cannot get this to work. Although I do try, in all except the simplest cases I got lost in a welter of calls to factor, levels, sort, order and with.
Q. How can I arrange the above highlight table so that both y-axis and x-axis are ordered alphabetically?
EDIT: The answers from smillig and joran below do resolve the question with the test data but with the real data the problem remains: I can't get an alphabetical sort. This leaves me scratching my head as the basic structure of the data frame looks the same. Clearly I have omitted something, but what??
> str(mymelt)
'data.frame': 340 obs. of 3 variables:
$ Industry: chr "Animal and vegetable products" "Food and beverages" "Chemicals" "Plastic and rubber goods" ...
$ variable: Factor w/ 17 levels "Other areas",..: 17 17 17 17 17 17 17 17 17 17 ...
$ value : num 0.000904 0.000515 0.007189 0.007721 0.000274 ...
However, applying the with statement doesn't result in levels with an alphabetical sort.
> with(mymelt,factor(variable,levels = rev(sort(unique(variable)))))
[1] USA USA USA
[4] USA USA USA
[7] USA USA USA
[10] USA USA USA
[13] USA USA USA
[16] USA USA USA
[19] USA USA Canada
[22] Canada Canada Canada
[25] Canada Canada Canada
[28] Canada Canada Canada
All the way down to:
[334] Other areas Other areas Other areas
[337] Other areas Other areas Other areas
[340] Other areas
And if you do a levels() it seems to show the same thing:
[1] "Other areas" "Oceania" "Africa"
[4] "Other Non-Eurozone" "UK" "Other Eurozone"
[7] "Holland" "Germany" "Other Asia"
[10] "Middle East" "ASEAN-5" "Singapore"
[13] "HK/China" "Japan" "South Central America"
[16] "Canada" "USA"
That is, the non-reversed version of the above.
The following shot shows what the plot of the real data looks like. As you can see, the x-axis is sorted and the y-axis is not. I'm perplexed. I'm missing something but can't see what it is.
The y-axis on your chart is also already ordered alphabetically, but from the origin. I think you can achieve the order of the axes that you want by using xlim and ylim. For example:
ggplot(mymelt, aes(x = industry, y = variable, fill = value)) +
geom_tile() + geom_text(aes(fill = mymelt$value, label = mymelt$value)) +
ylim(rev(levels(mymelt$variable))) + xlim(levels(mymelt$industry))
will order the y-axis from all regions at the top, followed by americas, and then europe at the bottom (which is reverse alphabetical order, technically). The x-axis is alphabetically ordered from all industries to steel with cars in between.
As smillig says, the default is already to order the axes alphabetically, but the y axis will be ordered from the lower left corner up.
The basic rule with ggplot2 that applies to almost anything that you want in a specific order is:
If you want something to appear in a particular order, you must make the corresponding variable a factor, with the levels sorted in your desired order.
In this case, all you should need to do it this:
mymelt$variable <- with(mymelt,factor(variable,levels = rev(sort(unique(variable)))))
which should work regardless of whether you're running R with stringsAsFactors = TRUE or FALSE.
This principle applies to ordering axis labels, ordering bars, ordering segments within bars, ordering facets, etc.
For continuous variables there is a convenient scale_*_reverse() but apparently not for discrete variables, which would be a nice addition, I think.
Another possibility is to use fct_reorder from forecast library.
library(forecast)
mydf %>%
pivot_longer(cols=c('all regions', 'americas', 'europe')) %>%
mutate(name1=fct_reorder(name, value, .desc=FALSE)) %>%
ggplot( aes(x = industry, y = name1, fill = value)) +
geom_tile() + geom_text(aes( label = value))
Maybe a little bit late,
with(mymelt,factor(variable,levels = rev(sort(unique(variable)))))
this function doesn't order, because you are ordering "variable" that has no order (it's an unordered factor).
You should transform first the variable to a character, with the as.character function, like so:
with(mymelt,factor(variable,levels = rev(sort(unique(as.character(variable))))))
maybe this StackOverflow question can help:
Order data inside a geom_tile
specifically the first answer by Brandon Bertelsen:
"Note it's not an ordered factor, it's a factor in the right order"
It helped me to get the right order of the y-axis in a ggplot2 geom_tile plot.