Good evening, can anybody help with plotting. I have some data like this:
DF <- data.frame(
country_dest=c("Russia", "Germany", "Ukraine" ,"Kazakhstan",
"United States", "Italy", "Israel", "Belarus"),
SumTotal=c(7076562,2509617,1032325,680137,540630,359030,229186,217623)
)
It is not a big deal to plot it with separate 8 bars, but i am wondering to know is it possible to make a plot with 3 bars, where first bar will be with data of Russia (for example) second will be stacked bar of Germany, Ucraine, Kazakhstan, US and Italy maybe with some legend to understand who is who and third stacked bar of Belarus and Israel.
In internet I have found a solution to create new DF with 0 values, but didn't quite understand.
Than you in advance!
Well, you will need to add grouping information to your data. Then it becomes straightforward. Here's one strategy
#define groups
grp <-c("Russia"=1, "Germany"=2, "Ukraine"=2,"Kazakhstan"=2,
"United States"=2, "Italy"=2, "Israel"=3, "Belarus"=3)
DF$grp<-grp[as.character(DF$country_dest)]
#ensure plotting order
DF$country_dest <- factor(DF$country_dest, levels=DF$country_dest)
#draw plot
ggplot(DF, aes(x=grp, y=SumTotal, fill=country_dest)) +
geom_bar(stat="identity")
This will give you
You might wish to give your groups a more descriptive label.
Related
I am trying to make a plot of GDP vs CO2 emissions globally. I have found that I have two countries that have data that is a lot larger than the rest of the data so I am trying to separate it with facet_wrap so I have one graph of the two outlier countries and one graph with the rest of the data.
My code thus far is
ggplot(CO2_GDP, aes(x= GDP, y=value)) +
geom_point(size=1)+
labs(title = "GDP and CO2 Emissions", y= "CO2 Emissions in Tons", x= "GDP in Billions of USD") +
facet_wrap(~country_name==c("China", "United States"))
This gives me one graph with all of the countries including China and the United States and another graph of just China and United States. I need to find a way to remove China and United States from the first graph but have just that data on the second graph.
I thought by adding the comma between China and United States in the last row would remove them from the first graph and just show it on the second but thats not the case as you can see in this image the data on the "True" graph is still on the false graph and its not supposed to be.
I'm trying to make a bar plot in base R, so not with ggplot, that is grouped and reversed, and I found a lot of similar questions and answers but it seems none of them works for me.
My database is about Eurovision song contest 2007, this is the link towards it: https://www.kaggle.com/datagraver/eurovision-song-contest-scores-19752019
and this is the code for cleaning and getting the database I'm working with:
cela_baza <- read.csv("eurovision_song_contest_1975_2019.csv", stringsAsFactors = FALSE)
evro2007_pocetna<-cela_baza[cela_baza$Year=='2007',]
evro2007_aggr<-aggregate(evro2007_pocetna$Points~evro2007_pocetna$From.country+
evro2007_pocetna$To.country,FUN=mean)
colnames(evro2007_aggr) <- c('From country', 'To country','Points')
evro2007<-evro2007_aggr[!(evro2007_aggr$`From country`==evro2007_aggr$`To country`),]
nrow(subset(evro2007, evro2007$Points== 0 ))
evro2007_zero<- subset(evro2007, evro2007$Points> 0 )
What I need is a bar plot with number of points on x-axis and countries that participated in competition on y-axis, each country has three grouped bars of different color: first represents how many points that country gave to Serbia (winner), second points to Ukraine (2nd place) and third points to Russia (3rd place). So it is grouped and reversed, and I found code for that, but my problem is that not all of the participating countries gave points to these three countries I need so always occurs some errors.
Code for ggplot will work too, I can't install it on my old PC, but I will ask someone to do it for me, as long as I have the code, thanks for all the help in advance!
It's possible to do what you're describing, but the resulting plot is horrible because of the number of countries on the x axis (42, with three bars each), and the limitations of base R's barplot.
Here's how we can get the data in the correct format:
winners <- evro2007[evro2007$`To country` == "Ukraine" |
evro2007$`To country` == "Russia" |
evro2007$`To country` == "Serbia",]
self <- data.frame(`From country` = c("Serbia", "Ukraine", "Russia"),
`To country` = c("Serbia", "Ukraine", "Russia"),
Points = c(0, 0, 0), stringsAsFactors = FALSE)
names(self) <- names(winners)
winners <- rbind(winners, self)
winners <- winners[order(winners$`From country`, winners$`To country`),]
However, the base R barplot looks like this:
barplot(Points ~ `To country` + `From country`,
data = winners, beside = TRUE, cex.names = 0.3)
The countries are illegible, and the plot difficult to interpret.
Whereas, using ggplot:
winners$`To country` <- factor(winners$`To country`,
levels = c("Serbia", "Ukraine", "Russia"))
ggplot(winners, aes(`To country`, Points, fill = `To country`)) +
geom_col() +
facet_wrap(.~`From country`) +
theme(axis.text.x = element_blank())
We get this:
I'm trying to plot the GDP for every country on a bar chart. The country name goes on the x-axis and the GDP value on the y. However, there are a lot of countries, and I'd like the bar chart to just show the top 3 GDPs, and the bottom 3 GDPs and in between I want perhaps some dots or something to indicate that there are other countries in between. How do I go about this?
Building on #steveLangsford's solution - doing things in a (possibly) slightly more principled way
There might be a more "tidy" way to do this part:
find breakpoints for GDP categories:
GDP_sorted <- sort(toydata$GDP)
GDP_breaks <- c(-Inf,GDP_sorted[hm_selected],
GDP_sorted[hm_rows-hm_selected],
Inf)
use cut()to define GDP categories, and order countries by GDP:
toydata <- toydata %>%
mutate(GDP_cat=cut(GDP,breaks=GDP_breaks,labels=
c("Lowest","Mid","Highest")),
country=reorder(factor(country),GDP)) %>%
filter(GDP_cat != "Mid") %>%
droplevels()
Plot with facets (add a bit of extra space between the panels to emphasize the axis break):
ggplot(toydata,aes(x=country,y=GDP,fill=GDP_cat))+
geom_bar(stat="identity")+
theme_bw()+
theme(legend.position="none",
panel.spacing.x=grid::unit(5,"lines")
)+xlab("")+
scale_fill_brewer(palette="Dark2")+
facet_wrap(~GDP_cat,scale="free_x")
1) You will get faster better answers if you give a toy data set
2) Putting "dots or something" on your plot is likely to make data visualization people wince a little. You're basically suggesting a discontinuity in the x-axis, which is a common enough thing to do, but specifically excluded from ggplot
(see here:
Using ggplot2, can I insert a break in the axis?
and here:
https://groups.google.com/forum/#!topic/ggplot2/jSrL_FnS8kc)
But, this same discussion suggests facets as a solution to your problem. One way to do that might be something like this:
library(tidyverse)
library(patchwork)
hm_rows <- 50
hm_selected <- 3
toydata <- data.frame(country=paste("Country",1:hm_rows) ,GDP=runif(hm_rows,0,5000))%>%
arrange(desc(GDP))%>%
filter(row_number()<=hm_selected | row_number()>(hm_rows-hm_selected))%>%droplevels
toydata$status <- rep(c("Highest","Lowest"),each=hm_selected)
ggplot(toydata%>%filter(status=="Highest"),aes(x=country,y=GDP))+
geom_bar(stat="identity")+
ggtitle("Highest")+
ylim(c(0,max(toydata$GDP)))+
ggplot(toydata%>%filter(status=="Lowest"),aes(x=country,y=GDP))+
geom_bar(stat="identity")+
ggtitle("Lowest")+
ylim(c(0,max(toydata$GDP)))+
theme(#possibly questionable, but tweaks the results closer to the single-graph requested:
axis.text.y=element_blank(),
axis.ticks=element_blank()
)+ylab("")
Despite some similar questions and my research I cannot seem to solve my little problem. Please forgive if the answer is very easy and I am being silly....I have a data frame
df<-data.frame(X = c("Germany", "Chile","Netherlands","Papua New Guinea","Cameroon"), R_bar_Ger = c(1300000000, 620000, 550000, 400000, 320000))
I would like to produce a barplot with 2 bars (Country names on x-achsis, amounts on y-achsis).
The left bar should show Germany, the right one should be stacked with the remaining 4 countrys.
Please help and Thank you very much in advance!
One way to solve this is by using ggplot2 and a little bit of manipulating your data frame.
First, add a column to your data frame that indicates which bar a country should be plotted in (Germany or Not-Germany):
df$bar <- ifelse(df$X == "Germany", 1, 0)
Now, create the plot:
ggplot(data = df, aes(x = factor(bar), fill = factor(X), y = R_bar_Ger)) +
geom_bar(stat = "identity") +
scale_y_sqrt() +
labs(x = "Country Group", title = "Square Root Scale", fill = "Country") +
scale_x_discrete(labels = c("Not Germany", "Germany"))
Note that if you're not familiar with ggplot2, only the first two lines are necessary for creating the plot - the others are to make it look nice. Since Germany is orders of magnitude larger than your other countries, this isn't going to look very good without some sort of scaling. ggplot2 has a number of built in scaling commands that might be worth exploring - here, I've added the square root scale so you can that the non-Germany countries actually do get stacked as desired.
The documentation for ggplot2 bar charts can be found here - it's definitely worth a read if you're looking for a powerful plotting tool.
There are a number of ways to skin a cat, and your exact question will often change as you learn new tools. I probably wouldn't have set the problem specification up this way, but sticking as close to your data and barplot as possible, one way to achieve what I think you want is:
with(aggregate(R_bar_Ger ~ X=="Germany", data=df, sum), barplot(R_bar_Ger, names.arg=c("Other", "Germany")))
So what we're doing here is aggregating Germany and non-Germany figures by addition, and then passing those values to the barplot function along with sensible x-axis labels.
You'll need to add an additional column to your data first:
df$group <- ifelse(df$X=="Germany","Germany","Other")
Then we can use the following ggplot approach
library(ggplot)
qplot(x = factor(group), y = R_bar_Ger, data=df, geom = "bar", stat = "identity", fill = factor(X))
I have a data set as below
data=data.frame(Country=c("China","United States",
"United Kingdom",
"Brazil",
"Indonesia",
"Germany"),
percent=c(85,15,25,55,75,90))
and code for the same is
names = data$Country
barplot(data$percent,main="data1", horiz=TRUE,names.arg=names,
col="red")
I would like to add a grey color to the bar plot after the given values is plotted.
Say for example for Country China once the bar graph is plotted for 85 the remaining 15 should be plotted in Grey color. Similary for United states once bar chart is plotted for value 15 in column percent the remaining 85 should be grey color.
Any help on this is very helpfull.
Thanks
You can do:
# create a variable containing the "complementary percentage"
data$compl <- 100 - data$percent
# plot both the actual and complementary percentages at once, with desired colors (red and grey)
barplot(as.matrix(t(data[, c("percent","compl")])), main="data1", horiz=TRUE, names.arg=names, col=c("red","grey"))
EDIT
Based on this post, here is a way to do it with ggplot2
library(reshape)
melt_data <- melt(data,id="Country")
ggplot(melt_data, aes(x=Country, y=value, fill=variable)) + geom_bar(stat="identity")
I didn't see a built in method of doing this. Here's a hack version.
data$grey = c(rep(100,6)) #new data to fill out the lines
par(mar=c(3,7.5,3,3)) #larger margin on the right for names
barplot(data$grey, horiz=TRUE, #add barplot with grey filling
xlim=c(0,100),las=1,xaxt="n", #no axis
col="grey") #grey
par(new=TRUE) #plot again
barplot(data$percent,main="data1", horiz=TRUE,names.arg=names, #plot data
xlim=c(0,100),las=1, #change text direction and set x limits to 1:100
col="red")