I'm trying to plot this dataset with ggplot2, putting the name of each country in each line geom_line() and with the x axis (Year) and the y axis (with the relevant data from each country).
The DataSet to Edit
This is what I have so far. I wanted to include the name of the country in each line. The problem is that each country has its data in a separate column.
If you want to use ggplot you should bring your data into a "longer" format. Using package tidyr:
df %<>%
pivot_longer(cols=matches("[^Year]"),
names_to="Country",
values_to="Value")
gives you
# A tibble: 108 x 3
Year Country Value
<dbl> <chr> <dbl>
1 1995 Argentina 4122262
2 1995 Bolivia 3409890
3 1995 Brazil 36276255
4 1995 Chile 2222563
5 1995 Colombia 10279222
6 1995 Costa_Rica 1611055
7 1997 Argentina 4100563
8 1997 Bolivia 3391943
9 1997 Brazil 35718095
10 1997 Chile 2208382
Based on this it is easy to plot a line for each country using ggplot2:
ggplot(df, aes(x=Year, y=Value, color=Country)) +
geom_line()
You kind of answered your question. You require the package reshape to bring all countries into a single column.
Year<-c(1991,1992,1993,1994,1995,1996)
Argentina<-c(235,531,3251,3153,13851,16513)
Mexico<-c(16503,16035,3516,3155,30351,16513)
Japan<-c(1651,868416,68165,35135,03,136816)
df<-data.frame(Year,Argentina,Mexico,Japan)
library(reshape2)
df2<- melt(data = df, id.vars = "Year", Cont.Val=c("Argentina","Mexico","Japan"))
library(ggplot2)
ggplot(df2, aes(x=Year, y=value, group=variable, color=variable))+
geom_line()
Related
This question already has answers here:
ggplot2: sorting a plot
(5 answers)
How to force specific order of the variables on the X axis?
(1 answer)
Closed last month.
Good morning,
I'm trying to use ggplot with a data frame but I faced an issue. My ggplot doesn't take consideration about the function arrange on my data frame.
Here is my code :
data()
pop <- population[population$year == 1995, ]
pop <- pop[1:10, ]
pop %>%
ggplot(aes(x = country, y = population)) +
geom_point()
pop <- pop %>%
arrange(population)
pop %>%
ggplot(aes(x = country, y = population)) +
geom_point()
I would like that my graph would be arranged according to the population, so at the first place, the country with the lowest population, at the second place, the country with the second lowest population and so on. But ggplot doesn't match my graph as expected.
I have this data frame :
country year population
<chr> <int> <int>
1 Anguilla 1995 9807
2 American Samoa 1995 52874
3 Andorra 1995 63854
4 Antigua and Barbuda 1995 68349
5 Armenia 1995 3223173
6 Albania 1995 3357858
7 Angola 1995 12104952
8 Afghanistan 1995 17586073
9 Algeria 1995 29315463
10 Argentina 1995 34833168
But my graph is ordered by alphabetical order :
Do you have any idea to make it by population number?
I have this data:
country name value
<chr> <chr> <dbl>
1 Germany Jd 7.1
2 Germany Jc 8.4
3 Germany Ne 1.3
4 France Jd 8.3
5 France Jc 12
6 France Ne 3.7
and I would like to plot it in two groups of bars (with three columns each). Ordered the same as it is in the dataframe: First Germany, second France and the order of the columns Jd, Jc, Ne.
I did:
p <- ggplot(data, aes(x = country, y = value)) +
geom_bar(aes(fill = name), width=0.7, position = position_dodge(width=0.7), stat='identity')
but I get the plot in a different order: first France, then Germany and the order of the columns Jc, Jd, Ne. (seems to be ordered alphabetically).
How can I order the bars in the way I want?
Probably one of the simplest ways to take control on sorting is to convert as.factor() your ordering columns and define the levels, you'll override any other default ordering:
library(ggplot2)
data$country <- factor( data$country, levels = c("Germany", "France"))
data$name <- factor( data$name, levels = c("Jd", "Jc", "Ne"))
ggplot(data, aes(x = country, y = value,fill = name)) +
# moved the aes() all together, nothing related to the question
geom_bar(width=0.7, position position_dodge(width=0.7), stat='identity')
With data:
data <- read.table(text = "
country name value
Germany Jd 7.1
Germany Jc 8.4
Germany Ne 1.3
France Jd 8.3
France Jc 12
France Ne 3.7",header = T)
I have a tibble where the column names of original df were given by values in variable col that i melted into long format using id.vars=Country to get this using melt. This is to plot the different values of AGR_LogLabProd, MIN_LogLabProd, MAN_LogLabProd by year on the same x-axis .
CHN4
Country Year variable value
---------------------------
1 CHN 1958 AGR_LogLabProd 14.81782
2 CHN 1959 AGR_LogLabProd 14.61870
3 CHN 1960 AGR_LogLabProd 14.41969
4 CHN 1961 AGR_LogLabProd 14.28257
5 CHN 1958 MIN_LogLabProd 13.67850
6 CHN 1959 MIN_LogLabProd 14.24685
7 CHN 1960 MIN_LogLabProd 14.57734
8 CHN 1961 MIN_LogLabProd 14.59046
9 CHN 1958 MAN_LogLabProd 13.29359
10 CHN 1959 MAN_LogLabProd 13.86194
11 CHN 1960 MAN_LogLabProd 14.19243
12 CHN 1961 MAN_LogLabProd 14.20556
I use ggplot(CHN4, aes(x=Year, y=value))+geom_line()but its giving me a strange plot (given in the attached image) , not seperate lines for each variable in the variable column as expected . Any clue to whats going wrong?
This is a pretty common problem. You need to include a grouping variable. If you want to use color for every different level, you would use
library(ggplot2)
ggplot(CHN4, aes(x=Year, y=value, color = variable)) +
geom_line()
but if you don't care for colors, you can do
library(ggplot2)
ggplot(CHN4, aes(x=Year, y=value, group = variable)) +
geom_line()
I'm trying to create a dotplot where countries are listed on my Y axis from A-Z top to bottom. The medal count will be the X axis for each of the four plots, one each for gold, silver, bronze, and total. Of course, ggplot prefers to plot countries from Z-A and despite reading all about the problem, I haven't resolved the issue. I appreciate any straightforward help on both the coding and comprehension fronts.
mdat <- melt(raw, value.name = "Count", variable.name = "Place", id.var = "Country")
mdat[, "Place"] <- factor(mdat[, "Place"], levels=c("Gold", "Silver", "Bronze", "Total"))
##I know my problem is likely on or around the above line ##
plot1 <- ggplot(mdat, aes(x = Count, y = Country, colour = Place)) +
geom_point() +
facet_grid(.~Place) + theme_bw()+
scale_colour_manual(values=c("#FFCC33", "#999999", "#CC6600", "#000000"))
print(plot1)
Algeria Gold 4
Argentina Gold 5
Armenia Gold 1
Algeria Silver 2
Argentina Silver 5
Armenia Silver 2
Algeria Bronze 4
Argentina Bronze 2
Armenia Bronze 0
You have to sort the levels of Country before you plot. Also, there is no Total level the data you provided. The following appraoch should give you the desired result:
Reading the data (including a Total level for the Place variable):
mdat <- read.table(text="Country Place Count
Algeria Gold 4
Argentina Gold 5
Armenia Gold 1
Algeria Silver 2
Argentina Silver 5
Armenia Silver 2
Algeria Bronze 4
Argentina Bronze 2
Armenia Bronze 0
Algeria Total 10
Argentina Total 12
Armenia Total 3", header=TRUE)
Sorting the levels of the Country variable:
mdat$Country <- factor(mdat$Country,levels=sort(unique(mdat$Country),decreasing=TRUE))
Getting your Place variable in the correct order:
levels(mdat$Place) <- c("Bronze"=3,"Gold"=1,"Silver"=2,"Total"=4)
mdat$Place <- as.numeric(mdat$Place)
mdat$Place <- as.factor(mdat$Place)
levels(mdat$Place) <- c("Gold","Silver","Bronze","Total")
Creating the plot:
ggplot(mdat, aes(x = Count, y = Country, colour = Place)) +
geom_point(size=4) +
facet_grid(.~Place) + theme_bw()+
scale_colour_manual(values=c("#FFCC33","#999999","#CC6600","#000000"))
which gives the following plot:
As you melted your data already, I suspect that there is no Total variable in the raw dataframe. You can calculte that with:
raw$Total <- rowSums(..specify the Gold, Silver & Bronze columns here..)
I'm a ggplot2 newbie and have a rather simple question regarding time-series plots.
I have a data set in which the data is structured as follows.
Area 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
MIDWEST 10 6 13 14 12 8 10 10 6 9
How do I generate a time series when the data is structured in this format.
With the reshape package, I could just alter the data to look like:
totmidc <- melt(totmidb, id="Area")
totmidc
Area variable value
1 MIDWEST 1998 10
2 MIDWEST 1999 6
3 MIDWEST 2000 13
4 MIDWEST 2001 14
5 MIDWEST 2002 12
6 MIDWEST 2003 8
7 MIDWEST 2004 10
8 MIDWEST 2005 10
9 MIDWEST 2006 6
10 MIDWEST 2007 9
Then run the following code to get the desired plot.
ggplot(totmidc, aes(Variable, Value)) + geom_line() + xlab("") + ylab("")
However, is it possible to generate a time series plot from the first
object in which the columns represent the years.
What is the error that ggplot2 gives you? The following seems to work on my machine:
Area <- as.numeric(unlist(strsplit("1998 1999 2000 2001 2002 2003 2004 2005 2006 2007", "\\s+")))
MIDWEST <-as.numeric(unlist(strsplit("10 6 13 14 12 8 10 10 6 9", "\\s+")))
qplot(Area, MIDWEST, geom = "line") + xlab("") + ylab("")
#Or in a dataframe
df <- data.frame(Area, MIDWEST)
qplot(Area, MIDWEST, data = df, geom = "line") + xlab("") + ylab("")
You may also want to check out the ggplot2 website for details on scale_date et al.
I am guessing that with "time series plot" you mean you want to get a bar chart instead of a line chart?
In that case, you have to modify your code only slightly to pass the correct parameters to geom_bar(). The geom_bar default stat is stat_bin, which will calculate a frequency count of your categories on the x-scale. With your data you want to override this behaviour and use stat_identity.
library(ggplot2)
# Recreate data
totmidc <- data.frame(
Area = rep("MIDWEST", 10),
variable = 1998:2007,
value = round(runif(10)*10+1)
)
# Line plot
ggplot(totmidc, aes(variable, value)) + geom_line() + xlab("") + ylab("")
# Bar plot
# Note that the parameter stat="identity" passed to geom_bar()
ggplot(totmidc, aes(x=variable, y=value)) + geom_bar(stat="identity") + xlab("") + ylab("")
This produces the following bar plot: