This question already has answers here:
ggplot2: sorting a plot
(5 answers)
How to force specific order of the variables on the X axis?
(1 answer)
Closed last month.
Good morning,
I'm trying to use ggplot with a data frame but I faced an issue. My ggplot doesn't take consideration about the function arrange on my data frame.
Here is my code :
data()
pop <- population[population$year == 1995, ]
pop <- pop[1:10, ]
pop %>%
ggplot(aes(x = country, y = population)) +
geom_point()
pop <- pop %>%
arrange(population)
pop %>%
ggplot(aes(x = country, y = population)) +
geom_point()
I would like that my graph would be arranged according to the population, so at the first place, the country with the lowest population, at the second place, the country with the second lowest population and so on. But ggplot doesn't match my graph as expected.
I have this data frame :
country year population
<chr> <int> <int>
1 Anguilla 1995 9807
2 American Samoa 1995 52874
3 Andorra 1995 63854
4 Antigua and Barbuda 1995 68349
5 Armenia 1995 3223173
6 Albania 1995 3357858
7 Angola 1995 12104952
8 Afghanistan 1995 17586073
9 Algeria 1995 29315463
10 Argentina 1995 34833168
But my graph is ordered by alphabetical order :
Do you have any idea to make it by population number?
Related
I'm currently trying to make a scatter plot of child mortality rate and child labor. My problem is, I don't actually have a lot of data, and some countries may only get values for some years, and some other countries may only have data for some other years, so I can't plot all the data together, nor the data in any year is big enough to limit to that only year. I was wondering if there is a function that takes the last value available in the dataset for any given specified variable. So, for instance, if my last data for child labor from Germany is from 2015 and my last data from Italy is from 2014, and so forth with the rest of the countries, is there a way I can plot the last values for each country?
Code goes like this:
head(data2)
# A tibble: 6 x 5
Entity Code Year mortality labor
<chr> <chr> <dbl> <dbl> <dbl>
1 Afghanistan AFG 1962 34.5 NA
2 Afghanistan AFG 1963 33.9 NA
3 Afghanistan AFG 1964 33.3 NA
4 Afghanistan AFG 1965 32.8 NA
5 Afghanistan AFG 1966 32.2 NA
6 Afghanistan AFG 1967 31.7 NA
Never mind about those NA's. Labor data just doesn't go back there. But I do have it in the dataset, for more recent years. Child mortality data, on the other hand, is actually pretty complete.
Thanks.
I cannot find which variable to plot, but following code can select only last of each country.
data2 %>%
group_by(Entity) %>%
filter(Year == max(Year)) %>%
ungroup
result is like
Entity Code Year mortality labor
<chr> <chr> <dbl> <dbl> <lgl>
1 Afghanistan AFG 1967 31.7 NA
No you can plot some variable.
You might want to define what you mean by 'last' value per group - as in most recent, last occurrence in the data or something else?
dplyr::last picks out the last occurrence in the data, so you could use it along with arrange to order your data. In this example we sort the data by Year (ascending order by default), so the last observation will be the most recent. Assuming you don't want to include NA values, we also use filter to remove them from the data.
data2 %>%
# first remove NAs from the data
filter(
!is.na(labor)
) %>%
# then sort the data by Year
arrange(Year) %>%
# then extract the last observation per country
group_by(Entity) %>%
summarise(
last_record = last(labor)
)
I'm trying to plot this dataset with ggplot2, putting the name of each country in each line geom_line() and with the x axis (Year) and the y axis (with the relevant data from each country).
The DataSet to Edit
This is what I have so far. I wanted to include the name of the country in each line. The problem is that each country has its data in a separate column.
If you want to use ggplot you should bring your data into a "longer" format. Using package tidyr:
df %<>%
pivot_longer(cols=matches("[^Year]"),
names_to="Country",
values_to="Value")
gives you
# A tibble: 108 x 3
Year Country Value
<dbl> <chr> <dbl>
1 1995 Argentina 4122262
2 1995 Bolivia 3409890
3 1995 Brazil 36276255
4 1995 Chile 2222563
5 1995 Colombia 10279222
6 1995 Costa_Rica 1611055
7 1997 Argentina 4100563
8 1997 Bolivia 3391943
9 1997 Brazil 35718095
10 1997 Chile 2208382
Based on this it is easy to plot a line for each country using ggplot2:
ggplot(df, aes(x=Year, y=Value, color=Country)) +
geom_line()
You kind of answered your question. You require the package reshape to bring all countries into a single column.
Year<-c(1991,1992,1993,1994,1995,1996)
Argentina<-c(235,531,3251,3153,13851,16513)
Mexico<-c(16503,16035,3516,3155,30351,16513)
Japan<-c(1651,868416,68165,35135,03,136816)
df<-data.frame(Year,Argentina,Mexico,Japan)
library(reshape2)
df2<- melt(data = df, id.vars = "Year", Cont.Val=c("Argentina","Mexico","Japan"))
library(ggplot2)
ggplot(df2, aes(x=Year, y=value, group=variable, color=variable))+
geom_line()
I am working with data that look like this:
Country Year Aid
Angola 1995 416420000
Angola 1996 459310000
Angola 1997 354660000
Angola 1998 335270000
Angola 1999 387540000
Angola 2000 302210000
I want to create a lagged variable by adding up the previous five years in the data
So that the observation for 2000 looks like this:
Country Year Aid Lagged5
Angola 2000 416420000 1953200000
Which was derived by adding the Aid observations from 1995 to 1999 together:
416420000 + 459310000 + 354660000 + 335270000 + 387540000 = 1953200000
Also, I will need to group by country as well.
Thank You!
You could do:
library(dplyr)
df %>%
group_by(Country) %>%
mutate(Lagged5 = sapply(Year, function(x) sum(Aid[between(Year, x - 5, x - 1)])))
Output:
# A tibble: 6 x 4
# Groups: Country [1]
Country Year Aid Lagged5
<chr> <int> <int> <int>
1 Angola 1995 416420000 0
2 Angola 1996 459310000 416420000
3 Angola 1997 354660000 875730000
4 Angola 1998 335270000 1230390000
5 Angola 1999 387540000 1565660000
6 Angola 2000 302210000 1953200000
Using the input DF shown reproducibly in the Note at the end define a roll function which sums the prior 5 rows and use ave to run it for each Country. The width argument list(-seq(5)) to rollapplyr means use offsets -1, -2, -3, -4, -5 in summing, i.e. the values in the prior 5 rows.
The question did not discuss what to do with the initial rows in each country so we put in NA values but if you want partial sums add the partial = TRUE argument to rollapplyr. You can also change the fill=NA to some other value if you wish so it is quite flexible.
library(zoo)
roll <- function(x) rollapplyr(x, list(-seq(5)), sum, fill = NA)
transform(DF, Lag5 = ave(Aid, Country, FUN = roll))
Note
The input was assumed to be the following. We added a second country.
Lines <- "Country Year Aid
Angola 1995 416420000
Angola 1996 459310000
Angola 1997 354660000
Angola 1998 335270000
Angola 1999 387540000
Angola 2000 302210000"
DF <- read.table(text = Lines, header = TRUE, strip.white = TRUE,
colClasses = c("character", "integer", "numeric"))
DF <- rbind(DF, transform(DF, Country = "Belize"))
I have a data frame with GDP values for 12 South American countries over ~40 years. A snippet of the frame is as follows:
168 Chile 1244.1799 1972
169 Chile 4076.3207 1994
170 Chile 3474.7172 1992
171 Chile 2928.1562 1991
172 Chile 6143.7276 2004
173 Colombia 882.5687 1976
174 Colombia 1094.8795 1977
175 Colombia 5403.4557 2008
176 Colombia 2376.8022 2002
177 Colombia 2047.9784 1993
1) I want to order the data frame by country. The first ~40 values should pertain to Argentina, then next ~40 to Bolivia, etc.
2) Within each country grouping, I want to order by year. The first 3 rows should pertain to Argentina 2012, Argentina 2011, Argentina 2010, etc.
I can grab the data for each country individually using subset(), and then order it with order(). Surely I don't have to do this for every country and then use rbind()? How do I do it in one foul swoop?
3) Once I have the final product, I'd like to create 12 small, individual line graphs stacked vertically, each pertaining to a different country, which shows the trend of that country's GDP over the ~40 years. How I do create such a plot?
I'm sure I could find info on the 3rd question myself, but, well, I don't even know what such a graph is called in the first place..
Here is a solution with ggplot2. Assuming your data is in df:
library(ggplot2)
df$year.as.date <- as.Date(paste0(df$year, "-01-01")) # convert year to date
ggplot(df, aes(x=year.as.date, y=gdp)) +
geom_line() + facet_grid(country ~ .)
You don't actually need to sort by year and country, ggplot will handle that for you. Here is the data (clearly, only using 5 countries and 12 years, but this will work for your data). Also, I show you how to sort by two columns on the third line:
countries <- c("ARG", "BRA", "CHI", "PER", "URU")
df <- data.frame(country=rep(countries, 12), year=rep(2001:2012, each=5), gdp=runif(60))
df <- df[order(df$country, df$year),] # <- we sort here
df$gdp <- df$gdp + 1:12 / 2
Hi i have panel data and would like to reshape or cast my Indicator name column from long to wide format. currently all the columns are in long format, Year(1960-2011), Country Name (all the countries in the world), Indicator name (varying by different indicators) and Value(individual values corresponding to year, indicator name and country name). How can i do this can someone help please. I would like the various indicators to be in the wide format with the corresponding value below it and on the other columns year and country name. Please help
Indicator.Name Year Country
GDP 1960 USA
GDP 1960 UK
Country Name Year GDP PPP HHH
USA 1960 7 9 10
Uk 1960 9 10 NA
World 1960 7 5 3
Africa 1960 3 7 NA
try using dcast from reshape2 like below:
library(reshape2)
indicator <- c('PPP','PPP','GDP','GDP')
country.name <- c('USA','UK','USA','UK')
year <- c(1960,1961,1960,1961)
value <- c(5,7,8,9)
d <- data.frame(indicator, country.name, year, value)
d1 <- dcast(d, country.name + year ~ indicator)