I have a dataset in which the first column is named central_government_debt_percent_of_gdp and contains a list of years, then several columns with the name of some countries that contain the debt/GDP ratio for each of them in every year.
You can see some of the data at this link:
I want to create a graph that shows the evolution of the ratio for each country, with separate lines. How can I do it with ggplot?
Do I have to add a geom_line for each country?
Should I do some data manipulation ?
As some have already mentioned, it would be appreciated if you provided a reproducible example. I will still try to answer your question, based on the link you included.
You need to do some data transformation, as your data is not yet in "tidy" format. This means: You want a column for every variable, a row for every observation and a cell should contain one value. For that, you need the pivot_longer() function.
library(tidyverse)
data %>%
pivot_longer(
cols= austria:germania,
names_to= "countries",
values_to= "values") %>%
ggplot(aes(x= central_government_dept_percent_of_gdp,
y=values,
color= countries)+
geom_line()
Related
In my dataset I have two columns, named part_1 and part_2, that contain several numerical values.
I am required to create a graph that shows how the average varies in the two parts.
I think that the best way is to create a barplot with a bar for each part, but I'm not sure about it.
First, I created two new columns that contain the mean values for the two parts in each row:
averages <- my_data %>% mutate(avg_part1=mean(part_1,na.rm=T)) %>% mutate(avg_part2=mean(part_2,na.rm=T))
Then, I inserted the values in two new variables:
avg_part1 <- averages %>% slice(1) %>% pull(avg_part1) avg_part2 <- averages %>% slice(1) %>% pull(avg_part2)
To create the plot I did:
to_graph<-c("First part"=avg_part1,"Second part"=avg_part2) barplot(to_graph)
And I obtained the graph I wanted, but it's not very nice to see.
I feel like this process is too complex and I may be able to do everything in a couple lines and without creating so many new variables, do you have any suggestions?
Also, I would prefer to create the graph with ggplot because it's better to improve the design, but I don't really know how to do it.
Thanks!
Using ggplot:
library(ggplot2)
library(dplyr)
my_data %>%
stack(select = c(part_1, part_2)) %>%
ggplot(aes(values, x=ind)) + geom_bar(stat="summary", fun=mean)
It is my first post, therefore I have an absolute beginner question. I tried to find help elsewhere, but couldn't make even the first steps.
I have the following data frame:
My aim is to plot a geom_line or geom_smooth, with a timeline on the x-axis from the column "Verstorbene" in my df (from 01-01 to 31-12), on the y-axis I would like to have a value-range from (0-1000) and the factors should be the years 2015-2022 for each year to be it's own factor (line).
It is difficult without a reproducible example to know the exact format of your data, but from the picture of your data, the steps you need to take are:
Pivot your data into long format, so that there is a single column for the value, and a factor column telling you which year the observation came from
Turn your date strings into actual dates.
Plot the result
I have created an example data set that should have the same names and structure as your own data (see footnote), so that the following code should work for you too (as long as your data frame is called my_df).
If this does not work for you, please include a reproducible version of your data rather than an image. dput(my_df) will be helpful to create this.
library(tidyverse)
my_df %>%
pivot_longer(-1, names_to = 'Year') %>%
mutate(Verstorbene = as.POSIXct(strptime(paste(Verstorbene, 2015),
'%e %B %Y'))) %>%
ggplot(aes(Verstorbene, value, color = Year)) +
geom_line() +
scale_x_datetime(date_labels = '%e %B') +
theme_minimal(base_size = 15)
df <- data.frame(Country = c("Indonesia","Indonesia","Brazil","Colombia","Mexico","Colombia","Costa Rica" ,"Mexico","Brazil","Costa Rica"),
Subject = c("Boys", "Girls","Boys","Boys","Boys","Girls","Boys","Girls","Girls","Girls"),
Value = c(358.000,383.000,400.000,407.000,415.000,417.000,419.000,426.000,426.000,434.000))
I'm trying to make a plot of Country vs Value, but ordering the points by the Value ascending for the Boys rows only. I know I can use something like:
df %>%
ggplot(aes(reorder(Country, Value), Value)) +
geom_point()
This does not take into account the Boys only rows in the subject column. How do I go about doing this?
Edit: The ordering can be done outside ggplot as:
df <- df %>%
arrange(Value, Subject)
However, I just cannot yet replicate it in the ggplot reorder. Included is an example of the data in question.
Arranging your data frame does not change the way the column Country will be ordered on the x axis. The priority for the order on the axis for discrete variables is:
If you supply a reorder or final specification in aes(), use that ordering
If the column is a factor, use the order of the levels of that factor
If the column is not a factor, order alphanumerically
As far as I know, you can only specify one column to use in reorder(), so the next step is to convert to a factor and specify the levels. The way the items appear in the ordering of the data frame does not matter, since the columns are treated completely separate from the order in which they appear in the data frame. In fact, this is kind of the whole idea behind mapping.
Therefore, if you want this particular order, you'll have to convert the Country column into a factor and specify levels. You can do that separately, or pipe it all together using mutate(). Just note that we have to specify to use unique() values of the Country column to ensure we only provide each level one time in the order in which they appear in the sorted data frame.
# color and size added for clarity on the sorting
df %>%
arrange(Subject, Value) %>%
mutate(Country=factor(Country, levels=unique(Country))) %>%
ggplot(aes(Country, Value, color=Subject)) + geom_point(size=3)
I have a data set like this one: Names of mutations and two numerical variables representing values in two conditions (CIP and TIG):
I was able to plot one variable (e.g. CIP) in these mutation using the following code:
Data names as "Dotchart2)
dotchart(Dotchart2$`CIP resistance`,
labels = rownames((Dotchart2)), pch = 16, cex = 1, pt.cex = 2)
This appeared as follows:
Since I am comparing CIP vs TIG, I would like to have the same figure but showing another dots for the TIG for the same mutation (i.e. on each horizontal mutation line, there will be two dots of different color, one for CIP value and the other for TIG value). It should appear like this figure for instance
Could any of you provide a simplified code for this ?
I think you'll find your answer here.
In the link provided, #JoshO'Brien creates a dotchart plot using a lattice configuration:
autos_data <- read.table("~/Documents/R/test.txt", header=F)
library(lattice)
dotplot(V1~V2, data=autos_data)
This documentation does a thorough job of explaining and detailing graph styles (graph_type), data graphing (formula), and the data source (data=), resulting in the following:
library(lattice)
graph_type(formula, data=)
To do this easily in lattice or ggplot2 you first need to convert your data to long format. I don't have a data set handy in the right format, so I took the famous iris data set and converted it to a wide-format data set called iris_wide (see code at the bottom). I'm using tidyverse here: all of this can also be done in base R.
(To understand what's going on here you should definitely examine the iris_wide and iris_long objects.)
convert from wide to long format
library(tidyverse)
iris_long <- iris_wide %>%
pivot_longer(cols=-id,names_to="species",values_to="value")
lattice version
lattice::dotplot(id~value, data=iris_long, group=species,pch=16,
auto.key=TRUE)
ggplot version
ggplot(iris_long, aes(value,id,colour=species))+geom_point()
convert iris data from long to wide
To match your example, I'm selecting only two categories (species) and one variable (sepal length)
iris_wide <- (iris
%>% filter(Species %in% c("setosa","virginica"))
%>% select(Sepal.Length, Species)
%>% group_by(Species)
%>% mutate(id=seq(n()))
%>% pivot_wider(names_from=Species, values_from=Sepal.Length)
%>% head(10)
%>% mutate(id=LETTERS[seq(n())])
)
I want to do the opposite of this question, and sort of the opposite of this question, though that's about legends, not the plot itself.
The other SO questions seem to be asking about how to keep unused factor levels. I'd actually like mine removed. I have several name variables and several columns (wide format) of variable attributes that I'm using to create numerous bar plots. Here's a reproducible example:
library(ggplot2)
df <- data.frame(name=c("A","B","C"), var1=c(1,NA,2),var2=c(3,4,5))
ggplot(df, aes(x=name,y=var1)) + geom_bar()
I get this:
I'd like only the names that have corresponding varn's show up in my bar plot (as in, there would be no empty space for B).
Reusing the base plot code will be quite easy if I can simply change my output file name and y=var bit. I'd like not have to subset my data frame just to use droplevels on the result for each plot if possible!
Update based on the na.omit() suggestion
Consider a revised data set:
library(ggplot2)
df <- data.frame(name=c("A","B","C"), var1=c(1,NA,2),var2=c(3,4,5), var3=c(NA,6,7))
ggplot(df, aes(x=name,y=var1)) + geom_bar()
I need to use na.omit() for plotting var1 because there's an NA present. But since na.omit makes sure values are present for all columns, the plot removes A as well since it has an NA in var3. This is more analogous to my data. I have 15 total responses with NAs peppered about. I only want to remove factor levels that don't have values for the current plotted y vector, not that have NAs in any vector in the whole data frame.
One easy options is to use na.omit() on your data frame df to remove those rows with NA
ggplot(na.omit(df), aes(x=name,y=var1)) + geom_bar()
Given your update, the following
ggplot(df[!is.na(df$var1), ], aes(x=name,y=var1)) + geom_bar()
works OK and only considers NA in Var1. Given that you are only plotting name and Var, apply na.omit() to a data frame containing only those variables
ggplot(na.omit(df[, c("name", "var1")]), aes(x=name,y=var1)) + geom_bar()
Notice that, when plotting, you're using only two columns of your data frame, meaning that, rather than passing your whole data.frame you could take the relevant columns x[,c("name", "var1")] apply na.omit to remove the unwanted rows (as Gavin Simpson suggests) na.omit(x[,c("name", "var1")]) and then plot this data.
My R/ggplot is quite rusty, and I realise that there are probably cleaner ways to achieve this.
A lot of time has passed since this question was originally asked. In 2021 if I was handling this I would use something like:
library(ggplot2)
library(tidyr)
df <- data.frame(name=c("A","B","C"), var1=c(1,NA,2),var2=c(3,4,5))
df %>%
drop_na(var1) %>%
ggplot(aes(name, var1)) +
geom_col()
Created on 2021-12-03 by the reprex package (v2.0.1)