It is my first post, therefore I have an absolute beginner question. I tried to find help elsewhere, but couldn't make even the first steps.
I have the following data frame:
My aim is to plot a geom_line or geom_smooth, with a timeline on the x-axis from the column "Verstorbene" in my df (from 01-01 to 31-12), on the y-axis I would like to have a value-range from (0-1000) and the factors should be the years 2015-2022 for each year to be it's own factor (line).
It is difficult without a reproducible example to know the exact format of your data, but from the picture of your data, the steps you need to take are:
Pivot your data into long format, so that there is a single column for the value, and a factor column telling you which year the observation came from
Turn your date strings into actual dates.
Plot the result
I have created an example data set that should have the same names and structure as your own data (see footnote), so that the following code should work for you too (as long as your data frame is called my_df).
If this does not work for you, please include a reproducible version of your data rather than an image. dput(my_df) will be helpful to create this.
library(tidyverse)
my_df %>%
pivot_longer(-1, names_to = 'Year') %>%
mutate(Verstorbene = as.POSIXct(strptime(paste(Verstorbene, 2015),
'%e %B %Y'))) %>%
ggplot(aes(Verstorbene, value, color = Year)) +
geom_line() +
scale_x_datetime(date_labels = '%e %B') +
theme_minimal(base_size = 15)
Related
I have a dataset in which the first column is named central_government_debt_percent_of_gdp and contains a list of years, then several columns with the name of some countries that contain the debt/GDP ratio for each of them in every year.
You can see some of the data at this link:
I want to create a graph that shows the evolution of the ratio for each country, with separate lines. How can I do it with ggplot?
Do I have to add a geom_line for each country?
Should I do some data manipulation ?
As some have already mentioned, it would be appreciated if you provided a reproducible example. I will still try to answer your question, based on the link you included.
You need to do some data transformation, as your data is not yet in "tidy" format. This means: You want a column for every variable, a row for every observation and a cell should contain one value. For that, you need the pivot_longer() function.
library(tidyverse)
data %>%
pivot_longer(
cols= austria:germania,
names_to= "countries",
values_to= "values") %>%
ggplot(aes(x= central_government_dept_percent_of_gdp,
y=values,
color= countries)+
geom_line()
I'm still new to R and Stackoverflow and was looking for help. My assignment is about the World Cup. I want to make a bar chart that shows the abbreviations of country names on the x-axis and the attendance of their stadiums on the y-axis. I used the code below and got a graph that I attached as an image to see what I made. The problem is that the x-axis shows all the countries in the dataframe and I only want about 10 selected countries. Is there anything I'm missing and what can I do. Thank you
CODE:
WorldCupMatches %>%
ggplot(aes(x = Home.Team.Initials, y = Attendance)) +
geom_col()
OP, you can filter the dataset before you pipe into ggplot(...). There are a few ways to do this, but I find using dplyr::filter() function to be one of the simplest. You can specify to only include rows in your dataframe that satisfy a particular condition:
WorldCupMatches %>%
dplyr::filter(Home.Team.Initials %in% c(...) ) %>%
ggplot(aes(x = Home.Team.Initials, y = Attendance)) +
geom_col()
Just specify c(...) to be a vector of the home team initials you want to see shown in the plot.
I have a data set like this one: Names of mutations and two numerical variables representing values in two conditions (CIP and TIG):
I was able to plot one variable (e.g. CIP) in these mutation using the following code:
Data names as "Dotchart2)
dotchart(Dotchart2$`CIP resistance`,
labels = rownames((Dotchart2)), pch = 16, cex = 1, pt.cex = 2)
This appeared as follows:
Since I am comparing CIP vs TIG, I would like to have the same figure but showing another dots for the TIG for the same mutation (i.e. on each horizontal mutation line, there will be two dots of different color, one for CIP value and the other for TIG value). It should appear like this figure for instance
Could any of you provide a simplified code for this ?
I think you'll find your answer here.
In the link provided, #JoshO'Brien creates a dotchart plot using a lattice configuration:
autos_data <- read.table("~/Documents/R/test.txt", header=F)
library(lattice)
dotplot(V1~V2, data=autos_data)
This documentation does a thorough job of explaining and detailing graph styles (graph_type), data graphing (formula), and the data source (data=), resulting in the following:
library(lattice)
graph_type(formula, data=)
To do this easily in lattice or ggplot2 you first need to convert your data to long format. I don't have a data set handy in the right format, so I took the famous iris data set and converted it to a wide-format data set called iris_wide (see code at the bottom). I'm using tidyverse here: all of this can also be done in base R.
(To understand what's going on here you should definitely examine the iris_wide and iris_long objects.)
convert from wide to long format
library(tidyverse)
iris_long <- iris_wide %>%
pivot_longer(cols=-id,names_to="species",values_to="value")
lattice version
lattice::dotplot(id~value, data=iris_long, group=species,pch=16,
auto.key=TRUE)
ggplot version
ggplot(iris_long, aes(value,id,colour=species))+geom_point()
convert iris data from long to wide
To match your example, I'm selecting only two categories (species) and one variable (sepal length)
iris_wide <- (iris
%>% filter(Species %in% c("setosa","virginica"))
%>% select(Sepal.Length, Species)
%>% group_by(Species)
%>% mutate(id=seq(n()))
%>% pivot_wider(names_from=Species, values_from=Sepal.Length)
%>% head(10)
%>% mutate(id=LETTERS[seq(n())])
)
I am a bit stuck with some code. Of course I would appreciate a piece of code which sorts my dilemma, but I am also grateful for hints of how to sort that out.
Here goes:
First of all, I installed the packages (ggplot2, lubridate, and openxlsx)
The relevant part:
I extract a file from an Italians gas TSO website:
Storico_G1 <- read.xlsx(xlsxFile = "http://www.snamretegas.it/repository/file/Info-storiche-qta-gas-trasportato/dati_operativi/2017/DatiOperativi_2017-IT.xlsx",sheet = "Storico_G+1", startRow = 1, colNames = TRUE)
Then I created a data frame with the variables I want to keep:
Storico_G1_df <- data.frame(Storico_G1$pubblicazione, Storico_G1$IMMESSO, Storico_G1$`SBILANCIAMENTO.ATTESO.DEL.SISTEMA.(SAS)`)
Then change the time format:
Storico_G1_df$pubblicazione <- ymd_h(Storico_G1_df$Storico_G1.pubblicazione)
Now the struggle begins. Since in this example I would like to chart the 2 time series with 2 different Y axes because the ranges are very different. This is not really a problem as such, because with the melt function and ggplot i can achieve that. However, since there are NAs in 1 column, I dont know how I can work around that. Since, in the incomplete (SAS) column, I mainly care about the data point at 16:00, I would ideally have hourly plots on one chart and only 1 datapoint a day on the second chart (at said 16:00). I attached an unrelated example pic of a chart style I mean. However, in the attached chart, I have equally many data points on both charts and hence it works fine.
Grateful for any hints.
Take care
library(lubridate)
library(ggplot2)
library(openxlsx)
library(dplyr)
#Use na.strings it looks like NAs can have many values in the dataset
storico.xl <- read.xlsx(xlsxFile = "http://www.snamretegas.it/repository/file/Info-storiche-qta-gas-trasportato/dati_operativi/2017/DatiOperativi_2017-IT.xlsx",
sheet = "Storico_G+1", startRow = 1,
colNames = TRUE,
na.strings = c("NA","N.D.","N.D"))
#Select and rename the crazy column names
storico.g1 <- data.frame(storico.xl) %>%
select(pubblicazione, IMMESSO, SBILANCIAMENTO.ATTESO.DEL.SISTEMA..SAS.)
names(storico.g1) <- c("date_hour","immesso","sads")
# the date column look is in the format ymd_h
storico.g1 <- storico.g1 %>% mutate(date_hour = ymd_h(date_hour))
#Not sure exactly what you want to plot, but here is each point by hour
ggplot(storico.g1, aes(x= date_hour, y = immesso)) + geom_line()
#For each day you can group, need to format the date_hour for a day
#You can check there are 24 points per day
#feed the new columns into the gplot
storico.g1 %>%
group_by(date = as.Date(date_hour, "d-%B-%y-")) %>%
summarise(count = n(),
daily.immesso = sum(immesso)) %>%
ggplot(aes(x = date, y = daily.immesso)) + geom_line()
Here is what I have:
A data frame which contains a date field, and a number of summary statistics.
Here's what I want:
I want a chart that allows me to compare the time series week over week, to see how the performance of the process this week compares to the previous one, for example.
What I have done so far:
##Get the week day name to display
summaryData$WeekDay <- format(summaryData$Date, format = '%A')
##Get the week number to differentiate the weeks
summaryData$Week <- format(summaryData$Date, format = '%V')
summaryData %>%
ggvis(x = ~WeekDay, y = ~Referrers) %>%
layer_lines(stroke = ~Week)`
I expected it to create a chart with multiple coloured lines, each one representing a week in my data set. It does not do what I expect
Try looking at reshaper to convert your data with a factor variable for each week, or split up the data with a dplyr::lag() command.
A general way of doing graphs of multiple columns in ggivs is to use the following format
summaryData %>%
ggvis() %>%
layer_lines(x = ~WeekDay, y = ~Referrers)%>%
layer_lines(x=~WeekDay, y= ~Other)
I hope this helps