I have a data frame of daily precipitation data that runs from Jan-1980 to Dec-2017. I have aggregated monthly averages (as seen in the image). How would I go about examining certain months (E.G. compare all Decembers)?enter image description here
your could select the Decembers and draw a plot from this:
library(dplyr)
library(ggplot2)
dd.agg %>%
# filter all Decembers
dplyr::filter(mo == "12") %>%
# change name of last column
dplyr::rename(precipitation = 3) %>%
# change year to integer just in case it is char
dplyr::mutate(yr = as.integer(yr)) %>%
# order by year just in case it is unorderes
dplyr::arrange(yr) %>%
# draw a bar chart
ggplot2::ggplot(aes(x = yr, y = precipitation)) +
ggplot2::geom_col()
Related
I am trying to use ggplot to draw the data contained in the following date frame:
df <- data.frame( dress_id = c(1,2,3,4,5),
29/8/2013 = c(2000,150,6,1000,900),
31/8/2013 = c(2000,200,7,1100,1000),
2/9/2013 = c(2400,600,7,1350,1300),
4/9/2013 = c(2600,600,7,1500,1400),
style = c("Sexy", "Casual","vintage","Brief","cute"))
I want to have x-axis to be my date (29/8/2013...2/9/2013) and my y-axis to be the sales price of dates and finally my style.
Is this possible using ggplot?
here are the details to zx8754's answer.
First, note that I put an X infront of the date columns: this is because column-names in R should not start with a number.
df <- data.frame( dress_id = c(1,2,3,4,5),
"X29/8/2013" = c(2000,150,6,1000,900),
"X31/8/2013" = c(2000,200,7,1100,1000),
"X2/9/2013" = c(2400,600,7,1350,1300),
"X4/9/2013" = c(2600,600,7,1500,1400),
style = c("Sexy", "Casual","vintage","Brief","cute"))
Next, I load the tidyverse package, which contains functions to work with data.frames and also includes ggplot2
library(tidyverse)
Finally, I transform your data from wide to long: this is done with the gather functions. As a result, there is now a date column in your data.frame which contains all the present dates and a value column which contains the sales prices.
df %>%
gather(date, value, -dress_id, -style) %>%
mutate(date = as.Date(date, format = c("X%d.%m.%Y"))) %>%
ggplot(aes(x = date, y = value, colour = style)) +
geom_line()
I have a large data set, with repeated Plot measurements (count data) covering a large time span in different research areas. I now like to filter the data so that I only have complete field seasons left (April-November). Some areas are sampled in the same years, others in different years.
So far I have:
arthropods.all.sea <- with(arthropods.all, arthropods.all[month(Date) >= 4 & month(Date) < 12, ])
but can't figure out, how to include the condition that field seasons must be complete for each area.
Any help is very much appreciated.
I created a dummy data set, to illustrate how my real data set looks like.
df1 <- data.frame(ID = c("Ki_1","Ki_2","Ki_2","Ki_3","Ho_1","Ho_2"),
Date = as.POSIXct(c('1999-06-23', '1998-09-25', '1998-08-22', '2000-08-22', '1990-05-01', '1991-07-06')),
Area = c("Kin", "Kin", "Kin", "Kin","Hohe", "Hohe"),
Species=c("Species1","Species1","Species2","Species1","Species10","Species11"),
Count=c(12,23,21,14,7,2))
You can select only those Area where all the months from April-November are present in the data.
library(dplyr)
library(lubridate)
result <- df1 %>%
mutate(year = year(Date), month = month(Date)) %>%
group_by(Area, year) %>%
filter(all(4:11 %in% month) & month %in% 4:11)
I have a dataset where different cities go in and out of a program, like this example dataset:
example.dat <- data.frame (c(1000, 2000, 3000), c("15-10-01", "16-05-01", "16-07-01"), c("16-06-01", "16-10-01", "17-08-01"))
colnames(example.dat) <- c("Population", "Enter.Program", "Leave.Program")
This gives you a data frame that looks like:
Population Enter.Program Leave.Program
1000 15-10-01 16-06-01
2000 16-05-01 16-10-01
3000 16-07-01 17-08-01
First, I'd like to create an output table like this:
Per.Begin Per.End Total.Pop.In
15-10-01 16-04-30 1000
16-05-01 16-05-30 3000
16-06-01 16-06-30 2000
16-07-01 16-09-30 5000
16-10-01 17-07-30 3000
17-08-01 18-04-26 0
And then plot this in ggplot as a graph that looks like either a step function or kind of like a jagged rectangular surface where the top edge is the running total, kind of like a cumulative density function but where the y-axis can go down as well as up, and where the x-axis goes in steps that are the width of time periods.
Here are the steps I've blocked out, but I don't know how to execute:
Make column of unique dates
Make column of period change end dates (i.e., next unique date minus one day)
Calculate running sum of the cities within each period (i.e., third column)
Plot in ggplot
Using dplyr (because you tagged the question with it) you can do what you want. The main things that need to happen are:
Break out your entries and exits making your population positive and negative.
Get all the dates from your earliest to your last so you can have the desired blocky lines. It is probably possible to do this without every date, but this is easy and requires less thinking.
Code is below
library(dplyr)
library(ggplot2)
example.dat <- data.frame (c(1000, 2000, 3000), c("15-10-01", "16-05-01", "16-07-01"), c("16-06-01", "16-10-01", "17-08-01"))
colnames(example.dat) <- c("Population", "Enter.Program", "Leave.Program")
changes = example.dat %>%
select("Population","Date"="Enter.Program") %>%
bind_rows(example.dat %>%
select("Population","Date"="Leave.Program") %>%
mutate(Population = -1*Population)) %>%
mutate(Date = as.Date(Date,"%y-%m-%d"))
startDate = min(changes$Date)
endDate = max(changes$Date)
final = data_frame(Date = seq(startDate,endDate,1)) %>%
left_join(changes,by="Date") %>%
mutate(Population = cumsum(ifelse(is.na(Population),0,Population)))
ggplot(data = final,aes(x=Date,y=Population)) +
geom_line()
UPDATE
If you don't want to have every date from the earliest to the latest, you can use a blurgh for loop to add the needed rows to get a pretty result. Here we walk through and duplicate each date after the first with the preceding cumulative sum. It's not pretty, but it makes the graph.
library(dplyr)
library(ggplot2)
example.dat <- data.frame (c(1000, 2000, 3000), c("15-10-01", "16-05-01", "16-07-01"), c("16-06-01", "16-10-01", "17-08-01"))
colnames(example.dat) <- c("Population", "Enter.Program", "Leave.Program")
changes = example.dat %>%
select("Population","Date"="Enter.Program") %>%
bind_rows(example.dat %>%
select("Population","Date"="Leave.Program") %>%
mutate(Population = -1*Population)) %>%
mutate(Date = as.Date(Date,"%y-%m-%d")) %>%
arrange(Date) %>%
mutate(Population = cumsum(Population))
for(i in nrow(changes):2){
changes = bind_rows(changes[1:(i-1),],
data_frame(Population = changes$Population[i-1],Date = changes$Date[i]),
changes[i:nrow(changes),])
}
ggplot(data = changes,aes(x=Date,y=Population)) +
geom_line()
I am a bit stuck with some code. Of course I would appreciate a piece of code which sorts my dilemma, but I am also grateful for hints of how to sort that out.
Here goes:
First of all, I installed the packages (ggplot2, lubridate, and openxlsx)
The relevant part:
I extract a file from an Italians gas TSO website:
Storico_G1 <- read.xlsx(xlsxFile = "http://www.snamretegas.it/repository/file/Info-storiche-qta-gas-trasportato/dati_operativi/2017/DatiOperativi_2017-IT.xlsx",sheet = "Storico_G+1", startRow = 1, colNames = TRUE)
Then I created a data frame with the variables I want to keep:
Storico_G1_df <- data.frame(Storico_G1$pubblicazione, Storico_G1$IMMESSO, Storico_G1$`SBILANCIAMENTO.ATTESO.DEL.SISTEMA.(SAS)`)
Then change the time format:
Storico_G1_df$pubblicazione <- ymd_h(Storico_G1_df$Storico_G1.pubblicazione)
Now the struggle begins. Since in this example I would like to chart the 2 time series with 2 different Y axes because the ranges are very different. This is not really a problem as such, because with the melt function and ggplot i can achieve that. However, since there are NAs in 1 column, I dont know how I can work around that. Since, in the incomplete (SAS) column, I mainly care about the data point at 16:00, I would ideally have hourly plots on one chart and only 1 datapoint a day on the second chart (at said 16:00). I attached an unrelated example pic of a chart style I mean. However, in the attached chart, I have equally many data points on both charts and hence it works fine.
Grateful for any hints.
Take care
library(lubridate)
library(ggplot2)
library(openxlsx)
library(dplyr)
#Use na.strings it looks like NAs can have many values in the dataset
storico.xl <- read.xlsx(xlsxFile = "http://www.snamretegas.it/repository/file/Info-storiche-qta-gas-trasportato/dati_operativi/2017/DatiOperativi_2017-IT.xlsx",
sheet = "Storico_G+1", startRow = 1,
colNames = TRUE,
na.strings = c("NA","N.D.","N.D"))
#Select and rename the crazy column names
storico.g1 <- data.frame(storico.xl) %>%
select(pubblicazione, IMMESSO, SBILANCIAMENTO.ATTESO.DEL.SISTEMA..SAS.)
names(storico.g1) <- c("date_hour","immesso","sads")
# the date column look is in the format ymd_h
storico.g1 <- storico.g1 %>% mutate(date_hour = ymd_h(date_hour))
#Not sure exactly what you want to plot, but here is each point by hour
ggplot(storico.g1, aes(x= date_hour, y = immesso)) + geom_line()
#For each day you can group, need to format the date_hour for a day
#You can check there are 24 points per day
#feed the new columns into the gplot
storico.g1 %>%
group_by(date = as.Date(date_hour, "d-%B-%y-")) %>%
summarise(count = n(),
daily.immesso = sum(immesso)) %>%
ggplot(aes(x = date, y = daily.immesso)) + geom_line()
Here is what I have:
A data frame which contains a date field, and a number of summary statistics.
Here's what I want:
I want a chart that allows me to compare the time series week over week, to see how the performance of the process this week compares to the previous one, for example.
What I have done so far:
##Get the week day name to display
summaryData$WeekDay <- format(summaryData$Date, format = '%A')
##Get the week number to differentiate the weeks
summaryData$Week <- format(summaryData$Date, format = '%V')
summaryData %>%
ggvis(x = ~WeekDay, y = ~Referrers) %>%
layer_lines(stroke = ~Week)`
I expected it to create a chart with multiple coloured lines, each one representing a week in my data set. It does not do what I expect
Try looking at reshaper to convert your data with a factor variable for each week, or split up the data with a dplyr::lag() command.
A general way of doing graphs of multiple columns in ggivs is to use the following format
summaryData %>%
ggvis() %>%
layer_lines(x = ~WeekDay, y = ~Referrers)%>%
layer_lines(x=~WeekDay, y= ~Other)
I hope this helps