I have problems with coding of BIG DATA.
view(data)
Year
Month
Deaths
1998
1
200
1998
2
40
1998
3
185
1998
4
402
1998
5
20
1998
6
48
1998
7
290
1998
8
15
1998
9
252
1998
10
409
1998
11
233
1998
12
122
My data goes until 2014. I would like to create a time series. In the x-Axis only some years are available in 5 year step. In the y axis the deaths of all month during the 2000 years are shown. I don't know how can I code that?
I am not sure if it is right because I didn't have any data. I have this from a programming book
data$date = as.Date(paste(data$Year, data$Month,1), format = "%Y %m %d")
ggplot(data,
aes(
x = date,
y = Deaths,
)) +
geom_line() +
ggtitle("Time series") +
xlab("Year") +
ylab("Deaths")
Update if you want a month break, you can use
scale_x_date(date_breaks = "year", date_labels = "%Y", date_minor_breaks = "month")
Related
I am trying to animate some plots in R using ggplot2 and gganimate.
I read some tutorials and successfully animated the gapminder data.
However, I have problems when I try to animate my dataset. First I created the plot and try to animate it like this:
animate_data_1 <- ggplot(data_1,
aes(x = Month, y = rain, colour = factor(Year))) +
geom_line(stat = "identity") +
labs(x = "Month", y = "Rain") +
geom_point(show.legend = FALSE, alpha = 0.7) +
scale_color_viridis_d()
animate_data_1
animate_data_1 + transition_time(Year) + labs(title = "Year: {frame_time}")
The above code is creating the animated plot, but it is showing one year at a time and then changing to another year. But, I want all year will change the value at a time (increasing from start to end). Like this example data gradually appear
So, I made this change:
animate_data_1 +
geom_point(aes(group = seq_along(Month))) +
transition_reveal(Year) +
labs(title = "Year: {frame_time}")
Now it showing this error:
Error: Provided file does not exist In addition: There were 50 or more warnings (use warnings() to see the first 50)
What is the problem? How can solve this?
My data:
tem Month Year rain
1 16.9760 1 1901 18.53560
2 19.9026 2 1901 16.25480
3 24.3158 3 1901 70.79810
4 28.1834 4 1901 66.16160
5 27.8892 5 1901 267.21500
6 28.8925 6 1901 341.04200
7 28.3327 7 1901 540.90700
8 27.9243 8 1901 493.21000
9 27.6057 9 1901 291.54900
10 27.0887 10 1901 199.17100
11 22.1671 11 1901 126.28500
12 18.5574 12 1901 1.69035
13 18.5455 1 1902 1.29152
14 20.1252 2 1902 0.14722
15 25.5508 3 1902 62.76860
16 26.5562 4 1902 229.58900
17 27.3165 5 1902 302.19700
18 28.2660 6 1902 528.77500
19 27.6247 7 1902 415.25700
20 28.1001 8 1902 435.16600
21 27.7271 9 1902 282.87200
22 26.0153 10 1902 76.65180
Try this but the time frame in not coming. Howver, this code in not giving the gradual increasing plot but it is giving all year one after another
animate_data_1 +
geom_point(aes(group = seq_along(Year)))+
transition_reveal(Year)
To get a movement for each month, you need to define a new variable to combine year and month as shown below, and then reveal it via transition_time. You can change fps value to increase or decrease the speed.
data_1$Yearm <- data_1$Year + data_1$Month*0.08
p <- animate_data_1 + transition_time(Yearm) +
labs(title = "Year: {frame_time}")
animate(p, fps=5)
I am trying to plot the monthly rainfall data from 1986 to 2016 using ggplot. My dataframe looks like this:
head(df)
Year Month Station Rainfall Remarks
1 1986 Jan stn1 0.0 Observed
2 1986 Feb stn1 10.4 Observed
3 1986 Mar stn1 16.5 Estimated
4 1986 Apr stn1 34.0 Observed
5 1986 May stn1 27.0 Observed
6 1986 Jun stn1 159.4 Observed
str(df)
'data.frame': 1488 obs. of 5 variables:
$ Year : chr "1986" "1986" "1986" "1986" ...
$ Month : Ord.factor w/ 12 levels "Jan"<"Feb"<"Mar"<..: 1 2 3 4 5 6 7 8 9 10 ...
$ Station : Factor w/ 4 levels "stn1","stn2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Rainfall: num 0 10.4 16.5 34 27 ...
$ Remarks : Factor w/ 2 levels "Estimated","Observed": 2 2 1 2 2 2 2 2 2 2 ...
I tried the following code:
library(ggplot2)
ggplot(df, aes(x=Year, y=Rainfall, col=Station)) + geom_line()
However the above code results in vertical lines plot, while I want to have smooth varying lines.
I want to plot all the four station (stn1 to stn4) such that the color of each line be based on the df$Remarks.
Also is it possible to have unique color for each station?
Your help would be appreciated
Here is one approach if you create a month-year variable:
library(ggplot2)
library(zoo)
df$Mo_Yr <- as.yearmon(paste0(df$Year, '-', df$Month), "%Y-%b")
ggplot(df, aes(x=Mo_Yr, y=Rainfall, col=Station)) +
geom_line() +
scale_x_yearmon()
If you want to use different color points for Remarks (Observed and Estimated), for a single Station, you could try the following:
ggplot(df, aes(x=Mo_Yr, y=Rainfall)) +
geom_point(aes(col = Remarks)) +
geom_line() +
scale_x_yearmon()
If you want to plot 2 lines for Observed and Estimated, you could add col argument to geom_line as below. Note I added some example data to illustrate. Depending on what data you have available this may (or may not) be what you need.
ggplot(df, aes(x=Mo_Yr, y=Rainfall)) +
geom_line(aes(col=Remarks)) +
scale_x_yearmon()
Data (for last example)
df <- read.table(text =
"Year Month Station Rainfall Remarks
1986 Jan stn1 0.0 Observed
1986 Feb stn1 10.4 Observed
1986 Mar stn1 16.5 Estimated
1986 Apr stn1 34.0 Observed
1986 May stn1 27.0 Observed
1986 Jun stn1 159.4 Observed
1986 Jul stn1 83.1 Estimated
1986 Aug stn1 55.7 Observed
1986 Sep stn1 12.3 Estimated", header = T, stringsAsFactors = T)
You might want to try passing the stat_smooth parameter
ggplot(df) +
geom_line(aes(y= Rainfall, x= Year, color= Station)) +
stat_smooth(aes(y= Rainfall, x= Year), method = lm, formula = y ~ poly(x, 10), se = FALSE)
I have a large dataframe (CO2_df) with many years for many countries, and tried to plot a graph with ggplot2. This graph will have 6 curves + an aggregate curve. However, my graph looks pretty "funny" and I have no idea why.
The data looks like this (excerpt):
x y x1 x2 x4 x6
1553 1993 0.00000 CO2 Austria 6 6 - Other Sector
1554 2006 0.00000 CO2 Austria 6 6 - Other Sector
1555 2015 0.00000 CO2 Austria 6 6 - Other Sector
2243 1998 12.07760 CO2 Austria 5 5 - Waste management
2400 1992 11.12720 CO2 Austria 5 5 - Waste management
2401 1995 11.11040 CO2 Austria 5 5 - Waste management
2402 2006 10.26000 CO2 Austria 5 5 - Waste management
2489 1998 0.00000 CO2 Austria 6 6 - Other Sector
I have used this code:
ggplot(data=CO2_df, aes(x=x, y=y, group=x6, colour=x6)) +
geom_line() +
geom_point() +
ggtitle("Austria") +
xlab("Year") +
ylab("C02 Emissions") +
labs(colour = "Sectors")
scale_color_brewer(palette="Dark2")
CO2_df %>%
group_by(x) %>%
mutate(sum.y = sum(y)) %>%
ggplot(aes(x=x, y=y, group=x6, colour=x6)) +
geom_line() +
geom_point() +
ggtitle("Austria") +
xlab("Year") +
ylab("C02 Emissions") +
labs(colour = "Sectors")+
scale_color_brewer(palette="Dark2")+
geom_line(aes(y = sum.y), color = "black")
My questions
1) Why does it look like this and how can I solve it?
2) I have no idea why the value on the y axis are close to zero. They are not...
3) How can I add an entry to the legend for the aggregate line?
Thank you for any sort of help!
Nordsee
What about something like this:
CO2_df %>% # data
group_by(x,x6) %>% # group by
summarise(y = sum(y)) %>% # add the sum per group
ggplot(aes(x=x, y=y)) + # plot
geom_line(aes(group=x6, fill=x6, color=x6))+
# here you can put a summary line, like sum, or mean, and so on
stat_summary(fun.y = sum, na.rm = TRUE, color = 'black', geom ='line') +
geom_point() +
ggtitle("Austria") +
xlab("Year") +
ylab("C02 Emissions") +
labs(colour = "Sectors")+
scale_color_brewer(palette="Dark2"))
With modified data, to see the right behaviour, I've put same years and very different values to understand:
CO2_df <- read.table(text ="
x y x1 x2 x4 x6
1553 1993 20 CO2 'Austria' 6 '6 - Other Sector'
1554 1994 23 CO2 'Austria' 6 '6 - Other Sector'
1555 1995 43 CO2 'Austria' 6 '6 - Other Sector'
2243 1993 12.07760 CO2 'Austria' 5 '5 - Waste management'
2400 1994 11.12720 CO2 'Austria' 5 '5 - Waste management'
2401 1995 11.11040 CO2 'Austria' 5 '5 - Waste management'
2402 1996 10.26000 CO2 'Austria' 5 '5 - Waste management'
2489 1996 50 CO2 'Austria' 6 '6 - Other Sector'", header = T)
I'm a newer R user and I need help with a time series plot. I created a time series plot, and cannot figure out how to change my x-axis values to correspond to my sample dates. My data is as follows:
Year Month Level
2009 8 350
2009 9 210
2009 10 173
2009 11 166
2009 12 153
2010 1 141
2010 2 129
2010 3 124
2010 4 103
2010 5 69
2010 6 51
2010 7 49
2010 8 51
2010 9 51
Let's say this data is saved as the name "data.csv"
data = read.table("data.csv", sep = ",", header = T)
data.ts = ts(data, frequency = 1)
plot(dat.mission.ts[, 3], ylab = "level", main = "main", axes = T)
I've also tried inputing the start = c(2009, 8) into the ts function but I still get wrong values
When I plot this my x axis does not correlate to August 2009 through Sept. 2010. It will either increase by year or just by decimal. I've looked up many examples online and also through the ? help on R, but cannot find a way to relabel my axis values. Any help would be appreciated.
Using base coding, you can accomplish this in a few steps. As described in this SO answer, you can identify your "Month" and "Year" data as a date if you use as.Date and paste functions together and incorporate a day (i.e., first day of the month; "1"). For the purposes of this answer, I will simply refer to the data you provided as df:
df$date<-with(df,as.Date(paste(Year,Month,'1',sep='-'),format='%Y-%m-%d'))
df
Year Month Level date
1 2009 8 350 2009-08-01
2 2009 9 210 2009-09-01
3 2009 10 173 2009-10-01
4 2009 11 166 2009-11-01
5 2009 12 153 2009-12-01
6 2010 1 141 2010-01-01
7 2010 2 129 2010-02-01
8 2010 3 124 2010-03-01
9 2010 4 103 2010-04-01
10 2010 5 69 2010-05-01
11 2010 6 51 2010-06-01
12 2010 7 49 2010-07-01
13 2010 8 51 2010-08-01
14 2010 9 51 2010-09-01
Then you can use your basic plot, axis, and mtext functions to control how you want to visualize the data and your axes. For instance:
xmin<-min(df$date,na.rm=T);xmax<-max(df$date,na.rm=T) #ESTABLISH X-VALUES (MIN & MAX)
ymin<-min(df$Level,na.rm=T);ymax<-max(df$Level,na.rm=T) #ESTABLISH Y-VALUES (MIN & MAX)
xseq<-seq.Date(xmin,xmax,by='1 month') #CREATE DATE SEQUENCE THAT INCREASES BY MONTH FROM DATE MINIMUM TO MAXIMUM
yseq<-round(seq(0,ymax,by=50),0) # CREATE SEQUENCE FROM 0-350 BY 50
par(mar=c(1,1,0,0),oma=c(6,5,3,2)) #CONTROLS YOUR IMAGE MARGINS
plot(Level~date,data=df,type='b',ylim=c(0,ymax),axes=F,xlab='',ylab='');box() #PLOT LEVEL AS A FUNCTION OF DATE, REMOVE AXES FOR FUTURE CUSTOMIZATION
axis.Date(side=1,at=xseq,format='%Y-%m',labels=T,las=3) #ADD X-AXIS LABELS WITH "YEAR-MONTH" FORMAT
axis(side=2,at=yseq,las=2) #ADD Y-AXIS LABELS
mtext('Date (Year-Month)',side=1,line=5) #X-AXIS LABEL
mtext('Level',side=2,line=4) #Y-AXIS LABEL
library(data.table)
library(ggplot2)
library(scales)
data<-data.table(datetime=seq(as.POSIXct("2009/08/01",format="%Y/%m/%d"),
as.POSIXct("2010/09/01",format="%Y/%m/%d"),by="1 month"),
Level=c(350,210,173,166,153,141,129,124,103,69,51,49,51,51))
ggplot(data)+
geom_point(aes(x=datetime,y=Level),col="brown1",size=1)+
scale_x_datetime(labels = date_format("%Y/%m"),breaks = "1 month")+
theme(axis.text.x = element_text(angle = 90, hjust = 1,vjust=0.3))
Example using xts package:
library(xts)
ts1 <- xts(data$Level, as.POSIXct(sprintf("%d-%d-01", data$Year, data$Month)))
# or ts1 <- xts(data$Level, as.yearmon(data$Year + (data$Month-1)/12))
plot(ts1)
If you are using ggplot2:
library(ggplot2)
autoplot(ts1)
I'm a ggplot2 newbie and have a rather simple question regarding time-series plots.
I have a data set in which the data is structured as follows.
Area 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
MIDWEST 10 6 13 14 12 8 10 10 6 9
How do I generate a time series when the data is structured in this format.
With the reshape package, I could just alter the data to look like:
totmidc <- melt(totmidb, id="Area")
totmidc
Area variable value
1 MIDWEST 1998 10
2 MIDWEST 1999 6
3 MIDWEST 2000 13
4 MIDWEST 2001 14
5 MIDWEST 2002 12
6 MIDWEST 2003 8
7 MIDWEST 2004 10
8 MIDWEST 2005 10
9 MIDWEST 2006 6
10 MIDWEST 2007 9
Then run the following code to get the desired plot.
ggplot(totmidc, aes(Variable, Value)) + geom_line() + xlab("") + ylab("")
However, is it possible to generate a time series plot from the first
object in which the columns represent the years.
What is the error that ggplot2 gives you? The following seems to work on my machine:
Area <- as.numeric(unlist(strsplit("1998 1999 2000 2001 2002 2003 2004 2005 2006 2007", "\\s+")))
MIDWEST <-as.numeric(unlist(strsplit("10 6 13 14 12 8 10 10 6 9", "\\s+")))
qplot(Area, MIDWEST, geom = "line") + xlab("") + ylab("")
#Or in a dataframe
df <- data.frame(Area, MIDWEST)
qplot(Area, MIDWEST, data = df, geom = "line") + xlab("") + ylab("")
You may also want to check out the ggplot2 website for details on scale_date et al.
I am guessing that with "time series plot" you mean you want to get a bar chart instead of a line chart?
In that case, you have to modify your code only slightly to pass the correct parameters to geom_bar(). The geom_bar default stat is stat_bin, which will calculate a frequency count of your categories on the x-scale. With your data you want to override this behaviour and use stat_identity.
library(ggplot2)
# Recreate data
totmidc <- data.frame(
Area = rep("MIDWEST", 10),
variable = 1998:2007,
value = round(runif(10)*10+1)
)
# Line plot
ggplot(totmidc, aes(variable, value)) + geom_line() + xlab("") + ylab("")
# Bar plot
# Note that the parameter stat="identity" passed to geom_bar()
ggplot(totmidc, aes(x=variable, y=value)) + geom_bar(stat="identity") + xlab("") + ylab("")
This produces the following bar plot: