Defining X axis by 2 parameters in a scatter plot - r

I am new to R, and I am working on graphing data that is spread out over the years 1963-2014. In my data, I have one column for the year (year), another for a month (month), and another for the concentration of magnesium in the water (Mg).
I am trying to make a scatter plot of how magnesium concentration has changed over time, but if I plot years on the x-axis and magnesium on the y, I end up with 12 points (one for each month) stacked on top of each other for every year. My data is called water2, and it produces
this graph.
Is there a way to ask R to spread these magnesium points out over the months and the years, essentially using two columns to define 1 x-axis? Alternatively, is there a way to create a new column that will define the years and months in one?

# dummy data
data <- data.frame(year = rep(1963:2014, each = 12),
month = rep(1:12, times = 52),
value = cumsum(rnorm(12*52)))
# convert it to a time-series object and plot it :
data.ts <- ts(data$value, start = 1963, frequency = 12)
plot.ts(data.ts, type = "p")
# Or you can ignore the time variables and just make a "index plot" with one variable :
plot(data$value, type = "p", xaxt = "n")
axis(1, at = seq(1, 12*52, by = 12), labels = 1963:2014)
# If you wanna merge year and month and generate a new variable :
data <- within(data, time <- paste(year, month, sep = "-"))
head(data)
year month value time
1 1963 1 -0.56389506 1963-1
2 1963 2 0.60636512 1963-2
3 1963 3 0.04645893 1963-3
4 1963 4 -0.76187300 1963-4
5 1963 5 -1.22781272 1963-5
6 1963 6 -2.33044086 1963-6

Related

how do I plot 3 variable separarelt in ggplot?

I want to create a time series plot showing how two variables have changed overtime and colour them to their appropriate region?
I have 2 regions, England and Wales and for each I have calculated the total_tax and the total_income.
I want to plot these on a ggplot over the years, using the years variable.
How would I do this and colour the regions separately?
I have the year variable which I will put on the x axis, then I want to plot both incometax and taxpaid on the graph but show how they have both changed over time?
How would I add a 3rd axis to get the plot how these two variables have changed overtime?
I have tried this code but it has not worked the way I wanted it to do.
ggplot(tax_data, filter %>% aes(x=date)) +
geom_line(aes(y=incometax, color=region)) +
geom_line(aes(y=taxpaid, color=region))+
ggplot is at the beginning a bit hard to grasp - I guess you're trying to achieve something like the following:
Assuming your data is in a format with a column for each date, incometax and taxpaid - I'm creating here an example:
library(tidyverse)
dataset <- tibble(date = seq(from = as.Date("2015-01-01"), to = as.Date("2019-12-31"), by = "month"),
incometax = rnorm(60, 100, 10),
taxpaid = rnorm(60, 60, 5))
Now, for plotting a line for each incometax and taxpaid we need to shape or "tidy" the data (see here for details):
dataset <- dataset %>% pivot_longer(cols = c(incometax, taxpaid))
Now you have three columns like this - we've turned the former column names into the variable name:
# A tibble: 6 x 3
date name value
<date> <chr> <dbl>
1 2015-01-01 incometax 106.
2 2015-01-01 taxpaid 56.9
3 2015-02-01 incometax 112.
4 2015-02-01 taxpaid 65.0
5 2015-03-01 incometax 95.8
6 2015-03-01 taxpaid 64.6
this has now the right format for ggplot and you can map the name to the colour of the lines:
ggplot(dataset, aes(x = date, y = value, colour = name)) + geom_line()

Importing/Plotting a Time Series in R with two columns

I have RStudio and want to import a time series data set. The column on the x-axis should be the year, however when I use the ts.plot command it just plots Time on the x-axis. How can I make the years from the data set appear on my plot?
The data set is for Water Usage in NYC from 1898 to 1968. There are two columns, The Year and Water Usage.
This is the link to the data I used (I have donwnloaded the .TSV file)
https://datamarket.com/data/set/22tl/annual-water-use-in-new-york-city-litres-per-capita-per-day-1898-1968#!ds=22tl&display=line
These are the commands for importing my data:
nyc <- read.csv("~/Desktop/annual-water-use-in-new-york-cit.tsv", sep="")
View(nyc)
ts.plot(nyc)
This is what I get:
There are several ways to do this. I used the CSV file from your link in this demonstration.
library(tidyverse)
nyc <- read_csv("annual-water-use-in-new-york-cit.csv")
head(nyc)
# A tibble: 6 x 2
Year `Annual water use in New York city, litres per capita per day, 1898-1968`
<chr> <chr>
1 1898 402.8
2 1899 421.3
3 1900 431.2
4 1901 426.2
5 1902 425.5
6 1903 423.6
Method 1
Create a time series object and plot this time series.
Firstly, let us fix the column name of the annual water use so that it is easier to call in our code.
nyc <- nyc %>%
rename(
water_use = `Annual water use in New York city, litres per capita per day, 1898-1968`
)
Make the time series object nyc.ts with the ts() function.
nyc.ts <- ts(as.numeric(nyc$water_use), start = 1898)
You can then use the generic plot function to plot the time series.
plot(nyc.ts, xlab = "Years")
Method 2
Use the forecast::autoplot function. Note that this function is built on top of ggplot2.
autoplot(nyc.ts) + xlab("Years") + ylab("Amount in Litres")
Method 3
With just ggplot2:
nyc$Year <- as.POSIXct(nyc$Year, format = "%Y")
nyc$water_use <- as.numeric(nyc$water_use)
ggplot(nyc, aes(x = Year, y = water_use)) + geom_line() + xlab("Years") + ylab("Amount in Litres")

How to plot different months as different series in the same graph in R

I have the following dataset
head(Data)
Fecha PriceStats
1 01-2002 45.2071
2 02-2002 46.6268
3 03-2002 48.4712
4 04-2002 53.5067
5 05-2002 55.6527
6 06-2002 57.6684
ThereĀ“s a total of 176 observations.
Every row corresponds to a different month.
I would like to create a graph with the 12 months of the year in the x-axis and that every year of the dataset (containing 12 months each) corresponds to a series in the graph so I can plot all the different years overlapping (in these case would be 15 series).
Do I have to set levels on the dataset or ggplot can do that directly?
This should do it:
library(ggplot2)
library(lubridate)
Data <- data.frame(date = seq(ymd('2014/01/01'), ymd('2016/12/01'), 30),
n = sample(1:50, 36))
Data$month <- month(Data$date)
Data$year <- year(Data$date)
ggplot(Data, aes(x = month, y = n, group = year)) +
geom_line(aes(colour = as.factor(year)))

How to draw time series plot for data in date format in R

I have data where there are dates of visits of children.
date
16.08.13
16.08.13
16.08.13
17.08.13
27.08.13
03.09.13
04.09.13
05.09.13
07.09.13
07.09.13
I want to draw a time series plot in R that shows the dates and corresponding number of visits. For example, above there are 3 children on 16.08.2013.
In addition, my data cover 3 years. So, I would like to see the seasonal change over 3 years.
First let us create a longer data set called r. Use table to compute the frequencies, convert to a zoo time series and plot. Then compute the mean of each year/month and create a monthplot. Finally plot the means over all months vs month.
# test data
set.seed(123)
r <- as.Date("2000-01-01") + cumsum(rpois(1000, 1))
library(zoo)
opar <- par(mfrow = c(2,2)) # create a 2x2 grid of plots - optional
# plot freq vs. time
tab <- table(r)
z <- zoo(c(tab), as.Date(names(tab)))
plot(z) # this will be the upper left plot
# plot each month separately
zm <- aggregate(z, as.yearmon, mean)
monthplot(zm) # upper right plot
# plot month means
# zc <- aggregate(zm, cycle(zm), mean) # alternative but not equivalent
zc <- aggregate(z, cycle(as.yearmon(time(z))), mean)
plot(zc) # lower plot
par(opar) # reset grid
Note: The sum of z for each year/month is zym and the average of those for all the January months, all the February months, ...., all December months is:
zym <- aggregate(z, as.yearmon(time(z)), sum)
aggregate(zym, cycle(as.yearmon(time(zym))), mean)
With ggplot and scale packages you can try something like this (which is a piece of my code that actually works):
library(ggplot2)
library(lubridate)
library(scales)
g_sm_ddply <- ggplot(final_data, aes(x = as.Date(dates), y = scon_me, fill = tipo))
g_sm_ddply + geom_bar(position = "dodge", stat = "identity") +
labs(title = "SCONTRINO MEDIO ACQ_ISS_KPMG NUOVA CLUSTERIZZAZIONE", x = "data", y = "scontrino medio")+
scale_x_date(breaks = date_breaks("month"), labels = date_format("%Y/%m"))
I assume that you are already familiar with basic data manipulation in R.
One way to do what you want, is to tabulate the date vector and create a proper times series object or a data.frame
df <- as.data.frame(table(date)) ### tabulate
df$date <- as.Date(df$date, "%d.%m.%y") ### turn your date to Date class
df
## date Freq
## 1 2013-09-03 1
## 2 2013-09-04 1
## 3 2013-09-05 1
## 4 2013-09-07 2
## 5 2013-08-16 3
## 6 2013-08-17 1
## 7 2013-08-27 1
plot(Freq ~ date, data = df, pch = 19) ### plot
So far we are still missing the seasonal trend analysis the OP asked for. I think it is the more difficult part of the question.
If your data covers only 3 years, you can maybe observe the seasonal changes by simple looking at the monthly average daily visits.
Depending on your needs you can go with a simple monthly plot or you might have to prepare further your data to compute the exact trend in seasonality.
Below a suggestion on how to compute and plot the Monthly average number visits per day (with at least one visit per day)
library(ggplot2)
df<-read.table(text="
16.08.13
16.08.13
16.08.13
17.08.13
27.08.13
03.09.13
04.10.13
05.09.13
07.09.13
07.01.14
03.02.14
04.03.14
04.03.14
04.03.14
15.05.14
15.05.14
15.09.14
20.10.14
20.09.14 ", col.names="date")
df <- as.data.frame(table(df)) #get the frequency count (daily)
df$date <- as.Date(df$df, "%d.%m.%y") # turn your date variable to Date class
df$year<-sapply(df$df,function(x) strptime(x,"%d.%m.%Y")$year+1900) #extract month of the visit
df$month<-sapply(df$df,function(x) strptime(x,"%d.%m.%Y")$mon+1) #extract year of the visit
#plot daily frequency
ggplot(aes(x=date, y=Freq), data = df) +
geom_bar(stat = 'identity', position = 'dodge')+
ggtitle("Daily visits")
#compute monthly average visit per day (for days with at least one visit)
library(dplyr)
df2<-df[,c("year","month","Freq")]%>%
group_by(year,month) %>%
summarise_each(funs(mean=mean(., na.rm=TRUE)))
#recreate a date for the graph
df2$date<-as.Date(paste(rep("01",length(df2)),df2$month,df2$year),"%d %m %y")
ggplot(aes(x=date, y=Freq), data = df2) +
geom_bar(stat = 'identity', position = 'dodge')+
ggtitle("Average daily visits per month")

Selecting and plotting months in ggplot2

I have a time series dataset in this format with two columns date (e.g Jan 1980, Feb 1980...Dec 2013) and it's corresponding temperature. This dataset is from 1980 to 2013. I am trying to subset and plot time series in ggplot for the months separately (e.g I only want all Feb so that I can plot it using ggplot). Tried the following, but the Feb1 is empty
Feb1 <- subset(temp, date ==5)
The structure of my dataset is:
'data.frame': 408 obs. of 2 variables:
$ date :Class 'yearmon' num [1:359] 1980 1980 1980 1980 1980 ...
$ temp: int 16.9 12.7 13 6 6.0 5 6 10.9 0.9 16 ...
What about this?:
library(zoo)
# Generating some data:
df <- data.frame(date = as.yearmon("1980-01") + 0:407/12, val = rnorm(408))
# Subsetting to get a specific month:
df.sub <- subset(df, format(df$date,"%b")=="Jan")
# The actual plot:
ggplot(df.sub) + geom_line(aes(x = as.Date(date), y = val))
I believe your column being in a 'yearmon' class comes in the format "mm YY". I'm a little confused by how you are subsetting the data by 'date==5'. Below I try a method.
temp$month<-substr(temp$date,1,3)
Feb1<-subset(temp,month=='Feb')
#more elegant
Feb1<-subset(temp,substr(temp$date,1,3)=='Feb')
You can also directly plot the subset in ggplot2 without creating a new data frame.
Based on RStudent's solution:
library(zoo)
# Generating some data:
df <- data.frame(date = as.yearmon("1980-01") + 0:407/12, val = rnorm(408))
library(ggplot2)
ggplot(df[format(df$date,"%b")=="Jan", ], aes(x = as.Date(date), y = val))+
geom_line()
Convert the data to zoo, use cycle to split into months and autoplot.zoo to plot. Below we show four different ways to plot. First we plot just January. Then we plot all the months with each month in a separate panel and then we plot all months with each month as a separate series all in the same panel. Finally we use monthplot (not ggplot2) to plot them all in a single panel in a different manner.
library(zoo)
library(ggplot2)
# test data
set.seed(123)
temp <- data.frame(date = as.yearmon(1980 + 0:479/12), value = rnorm(480))
z <- read.zoo(temp, FUN = identity) # convert to zoo
# split into 12 series and cbind them together so zz480 is 480 x 12
# Then aggregate to zz which is 40 x 12
zz480 <- do.call(cbind, split(z, cycle(z)))
zz <- aggregate(zz480, as.numeric(trunc(time(zz480))), na.omit)
### now we plot this 4 different ways
#####################################
# 1. plot just January
autoplot(zz[, 1]) + ggtitle("Jan")
# 2. plot each in separate panel
autoplot(zz)
# 3. plot them all in a single panel
autoplot(zz, facet = NULL)
# 4. plot them all in a single panel in a different way (not using ggplot2)
monthplot(z)
Note that an alternative way to calculate zz would be:
zz <- zoo(matrix(coredata(z), 40, 12, byrow=TRUE), unique(as.numeric(trunc(time(z)))))
Update: Added plot types and improved the approach.

Resources