R: reshape and dcast confusion - r

Having worked on this for a few hours, and looking through all the dcast answers and tutorials I could find. I think I'd better ask instead.
Example data first:
xx = data.frame(SOIL = c(rep("kraz", 20), (rep("sodo", 20))),
DM = runif(0, 20,n=40),
cutnum = c(rep(1:4,10)))
Now, the required operation. I'd like to end up with a table with Soil names on the row, and Cutnum as column names, with the DM values in the columns under the cutnumbers.
# Soil 1 2 3 4
# kraz 1.2 19 12.1 9.9
# kraz 15.3 4.5 9.2 12.1
# kraz 14 15.2 5.2 15.4
# kraz 18.5 0.7 14.3 5
# kraz 17.1 15.8 2.9 9.5
# kraz 13 14.4 4.9 8.6
# kraz 3 10.2 3.5 14
# kraz 17.7 8.6 10.6 16.1
# kraz 12.6 1.7 2.2 17.5
# kraz 3.8 16.7 4.8 0.4
# kraz 4.1 17.1 12.5 14.5
# kraz 17.8 5.2 11.2 9.5
# kraz 12.3 2.2 4.8 8.7
# kraz 7.3 3 10.2 1.6
# kraz 11.3 12.2 13.4 10.2
# kraz 7.5 15.9 8.9 18.3
# kraz 15 5 19.6 16.5
# sodo 8.4 2.6 18.3 15.1
# sodo 6.9 19.7 6.5 8.4
# sodo 4 6.5 4.2 11.9
# sodo 0.8 12 18.3 15.4
# sodo 7.2 11.9 6.7 4.7
# sodo 2.6 4.4 13.8 13.7
# sodo 11.3 16.4 12.3 9.6
# sodo 5.6 17.1 11.4 16.7
# sodo 10.4 4.7 5.7 10.6
# sodo 8.7 5.6 1.1 4.8
# sodo 19.2 14.8 7 7
# sodo 18.6 9 14.9 5
# sodo 4.3 2.4 0.3 11.1
# sodo 4.9 18.4 19.5 9.7
# sodo 18.8 3.3 15.9 12.7
# sodo 19.7 0.1 13.6 3
# sodo 11.3 11.1 6.6 9.5
# sodo 8.1 11.3 10.1 3.5
# sodo 14.1 13.5 0.5 17.2
# sodo 16.8 15.6 16.2 17.3
I've tried:
require(reshape2)
dcast(xx, formula = SOIL ~ Cutnum, value.var = xx$DM)
Which produces the following:
Error: value.var (18.943128376267911.662011714652217.372458190657214.7615862498069.991016136482364.527641483582569.107870771549641.0582387680187810.695438273251115.275471545755917.17561007011680.9180781804025171.6045100009068812.556012449786118.57340626884257.465867823921144.576288489624868.3055530954152315.88032039348041.3668353855609916.888091783039310.544018512591714.95763068087410.46029578894380.95387450419366410.41133180726329.8472668603062618.449066961184111.24195748940114.0428098617121613.89849389437598.8408243283629416.336669707670818.53340925183156.113082133233555.875797253102060.06016504485160119.295095484703784.955938421189798.97169086616486) not found in input
In addition: Warning message:
In if (!(value.var %in% names(data))) { :
the condition has length > 1 and only the first element will be used
I'd greatly appreciate a suggestion that will get a more useful result.

Based on the comments, it seems OP requires a wide format ordered by the row with no function aggregation.
This should do it,
set.seed(123)
xx = data.frame(SOIL = c(rep("kraz", 20), (rep("sodo", 20))),
DM = runif(0, 20,n=40),
cutnum = c(rep(1:4,10)))
require(reshape2)
xx$t <- rep(1:10, each=4) # Add column to identify subset
dcast(xx, SOIL+t~cutnum, value.var="DM")[, -2] # Remove new column

Related

How can I export THESE 2 matrices in one excel file with proper spacing and formatting in Rstudio

Truck location coordinates>
X[Now] Y[Now]
A 5.4 15.4
B 8.3 9.0
C 6.6 5.2
D 6.5 13.5
E 15.0 1.9
Load location coordinates> print(Bcd)
Pick-up-X Pick-up-Y Drop-off-X Drop-off-Y
1 18.3 0.5 4.0 13.9
2 11.1 0.1 17.1 18.9
3 20.0 8.9 18.4 7.4
4 4.4 18.2 8.6 15.0
5 12.7 2.9 4.0 0.7
6 5.2 10.7 16.9 18.9
7 18.5 19.0 4.8 9.5
8 8.2 17.3 0.6 4.6
9 11.5 0.5 3.4 11.4
10 2.1 11.3 11.4 0.1

How to merge a particular row of a group of rows with the same reference #, with a df that contains a single row with the matching reference #

I'm still very new at stack overflow so please let me know if there is a better way to include data or other formatting issues with my question. Thanks!
I have 2 data frames. One contains a single row of data that i need which has a unique reference number.
I need to merge the Ph and Dissolved02 from the first data frame into the one with latitude and longitude. But I only want to the values from the last row of each unique reference number, or in other words the deepest pH and Dissolved02 values. The final data frame will only have one occurrence of each reference number. A sample of each data frame can be created with the following code (maybe a much easier way to input data into stack overflow?)...
sample.df <- readLines(textConnection("BBM2008050101 0.2 B 24.8 52.1 8.2 34.3 6.1
BBM2008050101 1.0 B 24.8 52.4 8.2 34.5 6.1
BBM2008050101 1.4 B 24.8 52.4 8.2 34.5 6.1
BBM2008050102 0.2 B 24.5 53.0 8.1 35.0 6.3
BBM2008050102 1.0 B 24.5 53.0 8.1 34.9 6.0
BBM2008050102 1.6 B 24.5 53.0 8.1 35.0 5.9
BBM2008050103 0.2 B 24.9 51.1 8.2 33.5 6.1
BBM2008050103 1.0 B 24.9 51.1 8.2 33.5 6.1
BBM2008050103 1.6 B 24.9 51.1 8.2 33.5 6.1
BBM2008050104 0.2 B 25.1 51.4 8.2 33.8 6.7
BBM2008050104 1.0 B 25.1 51.4 8.2 33.8 6.5
BBM2008050104 1.6 B 25.1 51.4 8.2 33.8 6.5
BBM2008050105 0.2 B 24.9 51.9 8.1 34.1 7.7
BBM2008050105 1.0 B 24.9 51.9 8.2 34.1 7.9
BBM2008050106 0.2 B 25.4 51.1 8.3 33.5 7.0
BBM2008050106 1.0 B 25.4 51.1 8.3 33.5 6.5
BBM2008050106 2.0 B 25.4 51.1 8.3 33.5 6.5
BBM2008050106 2.3 B 25.4 51.1 8.3 33.5 6.4 "))
sample.df <- strsplit(sample.df,"[[:space:]]+")
max.len <- max(sapply(sample.df, length))
corrected.list <- lapply(sample.df, function(x) {c(x, rep(NA, max.len - length(x)))})
df <- do.call(rbind, corrected.list)
colnames(df) <- c("Reference", "Depth", "Beg_end", "Temperature", "Conductivity", "pH", "Salinity", "DissolvedO2")
df <- as.data.frame(df)
sample.df2 <- readLines(textConnection("BBM2008050101 301 -83.44165 29.637633 1.6 D
BBM2008050102 301 -83.439717 29.630233 1.8 D
BBM2008050103 301 -83.434017 29.605567 1.8 D
BBM2008050104 301 -83.440067 29.596267 1.8 D
BBM2008050105 301 -83.4346 29.592667 1.2 D
BBM2008050106 300 -83.44555 29.596917 2.5 D"))
sample.df2 <- strsplit(sample.df2,"[[:space:]]+")
max.len2 <- max(sapply(sample.df2, length))
corrected.list2 <- lapply(sample.df2, function(x) {c(x, rep(NA, max.len2 - length(x)))})
df2 <- do.call(rbind, corrected.list2)
colnames(df2) <- c("Reference", "Gear", "Longitude", "Latitude", "StartDepth", "Zone")
df2 <- as.data.frame(df2)
Output would be sample.df3 with the deepest Ph and Dissolved02 columns added. Like below but obviously my data frame is much larger and I cannot do this manually.
sample.df3 <- readLines(textConnection("BBM2008050101 301 -83.44165 29.637633 1.6 D 8.2 6.1
BBM2008050102 301 -83.439717 29.630233 1.8 D 8.1 5.9
BBM2008050103 301 -83.434017 29.605567 1.8 D 8.2 6.1
BBM2008050104 301 -83.440067 29.596267 1.8 D 8.2 6.5
BBM2008050105 301 -83.4346 29.592667 1.2 D 8.2 7.9
BBM2008050106 300 -83.44555 29.596917 2.5 D 8.3 6.4"))
sample.df3 <- strsplit(sample.df3,"[[:space:]]+")
max.len3 <- max(sapply(sample.df3, length))
corrected.list3 <- lapply(sample.df3, function(x) {c(x, rep(NA, max.len3 - length(x)))})
df3 <- do.call(rbind, corrected.list3)
colnames(df3) <- c("Reference", "Gear", "Longitude", "Latitude", "StartDepth", "Zone", "pH", "Dissolved02")
df3 <- as.data.frame(df3)
The below uses dplyr's group_by and summarise to get the last row where a Reference occurs, then filters DF1 on the last rows for each Reference and finally merges in all columns from DF2
library(dplyr)
df$id <- c(1:nrow(df)) # Create ID Column to store row number
# Create a smaller df with just the references and the max row number (which should equal the last occurance)
df1_last_references <- df %>%
group_by(Reference) %>%
summarise(id = max(id))
# Filter Original DF on the row numbers matching from the last references
df <- df[which(df$id %in% df1_last_references$id), ]
# merge in the columns from DF2
df3 <- merge(df, df2, all.x = T, by = 'Reference')
head(df3)
Reference Gear Longitude Latitude StartDepth Zone pH Dissolved02
1 BBM2008050101 301 -83.44165 29.637633 1.6 D 8.2 6.1
2 BBM2008050102 301 -83.439717 29.630233 1.8 D 8.1 5.9
3 BBM2008050103 301 -83.434017 29.605567 1.8 D 8.2 6.1
4 BBM2008050104 301 -83.440067 29.596267 1.8 D 8.2 6.5
5 BBM2008050105 301 -83.4346 29.592667 1.2 D 8.2 7.9
6 BBM2008050106 300 -83.44555 29.596917 2.5 D 8.3 6.4
An option using data.table:
DT2[, c("pH", "Dissolved02") :=
DT1[.SD, on=.(Reference), mult="last", .(pH, DissolvedO2)]
]
output (DT2):
Reference Gear Longitude Latitude StartDepth Zone pH Dissolved02
1: BBM2008050101 301 -83.44165 29.63763 1.6 D 8.2 6.1
2: BBM2008050102 301 -83.43972 29.63023 1.8 D 8.1 5.9
3: BBM2008050103 301 -83.43402 29.60557 1.8 D 8.2 6.1
4: BBM2008050104 301 -83.44007 29.59627 1.8 D 8.2 6.5
5: BBM2008050105 301 -83.43460 29.59267 1.2 D 8.2 7.9
6: BBM2008050106 300 -83.44555 29.59692 2.5 D 8.3 6.4
data:
library(data.table)
DT1 <- fread("Reference Depth Beg_end Temperature Conductivity pH Salinity DissolvedO2
BBM2008050101 0.2 B 24.8 52.1 8.2 34.3 6.1
BBM2008050101 1.0 B 24.8 52.4 8.2 34.5 6.1
BBM2008050101 1.4 B 24.8 52.4 8.2 34.5 6.1
BBM2008050102 0.2 B 24.5 53.0 8.1 35.0 6.3
BBM2008050102 1.0 B 24.5 53.0 8.1 34.9 6.0
BBM2008050102 1.6 B 24.5 53.0 8.1 35.0 5.9
BBM2008050103 0.2 B 24.9 51.1 8.2 33.5 6.1
BBM2008050103 1.0 B 24.9 51.1 8.2 33.5 6.1
BBM2008050103 1.6 B 24.9 51.1 8.2 33.5 6.1
BBM2008050104 0.2 B 25.1 51.4 8.2 33.8 6.7
BBM2008050104 1.0 B 25.1 51.4 8.2 33.8 6.5
BBM2008050104 1.6 B 25.1 51.4 8.2 33.8 6.5
BBM2008050105 0.2 B 24.9 51.9 8.1 34.1 7.7
BBM2008050105 1.0 B 24.9 51.9 8.2 34.1 7.9
BBM2008050106 0.2 B 25.4 51.1 8.3 33.5 7.0
BBM2008050106 1.0 B 25.4 51.1 8.3 33.5 6.5
BBM2008050106 2.0 B 25.4 51.1 8.3 33.5 6.5
BBM2008050106 2.3 B 25.4 51.1 8.3 33.5 6.4")
DT2 <- fread("Reference Gear Longitude Latitude StartDepth Zone
BBM2008050101 301 -83.44165 29.637633 1.6 D
BBM2008050102 301 -83.439717 29.630233 1.8 D
BBM2008050103 301 -83.434017 29.605567 1.8 D
BBM2008050104 301 -83.440067 29.596267 1.8 D
BBM2008050105 301 -83.4346 29.592667 1.2 D
BBM2008050106 300 -83.44555 29.596917 2.5 D")

estimate phase and amplitude of a seasonal cycle

I have the following data:
CET <- url("http://www.metoffice.gov.uk/hadobs/hadcet/cetml1659on.dat")
cet <- read.table(CET, sep = "", skip = 6, header = TRUE,
fill = TRUE, na.string = c(-99.99, -99.9))
names(cet) <- c(month.abb, "Annual")
cet <- cet[-nrow(cet), ]
rn <- as.numeric(rownames(cet))
Years <- rn[1]:rn[length(rn)]
annCET <- data.frame(Temperature = cet[, ncol(cet)],Year = Years)
cet <- cet[, -ncol(cet)]
cet <- stack(cet)[,2:1]
names(cet) <- c("Month","Temperature")
cet <- transform(cet, Year = (Year <- rep(Years, times = 12)),
nMonth = rep(1:12, each = length(Years)),
Date = as.Date(paste(Year, Month, "15", sep = "-"),format = "%Y-%b-%d"))
cet <- cet[with(cet, order(Date)), ]
idx <- cet$Year > 1900
cet <- cet[idx,]
cet <- cet[,c('Date','Temperature')]
plot(cet, type = 'l')
This demonstrates the monthly temperature cycle from 1900 to 2014 in England, UK.
I would like to evaluate the phase and amplitude of the seasonal cycle of temperature follwowing the methods outlined in this paper. Specifically, they describe that given 12 monthly values (as we have here) we can estimate the yearly component as:
where X(t) represents 12 monthly values of surface temperature, x(t+t0), t = 0.5,...,11.5, are 12 monthly values of the de-meaned monthly temperature, where the factor of two is to account for both positive and negative frequencies.
Then the amplitude and phase of the seasonal cycle can be calculated as
and
They specify, that each year of data, they calculate the yearly (one cycle per year) sinusoidal component using the Fourier transform, as the equation shown above.
I'm a bit stuck on how to generate the time series they demonstrate here. Can anyone please provide some guidance as to how I can reproduce these methods. Note, I also work in matlab - in case anyone has some suggestions as to how this would be achieved in that environment.
Here is a subset of the data.
Date Temperature
1980-01-15 2.3
1980-02-15 5.7
1980-03-15 4.7
1980-04-15 8.8
1980-05-15 11.2
1980-06-15 13.8
1980-07-15 14.7
1980-08-15 15.9
1980-09-15 14.7
1980-10-15 9
1980-11-15 6.6
1980-12-15 5.6
1981-01-15 4.9
1981-02-15 3
1981-03-15 7.9
1981-04-15 7.8
1981-05-15 11.2
1981-06-15 13.2
1981-07-15 15.5
1981-08-15 16.2
1981-09-15 14.5
1981-10-15 8.6
1981-11-15 7.8
1981-12-15 0.3
1982-01-15 2.6
1982-02-15 4.8
1982-03-15 6.1
1982-04-15 8.6
1982-05-15 11.6
1982-06-15 15.5
1982-07-15 16.5
1982-08-15 15.7
1982-09-15 14.2
1982-10-15 10.1
1982-11-15 8
1982-12-15 4.4
1983-01-15 6.7
1983-02-15 1.7
1983-03-15 6.4
1983-04-15 6.8
1983-05-15 10.3
1983-06-15 14.4
1983-07-15 19.5
1983-08-15 17.3
1983-09-15 13.7
1983-10-15 10.5
1983-11-15 7.5
1983-12-15 5.6
1984-01-15 3.8
1984-02-15 3.3
1984-03-15 4.7
1984-04-15 8.1
1984-05-15 9.9
1984-06-15 14.5
1984-07-15 16.9
1984-08-15 17.6
1984-09-15 13.7
1984-10-15 11.1
1984-11-15 8
1984-12-15 5.2
1985-01-15 0.8
1985-02-15 2.1
1985-03-15 4.7
1985-04-15 8.3
1985-05-15 10.9
1985-06-15 12.7
1985-07-15 16.2
1985-08-15 14.6
1985-09-15 14.6
1985-10-15 11
1985-11-15 4.1
1985-12-15 6.3
1986-01-15 3.5
1986-02-15 -1.1
1986-03-15 4.9
1986-04-15 5.8
1986-05-15 11.1
1986-06-15 14.8
1986-07-15 15.9
1986-08-15 13.7
1986-09-15 11.3
1986-10-15 11
1986-11-15 7.8
1986-12-15 6.2
1987-01-15 0.8
1987-02-15 3.6
1987-03-15 4.1
1987-04-15 10.3
1987-05-15 10.1
1987-06-15 12.8
1987-07-15 15.9
1987-08-15 15.6
1987-09-15 13.6
1987-10-15 9.7
1987-11-15 6.5
1987-12-15 5.6
1988-01-15 5.3
1988-02-15 4.9
1988-03-15 6.4
1988-04-15 8.2
1988-05-15 11.9
1988-06-15 14.4
1988-07-15 14.7
1988-08-15 15.2
1988-09-15 13.2
1988-10-15 10.4
1988-11-15 5.2
1988-12-15 7.5
1989-01-15 6.1
1989-02-15 5.9
1989-03-15 7.5
1989-04-15 6.6
1989-05-15 13
1989-06-15 14.6
1989-07-15 18.2
1989-08-15 16.6
1989-09-15 14.7
1989-10-15 11.7
1989-11-15 6.2
1989-12-15 4.9
1990-01-15 6.5
1990-02-15 7.3
1990-03-15 8.3
1990-04-15 8
1990-05-15 12.6
1990-06-15 13.6
1990-07-15 16.9
1990-08-15 18
1990-09-15 13.2
1990-10-15 11.9
1990-11-15 6.9
1990-12-15 4.3
1991-01-15 3.3
1991-02-15 1.5
1991-03-15 7.9
1991-04-15 7.9
1991-05-15 10.8
1991-06-15 12.1
1991-07-15 17.3
1991-08-15 17.1
1991-09-15 14.7
1991-10-15 10.2
1991-11-15 6.8
1991-12-15 4.7
1992-01-15 3.7
1992-02-15 5.4
1992-03-15 7.5
1992-04-15 8.7
1992-05-15 13.6
1992-06-15 15.7
1992-07-15 16.2
1992-08-15 15.3
1992-09-15 13.4
1992-10-15 7.8
1992-11-15 7.4
1992-12-15 3.6
1993-01-15 5.9
1993-02-15 4.6
1993-03-15 6.7
1993-04-15 9.5
1993-05-15 11.4
1993-06-15 15
1993-07-15 15.2
1993-08-15 14.6
1993-09-15 12.4
1993-10-15 8.5
1993-11-15 4.6
1993-12-15 5.5
1994-01-15 5.3
1994-02-15 3.2
1994-03-15 7.7
1994-04-15 8.1
1994-05-15 10.7
1994-06-15 14.5
1994-07-15 18
1994-08-15 16
1994-09-15 12.7
1994-10-15 10.2
1994-11-15 10.1
1994-12-15 6.4
1995-01-15 4.8
1995-02-15 6.5
1995-03-15 5.6
1995-04-15 9.1
1995-05-15 11.6
1995-06-15 14.3
1995-07-15 18.6
1995-08-15 19.2
1995-09-15 13.7
1995-10-15 12.9
1995-11-15 7.7
1995-12-15 2.3
1996-01-15 4.3
1996-02-15 2.5
1996-03-15 4.5
1996-04-15 8.5
1996-05-15 9.1
1996-06-15 14.4
1996-07-15 16.5
1996-08-15 16.5
1996-09-15 13.6
1996-10-15 11.7
1996-11-15 5.9
1996-12-15 2.9
1997-01-15 2.5
1997-02-15 6.7
1997-03-15 8.4
1997-04-15 9
1997-05-15 11.5
1997-06-15 14.1
1997-07-15 16.7
1997-08-15 18.9
1997-09-15 14.2
1997-10-15 10.2
1997-11-15 8.4
1997-12-15 5.8
1998-01-15 5.2
1998-02-15 7.3
1998-03-15 7.9
1998-04-15 7.7
1998-05-15 13.1
1998-06-15 14.2
1998-07-15 15.5
1998-08-15 15.9
1998-09-15 14.9
1998-10-15 10.6
1998-11-15 6.2
1998-12-15 5.5
1999-01-15 5.5
1999-02-15 5.3
1999-03-15 7.4
1999-04-15 9.4
1999-05-15 12.9
1999-06-15 13.9
1999-07-15 17.7
1999-08-15 16.1
1999-09-15 15.6
1999-10-15 10.7
1999-11-15 7.9
1999-12-15 5
2000-01-15 4.9
2000-02-15 6.3
2000-03-15 7.6
2000-04-15 7.8
2000-05-15 12.1
2000-06-15 15.1
2000-07-15 15.5
2000-08-15 16.6
2000-09-15 14.7
2000-10-15 10.3
2000-11-15 7
2000-12-15 5.8
2001-01-15 3.2
2001-02-15 4.4
2001-03-15 5.2
2001-04-15 7.7
2001-05-15 12.6
2001-06-15 14.3
2001-07-15 17.2
2001-08-15 16.8
2001-09-15 13.4
2001-10-15 13.3
2001-11-15 7.5
2001-12-15 3.6
2002-01-15 5.5
2002-02-15 7
2002-03-15 7.6
2002-04-15 9.3
2002-05-15 11.8
2002-06-15 14.4
2002-07-15 16
2002-08-15 17
2002-09-15 14.4
2002-10-15 10.1
2002-11-15 8.5
2002-12-15 5.7
2003-01-15 4.5
2003-02-15 3.9
2003-03-15 7.5
2003-04-15 9.6
2003-05-15 12.1
2003-06-15 16.1
2003-07-15 17.6
2003-08-15 18.3
2003-09-15 14.3
2003-10-15 9.2
2003-11-15 8.1
2003-12-15 4.8
2004-01-15 5.2
2004-02-15 5.4
2004-03-15 6.5
2004-04-15 9.4
2004-05-15 12.1
2004-06-15 15.3
2004-07-15 15.8
2004-08-15 17.6
2004-09-15 14.9
2004-10-15 10.5
2004-11-15 7.7
2004-12-15 5.4
2005-01-15 6
2005-02-15 4.3
2005-03-15 7.2
2005-04-15 8.9
2005-05-15 11.4
2005-06-15 15.5
2005-07-15 16.9
2005-08-15 16.2
2005-09-15 15.2
2005-10-15 13.1
2005-11-15 6.2
2005-12-15 4.4
2006-01-15 4.3
2006-02-15 3.7
2006-03-15 4.9
2006-04-15 8.6
2006-05-15 12.3
2006-06-15 15.9
2006-07-15 19.7
2006-08-15 16.1
2006-09-15 16.8
2006-10-15 13
2006-11-15 8.1
2006-12-15 6.5
2007-01-15 7
2007-02-15 5.8
2007-03-15 7.2
2007-04-15 11.2
2007-05-15 11.9
2007-06-15 15.1
2007-07-15 15.2
2007-08-15 15.4
2007-09-15 13.8
2007-10-15 10.9
2007-11-15 7.3
2007-12-15 4.9
2008-01-15 6.6
2008-02-15 5.4
2008-03-15 6.1
2008-04-15 7.9
2008-05-15 13.4
2008-06-15 13.9
2008-07-15 16.2
2008-08-15 16.2
2008-09-15 13.5
2008-10-15 9.7
2008-11-15 7
2008-12-15 3.5
2009-01-15 3
2009-02-15 4.1
2009-03-15 7
2009-04-15 10
2009-05-15 12.1
2009-06-15 14.8
2009-07-15 16.1
2009-08-15 16.6
2009-09-15 14.2
2009-10-15 11.6
2009-11-15 8.7
2009-12-15 3.1
2010-01-15 1.4
2010-02-15 2.8
2010-03-15 6.1
2010-04-15 8.8
2010-05-15 10.7
2010-06-15 15.2
2010-07-15 17.1
2010-08-15 15.3
2010-09-15 13.8
2010-10-15 10.3
2010-11-15 5.2
2010-12-15 -0.7
2011-01-15 3.7
2011-02-15 6.4
2011-03-15 6.7
2011-04-15 11.8
2011-05-15 12.2
2011-06-15 13.8
2011-07-15 15.2
2011-08-15 15.4
2011-09-15 15.1
2011-10-15 12.6
2011-11-15 9.6
2011-12-15 6
2012-01-15 5.4
2012-02-15 3.8
2012-03-15 8.3
2012-04-15 7.2
2012-05-15 11.7
2012-06-15 13.5
2012-07-15 15.5
2012-08-15 16.6
2012-09-15 13
2012-10-15 9.7
2012-11-15 6.8
2012-12-15 4.8
2013-01-15 3.5
2013-02-15 3.2
2013-03-15 2.7
2013-04-15 7.5
2013-05-15 10.4
2013-06-15 13.6
2013-07-15 18.3
2013-08-15 16.9
2013-09-15 13.7
2013-10-15 12.5
2013-11-15 6.2
2013-12-15 6.3
2014-01-15 5.7
2014-02-15 6.2
2014-03-15 7.6
2014-04-15 10.2
2014-05-15 12.2
2014-06-15 15.1
2014-07-15 17.7
2014-08-15 14.9
2014-09-15 15.1
2014-10-15 12.5
2014-11-15 8.6
2014-12-15 5.2
Literally, the formula for Y can be represented in MATLAB as:
t=0.5:0.5:11.5; %//make sure the step size is indeed 0.5
Y = 1/6.*sum(exp(2*pi*i.*t/12).*X(t0-t); %// add the function for X
phi = atan2(imag(Y)/real(Y)); %// seasonal phase
without knowing the function for X I can't be sure this can indeed be vectorised, or whether you'd have to loop, which can be done like:
t=0.5:0.5:11.5; %//make sure the step size is indeed 0.5
Ytmp(numel(t),1)=0; %// initialise output
for ii = 1:numel(t)
Ytmp(ii,1) = exp(2*pi*i.*t(ii)/12).*X(t0-t(ii));
end
Y = 1/6 * sum(Ytmp)
Just slot in any t0 you want, loop over the codes above and you have your time series.

Subsetting and Looping a Time Series Data in R

I have a dataset of timeseries (30 years). I did a subset for the month and the date I want (shown below in the code). Is there a way to do a loop for each month and the days in those month? Also, is there a way to save the plots automatically, in different folders corresponding to each month? Right now I am doing it manually by changing the month and date which corresponds to dfOct31all <- df [ which(df$Month==10 & df$Day==31), ]in the code below then plotting and saving it. By the way, I'm using RStudio.
Can someone please guide me?
Thanks!
setwd("WDir")
df <- read.csv("Velocity.csv", header = TRUE)
attach(df)
#Day 31
dfOct31all <- df [ which(df$Month==10 & df$Day==31), ]
dfall31Mbs <- dfOct31all[c(-1,-2,-3)]
densities <- lapply(dfall31Mbs, density)
par(mfcol=c(5,5), oma=c(1,1,0,0), mar=c(1,1,1,0), tcl=-0.1, mgp=c(0,0,0))
plot(densities[[1]], col="black",main = "1000mb",xlab=NA,ylab=NA)
plot(densities[[2]], col="black",main="925mb",xlab=NA,ylab=NA)
plot(densities[[3]], col="black",main="850mb",xlab=NA,ylab=NA)
plot(densities[[4]], col="black",main="700mb",xlab=NA,ylab=NA)
plot(densities[[5]], col="black",main="600mb",xlab=NA,ylab=NA)
plot(densities[[6]], col="black",main="500mb",xlab=NA,ylab=NA)
plot(densities[[7]], col ="black",main="400mb",xlab=NA,ylab=NA)
plot(densities[[8]], col="black",main="300mb",xlab=NA,ylab=NA)
plot(densities[[9]], col="black",main="250mb",xlab=NA,ylab=NA)
plot(densities[[10]], col="black",main="200mb",xlab=NA,ylab=NA)
plot(densities[[11]], col= "black",main="150mb",xlab=NA,ylab=NA)
plot(densities[[12]], col= "black",main="100mb",xlab=NA,ylab=NA)
plot(densities[[13]], col = "black",main="70mb",xlab=NA,ylab=NA)
plot(densities[[14]], col="black",main="50mb",xlab=NA,ylab=NA)
plot(densities[[15]], col="black",main="30mb",xlab=NA,ylab=NA)
plot(densities[[16]], col = "black",main="20mb",xlab=NA,ylab=NA)
plot(densities[[17]], col="black",main="10mb",xlab=NA,ylab=NA)
Snippet of data is shown as well
Year Month Day 1000mb 925mb 850mb 700mb 600mb 500mb 400mb 300mb 250mb 200mb 150mb 100mb 70mb 50mb 30mb 20mb 10mb
1984 10 31 6 6.6 7.9 11.5 14.6 17 20.8 25.8 26.4 25.3 24.4 22.7 19.9 19.2 20.4 24.8 30.8
1985 10 31 5.8 7.1 7.7 11.5 14.7 17.3 25.3 32.6 32.9 32.4 27.1 20.9 14.2 9.7 6.4 7.3 7.4
1986 10 31 4.3 6.1 7.7 11.3 18.4 26.3 34.4 44.5 48.9 46.2 34.5 20.4 13.8 13.2 21.7 31 46.4
1987 10 31 2.2 2.9 4 7 9 13.9 19.9 25.8 26.6 23.7 17.3 12 7 3.1 1.7 5.8 14.1
1988 10 31 2.5 2.1 2.3 6.5 6.4 5.1 7.4 12.1 13.4 16.1 16.7 15.2 8.8 5 2.8 6.2 8.9
1989 10 31 3.4 4 4.7 4.4 4.1 4 4.6 4.8 5.9 5.6 10.9 13.9 12.3 10.4 8.1 8 8
1990 10 31 4 4.9 7.5 14.6 19 21.9 25.7 28.3 29.4 29.2 27.3 18 12.6 10.1 9 12 19.9
1991 10 31 2.8 3.2 4 10.8 12.1 11.2 9.9 9.1 9.9 12.8 18 17.5 10.4 6.3 4.2 7.6 11.7
1992 10 31 5.9 6.9 7.9 13.1 17.9 25.2 34.6 47.3 53.3 53 42.4 21.3 11.6 6 4.6 8.5 12.8
1993 10 31 2.3 1.5 0.4 3.6 6.3 10.1 14.3 19.1 21.6 21.8 18.4 13.6 12.3 9.5 6.9 11 18.1
1994 10 31 2 2.2 3.8 11.6 17 19.8 23.6 24.9 25.5 26.2 28.4 25.2 16.7 13.6 9.3 8.3 9.8
1995 10 31 1.5 2 3.4 7.6 9.1 11.2 13.7 17.9 20.3 21.7 21.1 16.7 13 12.1 14.9 21.4 27.3
1996 10 31 1.9 2.4 3.5 8 11.7 17.4 26.4 35.6 33.3 24.6 12.4 4.1 0.5 3.4 7.2 9.4 11.6
1997 10 31 3.7 4.8 7.8 19.2 24.6 29.6 35.6 41 41.8 42 37.9 23.7 11.2 8.6 4.2 3.8 7
1998 10 31 0.7 1.1 0.9 4.8 8.4 11.4 14 25.3 29.7 25.2 15.9 6.6 2.1 1 4.5 8.9 6.1
1999 10 31 1.9 1.6 2.4 10.7 15.3 19 23.2 29 32.4 31.9 28 20.3 10.8 9.4 12 14.5 16.9
2000 10 31 5.1 5.8 6.7 12.8 18.2 23.9 29.9 40.7 42.2 33.7 23.5 12.7 2.6 1.6 3.8 4.7 5.1
2001 10 31 5.7 6.1 7.1 10.1 10.8 14.7 18.3 22.8 22.3 22.2 22 14 9.5 6.6 5.2 6.5 8.6
2002 10 31 1.4 1.6 1.8 9.2 14.5 19.5 24.8 30 30.5 27.6 22.2 13.9 9.1 7.1 8.5 16.1 23.8
2003 10 31 1.5 1.3 0.7 1 3.5 6 11.7 21.5 21.9 22.9 23 20.7 15.8 12.5 14.5 20.1 26
2004 10 31 5.4 5.6 6.9 14.4 23.3 33.3 46.1 60.9 62.1 54.6 42.9 28 17.3 12.3 10.1 13.6 13.3
2005 10 31 1.7 1.3 3 10.3 15.8 19.5 21.1 22.8 24.1 24.5 24.5 20.6 13.5 10.7 10 10.7 10.4
2006 10 31 2.3 1.5 1.7 8.7 12.5 15.9 18.7 20.5 21.8 24.3 29.9 25.3 18.3 12.8 7.7 8.8 12.4
2007 10 31 3.7 2.7 2.3 2.2 2.6 4.2 6.5 11.9 15.9 19.6 17.2 9.5 6.9 5.7 4.9 5.8 11.7
2008 10 31 7.7 10.8 14.3 20.3 23 25.8 27.4 32.1 35.4 34.8 25.8 13.2 7.1 2.9 2.6 3.4 6
2009 10 31 0.5 0.2 2 9.3 13.5 17.6 18.8 20.8 21.4 21.2 18.9 14.2 11.1 6.4 1.9 3 8
2010 10 31 5.6 6.8 8.5 13.4 16.5 20.3 23.8 26.8 31 28.1 24 15.7 9.9 7 4.8 3.9 1.8
2011 10 31 5.9 6.7 5.6 7.9 10.3 11.8 12.5 16.2 19.5 21.4 17.9 13.2 9.6 7.9 8 8.3 10.8
2012 10 31 4.8 6.3 9.4 19.5 24.2 27.2 27.5 27.3 27.7 30.7 27.5 16.7 10 7.6 8 13.8 19.7
2013 10 31 1.4 1.9 3.9 9.1 13.1 17.3 22.9 29.7 30.4 27.3 23.5 18.2 13.1 6.3 4.4 2.4 9.4
I wrote it out for each day rather than doing a loop.

Floating barcharts

I want to make bar charts where the bar minimum can be specified (much like the box in a box and whisker plot). Can barplot do that? I suspect the answer's in ggplot, but I can't find an example.
Here's some data:
X Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 Highest recorded 31.5 31.8 30.3 28.0 24.9 24.4 21.7 20.9 24.5 25.4 26.0 28.7
2 Mean monthly maximum 27.8 28.6 27.0 24.8 22.0 20.0 18.9 18.8 20.4 22.4 23.9 26.8
3 Mean daily maximum 24.2 24.8 23.1 20.9 18.4 16.3 15.5 15.7 16.9 18.3 20.0 22.4
4 Mean 19.1 19.8 18.1 16.2 13.8 11.9 11.2 11.6 12.7 14.1 15.7 17.7
5 Mean daily minimum 14.0 14.7 13.1 11.4 9.2 7.5 6.9 7.4 8.4 10.0 11.4 13.0
6 Mean monthly minimum 7.6 9.1 6.8 3.8 2.3 -0.5 -0.2 1.0 2.3 3.7 5.3 6.7
7 Lowest recorded 4.0 5.6 4.1 -1.3 0.0 -3.1 -2.6 -1.4 -0.8 2.0 2.7 4.1
xaxis =c("J" ,"F" ,"M" ,"A" ,"M" ,"J","J","A", "S", "O","N","D")
So ideally, I end up with a stacked bar for each month, that starts at the 'Lowest recorded' value, rather than at zero.
I've also had a try with superbarplot from the UsingR package. I can get the bars to start where I want, but can't move the x axis down out of the centre of the plot. Thanks in advance.
You can use geom_boxplot in ggplot2 to get what (I think) you want specifying the precomputed values and stat = 'identity' and use geom_crossbar to put in the other
# first, your data
weather <- read.table(text = 'X Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 "Highest recorded" 31.5 31.8 30.3 28.0 24.9 24.4 21.7 20.9 24.5 25.4 26.0 28.7
2 "Mean monthly maximum" 27.8 28.6 27.0 24.8 22.0 20.0 18.9 18.8 20.4 22.4 23.9 26.8
3 "Mean daily maximum" 24.2 24.8 23.1 20.9 18.4 16.3 15.5 15.7 16.9 18.3 20.0 22.4
4 "Mean" 19.1 19.8 18.1 16.2 13.8 11.9 11.2 11.6 12.7 14.1 15.7 17.7
5 "Mean daily minimum" 14.0 14.7 13.1 11.4 9.2 7.5 6.9 7.4 8.4 10.0 11.4 13.0
6 "Mean monthly minimum" 7.6 9.1 6.8 3.8 2.3 -0.5 -0.2 1.0 2.3 3.7 5.3 6.7
7 "Lowest recorded" 4.0 5.6 4.1 -1.3 0.0 -3.1 -2.6 -1.4 -0.8 2.0 2.7 4.1', header =T)
library(reshape2)
library(ggplot2)
# reshape to wide format (basically transposing the data.frame)
w <- dcast(melt(weather), variable~X)
ggplot(w, aes(x=variable,ymin = `Lowest recorded`,
ymax = `Highest recorded`, lower = `Lowest recorded`,
upper = `Highest recorded`, middle = `Mean daily maximum`)) +
geom_boxplot(stat = 'identity') +
xlab('month') +
ylab('Temperature') +
geom_crossbar(aes(y = `Mean monthly maximum` ))+
geom_crossbar(aes(y = `Mean monthly minimum`)) +
geom_crossbar(aes(y = `Mean daily maximum` ))+
geom_crossbar(aes(y = `Mean daily minimum`))
This is partially described in an example in the help for geom_boxplot

Resources