Read a time series table using read.zoo - r

I've looked all over the place, but I can't find where this question has been asked before.
What is a clean way to get this data into a proper zoo series? This version is a copy/paste to make this post easier, but it will always come in the following table form (from a text file). My read.zoo() statement reads the Year as the index but the quarters (Qtr1, Qtr2, etc) are read as column names. I've been trying to figure out a non-garbage way to read the columns as the "quarter" part of the index, but it's sloppy (too sloppy to post). I'm guessing this problem has already been solved, but I can't find it.
> texinp <- "
+ Year Qtr1 Qtr2 Qtr3 Qtr4
+ 1992 566 443 329 341
+ 1993 344 212 133 112
+ 1994 252 252 199 207"
> z <- read.zoo(textConnection(texinp), header=TRUE)
> z
From the as.yearqtr() documentation, the target would look like:
1992 Q1 1992 Q2 1992 Q3 1992 Q4 1993 Q1 1993 Q2 1993 Q3 1993 Q4
566 443 329 341 344 212 133 112
1994 Q1 1994 Q2 1994 Q3 1994 Q4
252 252 199 207

Read in the data using read.zoo and then convert it to a zooreg object with yearqtr time index:
texinp <- "Year Qtr1 Qtr2 Qtr3 Qtr4
1992 566 443 329 341
1993 344 212 133 112
1994 252 252 199 207"
library(zoo)
z <- read.zoo(text = texinp, header=TRUE)
zz <- zooreg(c(t(z)), start = yearqtr(start(z)), freq = 4)
The result looks like this:
> zz
1992 Q1 1992 Q2 1992 Q3 1992 Q4 1993 Q1 1993 Q2 1993 Q3 1993 Q4 1994 Q1 1994 Q2 1994 Q3 1994 Q4
566 443 329 341 344 212 133 112 252 252 199 207

read.zoo assumes your data has at most one time-index column, so you have to process this yourself. First read it in using read.table
zt <- read.table( textConnection( texinp ), header = TRUE)
then convert it to a "long table" using the melt function from the reshape package:
require(reshape)
zt.m <- melt( zt, id = 'Year', variable_name = 'Qtr')
> zt.m
Year Qtr value
1 1992 Qtr1 566
2 1993 Qtr1 344
3 1994 Qtr1 252
4 1992 Qtr2 443
5 1993 Qtr2 212
6 1994 Qtr2 252
7 1992 Qtr3 329
8 1993 Qtr3 133
9 1994 Qtr3 199
10 1992 Qtr4 341
11 1993 Qtr4 112
12 1994 Qtr4 207
and finally create your desired zoo object:
z <- with( zt.m, zoo( value, as.yearqtr(paste(Year, Qtr), format = '%Y Qtr%q')))
> z
1992 Q1 1992 Q2 1992 Q3 1992 Q4 1993 Q1 1993 Q2 1993 Q3 1993 Q4 1994 Q1 1994 Q2
566 443 329 341 344 212 133 112 252 252
1994 Q3 1994 Q4
199 207

Related

construct a pseudo panel based on similar values for some variables in R

I Have a 2 questions in one. I have 20 data frames. Each one is subject to a given year (from 2000 to 2020). They all have the same columns. 1) I want to merge them based on similar observations for a list of variables (columns), so I can construct a panel. 2) Plus when merging I want to rename the columns by adding a suffixes indicating the date.
For example, let take 3 dataframes
df1
year_sample birth_date country work_establishment Wage
2014 1995 US X2134 1700
2014 1996 US X26 1232
2014 1992 CANADA X26 2553
2014 1990 FRANCE X4T346 6574
2014 1983 BELGIUM X2E43 1706
2014 1975 US X2134 1000
2014 1969 CHINA XXZT55 996
df2
year_sample birth_date country work_establishment Wage
2015 1995 US X2134 1756
2015 1996 US X26 1230
2015 1992 CANADA X26 2700
2015 1990 FRANCE X4T346 6574
2015 1975 US X2134 1000
2015 1979 GERMANY X35555 2435
df3
year_sample birth_date country work_establishment Wage
2016 1995 US X2134 1750
2016 1996 US X26 1032
2016 1992 CANADA X26 2353
2016 1990 FRANCE X4T346 6574
2016 1955 MALI X2244 1000
2016 1979 GERMANY X35555 2435
If an observation have similar values for c(birth_date; country ; work_establisment) then I will considere it as the same person. I want therefore:
df_final
id birth_date country work_establishment Wage_2014 Wage_2015 Wage_2016
1 1995 US X2134 1700 1756 1750
2 1996 US X26 1232 1230 1032
3 1992 CANADA X26 2553 2700 2353
4 1990 FRANCE X4T346 6574 6574 6574
I know that if I had just two dataframes I can do :
df_final <- transform(merge(df1,df2, by=c("birth_date", "country", "work_establishment"), suffixes=c("_2014", "_2015")))
But I can't manage to do it for several dataframes at once.
Thank you!
You can get all the dataframes in a list.
list_df <- mget(paste0('df', 1:3))
#OR
#list_df <- list(df1, df2, df3)
Then add suffix to 'Wage' column in each of the dataframe from the year_sample value and drop the year column and use Reduce to merge the dataframes into one.
result <- Reduce(function(x, y)
merge(x, y, by=c("birth_date", "country", "work_establishment")),
lapply(list_df, function(x)
{names(x)[5] <- paste('Wage', x$year_sample[1], sep = '_');x[-1]}))
result
# birth_date country work_establishment Wage_2014 Wage_2015 Wage_2016
#1 1990 FRANCE X4T346 6574 6574 6574
#2 1992 CANADA X26 2553 2700 2353
#3 1995 US X2134 1700 1756 1750
#4 1996 US X26 1232 1230 1032

Counting number of consecutive years in a range in R

I need to count the number of contiguous years in a data frame. I want to filter data frames that have more than 30 years of consecutive records. Before I was doing:
(length(unique(Daily_Streamflow$year)) > 30
But I realized that the number of years (unique years) could be more than 30 but not in a consecutive range, for example:
(unique(DSF_09494000$year))
[1] 1917 1918 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980
[27] 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
[53] 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
How is possible to count the number of years in a range that is continuous without missing years? Is there a similar function as na.contiguous of stats package but for non-NA values?

Adding data to data table dependent on prior data

My first post so hopefully I am doing it right.
I have a table as below:
Year Day Amount
1990 1 200
1990 363 2058
1993 1 10
1993 71 564
1993 360 931
I would like to add rows of data to this table such that there is a row entry for all numbers between the maximum 'Day' of each 'Year' in the table and 364, and the corresponding value in 'Amount' would be the maximum 'Amount' for each Year. The resulting data should be:
Year Day Amount
1990 1 200
1990 363 2058
1993 1 10
1993 71 564
1993 360 931
1990 364 2058
1993 361 931
1993 362 931
1993 363 931
1993 364 931
Any ideas?
Taking advantage of how data.table[i, j, by] lets us evaluate expressions in j for each group of by:
library(data.table)
DT <- data.table(
Year = c(1990, 1990, 1993, 1993, 1993),
Day = c(1, 363, 1, 71, 360),
Amount = c(200, 2058, 10, 564, 931)
)
DT[
order(Day),
{
extended_days <- seq(max(Day) + 1, 364)
extended_amounts <- rep(max(Amount), length(extended_days))
list(
Day = c(Day, extended_days),
Amount = c(Amount, extended_amounts)
)
},
keyby = Year
]
# Year Day Amount
# 1: 1990 1 200
# 2: 1990 363 2058
# 3: 1990 364 2058
# 4: 1993 1 10
# 5: 1993 71 564
# 6: 1993 360 931
# 7: 1993 361 931
# 8: 1993 362 931
# 9: 1993 363 931
# 10: 1993 364 931

Convert rows to Columns in R

My Dataframe:
> head(scotland_weather)
JAN Year.1 FEB Year.2 MAR Year.3 APR Year.4 MAY Year.5 JUN Year.6 JUL Year.7 AUG Year.8 SEP Year.9 OCT Year.10
1 293.8 1993 278.1 1993 238.5 1993 191.1 1947 191.4 2011 155.0 1938 185.6 1940 216.5 1985 267.6 1950 258.1 1935
2 292.2 1928 258.8 1997 233.4 1990 149.0 1910 168.7 1986 137.9 2002 181.4 1988 211.9 1992 221.2 1981 254.0 1954
3 275.6 2008 244.7 2002 201.3 1992 146.8 1934 155.9 1925 137.8 1948 170.1 1939 202.3 2009 193.9 1982 248.8 2014
4 252.3 2015 227.9 1989 200.2 1967 142.1 1949 149.5 2015 137.7 1931 165.8 2010 191.4 1962 189.7 2011 247.7 1938
5 246.2 1974 224.9 2014 180.2 1979 133.5 1950 137.4 2003 135.0 1966 162.9 1956 190.3 2014 189.7 1927 242.3 1983
6 245.0 1975 195.6 1995 180.0 1989 132.9 1932 129.7 2007 131.7 2004 159.9 1985 189.1 2004 189.6 1985 240.9 2001
NOV Year.11 DEC Year.12 WIN Year.13 SPR Year.14 SUM Year.15 AUT Year.16 ANN Year.17
1 262.0 2009 300.7 2013 743.6 2014 409.5 1986 455.6 1985 661.2 1981 1886.4 2011
2 244.8 1938 268.5 1986 649.5 1995 401.3 2015 435.6 1948 633.8 1954 1828.1 1990
3 242.2 2006 267.2 1929 645.4 2000 393.7 1994 427.8 2009 615.8 1938 1756.8 2014
4 231.3 1917 265.4 2011 638.3 2007 393.2 1967 422.6 1956 594.5 1935 1735.8 1938
5 229.9 1981 264.0 2006 608.9 1990 391.7 1992 397.0 2004 590.6 1982 1720.0 2008
6 224.9 1951 261.0 1912 592.8 2015 389.1 1913 390.1 1938 589.2 2006 1716.5 1954
Year.X column is not ordered. I wish to convert this into the following format:
month year rainfall_mm
Jan 1993 293.8
Feb 1993 278.1
Mar 1993 238.5
...
Nov 2015 230.0
I tried t() but it keeps the year column separate.
also tried reshape2 recast(data, formula, ..., id.var, measure.var) but something is missing. as both month and Year.X columns are numeric and int
> str(scotland_weather)
'data.frame': 106 obs. of 34 variables:
$ JAN : num 294 292 276 252 246 ...
$ Year.1 : int 1993 1928 2008 2015 1974 1975 2005 2007 1990 1983 ...
$ FEB : num 278 259 245 228 225 ...
$ Year.2 : int 1990 1997 2002 1989 2014 1995 1998 2000 1920 1918 ...
$ MAR : num 238 233 201 200 180 ...
$ Year.3 : int 1994 1990 1992 1967 1979 1989 1921 1913 2015 1978 ...
$ APR : num 191 149 147 142 134 ...
Based on the pattern of alternating columns in the 'scotland_weather' for the 'YearX' column, one way would be to use c(TRUE, FALSE) to select the alternate column by recycling, which is similar to seq(1, ncol(scotland_weather), by =2). By using c(FALSE, TRUE), we get the seq(2, ncol(scotland_weather), by =2). This will be useful for extracting those columns, get the transpose (t) and concatenate (c) to vector. Once we are done with this, the next step will be to extract the column names that are not 'Year'. For this grep can be used. Then, we use data.frame to bind the vectors to a data.frame.
res <- data.frame(month= names(scotland_weather)[!grepl('Year',
names(scotland_weather))], year=c(t(scotland_weather[c(FALSE,TRUE)])),
rainfall_mm= c(t(scotland_weather[c(TRUE,FALSE)])))
head(res,4)
# month year rainfall_mm
#1 JAN 1993 293.8
#2 FEB 1993 278.1
#3 MAR 1993 238.5
#4 APR 1947 191.1
The problem you have is not only that you need to transform your data you do also have the problem that years for first column is in the second, years for the third column is in the fourth and so on...
Here is a solution using tidyr.
library(tidyr)
match <- Vectorize(function(x,y) grep(x,names(df)) - grep(y,names(df) == 1))
years <- grep("Year",names(scotland_weather))
df %>% gather("month","rainfall_mm",-years) %>%
gather("yearname","year",-c(months,time)) %>%
filter(match(month,yearname)) %>%
select(-yearname)

Aggregating monthly column values into quarterly values

Hello Everybody I am pretty much completely new to R and any help is greatly appreciated. I have the following data (called "depressionaggregate") from 2004 until 2013 for each month:
Month Year DepressionCount
1 01 2004 285
2 02 2004 323
3 03 2004 267
4 04 2004 276
5 05 2004 312
6 06 2004 232
7 07 2004 228
8 08 2004 280
9 09 2004 277
10 10 2004 335
11 11 2004 273
I am trying to create a new column with the aggregated values for each year for each quarter (i.e. 2004 Q1, 2004 Q2 etc.). I have tried using the function aggregate but have not been successful. Hope you can help me! Regards
1) If DF is the input data.frame convert it to a zoo object z with a "yearmon" index and then aggregate that to "yearqtr":
library(zoo)
toYearmon <- function(y, m) as.yearmon(paste(y, m, sep = "-"))
z <- read.zoo(DF, index = 2:1, FUN = toYearmon)
ag <- aggregate(z, as.yearqtr, sum)
giving:
> ag
2004 Q1 2004 Q2 2004 Q3 2004 Q4
875 820 785 608
2) This would also work:
library(zoo)
yq <- as.yearqtr(as.yearmon(paste(DF$Year, DF$Month), "%Y %m"))
ta <- tapply(DF$DepressionCount, yq, sum)

Resources