Adding data to data table dependent on prior data - r

My first post so hopefully I am doing it right.
I have a table as below:
Year Day Amount
1990 1 200
1990 363 2058
1993 1 10
1993 71 564
1993 360 931
I would like to add rows of data to this table such that there is a row entry for all numbers between the maximum 'Day' of each 'Year' in the table and 364, and the corresponding value in 'Amount' would be the maximum 'Amount' for each Year. The resulting data should be:
Year Day Amount
1990 1 200
1990 363 2058
1993 1 10
1993 71 564
1993 360 931
1990 364 2058
1993 361 931
1993 362 931
1993 363 931
1993 364 931
Any ideas?

Taking advantage of how data.table[i, j, by] lets us evaluate expressions in j for each group of by:
library(data.table)
DT <- data.table(
Year = c(1990, 1990, 1993, 1993, 1993),
Day = c(1, 363, 1, 71, 360),
Amount = c(200, 2058, 10, 564, 931)
)
DT[
order(Day),
{
extended_days <- seq(max(Day) + 1, 364)
extended_amounts <- rep(max(Amount), length(extended_days))
list(
Day = c(Day, extended_days),
Amount = c(Amount, extended_amounts)
)
},
keyby = Year
]
# Year Day Amount
# 1: 1990 1 200
# 2: 1990 363 2058
# 3: 1990 364 2058
# 4: 1993 1 10
# 5: 1993 71 564
# 6: 1993 360 931
# 7: 1993 361 931
# 8: 1993 362 931
# 9: 1993 363 931
# 10: 1993 364 931

Related

Create a new variable in data frame that contains the sum of the values of all other groups

I have data similar to this
example_data <- data.frame(
company = c(rep("A",6),
rep("B",6),
rep("C",6)),
year = c(rep(c(rep(c(2019), 3), rep(2020, 3)), 3)),
country = c(rep(c("Australia","Tanzania","Nepal"),3)),
sales = c(sample(1000:2000, 18)),
employees = c(sample(100:200, 18)),
profit = c(sample(500:1000, 18))
)
which when printed out looks like this:
> example_data
company year country sales employees profit
1 A 2019 Australia 1815 138 986
2 A 2019 Tanzania 1183 126 907
3 A 2019 Nepal 1159 155 939
4 A 2020 Australia 1873 183 866
5 A 2020 Tanzania 1858 198 579
6 A 2020 Nepal 1841 184 601
7 B 2019 Australia 1989 160 595
8 B 2019 Tanzania 1162 151 520
9 B 2019 Nepal 1470 187 670
10 B 2020 Australia 1013 128 945
11 B 2020 Tanzania 1718 123 886
12 B 2020 Nepal 1135 149 778
13 C 2019 Australia 1846 188 755
14 C 2019 Tanzania 1445 194 916
15 C 2019 Nepal 1029 145 903
16 C 2020 Australia 1737 161 578
17 C 2020 Tanzania 1489 141 859
18 C 2020 Nepal 1350 167 536
The unit of observation for the three variables of interest sales, employees, profit is a unique combination of company, year, and country.
What I need is a column in the data frame for every one of these three variables named other_sales, other_employees, and other_profit. (In my actual data, I have not only three but closer to 40 such variables of interest.) These should be the sum of the other companies in that year, in that country for that variable. So for instance, example_data$other_sales[1] should be the sum of the two values 1989 and 1846, which are "he sales for company B in that year in that country, and the sales for company C in that year in that country respectively.
I am familiar with dplyr::group_by() and dplyr::mutate(), but I struggle to come up with a way to solve this problem. What I would like to do is something like this:
library(dplyr)
example_data %>%
group_by(company, year, country) %>%
mutate(other_sales = sum(
example_data %>% filter(company!="this") %>% .$sales)
)
# "this" should be the value of 'company' in the current group
Obviously, this code doesn't work. Even if it did, it would not accomplish the goal of creating these other_* variables automatically for every specified column in the data frame. I've thought about creating a complicated for loop, but I figured before I go down that most likely wrong route, it's better to ask here. Finally, while it would be possible to construct a solution based purely on column indices (i.e., for example_data[1,7] calculate the sum of [7,4] and [13,4]), this would not work in my real data because the number of observations per company can differ.
EDIT: small correction in the code
--- SOLUTION ---
Based on the comment under this question, I was able to figure out a solution that solves both issues in the question:
example_data %>%
group_by(year, country) %>%
mutate(across(sales:profit, .names = "other_{.col}", function(x) sum(x)-x))
I think this will solve your problem
example_data %>%
group_by(country,year) %>%
mutate(other_sales = sum(sales)- sales)
To generalise it for all variables, i.e. sales, profit and employees:
(arrange is not necessary, but helps when checking.)
library(tidyverse)
set.seed(123)
example_data <- data.frame(
company = c(rep("A",6),
rep("B",6),
rep("C",6)),
year = c(rep(c(rep(c(2019), 3), rep(2020, 3)), 3)),
country = c(rep(c("Australia","Tanzania","Nepal"),3)),
sales = c(sample(1000:2000, 18)),
employees = c(sample(100:200, 18)),
profit = c(sample(500:1000, 18))
)
example_data |>
arrange(country, year, company) |> # Optional
group_by(country, year) |>
mutate(across(sales:profit, ~sum(.) - ., .names = "other_{.col}"))
#> # A tibble: 18 × 9
#> # Groups: country, year [6]
#> company year country sales employees profit other_sales other_em…¹ other…²
#> <chr> <dbl> <chr> <int> <int> <int> <int> <int> <int>
#> 1 A 2019 Australia 1414 190 989 3190 302 1515
#> 2 B 2019 Australia 1817 125 522 2787 367 1982
#> 3 C 2019 Australia 1373 177 993 3231 315 1511
#> 4 A 2020 Australia 1525 108 892 2830 372 1524
#> 5 B 2020 Australia 1228 197 808 3127 283 1608
#> 6 C 2020 Australia 1602 175 716 2753 305 1700
#> 7 A 2019 Nepal 1178 191 762 2899 283 1608
#> 8 B 2019 Nepal 1298 141 943 2779 333 1427
#> 9 C 2019 Nepal 1601 142 665 2476 332 1705
#> 10 A 2020 Nepal 1937 171 829 2721 266 1967
#> 11 B 2020 Nepal 1013 135 991 3645 302 1805
#> 12 C 2020 Nepal 1708 131 976 2950 306 1820
#> 13 A 2019 Tanzania 1462 156 608 2781 286 1633
#> 14 B 2019 Tanzania 1117 106 910 3126 336 1331
#> 15 C 2019 Tanzania 1664 180 723 2579 262 1518
#> 16 A 2020 Tanzania 1194 192 924 3010 296 1423
#> 17 B 2020 Tanzania 1243 182 634 2961 306 1713
#> 18 C 2020 Tanzania 1767 114 789 2437 374 1558
#> # … with abbreviated variable names ¹​other_employees, ²​other_profit
Created on 2022-12-08 with reprex v2.0.2

Rolling avergae for aggregated results in R

I have a database with sales value for individual firms that belong to different industries.
In the example dataset below:
set.seed(123)
df <- data.table(year=rep(1980:1984,each=4),sale=sample(100:150,20),ind=sample(LETTERS[1:2],20,replace = TRUE))
df[order(year,ind)]
year sale ind
1: 1980 114 A
2: 1980 102 A
3: 1980 130 B
4: 1980 113 B
5: 1981 136 A
6: 1981 148 A
7: 1981 141 B
8: 1981 142 B
9: 1982 124 A
10: 1982 125 A
11: 1982 104 A
12: 1982 126 B
13: 1983 108 A
14: 1983 128 A
15: 1983 140 B
16: 1983 127 B
17: 1984 134 A
18: 1984 107 A
19: 1984 106 A
20: 1984 146 B
The column "ind" represents industry and I have omitted the firm identifiers (no use in this example).
I want an average defined as follows:
For each year, the desired average is the average of all firms within the industry over the past three years. If the data for past three years is not available, a minimum of two observations is also acceptable.
For example, in the above dataset, if year=1982, and ind=A, there are only two observations for past years (which is still acceptable), so the desired average is the average of all sale values in years 1980 and 1981 for industry A.
If year=1983, and ind=A, we have three prior years, and the desired average is the average of all sale values in years 1980, 1981, and 1982 for industry A.
If year=1984, and ind=A, we have three prior years, and the desired average is the average of all sale values in years 1981, 1982, and 1983 for industry A.
The desired output, thus, will be as follows:
year sale ind mymean
1: 1980 130 B NA
2: 1980 114 A NA
3: 1980 113 B NA
4: 1980 102 A NA
5: 1981 141 B NA
6: 1981 142 B NA
7: 1981 136 A NA
8: 1981 148 A NA
9: 1982 124 A 125.0000
10: 1982 125 A 125.0000
11: 1982 126 B 131.5000
12: 1982 104 A 125.0000
13: 1983 140 B 130.4000
14: 1983 127 B 130.4000
15: 1983 108 A 121.8571
16: 1983 128 A 121.8571
17: 1984 134 A 124.7143
18: 1984 107 A 124.7143
19: 1984 146 B 135.2000
20: 1984 106 A 124.7143
A data.table solution is much preferred for fast implementation.
Many thanks in advance.
I am not very good in data.table. Here is one tidyverse solution if you like or if you can translate it to data.table
library(tidyverse)
df %>% group_by(ind, year) %>%
summarise(ds = sum(sale),
dn = n()) %>%
mutate(ds = (lag(ds,1)+lag(ds,2)+ifelse(is.na(lag(ds,3)), 0, lag(ds,3)))/(lag(dn,1)+lag(dn,2)+ifelse(is.na(lag(dn,3)), 0, lag(dn,3)))
) %>% select(ind, year, mymean = ds) %>%
right_join(df, by = c("ind", "year"))
`summarise()` regrouping output by 'ind' (override with `.groups` argument)
# A tibble: 20 x 4
ind year mymean sale
<chr> <int> <dbl> <int>
1 A 1980 NA 114
2 A 1980 NA 102
3 A 1981 NA 136
4 A 1981 NA 148
5 A 1982 125 124
6 A 1982 125 125
7 A 1982 125 104
8 A 1983 122. 108
9 A 1983 122. 128
10 A 1984 125. 134
11 A 1984 125. 107
12 A 1984 125. 106
13 B 1980 NA 130
14 B 1980 NA 113
15 B 1981 NA 141
16 B 1981 NA 142
17 B 1982 132. 126
18 B 1983 130. 140
19 B 1983 130. 127
20 B 1984 135. 146
You can use zoo's rollapply function to perform this rolling calculation. Note that there are dedicated functions to calculate rolling mean like frollmean in data.table and rollmean in zoo but they lack the argument partial = TRUE present in rollapply. partial = TRUE is useful here since you want to calculate the mean even if the window size is less than 3.
We can first calculate mean of sale value for each ind and year, then perform rolling mean calculation with window size of 3 and join this data with the original dataframe to get all the rows of original dataframe back.
library(data.table)
library(zoo)
df1 <- df[, .(sale = mean(sale)), .(ind, year)]
df2 <- df1[, my_mean := shift(rollapplyr(sale, 3, function(x)
if(length(x) > 1) mean(x, na.rm = TRUE) else NA, partial = TRUE)), ind]
df[df2, on = .(ind, year)]
This can be written using dplyr as :
library(dplyr)
df %>%
group_by(ind, year) %>%
summarise(sale = mean(sale)) %>%
mutate(avg_mean = lag(rollapplyr(sale, 3, partial = TRUE, function(x)
if(length(x) > 1) mean(x, na.rm = TRUE) else NA))) %>%
left_join(df, by = c('ind', 'year'))
Based on Ronak's answer (the mean of previous means), a more general way (the mean of all previous values), and a data.table solution then can be:
library(data.table)
library(roll)
df1 <- df[, .(sum_1 = sum(sale), n=length(sale)), .(ind, year)]
df1[,`:=`(
my_sum = roll_sum(shift(sum_1),3,min_obs = 2),
my_n = roll_sum(shift(n),3,min_obs = 2)
),by=.(ind)]
df1[,`:=`(my_mean=(my_sum/my_n))]
> df[df1[,!c("sum_1","n","my_sum","my_n")] ,on = .(ind, year)]
year sale ind my_mean
1: 1980 130 B NA
2: 1980 113 B NA
3: 1980 114 A NA
4: 1980 102 A NA
5: 1981 141 B NA
6: 1981 142 B NA
7: 1981 136 A NA
8: 1981 148 A NA
9: 1982 124 A 125.0000
10: 1982 125 A 125.0000
11: 1982 104 A 125.0000
12: 1982 126 B 131.5000
13: 1983 140 B 130.4000
14: 1983 127 B 130.4000
15: 1983 108 A 121.8571
16: 1983 128 A 121.8571
17: 1984 134 A 124.7143
18: 1984 107 A 124.7143
19: 1984 106 A 124.7143
20: 1984 146 B 135.2000

Running a function in dplyr gives wrong output

My sample data consists of daily rainfall and temperature from day 1 to 365 for year 1981 and 1982
set.seed(0)
df <- data.frame(year = rep(1981:1982, each = 365),
doy = rep(1:365, times = 2),
rainfall = sample(0:30, 730, replace = T),
tmax = sample(25:35, 730, replace = T))
Each year I have two days of the year called ref.doy and for each ref.doy, I have corresponding doy.first, doy.second.
my.df <- data.frame(year = c(1981, 1981, 1982, 1982),
ref.doy = c(250, 260, 230, 240),
doy.first = c(280, 300, 290, 310),
doy.second = c(310, 330, 340, 350))
What I want to do is for each year, take the first ref.doy and the corresponding
doy.first, doy.second and calculate total rainfall and mean temperature from
ref.doy:doy.first and doy.first:doy.second`. I wrote a function to do this:
my.func <- function(x) {
dat <- x %>%
dplyr::summarise(tot.rain.val1 = sum(rainfall[doy >= ref.doy & doy <= doy.first]),
tot.rain.val2 = sum(rainfall[doy >= doy.first & doy <= doy.second]),
mean.tmax.val1 = mean(tmax[doy >= ref.doy & doy <= doy.first]),
mean.tmax.val2 = sum(tmax[doy >= doy.first & doy <= doy.second]))
return(dat)
}
The approach I took is to first join the two data and then run my function
df <- df %>% left_join(my.df)
results <- df %>% dplyr::group_by(year, ref.doy) %>%
dplyr::summarise(results = paste(my.func(.), collapse = ","))
However, the results look a bit funny and the format is not correct. I need the results
in the following format
year ref.doy tot.rain.val1 tot.rain.val2 mean.tmax.val1 mean.tmax.val2
1981 250
1981 260
1982 230
1982 240
Your function returns a dataframe in the format you want, so you don't need to use paste, but save those outputs in a list and then unnest.
library(tidyverse)
df <- df %>% left_join(my.df)
df %>%
group_by(year, ref.doy) %>%
summarise(results = list(my.func(.))) %>%
unnest() %>%
ungroup() %>%
select(-year, -ref.doy)
# # A tibble: 16 x 6
# year1 ref.doy1 tot.rain.val1 tot.rain.val2 mean.tmax.val1 mean.tmax.val2
# <dbl> <dbl> <int> <int> <dbl> <int>
# 1 1981 250 396 365 29.6 939
# 2 1981 260 429 489 29.8 926
# 3 1982 230 994 805 29.3 1515
# 4 1982 240 1140 653 29.7 1224
# 5 1981 250 396 365 29.6 939
# 6 1981 260 429 489 29.8 926
# 7 1982 230 994 805 29.3 1515
# 8 1982 240 1140 653 29.7 1224
# 9 1981 250 396 365 29.6 939
#10 1981 260 429 489 29.8 926
#11 1982 230 994 805 29.3 1515
#12 1982 240 1140 653 29.7 1224
#13 1981 250 396 365 29.6 939
#14 1981 260 429 489 29.8 926
#15 1982 230 994 805 29.3 1515
#16 1982 240 1140 653 29.7 1224
What about something like this, if you want it in a function:
library(dplyr)
fun <- function(x,y) {
df1 <- x %>% left_join(y) %>% group_by(year,ref.doy) %>%
summarise(tot.rain.val1 = sum(rainfall[doy >= ref.doy & doy <= doy.first]),
tot.rain.val2 = sum(rainfall[doy >= doy.first & doy <= doy.second]),
mean.tmax.val1 = mean(tmax[doy >= ref.doy & doy <= doy.first]),
mean.tmax.val2 = sum(tmax[doy >= doy.first & doy <= doy.second]))
print(df1)
}
fun(df,my.df)
Joining, by = "year"
# A tibble: 4 x 6
# Groups: year [?]
year ref.doy tot.rain.val1 tot.rain.val2 mean.tmax.val1 mean.tmax.val2
<dbl> <dbl> <int> <int> <dbl> <int>
1 1981 250 396 365 29.6 939
2 1981 260 429 489 29.8 926
3 1982 230 994 805 29.3 1515
4 1982 240 1140 653 29.7 1224

Creating new column based on row values of multiple data subsetting conditions

I have a dataframe that looks more or less like follows (the original one has 12 years of data):
Year Quarter Age_1 Age_2 Age_3 Age_4
2005 1 158 120 665 32
2005 2 257 145 121 14
2005 3 68 69 336 65
2005 4 112 458 370 101
2006 1 75 457 741 26
2006 2 365 134 223 45
2006 3 257 121 654 341
2006 4 175 124 454 12
2007 1 697 554 217 47
2007 2 954 987 118 54
2007 4 498 235 112 65
Where the numbers in the age columns represents the amount of individuals in each age class for a specific quarter within a specific year. It is noteworthy that sometimes not all quarters in a specific year have data (e.g., third quarter is not represented in 2007). Also, each row represents a sampling event. Although not shown in this example, in the original dataset I always have more than one sampling event for a specific quarter within a specific year. For example, for the first quarter in 2005 I have 47 sampling events, leading therefore to 47 rows.
What I´d like to have now is a dataframe structured in a way like:
Year Quarter Age_1 Age_2 Age_3 Age_4 Cohort
2005 1 158 120 665 32 158
2005 2 257 145 121 14 257
2005 3 68 69 336 65 68
2005 4 112 458 370 101 112
2006 1 75 457 741 26 457
2006 2 365 134 223 45 134
2006 3 257 121 654 341 121
2006 4 175 124 454 12 124
2007 1 697 554 217 47 47
2007 2 954 987 118 54 54
2007 4 498 235 112 65 65
In this case, I want to create a new column (Cohort) in my original dataset which basically follows my cohorts along my dataset. In other words, when I´m in my first year of data (2005 with all quarters), I take the row values of Age_1 and paste it into the new column. When I move to the next year (2006), then I take all my row values related to my Age_2 and paste it to the new column, and so on and so forth.
I have tried to use the following function, but somehow it only works for the first couple of years:
extract_cohort_quarter <- function(d, yearclass=2005, quarterclass=1) {
ny <- 1:nlevels(d$Year) #no. of Year levels in the dataset
nq <- 1:nlevels(d$Quarter)
age0 <- (paste("age", ny, sep="_"))
year0 <- as.character(yearclass + ny - 1)
quarter <- as.character(rep(1:4, length(age0)))
age <- rep(age0,each=4)
year <- rep(year0,each=4)
df <- data.frame(year,age,quarter,stringsAsFactors=FALSE)
n <- nrow(df)
dnew <- NULL
for(i in 1:n) {
tmp <- subset(d, Year==df$year[i] & Quarter==df$quarter[i])
tmp$Cohort <- tmp[[age[i]]]
dnew <- rbind(dnew, tmp)
}
levels(dnew$Year) <- paste("Yearclass_", yearclass, ":",
year,":",quarter,":", age, sep="")
dnew
}
I have plenty of data from age_1 to age_12 for all the years and quarters, so I don´t think that it´s something related to the data structure itself.
Is there an easier solution to solve this problem? Or is there a way to improve my extract_cohort_quarter() function? Any help will be much appreciated.
-M
I have a simple solution but that demands bit of knowledge of the data.table library. I think you can easily adapt it to your further needs.
Here is the data:
DT <- as.data.table(list(Year = c(2005, 2005, 2005, 2005, 2006, 2006 ,2006 ,2006, 2007, 2007, 2007),
Quarter= c(1, 2, 3, 4 ,1 ,2 ,3 ,4 ,1 ,2 ,4),
Age_1 = c(158, 257, 68, 112 ,75, 365, 257, 175, 697 ,954, 498),
Age_2= c(120 ,145 ,69 ,458 ,457, 134 ,121 ,124 ,554 ,987, 235),
Age_3= c(665 ,121 ,336 ,370 ,741 ,223 ,654 ,454,217,118,112),
Age_4= c(32,14,65,101,26,45,341,12,47,54,65)
))
Here is th code :
DT[,index := .GRP, by = Year]
DT[,cohort := get(paste0("Age_",index)),by = Year]
and the output:
> DT
Year Quarter Age_1 Age_2 Age_3 Age_4 index cohort
1: 2005 1 158 120 665 32 1 158
2: 2005 2 257 145 121 14 1 257
3: 2005 3 68 69 336 65 1 68
4: 2005 4 112 458 370 101 1 112
5: 2006 1 75 457 741 26 2 457
6: 2006 2 365 134 223 45 2 134
7: 2006 3 257 121 654 341 2 121
8: 2006 4 175 124 454 12 2 124
9: 2007 1 697 554 217 47 3 217
10: 2007 2 954 987 118 54 3 118
11: 2007 4 498 235 112 65 3 112
What it does:
DT[,index := .GRP, by = Year]
creates an index for all different year in your table (by = Year makes an operation for group of year, .GRP create an index following the grouping sequence).
I use it to call the column that you named Age_ with the number created
DT[,cohort := get(paste0("Age_",index)),by = Year]
You can even do everything in the single line
DT[,cohort := get(paste0("Age_",.GRP)),by = Year]
I hope it helps
Here is an option using tidyverse
library(dplyr)
library(tidyr)
df1 %>%
gather(key, Cohort, -Year, -Quarter) %>%
separate(key, into = c('key1', 'key2')) %>%
mutate(ind = match(Year, unique(Year))) %>%
group_by(Year) %>%
filter(key2 == Quarter[ind]) %>%
mutate(newcol = paste(Year, Quarter, paste(key1, ind, sep="_"), sep=":")) %>%
ungroup %>%
select(Cohort, newcol) %>%
bind_cols(df1, .)
# Year Quarter Age_1 Age_2 Age_3 Age_4 Cohort newcol
#1 2005 1 158 120 665 32 158 2005:1:Age_1
#2 2005 2 257 145 121 14 257 2005:2:Age_1
#3 2005 3 68 69 336 65 68 2005:3:Age_1
#4 2005 4 112 458 370 101 112 2005:4:Age_1
#5 2006 1 75 457 741 26 457 2006:1:Age_2
#6 2006 2 365 134 223 45 134 2006:2:Age_2
#7 2006 3 257 121 654 341 121 2006:3:Age_2
#8 2006 4 175 124 454 12 124 2006:4:Age_2
#9 2007 1 697 554 217 47 47 2007:1:Age_3
#10 2007 2 954 987 118 54 54 2007:2:Age_3
#11 2007 4 498 235 112 65 65 2007:4:Age_3

Read a time series table using read.zoo

I've looked all over the place, but I can't find where this question has been asked before.
What is a clean way to get this data into a proper zoo series? This version is a copy/paste to make this post easier, but it will always come in the following table form (from a text file). My read.zoo() statement reads the Year as the index but the quarters (Qtr1, Qtr2, etc) are read as column names. I've been trying to figure out a non-garbage way to read the columns as the "quarter" part of the index, but it's sloppy (too sloppy to post). I'm guessing this problem has already been solved, but I can't find it.
> texinp <- "
+ Year Qtr1 Qtr2 Qtr3 Qtr4
+ 1992 566 443 329 341
+ 1993 344 212 133 112
+ 1994 252 252 199 207"
> z <- read.zoo(textConnection(texinp), header=TRUE)
> z
From the as.yearqtr() documentation, the target would look like:
1992 Q1 1992 Q2 1992 Q3 1992 Q4 1993 Q1 1993 Q2 1993 Q3 1993 Q4
566 443 329 341 344 212 133 112
1994 Q1 1994 Q2 1994 Q3 1994 Q4
252 252 199 207
Read in the data using read.zoo and then convert it to a zooreg object with yearqtr time index:
texinp <- "Year Qtr1 Qtr2 Qtr3 Qtr4
1992 566 443 329 341
1993 344 212 133 112
1994 252 252 199 207"
library(zoo)
z <- read.zoo(text = texinp, header=TRUE)
zz <- zooreg(c(t(z)), start = yearqtr(start(z)), freq = 4)
The result looks like this:
> zz
1992 Q1 1992 Q2 1992 Q3 1992 Q4 1993 Q1 1993 Q2 1993 Q3 1993 Q4 1994 Q1 1994 Q2 1994 Q3 1994 Q4
566 443 329 341 344 212 133 112 252 252 199 207
read.zoo assumes your data has at most one time-index column, so you have to process this yourself. First read it in using read.table
zt <- read.table( textConnection( texinp ), header = TRUE)
then convert it to a "long table" using the melt function from the reshape package:
require(reshape)
zt.m <- melt( zt, id = 'Year', variable_name = 'Qtr')
> zt.m
Year Qtr value
1 1992 Qtr1 566
2 1993 Qtr1 344
3 1994 Qtr1 252
4 1992 Qtr2 443
5 1993 Qtr2 212
6 1994 Qtr2 252
7 1992 Qtr3 329
8 1993 Qtr3 133
9 1994 Qtr3 199
10 1992 Qtr4 341
11 1993 Qtr4 112
12 1994 Qtr4 207
and finally create your desired zoo object:
z <- with( zt.m, zoo( value, as.yearqtr(paste(Year, Qtr), format = '%Y Qtr%q')))
> z
1992 Q1 1992 Q2 1992 Q3 1992 Q4 1993 Q1 1993 Q2 1993 Q3 1993 Q4 1994 Q1 1994 Q2
566 443 329 341 344 212 133 112 252 252
1994 Q3 1994 Q4
199 207

Resources