is if na restart count - r

This is the same as thisquestion but I want to preserve the date. Please read that first.
library(dplyr)
library(tidyverse)
df <- tibble(mydate = as.Date(c("2019-05-11 23:01:00", "2019-05-11 23:02:00", "2019-05-11 23:03:00", "2019-05-11 23:04:00",
"2019-05-12 23:05:00", "2019-05-12 23:06:00", "2019-05-12 23:07:00", "2019-05-12 23:08:00",
"2019-05-13 23:09:00", "2019-05-13 23:10:00", "2019-05-13 23:11:00", "2019-05-13 23:12:00",
"2019-05-14 23:13:00", "2019-05-14 23:14:00", "2019-05-14 23:15:00", "2019-05-14 23:16:00",
"2019-05-15 23:17:00", "2019-05-15 23:18:00", "2019-05-15 23:19:00", "2019-05-15 23:20:00")),
myval = c(0, NA, 1500, 1500,
1500, 1500, NA, 0,
0, 0, 1100, 1100,
1100, 0, 200, 200,
1100, 1100, 1100, 0
))
# just replace values [0,1] with NA
df$myval[df$myval >= 0 & df$myval <= 1] <- NA
df <- df %>%
group_by(myval) %>%
mutate(counts = sum(myval == myval)) %>%
mutate(result = (myval / counts))
Right now the result is:
mydate myval counts result
<date> <dbl> <int> <dbl>
1 2019-05-11 NA NA NA
2 2019-05-11 NA NA NA
3 2019-05-11 1500 4 375
4 2019-05-11 1500 4 375
5 2019-05-12 1500 4 375
6 2019-05-12 1500 4 375
7 2019-05-12 NA NA NA
8 2019-05-12 NA NA NA
9 2019-05-13 NA NA NA
10 2019-05-13 NA NA NA
11 2019-05-13 1100 6 183.
12 2019-05-13 1100 6 183.
13 2019-05-14 1100 6 183.
14 2019-05-14 NA NA NA
15 2019-05-14 200 2 100
16 2019-05-14 200 2 100
17 2019-05-15 1100 6 183.
18 2019-05-15 1100 6 183.
19 2019-05-15 1100 6 183.
20 2019-05-15 NA NA NA
I want to preserve the above dataframe, wth the dates column and the correct result.
I need somehow to restart the counting if after/before a value a NA exists.
So, for 1100 , I must have count 3 two times and not count 6.

You can create groups with data.table rleid :
library(dplyr)
df %>%
group_by(grp = data.table::rleid(myval)) %>%
mutate(counts = n(),
result= myval/counts)
# mydate myval grp counts result
# <date> <dbl> <int> <int> <dbl>
# 1 2019-05-11 NA 1 2 NA
# 2 2019-05-11 NA 1 2 NA
# 3 2019-05-11 1500 2 4 375
# 4 2019-05-11 1500 2 4 375
# 5 2019-05-12 1500 2 4 375
# 6 2019-05-12 1500 2 4 375
# 7 2019-05-12 NA 3 4 NA
# 8 2019-05-12 NA 3 4 NA
# 9 2019-05-13 NA 3 4 NA
#10 2019-05-13 NA 3 4 NA
#11 2019-05-13 1100 4 3 367.
#12 2019-05-13 1100 4 3 367.
#13 2019-05-14 1100 4 3 367.
#14 2019-05-14 NA 5 1 NA
#15 2019-05-14 200 6 2 100
#16 2019-05-14 200 6 2 100
#17 2019-05-15 1100 7 3 367.
#18 2019-05-15 1100 7 3 367.
#19 2019-05-15 1100 7 3 367.
#20 2019-05-15 NA 8 1 NA

With data.table
library(data.table)
setDT(df)[, counts := .N, rleid(myval)][, result := myval/counts]

Related

Group two dfs based on dates that closely match

These are subsets of two dataframes.
df1:
plot
mean_first_flower_date
gdd
1
2019-07-15
60
1
2019-07-21
50
1
2019-07-23
78
2
2019-05-13
100
2
2019-05-22
173
2
2019-05-25
245
(cont.)
df2:
plot
date
flowers
1
2019-07-12
2
1
2019-07-13
9
1
2019-07-14
3
1
2019-07-15
3
2
2019-05-12
10
2
2019-05-13
10
2
2019-05-14
14
2
2019-05-15
17
(cont.)
df2 has some matching dates with df1 but sometimes the dates are off for one or a couple days (highlighted in bold).
I would like to group both dfs based on both 'date' and 'plot', keeping df2, without losing 'gdd' data from df1.
This will happen if, for example, I inner_join both dfs because the dates will not match.
So if a date in df1 is one to three days earlier or later than what it's possible to match in df2, it's fine because the dates are relatively close. This is tricky because I want this data replacement only if there is not data available in df1 for that data range.
My goal is to have something like this:
plot
date
flowers
gdd
1
2019-07-12
2
60
1
2019-07-13
9
60
1
2019-07-14
3
60
1
2019-07-15
3
60
2
2019-05-12
10
100
2
2019-05-13
10
100
2
2019-05-14
14
100
2
2019-05-15
17
100
Is it possible to do?
I greatly appreciate any help!
Thanks!
I think a 'rolling join' from the data.table package can handle this:
library(data.table)
setDT(df1)
setDT(df2)
df1[, mean_first_flower_date := as.Date(mean_first_flower_date)]
df2[, date := as.Date(date)]
df1[df2, on=c("plot","mean_first_flower_date==date"), roll=3, rollends=TRUE]
# plot mean_first_flower_date gdd flowers
#1: 1 2019-07-12 60 2
#2: 1 2019-07-13 60 9
#3: 1 2019-07-14 60 3
#4: 1 2019-07-15 60 3
#5: 2 2019-05-12 100 10
#6: 2 2019-05-13 100 10
#7: 2 2019-05-14 100 14
#8: 2 2019-05-15 100 17
Using this data:
df1 <- read.table(text="plot mean_first_flower_date gdd
1 2019-07-15 60
1 2019-07-21 50
1 2019-07-23 78
2 2019-05-13 100
2 2019-05-22 173
2 2019-05-25 245", header=TRUE)
df2 <- read.table(text="plot date flowers
1 2019-07-12 2
1 2019-07-13 9
1 2019-07-14 3
1 2019-07-15 3
2 2019-05-12 10
2 2019-05-13 10
2 2019-05-14 14
2 2019-05-15 17", header=TRUE)
Try fill from dplyr. use this syntax
df2 %>% left_join(df1, by = c("plot" = "plot", "date" = "mean_first_flower_date")) %>%
fill(gdd, .direction = "up")
plot date flowers gdd
1 1 2019-07-12 2 60
2 1 2019-07-13 9 60
3 1 2019-07-14 3 60
4 1 2019-07-15 3 60
5 2 2019-05-12 10 100
6 2 2019-05-13 10 100
7 2 2019-05-14 14 NA
8 2 2019-05-15 17 NA
As you can notice there are two NAs in the last two rows which shouldn't be there if you'll join your actual df2 where these rows will be filled by 173 as there will be a match for 2019-05-22. Still if you want to fill the last NA rows, if any, you can use fill again with .direction = "down"
df2 %>% left_join(df1, by = c("plot" = "plot", "date" = "mean_first_flower_date")) %>%
fill(gdd, .direction = "up") %>% fill(gdd, .direction = "down")
plot date flowers gdd
1 1 2019-07-12 2 60
2 1 2019-07-13 9 60
3 1 2019-07-14 3 60
4 1 2019-07-15 3 60
5 2 2019-05-12 10 100
6 2 2019-05-13 10 100
7 2 2019-05-14 14 100
8 2 2019-05-15 17 100

average in previous group at the same place with another column

I have some data and I am dividing the mdo value by the count number of mdo instances in the previous group.
I am calculating the sog avg also.
But I want to calculate the sog avg that takes place to the same instances as the result (mdo/count) value.
library(dplyr)
library(lubridate)
library(purrr)
df <- tibble(mydate = as.Date(c("2019-05-11 23:01:00", "2019-05-11 23:02:00", "2019-05-11 23:03:00", "2019-05-11 23:04:00",
"2019-05-12 23:05:00", "2019-05-12 23:06:00", "2019-05-12 23:07:00", "2019-05-12 23:08:00",
"2019-05-13 23:09:00", "2019-05-13 23:10:00", "2019-05-13 23:11:00", "2019-05-13 23:12:00",
"2019-05-14 23:13:00", "2019-05-14 23:14:00", "2019-05-14 23:15:00", "2019-05-14 23:16:00",
"2019-05-15 23:17:00", "2019-05-15 23:18:00", "2019-05-15 23:19:00", "2019-05-15 23:20:00",
"2019-05-15 23:21:00", "2019-05-15 23:22:00", "2019-05-15 23:23:00", "2019-05-15 23:24:00",
"2019-05-15 23:25:00")),
mdo = c(1500, 1500, 1500, 1500,
1500, 1500, NA, 0,
0, 0, 900, 900, NA, NA, 1100, 1100,
1100, 200, 200, 200,200,
1100, 1100, 1100, 0
),
sog = c(12, 12, 12, 11, 10,9,
2,8.8, 8.7, 7.8, 11, 11, 12, 11,
9.54, 9.8, 10.4,4, 4, 4.5, 3.6,
7, 8, 9, 0))
df1 <- df %>%
mutate(grp = data.table::rleid(mdo))
df1 <- df1 %>%
#Keep only non-NA value
filter(!is.na(mdo)) %>%
#count occurence of each grp
count(grp, name = 'count') %>%
#Shift the count to the previous group
mutate(count = lag(count)) %>%
#Join with the original data
right_join(df1, by = 'grp') %>%
arrange(grp)
group_mdo <- df1 %>%
select(grp, mdo) %>%
unique() %>%
mutate(prev_mdo = lag(mdo, na.rm=TRUE)) %>%
select(-mdo) %>%
tidyr::fill(prev_mdo, .direction = "down")
df1 <- df1 %>%
left_join(group_mdo, by = "grp") %>%
mutate(result = ifelse(prev_mdo != 0, mdo / count, 0)) %>%
mutate(sog_avg = ifelse(prev_mdo != 0, map_dbl(.x = grp - 1, ~ mean(sog[grp == .x], na.rm=TRUE), na.rm=TRUE), NA))
The result right now is:
grp count mydate mdo sog prev_mdo result sog_avg
1 NA 2019-05-11 1500 12 NA NA NA
1 NA 2019-05-11 1500 12 NA NA NA
1 NA 2019-05-11 1500 12 NA NA NA
1 NA 2019-05-11 1500 11 NA NA NA
1 NA 2019-05-12 1500 10 NA NA NA
1 NA 2019-05-12 1500 9 NA NA NA
2 NA 2019-05-12 NA 2 1500 NA 11
3 6 2019-05-12 0 8.8 1500 0 2
3 6 2019-05-13 0 8.7 1500 0 2
3 6 2019-05-13 0 7.8 1500 0 2
4 3 2019-05-13 900 11 0 0 NA
4 3 2019-05-13 900 11 0 0 NA
5 NA 2019-05-14 NA 12 900 NA 11
5 NA 2019-05-14 NA 11 900 NA 11
6 2 2019-05-14 1100 9.54 900 550 11.5
6 2 2019-05-14 1100 9.8 900 550 11.5
6 2 2019-05-15 1100 10.4 900 550 11.5
7 3 2019-05-15 200 4 1100 66.7 9.91
7 3 2019-05-15 200 4 1100 66.7 9.91
7 3 2019-05-15 200 4.5 1100 66.7 9.91
7 3 2019-05-15 200 3.6 1100 66.7 9.91
8 4 2019-05-15 1100 7 200 275 4.03
8 4 2019-05-15 1100 8 200 275 4.03
8 4 2019-05-15 1100 9 200 275 4.03
9 3 2019-05-15 0 0 1100 0 8
My desired result:
grp count mydate mdo sog prev_mdo result sog_avg
1 NA 2019-05-11 1500 12 NA NA NA
1 NA 2019-05-11 1500 12 NA NA NA
1 NA 2019-05-11 1500 12 NA NA NA
1 NA 2019-05-11 1500 11 NA NA NA
1 NA 2019-05-12 1500 10 NA NA NA
1 NA 2019-05-12 1500 9 NA NA NA
2 NA 2019-05-12 NA 2 1500 NA NA
3 6 2019-05-12 0 8.8 1500 0 0
3 6 2019-05-13 0 8.7 1500 0 0
3 6 2019-05-13 0 7.8 1500 0 0
4 3 2019-05-13 900 11 0 0 0
4 3 2019-05-13 900 11 0 0 0
5 NA 2019-05-14 NA 12 900 NA NA
5 NA 2019-05-14 NA 11 900 NA NA
6 2 2019-05-14 1100 9.54 900 550 11
6 2 2019-05-14 1100 9.8 900 550 11
6 2 2019-05-15 1100 10.4 900 550 11
7 3 2019-05-15 200 4 1100 66.7 9.91
7 3 2019-05-15 200 4 1100 66.7 9.91
7 3 2019-05-15 200 4.5 1100 66.7 9.91
7 3 2019-05-15 200 3.6 1100 66.7 9.91
8 4 2019-05-15 1100 7 200 275 4.03
8 4 2019-05-15 1100 8 200 275 4.03
8 4 2019-05-15 1100 9 200 275 4.03
9 3 2019-05-15 0 0 1100 0 0
Where result is zero, sog_avg should be zero, where result is na, sog avg should be na.
And where result is being computed by using the previous group counts, sog avg should be computed with it's previous values.
So, for example:
mdo = 1100 , result is 550 because counts in previous non null group are 2 (mdo value 900).
1100 / 2 = 550 . At this point sog avg should be (11 + 11) / 2 = 11 because counts were 2 in the previous non null group.
Here is a data.table approach. It extensively uses the idea of making groups by using base table or tapply and then lags those results. Note, this answer would fail if mdo is not constant throughout a group.
library(data.table)
dt = as.data.table(df)
dt[, grp := rleid(mdo)]
dt[!is.na(mdo),
count := {
cnt = table(grp)
rep(shift(cnt), cnt)
}
]
setcolorder(dt, c("grp", "count", "mydate", "mdo", "sog"))
dt[,
prev_mdo := {
ord = table(grp)
nafill(rep(shift(mdo[cumsum(ord)]), ord), "locf")
}
]
dt[, result := fifelse(prev_mdo != 0L, mdo / count, 0)]
dt[!is.na(result),
sog_avg := {
mn = tapply(sog, grp, mean)
rep(shift(mn), table(grp))
}]
dt[result == 0 | is.na(result), sog_avg := result]
dt
#> grp count mydate mdo sog prev_mdo result sog_avg
#> 1: 1 NA 2019-05-11 1500 12.00 NA NA NA
#> 2: 1 NA 2019-05-11 1500 12.00 NA NA NA
#> 3: 1 NA 2019-05-11 1500 12.00 NA NA NA
#> 4: 1 NA 2019-05-11 1500 11.00 NA NA NA
#> 5: 1 NA 2019-05-12 1500 10.00 NA NA NA
#> 6: 1 NA 2019-05-12 1500 9.00 NA NA NA
#> 7: 2 NA 2019-05-12 NA 2.00 1500 NA NA
#> 8: 3 6 2019-05-12 0 8.80 1500 0.00000 0.000000
#> 9: 3 6 2019-05-13 0 8.70 1500 0.00000 0.000000
#> 10: 3 6 2019-05-13 0 7.80 1500 0.00000 0.000000
#> 11: 4 3 2019-05-13 900 11.00 0 0.00000 0.000000
#> 12: 4 3 2019-05-13 900 11.00 0 0.00000 0.000000
#> 13: 5 NA 2019-05-14 NA 12.00 900 NA NA
#> 14: 5 NA 2019-05-14 NA 11.00 900 NA NA
#> 15: 6 2 2019-05-14 1100 9.54 900 550.00000 11.000000
#> 16: 6 2 2019-05-14 1100 9.80 900 550.00000 11.000000
#> 17: 6 2 2019-05-15 1100 10.40 900 550.00000 11.000000
#> 18: 7 3 2019-05-15 200 4.00 1100 66.66667 9.913333
#> 19: 7 3 2019-05-15 200 4.00 1100 66.66667 9.913333
#> 20: 7 3 2019-05-15 200 4.50 1100 66.66667 9.913333
#> 21: 7 3 2019-05-15 200 3.60 1100 66.66667 9.913333
#> 22: 8 4 2019-05-15 1100 7.00 200 275.00000 4.025000
#> 23: 8 4 2019-05-15 1100 8.00 200 275.00000 4.025000
#> 24: 8 4 2019-05-15 1100 9.00 200 275.00000 4.025000
#> 25: 9 3 2019-05-15 0 0.00 1100 0.00000 0.000000
#> grp count mydate mdo sog prev_mdo result sog_avg

count rows from every previous value

This question is the same as here but this time I want to divide every value by the previous count, not itself. So, for the first value (1500) we will have NA because there is no other value before that. Then, we will divide 1100 by 4 because the count of previous value (1500) is 4. Then, we will divide 200 by 3 because the previous value (1100) has count 3. Last, divide 1100 by 2 because 200 has count 2. I tried to use shift/lag but can't succeed!
This is the code that divides every value with its own count.
library(dplyr)
library(tidyverse)
df <- tibble(mydate = as.Date(c("2019-05-11 23:01:00", "2019-05-11 23:02:00", "2019-05-11 23:03:00", "2019-05-11 23:04:00",
"2019-05-12 23:05:00", "2019-05-12 23:06:00", "2019-05-12 23:07:00", "2019-05-12 23:08:00",
"2019-05-13 23:09:00", "2019-05-13 23:10:00", "2019-05-13 23:11:00", "2019-05-13 23:12:00",
"2019-05-14 23:13:00", "2019-05-14 23:14:00", "2019-05-14 23:15:00", "2019-05-14 23:16:00",
"2019-05-15 23:17:00", "2019-05-15 23:18:00", "2019-05-15 23:19:00", "2019-05-15 23:20:00")),
myval = c(0, NA, 1500, 1500,
1500, 1500, NA, 0,
0, 0, 1100, 1100,
1100, 0, 200, 200,
1100, 1100, 1100, 0
))
# just replace values [0,1] with NA
df$myval[df$myval >= 0 & df$myval <= 1] <- NA
df <- df %>%
group_by(grp = data.table::rleid(myval)) %>%
mutate(counts = n(),
result= myval/counts)
# mydate myval grp counts result
# <date> <dbl> <int> <int> <dbl>
# 1 2019-05-11 NA 1 2 NA
# 2 2019-05-11 NA 1 2 NA
# 3 2019-05-11 1500 2 4 375
# 4 2019-05-11 1500 2 4 375
# 5 2019-05-12 1500 2 4 375
# 6 2019-05-12 1500 2 4 375
# 7 2019-05-12 NA 3 4 NA
# 8 2019-05-12 NA 3 4 NA
# 9 2019-05-13 NA 3 4 NA
#10 2019-05-13 NA 3 4 NA
#11 2019-05-13 1100 4 3 367.
#12 2019-05-13 1100 4 3 367.
#13 2019-05-14 1100 4 3 367.
#14 2019-05-14 NA 5 1 NA
#15 2019-05-14 200 6 2 100
#16 2019-05-14 200 6 2 100
#17 2019-05-15 1100 7 3 367.
#18 2019-05-15 1100 7 3 367.
#19 2019-05-15 1100 7 3 367.
#20 2019-05-15 NA 8 1 NA
I want to preserve the above dataframe, with the dates column and the correct result.
Here is one way :
library(dplyr)
#Create a group number
df1 <- df %>% mutate(grp = data.table::rleid(myval))
df1 %>%
#Keep only non-NA value
filter(!is.na(myval)) %>%
#count occurence of each grp
count(grp, name = 'count') %>%
#Shift the count to the previous group
mutate(count = lag(count)) %>%
#Join with the original data
right_join(df1, by = 'grp') %>%
#divide the count to get final result
mutate(result = myval/count) %>%
arrange(grp)
which returns
# A tibble: 20 x 5
# grp count mydate myval result
# <int> <int> <date> <dbl> <dbl>
# 1 1 NA 2019-05-11 NA NA
# 2 1 NA 2019-05-11 NA NA
# 3 2 NA 2019-05-11 1500 NA
# 4 2 NA 2019-05-11 1500 NA
# 5 2 NA 2019-05-12 1500 NA
# 6 2 NA 2019-05-12 1500 NA
# 7 3 NA 2019-05-12 NA NA
# 8 3 NA 2019-05-12 NA NA
# 9 3 NA 2019-05-13 NA NA
#10 3 NA 2019-05-13 NA NA
#11 4 4 2019-05-13 1100 275
#12 4 4 2019-05-13 1100 275
#13 4 4 2019-05-14 1100 275
#14 5 NA 2019-05-14 NA NA
#15 6 3 2019-05-14 200 66.7
#16 6 3 2019-05-14 200 66.7
#17 7 2 2019-05-15 1100 550
#18 7 2 2019-05-15 1100 550
#19 7 2 2019-05-15 1100 550
#20 8 NA 2019-05-15 NA NA

count number of occurences in different timeline

I have this kind of data.
library(dplyr)
library(tidyverse)
df <- tibble(mydate = as.Date(c("2019-05-11 23:01:00", "2019-05-11 23:02:00", "2019-05-11 23:03:00", "2019-05-11 23:04:00",
"2019-05-12 23:05:00", "2019-05-12 23:06:00", "2019-05-12 23:07:00", "2019-05-12 23:08:00",
"2019-05-13 23:09:00", "2019-05-13 23:10:00", "2019-05-13 23:11:00", "2019-05-13 23:12:00",
"2019-05-14 23:13:00", "2019-05-14 23:14:00", "2019-05-14 23:15:00", "2019-05-14 23:16:00",
"2019-05-15 23:17:00", "2019-05-15 23:18:00", "2019-05-15 23:19:00", "2019-05-15 23:20:00")),
myval = c(0, NA, 1500, 1500,
1500, 1500, NA, 0,
0, 0, 1100, 1100,
1100, 0, 200, 200,
1100, 1100, 1100, 0
))
I want to divide every same value with the counts that it appears. But, if between this number (value 1100) , another number (or NA) appears, and then re-appears (value 1100) , I want to count it separatable.
# just replace values [0,1] with NA
df$myval[df$myval >= 0 & df$myval <= 1] <- NA
df <- df %>%
group_by(myval) %>%
mutate(counts = sum(myval == myval)) %>%
mutate(result = (myval / counts))
Right now the result is:
mydate myval counts result
<date> <dbl> <int> <dbl>
1 2019-05-11 NA NA NA
2 2019-05-11 NA NA NA
3 2019-05-11 1500 4 375
4 2019-05-11 1500 4 375
5 2019-05-12 1500 4 375
6 2019-05-12 1500 4 375
7 2019-05-12 NA NA NA
8 2019-05-12 NA NA NA
9 2019-05-13 NA NA NA
10 2019-05-13 NA NA NA
11 2019-05-13 1100 6 183.
12 2019-05-13 1100 6 183.
13 2019-05-14 1100 6 183.
14 2019-05-14 NA NA NA
15 2019-05-14 200 2 100
16 2019-05-14 200 2 100
17 2019-05-15 1100 6 183.
18 2019-05-15 1100 6 183.
19 2019-05-15 1100 6 183.
20 2019-05-15 NA NA NA
but as you cane see for the value 1100 that appears twice, it count it 6 times.
I want to count it 3 times and then again 3 times.
So, for example value 1500 appears 4 times, so I divide 1500/4.
1100 should be divided by 3 and then again by 3.
You can do that using Run Length Encoding (which is basically a cumulative sum that restarts when it sees another value).
rle(df$myval) %$%
tibble(rle = lengths,
myval = values,
avg = values / rle)
# A tibble: 10 x 3
# rle myval avg
# <int> <dbl> <dbl>
# 1 1 0 0
# 2 1 NA NA
# 3 4 1500 375
# 4 1 NA NA
# 5 3 0 0
# 6 3 1100 367.
# 7 1 0 0
# 8 2 200 100
# 9 3 1100 367.
# 10 1 0 0

Merging data frames with different number of rows and different columns

I have two data frames with different number of columns and rows. I want to combine them into one data frame.
> month.saf
Name NCDC Year Month Day HrMn Temp Q
244 AP 99999 2014 2 1 0 12 1
245 AP 99999 2014 2 1 300 12.2 1
246 AP 99999 2014 2 1 600 14.4 1
247 AP 99999 2014 2 1 900 18.6 1
248 AP 99999 2014 2 1 1200 18 1
249 AP 99999 2014 2 1 1500 13.6 1
250 AP 99999 2014 2 1 1800 11.8 1
251 AP 99999 2014 2 1 2100 10.8 1
252 AP 99999 2014 2 2 0 8.4 1
253 AP 99999 2014 2 2 300 8.6 1
254 AP 99999 2014 2 2 600 19.8 2
255 AP 99999 2014 2 2 900 22.8 1
256 AP 99999 2014 2 2 1200 20.8 1
257 AP 99999 2014 2 2 1500 16.4 1
258 AP 99999 2014 2 2 1800 13.4 1
259 AP 99999 2014 2 2 2100 12.4 1
> T2Mdf
V1 V2
0 293.494262695312 291.642639160156
300 294.003479003906 292.375091552734
600 296.809997558594 295.207885742188
900 298.287811279297 297.181549072266
1200 298.317565917969 297.725708007813
1500 298.134002685547 296.226165771484
1800 296.006805419922 293.354248046875
2100 293.785491943359 293.547210693359
0.1 294.638732910156 293.019866943359
300.1 292.179992675781 291.256958007812
The output that I want is like this:
Name NCDC Year Month Day HrMn Temp Q V1 V2
244 AP 99999 2014 2 1 0 12 1 293.4942627 291.6426392
245 AP 99999 2014 2 1 300 12.2 1 294.003479 292.3750916
246 AP 99999 2014 2 1 600 14.4 1 296.8099976 295.2078857
247 AP 99999 2014 2 1 900 18.6 1 298.2878113 297.1815491
248 AP 99999 2014 2 1 1200 18 1 298.3175659 297.725708
249 AP 99999 2014 2 1 1500 13.6 1 298.1340027 296.2261658
250 AP 99999 2014 2 1 1800 11.8 1 296.0068054 293.354248
251 AP 99999 2014 2 1 2100 10.8 1 293.7854919 293.5472107
252 AP 99999 2014 2 2 0 8.4 1 294.6387329 293.0198669
253 AP 99999 2014 2 2 300 8.6 1 292.1799927 291.256958
254 AP 99999 2014 2 2 600 19.8 2 292.2477417 291.3471069
255 AP 99999 2014 2 2 900 22.8 1 294.2276306 294.2766418
256 AP 99999 2014 2 2 1200 20.8 1 NA NA
257 AP 99999 2014 2 2 1500 16.4 1 NA NA
258 AP 99999 2014 2 2 1800 13.4 1 NA NA
259 AP 99999 2014 2 2 2100 12.4 1 NA NA
I tried cbindbut it gives me an error
Error in data.frame(..., check.names = FALSE) : arguments imply
differing number of rows: 216, 220
And using rbind.fill() but it gives me something like
V1 V2 Name USAF NCDC Year Month Day HrMn I Type QCP Temp Q
1 293.494262695312 291.642639160156 <NA> NA NA NA NA NA NA NA <NA> NA <NA> NA
2 294.003479003906 292.375091552734 <NA> NA NA NA NA NA NA NA <NA> NA <NA> NA
3 296.809997558594 295.207885742188 <NA> NA NA NA NA NA NA NA <NA> NA <NA> NA
4 298.287811279297 297.181549072266 <NA> NA NA NA NA NA NA NA <NA> NA <NA> NA
5 298.317565917969 297.725708007813 <NA> NA NA NA NA NA NA NA <NA> NA <NA> NA
6 <NA> <NA> AP 421820 99999 2014 2 1 0 4 FM-12 NA 12 1
7 <NA> <NA> AP 421820 99999 2014 2 1 300 4 FM-12 NA 12.2 1
8 <NA> <NA> AP 421820 99999 2014 2 1 600 4 FM-12 NA 14.4 1
9 <NA> <NA> AP 421820 99999 2014 2 1 900 4 FM-12 NA 18.6 1
10 <NA> <NA> AP 421820 99999 2014 2 1 1200 4 FM-12 NA 18 1
How is it possible to do this in R?
If A and B are the two input data frames, here are some solutions:
1) merge This solutions works regardless of whether A or B has more rows.
merge(data.frame(A, row.names=NULL), data.frame(B, row.names=NULL),
by = 0, all = TRUE)[-1]
The first two arguments could be replaced with just A and B respectively if A and B have default rownames, i.e. 1, 2, ..., or if they have consistent rownames. That is, merge(A, B, by = 0, all = TRUE)[-1] .
For example, if we have this input:
# test inputs
A <- data.frame(BOD, row.names = letters[1:6])
B <- setNames(2 * BOD[1:2, ], c("X", "Y"))
then:
merge(data.frame(A, row.names=NULL), data.frame(B, row.names=NULL),
by = 0, all = TRUE)[-1]
gives:
Time demand X Y
1 1 8.3 2 16.6
2 2 10.3 4 20.6
3 3 19.0 NA NA
4 4 16.0 NA NA
5 5 15.6 NA NA
6 7 19.8 NA NA
1a) An equivalent variation is:
do.call("merge", c(lapply(list(A, B), data.frame, row.names=NULL),
by = 0, all = TRUE))[-1]
2) cbind.zoo This solution assumes that A has more rows and that B's entries are all of the same type, e.g. all numeric. A is not restricted. These conditions hold in the data of the question.
library(zoo)
data.frame(A, cbind(zoo(, 1:nrow(A)), as.zoo(B)))

Resources