Extract data based on a time series column in R - r

I have an annual daily timeseries pixel data in a data frame in such a way that each date occurs multiple times for each of the pixel. Now I would like to extract/subset this data based on a set of dates stored in another data frame. How can I do this in R using dplyr?
Sample data
X Y T Value
X1 Y1 1/1/2004 1
X2 Y2 1/1/2004 2
X3 Y3 1/1/2004 3
X1 Y1 1/2/2004 4
X2 Y2 1/2/2004 5
X3 Y3 1/2/2004 6
X1 Y1 1/3/2004 7
X2 Y2 1/3/2004 8
X3 Y3 1/3/2004 9
Dates of interest
1/1/2004
1/2/2004
Code
library(dplyr)
X = c("X1", "X2", "X3", "X1", "X2", "X3", "X1", "X2", "X3")
Y = c("Y1", "Y2", "Y3", "Y1", "Y2", "Y3", "Y1", "Y2", "Y3")
T = c("1/1/2004", "1/2/2004", "1/3/2004", "1/1/2004", "1/2/2004", "1/3/2004","1/1/2004", "1/2/2004", "1/3/2004")
Value = c("1", "2", "3", "4", "5", "6", "7", "8", "9")
df = data.frame(X, Y, T, Value)
# Desired dates
TS = read.csv("TS.csv")
TS
"1/1/2004", "1/2/2004"
#stuck...___

If your TS is TS = c("1/1/2004", "1/2/2004"), simply using filter,
library(dplyr)
df %>%
filter(T %in% TS)
X Y T Value
1 X1 Y1 1/1/2004 1
2 X2 Y2 1/2/2004 2
3 X1 Y1 1/1/2004 4
4 X2 Y2 1/2/2004 5
5 X1 Y1 1/1/2004 7
6 X2 Y2 1/2/2004 8
if your TS is TS = ("1/1/2004, 1/2/2004")
library(stringr)
df %>%
filter(T %in% str_split(gsub("\\s+", "", TS), ",", simplify = TRUE))

Base R:
> df[df$T %in% TS,]
X Y T Value
1 X1 Y1 1/1/2004 1
2 X2 Y2 1/2/2004 2
4 X1 Y1 1/1/2004 4
5 X2 Y2 1/2/2004 5
7 X1 Y1 1/1/2004 7
8 X2 Y2 1/2/2004 8
>
If TS is
"1/1/2004, 1/2/2004", use stringr:
> df[df$T %in% stringr::str_split(TS, ", ", simplify=TRUE),]
X Y T Value
1 X1 Y1 1/1/2004 1
2 X2 Y2 1/2/2004 2
4 X1 Y1 1/1/2004 4
5 X2 Y2 1/2/2004 5
7 X1 Y1 1/1/2004 7
8 X2 Y2 1/2/2004 8
>

Related

Creating new column in data frame based on value matched to participant ID

I know there is a simple solution to this problem, as I solved it a couple of months ago, but have since lost the relevant file, and cannot for the life of me work out how I did it.
My data is in a long form, where each row represents a participant's answer to one question, with all rows for one participant sharing a common participant ID - e.g.
ParticipantID Question Resp
1 Age x1
1 Gender x2
1 Education x3
1 Q1 x4
1 Q2 x5
...
2 Age y1
2 Gender y2
...
etc
I want to add new columns to the data to associate the various demographic values with each answer provided by a given participant. So in the example above, I would have a new column "Age" which would take the value x1 for all rows where ParticipantID = 1, y1 for all rows where ParticipantID = 2, etc., like so:
ParticipantID Question Resp Age Gender ...
1 Age x1 x1 x2
1 Gender x2 x1 x2
1 Education x3 x1 x2
1 Q1 x4 x1 x2
1 Q2 x5 x1 x2
...
2 Age y1 y1 y2
2 Gender y2 y1 y2
...
etc
Importantly, I can't just rotate the table from long to wide, because I need the study questions (represented as Q1, Q2, ... above) to remain in long form.
Any help that can be offered is greatly appreciated!
As long as each participant has the same questions in the same order, you can do
cbind(df, do.call(rbind, lapply(split(df, df$ParticipantID), function(x) {
setNames(as.data.frame(t(x[-1])[rep(2, nrow(x)),]), x[[2]])
})), row.names = NULL)
#> ParticipantID Question Resp Age Gender Education Q1 Q2
#> 1 1 Age x1 x1 x2 x3 x4 x5
#> 2 1 Gender x2 x1 x2 x3 x4 x5
#> 3 1 Education x3 x1 x2 x3 x4 x5
#> 4 1 Q1 x4 x1 x2 x3 x4 x5
#> 5 1 Q2 x5 x1 x2 x3 x4 x5
#> 6 2 Age y1 y1 y2 y3 y4 y5
#> 7 2 Gender y2 y1 y2 y3 y4 y5
#> 8 2 Education y3 y1 y2 y3 y4 y5
#> 9 2 Q1 y4 y1 y2 y3 y4 y5
#> 10 2 Q2 y5 y1 y2 y3 y4 y5
Data used
df <- structure(list(ParticipantID = c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L), Question = c("Age", "Gender", "Education", "Q1",
"Q2", "Age", "Gender", "Education", "Q1", "Q2"), Resp = c("x1",
"x2", "x3", "x4", "x5", "y1", "y2", "y3", "y4", "y5")), class = "data.frame",
row.names = c(NA, -10L))
df
#> ParticipantID Question Resp
#> 1 1 Age x1
#> 2 1 Gender x2
#> 3 1 Education x3
#> 4 1 Q1 x4
#> 5 1 Q2 x5
#> 6 2 Age y1
#> 7 2 Gender y2
#> 8 2 Education y3
#> 9 2 Q1 y4
#> 10 2 Q2 y5
Created on 2022-09-19 with reprex v2.0.2

Finding ALL indices of positions in vector matching columns of DF in R

I have a dataframe A whose columns I want to match with the row.names of another dataframe B.
# A
v1 v2
X1 X3
X1 X5
X1 X15
X2 X3
X2 X4
...
# row.names of B (some values are duplicated)
row_names_B=c('X17', 'X1', 'X2', 'X15', 'X3', 'X3', 'X1', 'X5', 'X4', ...)
I want to match the columns of A with the positions of row_names_B, such that I can return a list of ALL positions in B for each row in A.
#my results:
v1_index v2_index
2 5 #matches X1 in pos 2, X3 in pos 5
2 6 #matches X1 in pos 2, X3 in pos 6
7 5 #matches X1 in pos 7, X3 in pos 5
7 6 #matches X1 in pos 7, X3 in pos 6
2 5 #matches X1 in pos 2, X3 in pos 8
7 5 #matches X1 in pos 7, X3 in pos 8
...
Note that I want to find all possible solutions.
I understand that this should be with some variant of match or which as given in this example, but I'm not sure how to do the explosion for each of the matches. The way I see it is by running it through for loops, row by row, but perhaps there is a better way to do this?
You could create a list of position based on their name and randomly assign one value in the dataframe A from the list of positions.
C <- A
ref <- split(seq_along(row_names_B), row_names_B)
C[] <- lapply(A, function(y) sapply(ref[y],
function(x) if(length(x) == 1) x else sample(x, 1)))
C
# v1 v2
#1 2 5
#2 2 8
#3 7 4
#4 3 5
#5 3 9
data
A <- structure(list(v1 = c("X1", "X1", "X1", "X2", "X2"), v2 = c("X3",
"X5", "X15", "X3", "X4")), class = "data.frame", row.names = c(NA, -5L))
row_names_B <- c("X17", "X1", "X2", "X15", "X3", "X3", "X1", "X5", "X4")

Merging all the colunms in R with colunm names

I have the following data
> df
X1 X2 X3
1 3 4
1 0 0
1 1 0
and I want to merge all the column so that the final output will be
new colName
1 X1
1 X1
1 X1
3 X2
0 X2
1 X2
4 X3
0 X3
0 X3
You can try stack
> setNames(stack(df),c("new","colName"))
new colName
1 1 X1
2 1 X1
3 1 X1
4 3 X2
5 0 X2
6 1 X2
7 4 X3
8 0 X3
9 0 X3
Data
> dput(df)
structure(list(X1 = c(1L, 1L, 1L), X2 = c(3L, 0L, 1L), X3 = c(4L,
0L, 0L)), class = "data.frame", row.names = c(NA, -3L))
library (tidyverse)
pivot_longer(df,X1:X3)
You can try gathering the column names with tidyr
library(tidyr)
X1 <- c(1,1,1)
X2 <- c(3,0,1)
X3 <- c(4,0,0)
df <- data.frame(X1, X2, X3)
df <- df %>%
gather(new, colname, X1, X2, X3)
print(df)
new colname
1 X1 1
2 X1 1
3 X1 1
4 X2 3
5 X2 0
6 X2 1
7 X3 4
8 X3 0
9 X3 0

Paste 2 data frames side by side without any key

I have two data frames
A B E H
x1 x2 x3 x6
x1 x2 x4 x7
x1 x2 x5 x8
and
A B
y1 y2
y1 y2
and this is what i would like to achieve with dplyr or reshape2
A B E H A B
x1 x2 x3 x6 y1 y2
x1 x2 x4 x7 y1 y2
x1 x2 x5 x8
Thanks
If the number of rows are same use
cbind(df1, df2)
# A B E H A B
#1 x1 x2 x3 x6 y1 y2
#2 x1 x2 x4 x7 y1 y2
#3 x1 x2 x5 x8 y1 y2
Or in dplyr
library(dplyr)
library(stringr)
df2 %>%
rename_all(~ str_c(., ".1")) %>%
bind_cols(df1, .)
In some versions of dplyr (0.8.5), it would rename correctly when there are duplicate column names
bind_cols(df1, df2)
NOTE: It is not recommended to have same column names in data.frame so we could change the column names with make.unique
If we have two datasets with unequal number of rows
library(rowr)
cbind.fill(df1, df2new, fill = NA)
# A B E H A B
#1 x1 x2 x3 x6 y1 y2
#2 x1 x2 x4 x7 y1 y2
#3 x1 x2 x5 x8 <NA> <NA>
Or with base R
mxn <- max(nrow(df1), nrow(df2new))
df2new[(nrow(df2new)+1):mxn,] <- NA
cbind(df1, df2new)
# A B E H A B
#1 x1 x2 x3 x6 y1 y2
#2 x1 x2 x4 x7 y1 y2
#3 x1 x2 x5 x8 <NA> <NA>
data
df1 <- structure(list(A = c("x1", "x1", "x1"), B = c("x2", "x2", "x2"
), E = c("x3", "x4", "x5"), H = c("x6", "x7", "x8")),
class = "data.frame", row.names = c(NA,
-3L))
df2 <- structure(list(A = c("y1", "y1", "y1"), B = c("y2", "y2", "y2"
)), class = "data.frame", row.names = c(NA, -3L))
df2new <- structure(list(A = c("y1", "y1"), B = c("y2", "y2")), class = "data.frame", row.names = c(NA,
-2L))

Subset data based on another data in R

I have two data sets dat1 and dat2, that look like:
a<-c(rep(1,5), rep(2,3), rep(1,2), rep(2,4), rep(1,2))
b<-c(rep("AA", 8), rep("BB", 6), rep("CC", 2))
v<-c("x1", "x2", "x3", "x4", "x5", "x6", "x7", "x8",
"x4", "x5", "x6", "x7", "x8", "x9", "x5", "x8")
ab<-c(1,2,5,6,58,2,4,14,2,25,23,1,12,14,15,14)
dat1<-data.frame(a,b,v,ab)
names(dat1)<-c("loc", "point", "sp", "ab")
a<-c(rep(1,8), rep(2,4), rep(3, 2), rep(1,4))
b<-c(rep("AA", 8), rep("BB", 6), rep("DD", 4))
v<-c("y1", "y2", "y3", "y4", "y6", "y7", "y8", "y12",
"y1", "y2", "y3", "y4", "y5", "y6", "y1", "y2", "y3", "y6")
ab<-c(1,2,45,14,1,12,14,15,10,2,32,14,1,12,18,9,6,7)
dat2<-data.frame(a,b,v,ab)
names(dat2)<-c("loc", "point", "sp", "ab")
and I need to make subsets of these dataframes, where each subset contains only combinations of loc and point which are in dat1 and dat2.
My result should look like:
res1
loc point sp ab
1 1 AA x1 1
2 1 AA x2 2
3 1 AA x3 5
4 1 AA x4 6
5 1 AA x5 58
11 2 BB x6 23
12 2 BB x7 1
13 2 BB x8 12
14 2 BB x9 14
res2
loc point sp ab
1 1 AA y1 1
2 1 AA y2 2
3 1 AA y3 45
4 1 AA y4 14
5 1 AA y6 1
6 1 AA y7 12
7 1 AA y8 14
8 1 AA y12 15
9 2 BB y1 10
10 2 BB y2 2
11 2 BB y3 32
12 2 BB y4 14
I have tried merge() and than divide the result in two dataframes, but there are not same number of rows, so the rows of smaller data multiplied to fill the gaps. My tries with subset() also failed.
This is simialar to Subset a data frame based on another but I havent succeed even when I triend their solutions (ie. intersect).
Thx for help!
IMHO you can try:
merge(dat1, unique(dat2[,1:2]))
merge(dat2, unique(dat1[,1:2]))
semi_join in the dplyr package is designed for this:
library(dplyr)
# get just the rows in dat1 that have matches in dat2
dat1 %>% semi_join(dat2, by=c('loc', 'point'))

Resources