How to look up values between dates in r - r

The table below is a reference table. Column a (far left column) represents start dates. Column b (middle column) represents end dates. Column d (far right column) represents a "unique value" that corresponds to each of the time periods on the left.
a b d
1/1/07 1/1/08 a
1/1/08 1/1/09 b
1/1/09 1/1/10 c
1/1/10 1/1/11 d
1/1/11 1/1/12 e
Using the table above I have a list of dates (shown below). I would like to populate the "unique values" that correspond with the dates below. if the date below falls between two of the dates in the reference table above, the "unique value" is identified and populated below. Column e is the input. Column f is the output
e f
2/2/09 c
8/8/07 a
8/7/10 d
1/1/11 e
I am able to do the calculation in excel using vlookups, min and the array function. But I have no clue as to how to do it in r.
I tried using the merge function but it seems to require an exact match. I also tried the following code without success
Ifelse ( e >= x$a & e < x$b, d, "")
x is the name of the dataframe with columns a,b,d. FYI the dates were formatted for use in r and converted to numeric.
Thank you

Using sqldf package:
library(sqldf)
#reference data
df1 <- read.table(text="
a b d
1/1/07 1/1/08 a
1/1/08 1/1/09 b
1/1/09 1/1/10 c
1/1/10 1/1/11 d
1/1/11 1/1/12 e", header=TRUE, as.is=TRUE)
#data
df2 <- read.table(text="
e
2/2/09
8/8/07
8/7/10
1/1/11", header=TRUE, as.is=TRUE)
#convert to numeric
df1$a <- as.numeric(as.Date(df1$a,format="%d/%m/%y"))
df1$b <- as.numeric(as.Date(df1$b,format="%d/%m/%y"))
df2$e <- as.numeric(as.Date(df2$e,format="%d/%m/%y"))
#data
df1
# a b d
# 1 13514 13879 a
# 2 13879 14245 b
# 3 14245 14610 c
# 4 14610 14975 d
# 5 14975 15340 e
df2
# e
# 1 14277
# 2 13733
# 3 14798
# 4 14975
#output
sqldf("select e,d
from df1, df2
where df2.e >= df1.a and df2.e < df1.b")
# e d
# 1 13733 a
# 2 14277 c
# 3 14798 d
# 4 14975 e

Here is an answer with looping (as the guys pointed out you should get this part right first) hence I used loops for this example. Here I generated dates in months d1 and d2 and the corresponding dates you're interested in by weeks as e. Then created some random numbers in f and checked which ones fit the critera.
d1 <- seq(from=as.Date('2013-01-01'), to=as.Date('2013-11-12'), by='months')
d2 <- seq(from=as.Date('2013-02-01'), to=as.Date('2013-12-12'), by='months')
e <- seq(from=as.Date('2013-01-01'), to=as.Date('2013-12-13'), by='weeks')
f <- runif(length(e), 1, 10)
output <- NULL
i <- 1
j <- 1
while (i <= length(e) & j <= length(d1))
{
if (e[i] >= d1[j] & e[i] <= d2[j])
{
output[i] <- f[i]
i <- i + 1
}
else
{
j <- j + 1
}
}
output

Related

How to use a vector for creating a logical expression for subsetting a data frame?

I am trying to use a vector of logical expressions to subset a data frame. I have a data frame I want to subset based on several columns where I want to exclude "B" each time. First I want do define a vector for logical expressions based on data frame column names.
set.seed(42)
n <- 24
dataframe <- data.frame(column1=as.character(factor(paste("obs",1:n))),
rand1=rep(LETTERS[1:4], n/4),
rand2=rep(LETTERS[1:6], n/6),
rand3=rep(LETTERS[1:3], n/3),
x=rnorm(n))
columns <- colnames(dataframe)[2:4]
criteria <- quote(rep(paste0(columns[1:3], " != ", quote("B")), length(columns)))
What I want to achieve is a vector criteria containing
rand1 != "B" rand2 != "B" rand3 != "B" so I can use it to subset data frame based on columns like
dfs1 <- subset(dataframe, criteria[1])
dfs2 <- subset(dataframe, criteria[2])
dfs3 <- subset(dataframe, criteria[3])
I might be misunderstanding your question, but it seems like you want a collection of data.frames where each one excludes rows where a given column = 'B'.
Assuming this is what you want:
cols <- c('rand1', 'rand2', 'rand3')
result <- lapply(dataframe[, cols], function(x) dataframe[x!='B',])
will create a list of data.frames, each of which has the result of excluding rows where the indicated column == 'B'.
Based on Using tidy eval for multiple, arbitrary filter conditions
filter_fun <- function(df, cols, conds){
fp <- map2(cols, conds, function(x, y) quo((!!(as.name(x))) != !!y))
filter(df, !!!fp)
}
filter_col <- columns[1:3] %>% as.list()
cond_list <- rep(list("B"), length(columns[1:3]))
filter_fun(dataframe, cols = filter_col,
conds = cond_list)
column1 rand1 rand2 rand3 x
1 obs 1 A A A 1.3709584
2 obs 3 C C C 0.3631284
3 obs 4 D D A 0.6328626
4 obs 7 C A A 1.5115220
5 obs 9 A C C 2.0184237
6 obs 12 D F C 2.2866454
7 obs 13 A A A -1.3888607
8 obs 15 C C C -0.1333213
9 obs 16 D D A 0.6359504
10 obs 19 C A A -2.4404669
11 obs 21 A C C -0.3066386
12 obs 24 D F C 1.2146747

R - Merging and aligning two CSVs using common values in multiple columns

I currently have two .csv files that look like this:
File 1:
Attempt
Result
Intervention 1
B
Intervention 2
H
and File 2:
Name
Outcome 1
Outcome 2
Outcome 3
Sample 1
A
B
C
Sample 2
D
E
F
Sample 3
G
H
I
I would like to merge and align the two .csvs such that the result each row of File 1 is aligned by its "result" cell, against any of the three "outcome" columns in File 2, leaving blanks or "NA"s if there are no similarities.
Ideally, would look like this:
Attempt
Result
Name
Outcome 1
Outcome 2
Outcome 3
Intervention 1
B
Sample 1
A
B
C
Sample 2
D
E
F
Intervention 2
H
Sample 3
G
H
I
I've looked and only found answers when merging two .csv files with one common column. Any help would be very appreciated.
I will assume that " Result " in File 1 is unique, since more File 1 rows with same result value (i.e "B") will force us to consider new columns in the final data frame.
By this way,
Attempt <- c("Intervention 1","Intervention 2")
Result <- c("B","H")
df1 <- as.data.frame(cbind(Attempt,Result))
one <- c("Sample 1","A","B","C")
two <- c("Sample 2","D","E","F")
three <- c("Sample 3","G","H","I")
df2 <- as.data.frame(rbind(one,two,three))
row.names(df2) <- 1:3
colnames(df2) <- c("Name","Outcome 1","Outcome 2","Outcome 3")
vec_at <- rep(NA,nrow(df2));vec_res <- rep(NA,nrow(df2)); # Define NA vectors
for (j in 1:nrow(df2)){
a <- which(is.element(df1$Result,df2[j,2:4])==TRUE) # Row names which satisfy same element in two dataframes?
if (length(a>=1)){ # Don't forget that "a" may not be a valid index if no element satify the condition
vec_at[j] <- df1$Attempt[a] #just create a vector with wanted information
vec_res[j] <- df1$Result[a]
}
}
desired_df <- as.data.frame(cbind(vec_at,vec_res,df2)) # define your wanted data frame
Output:
vec_at vec_res Name Outcome 1 Outcome 2 Outcome 3
1 Intervention 1 B Sample 1 A B C
2 <NA> <NA> Sample 2 D E F
3 Intervention 2 H Sample 3 G H I
I wonder if you could use fuzzyjoin for something like this.
Here, you can provide a custom function for matching between the two data.frames.
library(fuzzyjoin)
fuzzy_left_join(
df2,
df1,
match_fun = NULL,
multi_by = list(x = paste0("Outcome_", 1:3), y = "Result"),
multi_match_fun = function(x, y) {
y == x[, "Outcome_1"] | y == x[, "Outcome_2"] | y == x[, "Outcome_3"]
}
)
Output
Name Outcome_1 Outcome_2 Outcome_3 Attempt Result
1 Sample_1 A B C Intervention_1 B
2 Sample_2 D E F <NA> <NA>
3 Sample_3 G H I Intervention_2 H

Dynamic column rename based on a separate data frame in R

Generate df1 and df2 like this
pro <- c("Hide-Away", "Hide-Away")
sourceName <- c("New Rate2", "FST")
standardName <- c("New Rate", "SFT")
df1 <- data.frame(pro, sourceName, standardName, stringsAsFactors = F)
A <- 1; B <- 2; C <-3; D <- 4; G <- 5; H <- 6; E <-7; FST <-8; Z <-8
df2<- data.frame(A,B,C,D,G,H,E,FST)
colnames(df2)[1]<- "New Rate2"
Then run this code.
df1 <- df1[,c(2,3)]
index<-which(colnames(df2) %in% df1[,1])
index2<-which(df1[,1] %in% colnames(df2) )
colnames(df2)[index] <- df1[index2,2]
The input of DF2 will be like
New Rate2 B C D G H E FST
1 2 3 4 5 6 7 8
The output of DF2 will be like
New Rate B C D G H E SFT
1 2 3 4 5 6 7 8
So clearly the code worked and swapped the names correctly. But now create df2 with the below code instead. And make sure to regenrate df1 to what it was before.
df2<- data.frame(FST,B,C,D,G,H,E,Z)
colnames(df2)[8]<- "New Rate2"
and then run
df1 <- df1[,c(2,3)]
index<-which(colnames(df2) %in% df1[,1])
index2<-which(df1[,1] %in% colnames(df2) )
colnames(df2)[index] <- df1[index2,2]
The input of df2 will be
FST B C D G H E New Rate2
8 2 3 4 5 6 7 8
The output of df2 will be
New Rate B C D G H E SFT
8 2 3 4 5 6 7 8
So the order of the columns has not been preserved. I know this is because of the %in code but I am not sure of an easy fix to make the column swapping more dynamic.
I am not totally sure about the question, as it seems a little vague. I'll try my best though--the best way I know to dynamically set column names is setnames from the data.table package. So let's say that I have a set of source names and a set of standard names, and I want to swap the source for the standard (which I take to be the question).
Given the data above, I have a data.frame structured like so:
> df2
A B C D G H E FST
1 1 2 3 4 5 6 7 8
as well as two vectors, sourceName and standardName.
sourceName <- c("A", "FST")
standardName <- c("New A", "FST 2: Electric Boogaloo")
I want to dynamically swap sourceName for standardName, and I can do this with setnames like so:
df3 <- as.data.table(df2)
setnames(df3, sourceName, standardName)
> df3
New A B C D G H E FST 2: Electric Boogaloo
1: 1 2 3 4 5 6 7 8
Trying to follow your example, in your second pass I get an index value of 0,
> df2
New Rate B C D G H E SFT
1 8 2 3 4 5 6 7 8
> df1
sourceName standardName
1 New Rate2 New Rate
2 FST SFT
> index<-which(colnames(df2) %in% df1[,1])
> index
integer(0)
which would account for your expected ordering on assignment to column names.

Sum the values of a 2 dimensional table according to labels in R

Coming from Sum the values according to labels in R.
I've been notified that working with 2 dimensional tables is rather significantly different with 1 dimensional ones, like:
a a,b a,b,c c
d 5 2 1 2
d,e 2 1 1 1
And we want to achieve:
a b c
d 12 5 5
e 4 2 2
So how can this be achieved using R?
A little bit convoluted, but it should work :
m <- as.matrix(data.frame('a'=c(5,2),'a,b'=c(2,1),
'a,b,c'=c(1:1),'c'=c(2,1),
check.names = FALSE,row.names=c('d','d,e')))
colNamesSplits <- strsplit(colnames(m),',')
rowNamesSplits <- strsplit(rownames(m),',')
colNms <- unique(unlist(colNamesSplits))
rowNms <- unique(unlist(rowNamesSplits))
colIdxs <- unlist(sapply(1:length(colNamesSplits),
function(i) rep.int(i,length(colNamesSplits[[i]]))))
rowIdxs <- unlist(sapply(1:length(rowNamesSplits),
function(i) rep.int(i,length(rowNamesSplits[[i]]))))
colIdxsMapped <- unlist(sapply(colNamesSplits, function(n) match(n,colNms)))
rowIdxsMapped <- unlist(sapply(rowNamesSplits, function(n) match(n,rowNms)))
# let's create the fully expanded matrix
expanded <- as.matrix(m[rowIdxs,colIdxs])
rownames(expanded) <- rowNms[rowIdxsMapped]
colnames(expanded) <- colNms[colIdxsMapped]
# aggregate expanded by cols :
expanded <- do.call(cbind,lapply(split(1:ncol(expanded),colnames(expanded)),
function(ii) rowSums(expanded[,ii,drop=FALSE])))
# aggregate expanded by rows :
expanded <- do.call(rbind,lapply(split(1:nrow(expanded),rownames(expanded)),
function(ii) colSums(expanded[ii,,drop=FALSE])))
> expanded
a b c
d 12 5 5
e 4 2 2

Applying a function to get comma separated string using multiple columns in DataFrame and create a third column

I am trying to sort names in a row and create a comma separated string which would create another column.
This being my sample data.frame .
df=data.frame(A=c("A","K","B","D","F"),B =c("E","C","D","A","K"))
A B
1 A E
2 K C
3 B D
4 D A
5 F K
The Output I am trying to get would be like this
A B C
1 A E A , E
2 K C C , K
3 B D B , D
4 D A A , D
5 F K F , K
So far I have tried this :
lapply(df,FUN=paste(sort(df$A,df$B),collapse=" , "))
mapply(FUN= function(x,y)paste(sort(x,y),collapse=" , "),df$A,df$B)
Here I am trying to sort column values and paste them using ',' to create a unique pair name.
Any help is appreciated.
If you only have 2 columns, you can use pmax and pmin to avoid any costly looping code. E.g.:
with(lapply(df, as.character), paste(pmin(A,B),pmax(A,B),sep=",") )
#[1] "A,E" "C,K" "B,D" "A,D" "F,K"
You can do it with mapply, but since your data are factors, you need to coerce to character to they sort properly:
df$C <- mapply(function(x, y){paste(sort(c(as.character(x), as.character(y))),
collapse = ',')}, df$A, df$B)
df
# A B C
# 1 A E A,E
# 2 K C C,K
# 3 B D B,D
# 4 D A A,D
# 5 F K F,K
To simplify a bit, you can just use apply to iterate over the rows:
apply(df, 1, function(x){paste(sort(x), collapse = ',')})
Since it treats df as a matrix, it converts everything to character, which happens to be what you want for the sample data.
Also see tidyr::unite for pasting two columns together, though it can't easily sort.
Try this
> for( i in 1:nrow(df)){
+ df$C[i]<-paste0(as.character(unlist(sort(df[i,1:2]))),collapse=" , ")
+ }
> df
> df
A B C
1 A E A , E
2 K C C , K
3 B D B , D
4 D A A , D
5 F K F , K

Resources