Need help concatenating column names - r

I am generating 5 different prediction and adding those predictions to an existing data frame. My code is:
For j in i{
…
actual.predicted <- data.frame(test_data, predicted)
}
I am trying to concatenate words together to create new column names, in the loop. Specifically, I have a column named “predicted” and I am generating predictions in each iteration of the loop. So, in the first iteration, I want the new column name to be “predicted.1” and for the second iteration, the new column name should be “predicted.2” and so on.
Any thoughts would be greatly appreciated.

You may not even need to use a loop here, but assuming you do, one pattern which might work well here would be to use a list:
results <- list()
for j in i {
# do something involving j
name <- paste0("predicted.", j)
results[[name]] <- data.frame(test_data, predicted)
}

One option is to set the names after assigning new columns
actual.predicted <- data.frame(orig_col = sample(10))
for (j in 1:5){
new_col = sample(10)
actual.predicted <- cbind(actual.predicted, new_col)
names(actual.predicted)[length(actual.predicted)] <- paste0('predicted.',j)
}
actual.predicted
# orig_col predicted.1 predicted.2 predicted.3 predicted.4 predicted.5
# 1 1 4 4 9 1 5
# 2 10 2 3 7 5 9
# 3 8 6 5 4 2 3
# 4 5 9 9 10 7 7
# 5 2 1 10 8 3 10
# 6 9 7 6 6 8 6
# 7 7 8 7 2 4 2
# 8 3 3 1 1 6 8
# 9 6 10 2 3 9 4
# 10 4 5 8 5 10 1

Related

Placing multiple outputs from each function call using apply into a row in a dataframe in R

I have a function that I repeat, changing the argument each time, using apply/sapply/lapply.
Works great.
I want to return a data set, where each row contains two (or more) variables from each iteration of the function.
Instead I get an unusable list.
do <-function(x){
a <- x+1
b <- x+2
cbind(a,b)
}
over <- [1:6]
final <- lapply(over, do)
Any suggestions?
Without changing your function do, you can use sapply and transpose it.
data.frame(t(sapply(over, do)))
# X1 X2
#1 2 3
#2 3 4
#3 4 5
#4 5 6
#5 6 7
#6 7 8
If you want to use do in current form with lapply, we can do
do.call(rbind.data.frame, lapply(over, do))
You could also try
as.data.frame(Reduce(rbind, final))
# a b
# 1 2 3
# 2 3 4
# 3 4 5
# 4 5 6
# 5 6 7
# 6 7 8
See ?Reduce and ?rbind for information about what they'll do.
You could also modify your final expression as
final <- as.data.frame(Reduce(rbind, lapply(over, do)))
#final
# a b
# 1 2 3
# 2 3 4
# 3 4 5
# 4 5 6
# 5 6 7
# 6 7 8

How do I add observations to an existing data frame column?

I have a data frame. Let's say it looks like this:
Input data set
I have simulated some values and put them into a vector c(4,5,8,8). I want to add these simulated values to columns a, b and c.
I have tried rbind or inserting the vector into the existing data frame, but that replaced the existing values with the simulated ones, instead of adding the simulated values below the existing ones.
x <- data.frame("a" = c(2,3,1), "b" = c(5,1,2), "c" = c(6,4,7))
y <- c(4,5,8,8)
This is the output I expect to see:
Output
Help would be greatly appreciated. Thank you.
Can do:
as.data.frame(sapply(x,
function(z)
append(z,y)))
a b c
1 2 5 6
2 3 1 4
3 1 2 7
4 4 4 4
5 5 5 5
6 8 8 8
7 8 8 8
An option is assignment
n <- nrow(x)
x[n + seq_along(y), ] <- y
x
# a b c
#1 2 5 6
#2 3 1 4
#3 1 2 7
#4 4 4 4
#5 5 5 5
#6 8 8 8
#7 8 8 8
Another option is replicate the 'y' and rbind
rbind(x, `colnames<-`(replicate(ncol(x), y), names(x)))
x[(nrow(x)+1):(nrow(x)+length(y)),] <- y

Build a data frame with overlapping observations

Lets say I have a data frame with the following structure:
> DF <- data.frame(x=1:5, y=6:10)
> DF
x y
1 1 6
2 2 7
3 3 8
4 4 9
5 5 10
I need to build a new data frame with overlapping observations from the first data frame to be used as an input for building the A matrix for the Rglpk optimization library. I would use n-length observation windows, so that if n=2 the resulting data frame would join rows 1&2, 2&3, 3&4, and so on. The length of the resulting data frame would be
(numberOfObservations-windowSize+1)*windowSize
The result for this example with windowSize=2 would be a structure like
x y
1 1 6
2 2 7
3 2 7
4 3 8
5 3 8
6 4 9
7 4 9
8 5 10
I could do a loop like
DFResult <- NULL
numBlocks <- nrow(DF)-windowSize+1
for (i in 1:numBlocks) {
DFResult <- rbind(DFResult, DF[i:(i+horizon-1), ])
}
But this seems vey inefficient, especially for very large data frames.
I also tried
rollapply(data=DF, width=windowSize, FUN=function(x) x, by.column=FALSE, by=1)
x y
[1,] 1 6
[2,] 2 7
[3,] 2 7
[4,] 3 8
where I was trying to repeat a block of rows without applying any aggregate function. This does not work since I am missing some rows
I am a bit stumped by this and have looked around for similar problems but could not find any. Does anyone have any better ideas?
We could do a vectorized approach
i1 <- seq_len(nrow(DF))
res <- DF[c(rbind(i1[-length(i1)], i1[-1])),]
row.names(res) <- NULL
res
# x y
#1 1 6
#2 2 7
#3 2 7
#4 3 8
#5 3 8
#6 4 9
#7 4 9
#8 5 10

R Merge tables with one identifier, and other columns with same name will add up

I am currently have multiple tables need to merge. For example, I have tbl_1, tbl_2, and tbl_3. And I want to reach the final result as result table.
tbl_1:
ID trx_1 Cre_counts Deb_counts
1 10 9 8
2 5 6 5
3 10 4 3
tbl_2:
ID trx_2 Unk_counts Deb_counts
1 10 1 2
2 5 6 5
3 10 3 7
tbl_3:
ID trx_3 Unk_counts Ckc_counts
1 3 4 4
2 2 4 3
3 8 7 6
result:
ID trx_1 tx_2 trx_3 Cre_counts Deb_counts Unk_counts Ckc_counts
1 10 10 3 9 10 5 4
2 5 5 2 6 10 10 3
3 10 10 8 4 10 10 6
I have tries merge three tables by "ID", but the column name will change to Deb_counts.x, Deb_counts.y... I can use transform(), rowSums() to take some extra step to make it work. But I am wondering is there a easier way to do it? Thank you!
Maybe not the most elegant but here is a way:
First, you need to put your tables into a list:
l_tbl <- mget(ls(pattern="^tbl"))
Then you go through the list, working with 2 tables at a time, thanks to Reduce, first adding the common columns, then merging:
Reduce(function(x, y) {
col_com <- setdiff(intersect(names(x), names(y)), "ID")
if(length(col_com)) {
x[, col_com] <- x[, col_com] + y[, col_com]
y <- y[, !(names(y) %in% col_com)] # you only keep the "not common" columns in the second table
}
return(merge(x, y, by="ID"))
}, l_tbl)
ID trx_1 Cre_counts trx_3 Ckc_counts trx_2 Deb_counts Unk_counts
1 1 10 9 3 4 10 10 5
2 2 5 6 2 3 5 10 10
3 3 10 4 8 6 10 10 10

R Looking up closest value in data.frame less than equal to another value

I have two data.frames, lookup_df and values_df. For each row in lookup_df I want to lookup the closest value in the values_df that is less than or equal to an index value.
Here's my code so far:
lookup_df <- data.frame(ids = 1:10)
values_df <- data.frame(idx = c(1,3,7), values = c(6,2,8))
What I'm wanting for the result_df is the following:
> result_df
ids values
1 1 6
2 2 6
3 3 2
4 4 2
5 5 2
6 6 2
7 7 8
8 8 8
9 9 8
10 10 8
I know how to do this with SQL fairly easily but I'm curious if there is an R way that is straightforward. I could iterate the the rows of the lookup_df and then loop through the rows of the values_df but that is not computationally efficient. I'm open to using dplyr library if someone knows how to use that to solve the problem.
If values_df is sorted by idx ascending, then findInterval will work:
lookup_df <- data.frame(ids = 1:10)
values_df <- data.frame(idx = c(1,3,7), values = c(6,2,8))
lookup_df$values <- values_df$values[findInterval(lookup_df$ids,values_df$idx)]
lookup_df
> ids values
1 1 6
2 2 6
3 3 2
4 4 2
5 5 2
6 6 2
7 7 8
8 8 8
9 9 8
10 10 8

Resources