Copying multiple columns from one data.frame to another - r

Is there a simple way to make R automatically copy columns from a data.frame to another?
I have something like:
>DF1 <- data.frame(a=1:3, b=4:6)
>DF2 <- data.frame(c=-2:0, d=3:1)
and I want to get something like
>DF1
a b c d
1 -2 4 -2 3
2 -1 5 -1 2
3 0 6 0 1
I'd normally do it by hand, as in
DF1$c <- DF2$c
DF1$d <- DF2$d
and that's fine as long as I have few variables, but it becomes very time consuming and prone to error when dealing with several variables. Any idea on how to do this efficiently? It's probably quite simple but I swear I wasn't able to find an answer googling, thank you!

The result from your example is not correct, it should be:
> DF1$c <- DF2$c
> DF1$d <- DF2$d
> DF1
a b c d
1 1 4 -2 3
2 2 5 -1 2
3 3 6 0 1
Then cbind does exactly the same:
> cbind(DF1, DF2)
a b c d
1 1 4 -2 3
2 2 5 -1 2
3 3 6 0 1

(I was going to add this as a comment to Jilber's now deleted and then undeleted post.) Might be safer to recommend something like
DF1 <- cbind(DF1, DF2[!names(DF2) %in% names(DF1)])

Related

Rolling cumulative product based on elseif condition

I am looking for a way of doing a rolling product in an ifelse statement that is based on an additional column?
My data looks like this
A B C
1 1 1
2 3 1
3 5 0
4 7 0
The excel formula equivalent would be
C3 = IF(B3=0,(1+A3/10)*C2,1)
I tried using
ifelse(B==0,cumprod(c(1,(A[-1]/10+1))),1)
I couldn't get it working for this case as it is always referring to just the data in column A.
I would expect the following results
A B C
1 1 1 1
2 3 1 1
3 5 0 1.5
4 7 0 2.55
thanks in advance
Try this:
df$C <- cumprod(with(df, ifelse(B==0, A/10+1, 1)))
Or using Reduce:
df$C <- Reduce('*', with(df, ifelse(B==0, A/10+1, 1)), accumulate = T)

R - Using for loop to conditionally change values in a dataframe

All of the variables are on the same scale in the data.frame 1-5.
Example of data.frame
rpi_invert
A B C D
5 2 4 1
3 5 5 2
1 1 3 4
For all values that equal 5 I would like to change it to 1.
for 4 change to 2.
for 2 change to 4.
for 1 change to 5.
Example of data.frame after values have been changed.
rpi_invert
A B C D
1 4 2 5
3 1 1 4
5 5 3 2
What I have tired.
for(b in colnames(rpi_invert)){
rpi_invert[[b]][rpi_invert[[b]] == 5] <- 1
rpi_invert[[b]][rpi_invert[[b]] == 4] <- 2
rpi_invert[[b]][rpi_invert[[b]] == 2] <- 4
rpi_invert[[b]][rpi_invert[[b]] == 1] <- 5
}
This will only change the values in the first row and not the second column.
for(b in colnames(rpi_invert)){
rpi_invert <- ifelse(rpi_invert[[b]] == 5,1,
ifelse(rpi_invert[[b]] == 4,2,
ifelse(rpi_invert[[b]] == 2,4,
ifelse(rpi_invert[[b]] == 1,5,rpi_invert[[b]]))))
}
But this gives me the error:
Error in rpi_invert[[b]] : subscript out of bounds
If I try to the same methods for an individual column instead of looping through the data.frame then both methods work so I am not sure what is the problem.
I am sure what I am trying to do can be done more efficiently without a for loop probably with some type of apply function but I am not sure how.
Any help will be appreciated please let me know if further information is needed.
You can try (if your data.frame is df):
3-(df-3)
# A B C D
#1 1 4 2 5
#2 3 1 1 4
#3 5 5 3 2
or, same but written a bit differently: 6-df

Working with long data format in R

Good day,
d <- c(1,1,1,2,2,2,3,3,3)
e <- c(5,6,7,5,6,7,5,6,7)
f <- c(0,0,1,0,1,0,0,0,1)
df <- data.frame(d,e,f)
I have data the looks like the above. What I need to do is for each unique element of d find the first non-zero value in f, and find the corresponding value in e. To be specific, I want another vector g so it looks like this:
d <- c(1,1,1,2,2,2,3,3,3)
e <- c(5,6,7,5,6,7,5,6,7)
f <- c(0,0,1,0,1,0,0,0,1)
g <- c(7,7,7,6,6,6,7,7,7)
df <- data.frame(d,e,f,g)
Suggestions to do this easily? I thought I could use split(), but I am having trouble using which() after the split. I can use ave like this:
foo <- function(x){which(x>0)[1]}
df$t <- ave(df$f,df$d,FUN=foo)
But I am having trouble finding the value of e. Any help is appreciated.
Someone else can provide a base R solution, but here's a way to do this using plyr:
> ddply(df,.(d),transform,g = head(e[f != 0],1))
d e f g
1 1 5 0 7
2 1 6 0 7
3 1 7 1 7
4 2 5 0 6
5 2 6 1 6
6 2 7 0 6
7 3 5 0 7
8 3 6 0 7
9 3 7 1 7
Note that I took your note about the "first nonzero element" literally, even though your example data only had a single unique nonzero element in the column (by group).
here's a way in base R
g <- inverse.rle(list(lengths=rle(d)$lengths, values=e[f != 0]))

Data frame "expand" procedure in R?

This is not a real statistical question, but rather a data preparation question before performing the actual statistical analysis. I have a data frame which consists of sparse data. I would like to "expand" this data to include zeroes for missing values, group by group.
Here is an example of the data (a and b are two factors defining the group, t is the sparse timestamp and xis the value):
test <- data.frame(
a=c(1,1,1,1,1,1,1,1,1,1,1),
b=c(1,1,1,1,1,2,2,2,2,2,2),
t=c(0,2,3,4,7,3,4,6,7,8,9),
x=c(1,2,1,2,2,1,1,2,1,1,3))
Assuming I would like to expand the values between t=0 and t=9, this is the result I'm hoping for:
test.expanded <- data.frame(
a=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1),
b=c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2),
t=c(0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9),
x=c(1,0,2,1,2,0,0,2,0,0,0,0,0,1,1,0,2,1,1,3))
Zeroes have been inserted for all missing values of t. This makes it easier to use.
I have a quick and dirty implementation which sorts the dataframe and loops through each of its lines, adding missing lines one at a time. But I'm not entirely satisfied by the solution. Is there a better way to do it?
For those who are familiar with SAS, it is similar to the proc expand.
Thanks!
As you noted in a comment to the other answer, doing it by group is easy with plyr which just leaves how to "fill in" the data sets. My approach is to use merge.
library("plyr")
test.expanded <- ddply(test, c("a","b"), function(DF) {
DF <- merge(data.frame(t=0:9), DF[,c("t","x")], all.x=TRUE)
DF[is.na(DF$x),"x"] <- 0
DF
})
merge with all.x=TRUE will make the missing values NA, so the second line of the function is needed to replace those NAs with 0's.
This is convoluted but works fine:
test <- data.frame(
a=c(1,1,1,1,1,1,1,1,1,1,1),
b=c(1,1,1,1,1,2,2,2,2,2,2),
t=c(0,2,3,4,7,3,4,6,7,8,9),
x=c(1,2,1,2,2,1,1,2,1,1,3))
my.seq <- seq(0,9)
not.t <- !(my.seq %in% test$t)
test[nrow(test)+seq(length(my.seq[not.t])),"t"] <- my.seq[not.t]
test
#------------
a b t x
1 1 1 0 1
2 1 1 2 2
3 1 1 3 1
4 1 1 4 2
5 1 1 7 2
6 1 2 3 1
7 1 2 4 1
8 1 2 6 2
9 1 2 7 1
10 1 2 8 1
11 1 2 9 3
12 NA NA 1 NA
13 NA NA 5 NA
Not sure if you want it sorted by t afterwards or not. If so, easy enough to do:
https://stackoverflow.com/a/6871968/636656

Create a vector listing run length of original vector with same length as original vector

This problem seems trivial but I'm at my wits end after hours of reading.
I need to generate a vector of the same length as the input vector that lists for each value of the input vector the total count for that value. So, by way of example, I would want to generate the last column of this dataframe:
> df
customer.id transaction.count total.transactions
1 1 1 4
2 1 2 4
3 1 3 4
4 1 4 4
5 2 1 2
6 2 2 2
7 3 1 3
8 3 2 3
9 3 3 3
10 4 1 1
I realise this could be done two ways, either by using run lengths of the first column, or grouping the second column using the first and applying a maximum.
I've tried both tapply:
> tapply(df$transaction.count, df$customer.id, max)
And rle:
> rle(df$customer.id)
But both return a vector of shorter length than the original:
[1] 4 2 3 1
Any help gratefully accepted!
You can do it without creating transaction counter with:
df$total.transactions <- with( df,
ave( transaction.count , customer.id , FUN=length) )
You can use rle with rep to get what you want:
x <- rep(1:4, 4:1)
> x
[1] 1 1 1 1 2 2 2 3 3 4
rep(rle(x)$lengths, rle(x)$lengths)
> rep(rle(x)$lengths, rle(x)$lengths)
[1] 4 4 4 4 3 3 3 2 2 1
For performance purposes, you could store the rle object separately so it is only called once.
Or as Karsten suggested with ddply from plyr:
require(plyr)
#Expects data.frame
dat <- data.frame(x = rep(1:4, 4:1))
ddply(dat, "x", transform, total = length(x))
You are probably looking for split-apply-combine approach; have a look at ddply in the plyr package or the split function in base R.

Resources