I am facing a problem with the amount of time needed to run my code. Basically, I have several columns a key value in the last column (that I identify as the mean in the reproducible example). I want it to be 1 when it is below the value and 2 when it is above.
Is there an easier way to do this?
a <- c(1,3,5,6,4)
b <- c(10,4,24,5,3)
df <- data.frame (a,b)
df$mean <- rowMeans (df)
for (i in 1:5){
df[i,1:2] [df[i,1:2]<df$mean[i]] <- 1
df[i,1:2] [df[i,1:2]>df$mean[i]] <- 2
}
Thank you in advance
You can simply do,
df[1:2] <- (df[1:2] > df$mean) + 1 #removed as.integer as per #akrun's comment
Which gives,
a b mean
1 1 2 5.5
2 1 2 3.5
3 1 2 14.5
4 2 1 5.5
5 2 1 3.5
Always avoid using loops when possible in R!
Alternative Solution using mutate_each from dplyr
df %>% mutate_each(funs(ifelse(mean>.,1,2)), 1:2)
Also gives
a b mean
1 1 2 5.5
2 1 2 3.5
3 1 2 14.5
4 2 1 5.5
5 2 1 3.5
Related
I am looking for a way of doing a rolling product in an ifelse statement that is based on an additional column?
My data looks like this
A B C
1 1 1
2 3 1
3 5 0
4 7 0
The excel formula equivalent would be
C3 = IF(B3=0,(1+A3/10)*C2,1)
I tried using
ifelse(B==0,cumprod(c(1,(A[-1]/10+1))),1)
I couldn't get it working for this case as it is always referring to just the data in column A.
I would expect the following results
A B C
1 1 1 1
2 3 1 1
3 5 0 1.5
4 7 0 2.55
thanks in advance
Try this:
df$C <- cumprod(with(df, ifelse(B==0, A/10+1, 1)))
Or using Reduce:
df$C <- Reduce('*', with(df, ifelse(B==0, A/10+1, 1)), accumulate = T)
Is there a simple way to make R automatically copy columns from a data.frame to another?
I have something like:
>DF1 <- data.frame(a=1:3, b=4:6)
>DF2 <- data.frame(c=-2:0, d=3:1)
and I want to get something like
>DF1
a b c d
1 -2 4 -2 3
2 -1 5 -1 2
3 0 6 0 1
I'd normally do it by hand, as in
DF1$c <- DF2$c
DF1$d <- DF2$d
and that's fine as long as I have few variables, but it becomes very time consuming and prone to error when dealing with several variables. Any idea on how to do this efficiently? It's probably quite simple but I swear I wasn't able to find an answer googling, thank you!
The result from your example is not correct, it should be:
> DF1$c <- DF2$c
> DF1$d <- DF2$d
> DF1
a b c d
1 1 4 -2 3
2 2 5 -1 2
3 3 6 0 1
Then cbind does exactly the same:
> cbind(DF1, DF2)
a b c d
1 1 4 -2 3
2 2 5 -1 2
3 3 6 0 1
(I was going to add this as a comment to Jilber's now deleted and then undeleted post.) Might be safer to recommend something like
DF1 <- cbind(DF1, DF2[!names(DF2) %in% names(DF1)])
if i have the following data frame G:
z type x
1 a 4
2 a 5
3 a 6
4 b 1
5 b 0.9
6 c 4
I am trying to get:
z type x y
3 a 6 3
2 a 5 2
1 a 4 1
4 b 1 2
5 b 0.9 1
6 c 4 1
I.e. i want to sort the whole data frame within the levels of factor type based on vector x. Get the length of of each level a = 3 b=2 c=1 and then number in a decreasing fashion in a new vector y.
My starting place is currently with sort()
tapply(y, x, sort)
Would it be best to first try and use sapply to split everything first?
There are many ways to skin this cat. Here is one solution using base R and vectorized code in two steps (without any apply):
Sort the data using order and xtfrm
Use rle and sequence to genereate the sequence.
Replicate your data:
dat <- read.table(text="
z type x
1 a 4
2 a 5
3 a 6
4 b 1
5 b 0.9
6 c 4
", header=TRUE, stringsAsFactors=FALSE)
Two lines of code:
r <- dat[order(dat$type, -xtfrm(dat$x)), ]
r$y <- sequence(rle(r$type)$lengths)
Results in:
r
z type x y
3 3 a 6.0 1
2 2 a 5.0 2
1 1 a 4.0 3
4 4 b 1.0 1
5 5 b 0.9 2
6 6 c 4.0 1
The call to order is slightly complicated. Since you are sorting one column in ascending order and a second in descending order, use the helper function xtfrm. See ?xtfrm for details, but it is also described in ?order.
I like Andrie's better:
dat <- read.table(text="z type x
1 a 4
2 a 5
3 a 6
4 b 1
5 b 0.9
6 c 4", header=T)
Three lines of code:
dat <- dat[order(dat$type), ]
x <- by(dat, dat$type, nrow)
dat$y <- unlist(sapply(x, function(z) z:1))
I Edited my response to adapt for the comments Andrie mentioned. This works but if you went this route instead of Andrie's you're crazy.
This problem seems trivial but I'm at my wits end after hours of reading.
I need to generate a vector of the same length as the input vector that lists for each value of the input vector the total count for that value. So, by way of example, I would want to generate the last column of this dataframe:
> df
customer.id transaction.count total.transactions
1 1 1 4
2 1 2 4
3 1 3 4
4 1 4 4
5 2 1 2
6 2 2 2
7 3 1 3
8 3 2 3
9 3 3 3
10 4 1 1
I realise this could be done two ways, either by using run lengths of the first column, or grouping the second column using the first and applying a maximum.
I've tried both tapply:
> tapply(df$transaction.count, df$customer.id, max)
And rle:
> rle(df$customer.id)
But both return a vector of shorter length than the original:
[1] 4 2 3 1
Any help gratefully accepted!
You can do it without creating transaction counter with:
df$total.transactions <- with( df,
ave( transaction.count , customer.id , FUN=length) )
You can use rle with rep to get what you want:
x <- rep(1:4, 4:1)
> x
[1] 1 1 1 1 2 2 2 3 3 4
rep(rle(x)$lengths, rle(x)$lengths)
> rep(rle(x)$lengths, rle(x)$lengths)
[1] 4 4 4 4 3 3 3 2 2 1
For performance purposes, you could store the rle object separately so it is only called once.
Or as Karsten suggested with ddply from plyr:
require(plyr)
#Expects data.frame
dat <- data.frame(x = rep(1:4, 4:1))
ddply(dat, "x", transform, total = length(x))
You are probably looking for split-apply-combine approach; have a look at ddply in the plyr package or the split function in base R.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How to sort a dataframe by column(s) in R
I have a dataset that looks like this:
x y z
1. 1 0.2
1.1 1 1.5
1.2 1 3.
1. 2 8.1
1.1 2 1.0
1.2 2 0.6
What I would like is organise the dataset first as a function of x in increasing order then as a function of y such that
x y z
1. 1 0.2
1. 2 8.1
1.1 1 1.5
1.1 2 1.
1.2 1 3.
1.2 2 0.6
I know that apply, mapply, tapply, etc functions reorganise datasets but I must admit that I don't really understand the differences between them nor do I really understand how to apply which and when.
Thank you for your suggestions.
You can order your data using the order function. There is no need for any apply family function.
Assuming your data is in a data.frame called df:
df[order(df$x, df$y), ]
x y z
1 1.0 1 0.2
4 1.0 2 8.1
2 1.1 1 1.5
5 1.1 2 1.0
3 1.2 1 3.0
6 1.2 2 0.6
See ?order for more help.
On a side note: reshaping in general refers to changing the shape of a data.frame, e.g. converting it from wide to tall format. This is not what is required here.
You can also use the arrange() function in plyr for this. Wrap the variables in desc() that you want to sort the other direction.
> library(plyr)
> dat <- head(ChickWeight)
> arrange(dat,weight,Time)
weight Time Chick Diet
1 42 0 1 1
2 51 2 1 1
3 59 4 1 1
4 64 6 1 1
5 76 8 1 1
6 93 10 1 1
This is the fastest way to do this that's still readable, if speed matters in your application. Benchmarks here:
How to sort a dataframe by column(s)?