I would like to refer to values in a data frame column with the row index being dependent on the value of another column.
Example:
value lag laggedValue
1 1 2
2 2 4
3 3 6
4 2 6
5 1 6
6 3 9
7 3 10
8 1 9
9 1 10
10 2
In Excel I use this formula in column "laggedValue":
=INDIRECT("B"&(ROW(B2)+C2))
How can I do this in an R data frame?
Thanks!
For row r with associated lag value lag[r] it looks like you're trying to create a new column that is the (r+lag[r])th element of value (or a missing value if this is out of bounds). You can do this with:
dat$laggedValue <- dat$value[seq(nrow(dat)) + dat$lag]
dat
value lag laggedValue
1 1 1 2
2 2 2 4
3 3 3 6
4 4 2 6
5 5 1 6
6 6 3 9
7 7 3 10
8 8 1 9
9 9 1 10
10 10 2 NA
Other commenters are mentioning that it looks like you're just adding the value and lag columns because your value column has the elements 1 through 10, but this solution will work even when your value column has other data stored in it.
Assuming the same thing as #rawr here:
dat <- data.frame(value=c(1:10),
lag=c(1,2,3,2,1,3,3,1,1,2))
dat$laggedValue <- dat$value + dat$lag
dat
value lag laggedValue
1 1 1 2
2 2 2 4
3 3 3 6
4 4 2 6
5 5 1 6
6 6 3 9
7 7 3 10
8 8 1 9
9 9 1 10
10 10 2 12
Related
I would like to create a dataframe that essentially would look something like this
Repeating the period from 1 to 10 and assigning the ID 42,574 times
so that I would end up with a 425,740 row dataframe.
I tried to create a dataframe using the following code
periodstring <- as.numeric(gl(10, 42574))
periods <- as.data.frame(periodstring)
but that sorts the numbers and other approaches did not quiete work. Is there a simple way to do this?
Thanks in advance.
Another option using rep:
data.frame(Period=rep(1:10,times=42574),
ID=rep(1:42574,each=10))
Output sample:
Period ID
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 1
7 7 1
8 8 1
9 9 1
10 10 1
11 1 2
12 2 2
13 3 2
14 4 2
15 5 2
16 6 2
17 7 2
18 8 2
19 9 2
20 10 2
This question already has answers here:
How to create a consecutive group number
(13 answers)
Closed 1 year ago.
I have these set of variables in the column Num I want to create another column that ranks them with size similar to rankt below but I don't like how this is done.
x <- data.frame("Num" = c(2,5,2,7,7,7,2,5,5))
x$rankt <- rank(x$Num)
Num rankt
1 2 2
2 5 5
3 2 2
4 7 8
5 7 8
6 7 8
7 2 2
8 5 5
9 5 5
Desired Outcome I would like for rankt
Num rankt
1 2 1
2 5 2
3 2 1
4 7 3
5 7 3
6 7 3
7 2 1
8 5 2
9 5 2
Well, a crude approach is to turn them to factors, which are just increasing numbers with labels, and then fetch those numbers:
x <- data.frame("Num" = c(2,5,2,7,7,7,2,5,5))
x$rankt <- as.numeric(as.factor( rank(x$Num) ))
x
It produces:
Num rankt
1 2 1
2 5 2
3 2 1
4 7 3
5 7 3
6 7 3
7 2 1
8 5 2
9 5 2
A solution with dplyr
library(dplyr)
x1 <- x %>%
mutate(rankt=dense_rank(desc(-Num)))
I have a factor variable with 6 levels, which simplified looks like:
1 1 2 2 2 3 3 3 4 4 4 4 5 5 5 6 6 6 1 1 1 2 2 2 2... 1 1 1 2 2... (with n = 78)
Note, that each number is repeated mostly but not always three times.
I need to transform this variable into the following pattern:
1 1 2 2 2 3 3 3 4 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 8...
where each repetition of the 6 levels continuous counting ascending.
Is there any way / any function that lets me do that?
Sorry for my bad description!
Assuming that you have a numerical vector that represents your simplified version you posted. i.e. x = c(1,1,1,2,2,3,3,3,1,1,2,2), you can use this:
library(dplyr)
cumsum(x != lag(x, default = 0))
# [1] 1 1 1 2 2 3 3 3 4 4 5 5
which compares each value to its previous one and if they are different it adds 1 (starting from 1).
Maybe you can try rle, i.e.,
v <- rep(seq_along((v<-rle(x))$values),v$lengths)
Example with dummy data
x = c(1,1,1,2,2,3,3,3,4,4,5,6,1,1,2,2,3,3,3,4,4)
then we can get
> v
[1] 1 1 1 2 2 3 3 3 4 4 5 6 7 7 8 8 9 9
[19] 9 10 10
In base you can use diff and cumsum.
c(1, cumsum(diff(x)!=0)+1)
# [1] 1 1 2 2 2 3 3 3 4 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 8
Data:
x <- c(1,1,2,2,2,3,3,3,4,4,4,4,5,5,5,6,6,6,1,1,1,2,2,2,2)
For instance, I have a data frame like this, there is no duplicated number for each row, numbers are sorted by each row.
W1 W2 W3 W4
1 1 3 4 7
2 4 5 6 7
3 1 2 5 8
4 2 5 9 10
5 4 7 10 13
6 1 2 6 9
I want to get the row ID when 1/2/3.... appears, since 1 in row 1,3,6; 2 in row 3,4,6; 3 in row 1 only, ...; So my result would like this
1 1 3 6
2 3 4 6
3 1
4 1 2 5
5 2 3 4
......
I would do:
split(t(row(df)), unlist(t(df)))
And if you need empty levels to show up:
split(t(row(df)), factor(unlist(t(df)), 1:max(df)))
This should be a lot faster than looping via for example:
lapply(1:max(df), function(i) which(rowSums(df == i) > 0))
Apologies if this is posted elsewhere I did searches here and elsewhere and found things that were close but not quite what I needed. After sinking a couple hours into this, I'm posting!
I need to remove rows from a data set for duplicate values in value1 by id. So in the following data frame I'd only want to remove row 3. I do not want to remove row 10 or row 9. If it makes a difference, in the actual date the values are dates.
I know the solution is probably very simple but I've yet to get it exactly right. Thanks!
x <- data.frame(cbind(id=c(1,2,2,2,3,3,4,5,6,6), value1=c(6,8,8,1,9,5,4,3,8,4), value2=1:10))
> x
id value1 value2
1 1 6 1
2 2 8 2
3 2 8 3
4 2 1 4
5 3 9 5
6 3 5 6
7 4 4 7
8 5 3 8
9 6 8 9
10 6 4 10
I want to end up with:
> x
id value1 value2
1 1 6 1
2 2 8 2
4 2 1 4
5 3 9 5
6 3 5 6
7 4 4 7
8 5 3 8
9 6 8 9
10 6 4 10
Try duplicated:
> x[!duplicated(x[1:2]), ]
id value1 value2
1 1 6 1
2 2 8 2
4 2 1 4
5 3 9 5
6 3 5 6
7 4 4 7
8 5 3 8
9 6 8 9
10 6 4 10