I have a factor variable with 6 levels, which simplified looks like:
1 1 2 2 2 3 3 3 4 4 4 4 5 5 5 6 6 6 1 1 1 2 2 2 2... 1 1 1 2 2... (with n = 78)
Note, that each number is repeated mostly but not always three times.
I need to transform this variable into the following pattern:
1 1 2 2 2 3 3 3 4 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 8...
where each repetition of the 6 levels continuous counting ascending.
Is there any way / any function that lets me do that?
Sorry for my bad description!
Assuming that you have a numerical vector that represents your simplified version you posted. i.e. x = c(1,1,1,2,2,3,3,3,1,1,2,2), you can use this:
library(dplyr)
cumsum(x != lag(x, default = 0))
# [1] 1 1 1 2 2 3 3 3 4 4 5 5
which compares each value to its previous one and if they are different it adds 1 (starting from 1).
Maybe you can try rle, i.e.,
v <- rep(seq_along((v<-rle(x))$values),v$lengths)
Example with dummy data
x = c(1,1,1,2,2,3,3,3,4,4,5,6,1,1,2,2,3,3,3,4,4)
then we can get
> v
[1] 1 1 1 2 2 3 3 3 4 4 5 6 7 7 8 8 9 9
[19] 9 10 10
In base you can use diff and cumsum.
c(1, cumsum(diff(x)!=0)+1)
# [1] 1 1 2 2 2 3 3 3 4 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 8
Data:
x <- c(1,1,2,2,2,3,3,3,4,4,4,4,5,5,5,6,6,6,1,1,1,2,2,2,2)
Related
This question already has answers here:
How to create a consecutive group number
(13 answers)
Closed 1 year ago.
I have these set of variables in the column Num I want to create another column that ranks them with size similar to rankt below but I don't like how this is done.
x <- data.frame("Num" = c(2,5,2,7,7,7,2,5,5))
x$rankt <- rank(x$Num)
Num rankt
1 2 2
2 5 5
3 2 2
4 7 8
5 7 8
6 7 8
7 2 2
8 5 5
9 5 5
Desired Outcome I would like for rankt
Num rankt
1 2 1
2 5 2
3 2 1
4 7 3
5 7 3
6 7 3
7 2 1
8 5 2
9 5 2
Well, a crude approach is to turn them to factors, which are just increasing numbers with labels, and then fetch those numbers:
x <- data.frame("Num" = c(2,5,2,7,7,7,2,5,5))
x$rankt <- as.numeric(as.factor( rank(x$Num) ))
x
It produces:
Num rankt
1 2 1
2 5 2
3 2 1
4 7 3
5 7 3
6 7 3
7 2 1
8 5 2
9 5 2
A solution with dplyr
library(dplyr)
x1 <- x %>%
mutate(rankt=dense_rank(desc(-Num)))
I want to replicate a vector with one value within this vector is missing (sequentially).
For example, my vector is
value <- 1:7
First, the series is without 1, second without 2, and so on. In the end, the series is in one vector.
The intended output looks like
2 3 4 5 6 7 1 3 4 5 6 7 1 2 4 5 6 7 1 2 3 5 6 7 1 2 3 4 6 7 1 2 3 4 5 6
Is there any smart way to do this?
You could use the diagonal matrix to set up a logical vector, using it to remove the appropriate values.
n <- 7
rep(1:n, n)[!diag(n)]
# [1] 2 3 4 5 6 7 1 3 4 5 6 7 1 2 4 5 6 7 1 2 3 5 6 7 1 2 3 4 6 7 1 2 3 4 5
# [36] 7 1 2 3 4 5 6
Well, you can certainly do it as a one-liner but I am not sure it qualifies as smart. For example:
x <- 1:7
do.call("c", lapply(as.list(-1:-length(x)), function(a)x[a]))
This simple uses lapply to create a list of copies of x with each of its entries deleted, and then concatenates them using c. The do.call function applies its first argument (a function) to its second argument (a list of arguments to the function).
For fun, it's also possible to just use rep:
> n <- 7
> rep(1:n, n)[rep(c(FALSE, rep(TRUE, n)), length.out=n^2)]
[1] 2 3 4 5 6 7 1 3 4 5 6 7 1 2 4 5 6 7 1 2 3 5 6 7 1 2 3 4 6 7 1 2 3 4 5 7 1 2
[39] 3 4 5 6
But lapply is cleaner, I think.
You could also do:
n <- 7
rep(seq(n), n)[-seq(1,n*n,n+1)]
#[1] 2 3 4 5 6 7 1 3 4 5 6 7 1 2 4 5 6 7 1 2 3 5 6 7 1 2 3 4 6 7 1 2 3 4 5 7 1 2 3 4 5 6
After looking at another post about column names and combn function here consider the same data.frame. We make a combn with all 2 possible vectors:
foo <- data.frame(x=1:5,y=4:8,z=10:14, w=8:4)
all_comb <- combn(foo,2)
Is there a way to keep column names after the combn call so in this case we could get "x y" instead of "X1.5 X4.8" as shown below ?
comb_df <- data.frame(all_comb[1,1],all_comb[2,1])
print(comb_df)
X1.5 X4.8
1 1 4
2 2 5
3 3 6
4 4 7
5 5 8
I suspect you really want to use expand.grid() instead.
Try this:
head(expand.grid(foo))
x y z w
1 1 4 10 8
2 2 4 10 8
3 3 4 10 8
4 4 4 10 8
5 5 4 10 8
6 1 5 10 8
or
head(expand.grid(foo[, 1:2]))
x y
1 1 4
2 2 4
3 3 4
4 4 4
5 5 4
6 1 5
I would like to refer to values in a data frame column with the row index being dependent on the value of another column.
Example:
value lag laggedValue
1 1 2
2 2 4
3 3 6
4 2 6
5 1 6
6 3 9
7 3 10
8 1 9
9 1 10
10 2
In Excel I use this formula in column "laggedValue":
=INDIRECT("B"&(ROW(B2)+C2))
How can I do this in an R data frame?
Thanks!
For row r with associated lag value lag[r] it looks like you're trying to create a new column that is the (r+lag[r])th element of value (or a missing value if this is out of bounds). You can do this with:
dat$laggedValue <- dat$value[seq(nrow(dat)) + dat$lag]
dat
value lag laggedValue
1 1 1 2
2 2 2 4
3 3 3 6
4 4 2 6
5 5 1 6
6 6 3 9
7 7 3 10
8 8 1 9
9 9 1 10
10 10 2 NA
Other commenters are mentioning that it looks like you're just adding the value and lag columns because your value column has the elements 1 through 10, but this solution will work even when your value column has other data stored in it.
Assuming the same thing as #rawr here:
dat <- data.frame(value=c(1:10),
lag=c(1,2,3,2,1,3,3,1,1,2))
dat$laggedValue <- dat$value + dat$lag
dat
value lag laggedValue
1 1 1 2
2 2 2 4
3 3 3 6
4 4 2 6
5 5 1 6
6 6 3 9
7 7 3 10
8 8 1 9
9 9 1 10
10 10 2 12
Suppose I have a vector of size n=8 v=(5,8,2,7,9,12,2,1). I would like to know how to build a N x N matrix that compares every pair of values of v and returns the minimum value of each comparation. In this example, it would be like this:
5 5 2 5 5 5 2 1
5 8 2 7 8 8 2 1
2 2 2 2 2 2 2 1
5 7 2 7 7 7 2 1
5 8 2 7 9 9 2 1
5 8 2 7 9 12 2 1
2 2 2 2 2 2 2 1
1 1 1 1 1 1 1 1
Could you help me with this, please?
outer(v, v, pmin)
Notice the use of pmin, not min, as the former is vectorised but not the latter.