Repetitive vectors in R [duplicate] - r

This question already has an answer here:
Closed 11 years ago.
Possible Duplicate:
R: generate a repeating sequence based on vector
To create the vector 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 is easy in one line, just type this into the command line and the appropriate output comes out immediately:
c(rep(1:3, 5))
But is there a similarly easy way to produce the vector 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 ?
The pattern of the repetition is different but it's not obvious to me why it's not amenable to a very simple solution. It's possible to do this with a "for" loop without too much difficulty, but can it be all compressed into one "line"?

You need the each parameter within rep:
> rep(1:5, each = 3)
[1] 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5

Related

R Create column that provides grouping number for each distinct group [duplicate]

This question already has an answer here:
get sequence of group in R
(1 answer)
Closed 2 years ago.
I need to add a column to my data that contains a number grouping for each distinct combination of other columns. It will likely be more clear with this example:
# Make data
df <- data.frame(x = c(1,1,2,3,4,5,2,3,4,5),
y = c(2, 2,3,4,5,1,3,4,5,1),
value = c(1,2,3,4,5,6,7,8,9,10))
# Print the data
df
x y value
1 1 2 1
2 1 2 2
3 2 3 3
4 3 4 4
5 4 5 5
6 5 1 6
7 2 3 7
8 3 4 8
9 4 5 9
10 5 1 10
I need to add a "Location" column that has the numbers each unique (or distinct) combination of x and y. Duplicated x and y combinations should all use the same number. In my example there are 5 unique combinations of x and y, so I only have a maximum of 5 Locations. My goal output is this:
x y value Location
1 1 2 1 1
2 1 2 2 1
3 2 3 3 2
4 3 4 4 3
5 4 5 5 4
6 5 1 6 5
7 2 3 7 2
8 3 4 8 3
9 4 5 9 4
10 5 1 10 5
I imagine doing something like this:
df <- df %>%
group_by(x,y) %>%
mutate(Location = ndistinct(x,y)
But this doesn't work. Any help is appreciated!
Thanks!
df %>% mutate(., Location=group_indices(., x,y))
x y value Location
1 1 2 1 1
2 1 2 2 1
3 2 3 3 2
4 3 4 4 3
5 4 5 5 4
6 5 1 6 5
7 2 3 7 2
8 3 4 8 3
9 4 5 9 4
10 5 1 10 5
See here and here.
Not quite as straightforward as I thought to start with.
Update
To answer OP's question: the dot . is a placeholder for "the object on the left hand side of the pipe" (%>%). Normally you don't need it because, by default, magrittr (the package which defines the pipe) assumes that you want to use the object on the left hand side of the pipe as the first argument to the function on the right hand side of the pipe, and makes the substitution for you. This is very helpful because the tidyverse is designed so that the object on the left hand side of the pipe is always the first argument to the function on the right hand side - so you don't have to use the dot.
If you use functions that don't belong to the tidyverse, you sometimes need the dot to override magrittr's default behaviour.
I wrote my first version of this answer without testing the code because the solution seemed "obvious". But I did test it afterwards (at the same time as OP reported the error) and found that it didn't work. A quick Google brought me to the github issue in the second link above, and hence to the correct answer.
I don't yet understand why, in this particular case, a tidyverse function doesn't work as I expect. (Other than taking the easy way out and saying that my expectation was wrong!)
In base R we can use:
df$location <- as.numeric(factor(paste(df$x,df$y)))
x y value location
1 1 2 1 1
2 1 2 2 1
3 2 3 3 2
4 3 4 4 3
5 4 5 5 4
6 5 1 6 5
7 2 3 7 2
8 3 4 8 3
9 4 5 9 4
10 5 1 10 5

R repeating sequence add 1 each repeat

I have a workbook problem for my R class I can't figure out. I need to "write an R command that uses rep() to create a vector with elements 1 2 3 4 2 3 4 5 3 4 5 6 4 5 6 7"
It seems to be a repeating sequence of 1 to 4, repeating 4 times and on each repeat adding 1 to the starting element. I'm very very new to R so I'm stumped. Any help would be appreciated.
We can use rep and add with the initial vector
v1 + rep(0:3, each = length(v1))
#[1] 1 2 3 4 2 3 4 5 3 4 5 6 4 5 6 7
Or using sapply
c(sapply(v1, `+`, 0:3))
Or using outer
c(outer(v1, 0:3, `+`))
data
v1 <- 1:4
Another option is to use sequence:
sequence(rep(4, 4), 1:4)
#[1] 1 2 3 4 2 3 4 5 3 4 5 6 4 5 6 7

Formatting R combn output

As a short example, when running combn(1:5,2), I get a matrix of 2 rows and 10 columns.
I know I can convert the output matrix to a data frame, but is it possible (any option inside combn) to have the output readily in the form of a vertical data frame of 2 columns and 10 rows ?
Thanks.
Simply transpose the matrix with t():
data.frame(t(combn(1:5, 2)))
Yields:
X1 X2
1 1 2
2 1 3
3 1 4
4 1 5
5 2 3
6 2 4
7 2 5
8 3 4
9 3 5
10 4 5

Group values by unique elements [duplicate]

This question already has answers here:
How to create a consecutive group number
(13 answers)
Create group number for contiguous runs of equal values
(4 answers)
Closed 1 year ago.
I have a vector that looks like this:
a <- c("A110","A110","A110","B220","B220","C330","D440","D440","D440","D440","D440","D440","E550")
I would like to create another another vector, based on a, that should look like:
b <- c(1,1,1,2,2,2,3,4,4,4,4,4,4,5)
In other words, b should assign a value (starting from 1) to each different element of a.
First of all, (I assume) this is your vector
a <- c("A110","A110","A110","B220","B220","C330","D440","D440","D440","D440","D440","D440","E550")
As per possible solutions, here are few (can't find a good dupe right now)
as.integer(factor(a))
# [1] 1 1 1 2 2 3 4 4 4 4 4 4 5
Or
cumsum(!duplicated(a))
# [1] 1 1 1 2 2 3 4 4 4 4 4 4 5
Or
match(a, unique(a))
# [1] 1 1 1 2 2 3 4 4 4 4 4 4 5
Also rle will work the similarly in your specific scenario
with(rle(a), rep(seq_along(values), lengths))
# [1] 1 1 1 2 2 3 4 4 4 4 4 4 5
Or (which is practically the same)
data.table::rleid(a)
# [1] 1 1 1 2 2 3 4 4 4 4 4 4 5
Though be advised that all 4 solutions have their unique behavior in different scenarios, consider the following vector
a <- c("B110","B110","B110","A220","A220","C330","D440","D440","B110","B110","E550")
And the results of the 4 different solutions:
1.
as.integer(factor(a))
# [1] 2 2 2 1 1 3 4 4 2 2 5
The factor solution begins with 2 because a is unsorted and hence the first values are getting higher integer representation within the factor function. Hence, this solution is only valid if your vector is sorted, so don't use it other wise.
2.
cumsum(!duplicated(a))
# [1] 1 1 1 2 2 3 4 4 4 4 5
This cumsum/duplicated solution got confused because of "B110" already been present at the beginning and hence grouped "D440","D440","B110","B110" into the same group.
3.
match(a, unique(a))
# [1] 1 1 1 2 2 3 4 4 1 1 5
This match/unique solution added ones at the end, because it is sensitive to "B110" showing up in more than one sequences (because of unique) and hence grouping them all into same group regardless of where they appear
4.
with(rle(a), rep(seq_along(values), lengths))
# [1] 1 1 1 2 2 3 4 4 5 5 6
This solution only cares about sequences, hence different sequences of "B110" were grouped into different groups

Creating a combination data.table in R

I would like to do the following:
A B
1 2
1 3
1 4
2 3
2 4
3 4
using data.table, but I am not sure how to exclude the already used numbers cumulatively.

Resources