Missing values in sequence into actual sequence in R? [duplicate] - r

This question already has answers here:
How to create a consecutive group number
(13 answers)
Closed 5 years ago.
I have a vector of integers, for example, v <- c(1,5,1,2,2,4,7,5,7). If I sort(unique(v)), the values 3 and 6 would be missing in the sequence. How can I transform v into a vector where sort(unique(v)) is an actual sequence of integers? This is, transforming v into c(1,4,1,2,2,3,5,3,5) (in general, of course).

Converting v to factor and back to numeric could do the trick
as.numeric(as.factor(v))
#[1] 1 4 1 2 2 3 5 4 5

Using OP's method, we get the expected output with match
match(v, sort(unique(v)))
#[1] 1 4 1 2 2 3 5 4 5

Related

How to create an vector with incremental values based on values of another vector? [duplicate]

This question already has answers here:
Generate an incrementally increasing sequence like 112123123412345
(4 answers)
Closed 1 year ago.
I have a vector v1 and using it, I want to create another vector v2
Here, v1 = c(7,6,5), v2 = c(1,2,3,4,5,6,7,1,2,3,4,5,6,1,2,3,4,5)
I want to get v2 with and without loops, both. How is it done?
You can use sequence to generate the numbers.
sequence(v1)
# [1] 1 2 3 4 5 6 7 1 2 3 4 5 6 1 2 3 4 5
And with a loop using lapply:
unlist(lapply(v1, seq))

R won't interpret table column as numerical values [duplicate]

This question already has answers here:
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Count number of occurences for each unique value
(14 answers)
Closed 2 years ago.
I am just trying to convert a column of numbers, which R thinks are characters, to numerical values.
I have the following table:
> longtab=as.data.frame(table(long));head(longtab)
long Freq
1 189485 1
2 189486 1
3 189487 1
4 189488 1
5 189489 1
6 189490 1
I've created a new table from those data as follows:
> q=head(longtab);q
long Freq
1 189485 1
2 189486 1
3 189487 1
4 189488 1
5 189489 1
6 189490 1
When I test whether the "long" column is numeric, R tells me that it is not.
> is.numeric(q$long)
[1] FALSE
When I try to coerce "long" values to be numeric using as.numeric(), I get the following:
> as.numeric(q$long)
[1] 1 2 3 4 5 6
But these are the row numbers not the values in the "long" column. This seems like it should be a simple problem to fix but I am struggling and have been at this a while. Any help would be greatly appreciated.

increasing value by one with each occurrence of non-repeated number [duplicate]

This question already has answers here:
Increment by 1 for every change in column
(6 answers)
Closed 2 years ago.
v <- c(1,1,2,3,3,3,1,1,3,4,4)
I'm trying to create a vector of elements in which the first occurrence of a non-repeated number always increases by one relative to the previous number.
This is the desired output
1,1,2,3,3,3,4,4,5,6,6
What would an efficient way of doing this would be?
A base R option with rle
> with(rle(v),rep(seq_along(values),lengths))
[1] 1 1 2 3 3 3 4 4 5 6 6
or data.table::rleid
> data.table::rleidv(v)
[1] 1 1 2 3 3 3 4 4 5 6 6

unique string count in a sequence [duplicate]

This question already has answers here:
transitions in a sequence
(2 answers)
Closed 2 years ago.
I am trying to get the unique counts of the strings in a sequence.
For example,
A<- c('CCE-CRE-DEE-DEE', 'FOE-FOE-GOE-GOE-GOE-ISE', 'ISE-PCE', 'ISE')
library('stringr')
B<- str_count(A, "-")
df<- data.frame(A, B)
I am expecting output as follows:
C here is the total diversity, or different states in the sequence, any thoughts or suggestions? I looked around in SO but couldn't find a reasonable solution.
df$C
4
3
2
1
I would do this using unique:
df$res <- sapply(str_split(A,"-"),function(x) length(unique(x)))
df
A B res
1 CCE-CRE-DEE-DEE 3 3
2 FOE-FOE-GOE-GOE-GOE-ISE 5 3
3 ISE-PCE 1 2
4 ISE 0 1
I supose that what you expect is actually 3 for CCE-CRE-DEE-DEE.

Use a 'for' loop for the 'aggregate' command? [duplicate]

This question already has answers here:
Mean per group in a data.frame [duplicate]
(8 answers)
Closed 7 years ago.
I've got a data frame as so,
Treatment Type Numerical Value
1 A 3
1 B 2
1 A 8
1 B 7
2 B 4
2 B 1
2 A 2
2 A 2
I want to make a table of means for each type and treatments.
Using aggregate, I have: aggregate(df[,3], list(Treatment) ,mean) which gives me the means for each treatment but not separated by type too. I was thinking this could be rectified by a for-loop.
Note: This is just a subset of the data, and the list of numerical values is hundreds for each type and treatment.
Since I don't have repu to comment:
aggregate(df, list(Treatment,Type), mean)

Resources