Assign order to list of numbers in R - r

Suppose I have a vector like this
lst <- c(2,3,4,6,7,9,10)
Is it possible to number the items in sequence?
Expected Output
lst.rank <- c(1,2,3,1,2,1,2)

unlist(lapply(split(lst, cumsum(c(1, diff(lst)) != 1)), seq_along), use.names = FALSE)
#OR
ave(cumsum(c(1, diff(lst)) != 1), cumsum(c(1, diff(lst)) != 1), FUN = seq_along)
#[1] 1 2 3 1 2 1 2

In the same spirit as d.b's answer, but using rle and sequence.
sequence(rle(cumsum(c(1, diff(lst)) != 1))$lengths)
[1] 1 2 3 1 2 1 2

lst <- c(2,3,4,6,7,9,10)
m = 1
for (i in 1:(length(lst)-1) ){
if (lst[i+1] == lst[i]+1){
lst[i]=m
if(i == length(lst)-1) lst[i+1] = m + 1
m = m+1
}
else{
lst[i]=m
m = 1
}
}
lst

Related

How to get rid of that inner condition?

If argument negation is true then the condition should be negated. Is there a more convenient way to write this?
foo <- function (x, type, negation){
if(type == 1){
condition <- x > 1
if(negation){
condition <- !condition
}
}
if(type == 2){
condition <- x == 5
if(negation){
condition <- !condition
}
}
x[condition]
}
EDIT:
example:
x <- 1:10
foo(x, 1, T) # 1
foo(x, 1, F) # 2 3 4 5 6 7 8 9 10
foo(x, 2, T) # 1 2 3 4 6 7 8 9 10
foo(x, 2, F) # 5
If there will be many types in future, consider using S3 OOP system.
If not:
foo <- function(x, type, negation) {
condition <- switch(
type,
`1` = x > 1,
`2` = x == 5
)
x[xor(negation, condition)]
}
(after #PoGibas comment):
foo <- function (x, type, negation){
if(type == 1){
condition <- x > 1
}
if(type == 2){
condition <- x == 5
}
if(negation){
condition <- !condition
}
x[condition]
}
any other ideas to improve it more?

Replacing the value in between vectors when there is a defined difference

I have question on replacing the value in between the vectors.
The algorithm should find that replacement number when the certain condition is met. In this case finding the number which makes the difference -20 with the previous number. So I prefer to use diff function.
Here is what I mean
x <- c(20,20,0,20,0,5)
> diff(x)
[1] 0 -20 20 -20 5
So in this case 0 makes the difference -20 and I want to change those 0s to 20.
. I know the easiest solution is the directly assigning x[3] <- 20 or x[5] <- 20
However, the 0 location is always different so I need an automated process that can do that. Thanks!
**EDIT
if we need to do this in a grouped data.frame
> df
x gr
1 20 1
2 20 1
3 0 1
4 20 1
5 0 1
6 5 1
7 33 2
8 0 2
9 20 2
10 0 2
11 20 2
12 0 2
How can we implement this ?
modify <- function(x){
value_search = c(0, 33)
value_replacement = c(20, 44)
for (k in 1:length(value_search)) {
index_position = which(x %in% value_search[k])
replacement = value_replacement[k]
for (i in index_position) {
x[i] = replacement
}
}
}
df%>%
group_by(gr)%>%
mutate(modif_x=modify(x))
Error in mutate_impl(.data, dots) :
Evaluation error: 'match' requires vector arguments.
You can do it using which to get the position, i.e.
x[which(diff(x) == -20)+1] <- 20
x
#[1] 20 20 20 20 20 5
if you want a generic way to replace values of a vector based on particular values, i would approach it this way.
x = c(20,20,0,20,0,5)
value_search = 0
value_replacement = 20
index_position = which(x %in% value_search)
for (i in index_position) {
x[i] = value_replacement
}
but this works for single values. if you want to look for multiple values, you can use a nested loop as below:
x = c(20,20,0,20,0,5,33)
value_search = c(0, 33)
value_replacement = c(20, 44)
for (k in 1:length(value_search)) {
index_position = which(x %in% value_search[k])
replacement = value_replacement[k]
for (i in index_position) {
x[i] = replacement
}
}
in response to OP's edits:
any number of ways to do this:
x = c(20,20,0,20,0,5,33)
gr = c(1,1,1,1,2,2,2)
df = data.frame(x, gr)
func_replace <- function(source, value_search, value_replacement) {
for (k in 1:length(source)) {
index_position = which(x %in% value_search[k])
replacement = value_replacement[k]
for (i in index_position) {
source[i] = replacement
} # for i loop
} # for k loop
return(source)
} # func_replace
value_search = c(0, 33)
value_replacement = c(20, 44)
gr_value = 1
df$replacement = with(df, ifelse(gr == gr_value, sapply(df, FUN = function(x) func_replace(x, value_search, value_replacement)), NA))

OR between two strings with 0, 1 's

Consider two strings of the form below:
101001
010001
How I can do OR between these two and report number of ones?
My goal is to just report 4 for the two strings above.
Thanks very much for your help
There's probably a more elegant way, but how about this:
x = "101001"
y = "010001"
dat = c(strsplit(x, split=""), strsplit(y, split=""))
sum(dat[[1]] == 1 | dat[[2]] == 1)
or this:
sum(unlist(strsplit(x, split="")) == 1 | unlist(strsplit(y, split="")) == 1)
or, per #jbaums comment:
sum(as.numeric(strsplit(x, '')[[1]]) | as.numeric(strsplit(y, '')[[1]]))
If you're only dealing with binary, you can convert the strings to numerics, add them, and count the number of non-zeros. (Edited to incorporate Julius's recommendation)
x = "101001"
y = "010001"
xy <- as.numeric(x) + as.numeric(y)
length(gregexpr("(1|2)", xy)[[1]])
You can write this to run over a vector pretty easily too.
#* function to generate sample data
make_binary_string <- function(n = 10, len = 6)
{
vapply(1:n,
function(i, n, len) paste0(sample(0:1, 6, replace = TRUE), collapse = ""),
character(1),
n = n,
len = len)
}
set.seed(pi)
x <- make_binary_string(n = 10)
y <- make_binary_string(n = 10)
xy <- as.numeric(x) + as.numeric(y)
nchar(gsub("0", "", xy))
Here is what I tried.
df <- data.frame(strsplit(str1,split = ""), strsplit(str2,split = ""))
names(df) <- c('x1', 'x2')
This will convert strings into dataframe like this
x1 x2
1 1 0
2 0 1
3 1 0
4 0 0
5 0 0
6 1 1
And then count number of rows which have atleast one 1
nrow(df[df$x1 == 1 | df$x2 == 1,])
Or
sum(bitwOr(as.numeric(strsplit(str1,split = "")[[1]]) , as.numeric(strsplit(str2,split = "")[[1]])))
We can define a function to.bool() that converts a string to a sequence of boolean values:
to.bool <- function(boolstr) as.logical(as.integer(unlist(strsplit(boolstr,""))))
sum(to.bool("101001") | to.bool("010001"))
#[1] 4

R: Remove repeated values and keep the first one in a binary vector

I would like to remove the repeated ones but keep the first in a binary vector:
x = c(0,0,1,1,0,1,0,1,1,1,0,1) # the input
y = c(0,0,1,0,1,0,1,0,1) # the desired output
i.e., one 1 and two 1's of the first and third set of 1's are removed, respectively, and the first in the set is kept.
I am trying to use rle with cumsum but have not yet figured it out. Any suggestion would be appreciated.
Using rle/inverse.rle
res <- rle(x)
res$lengths[res$values == 1] <- 1
inverse.rle(res)
## [1] 0 0 1 0 1 0 1 0 1
We can use diff:
x[c(1, diff(x)) == 1 | x == 0]
x = c(0,0,1,1,0,1,0,1,1,1,0,1)
x[!(x == 1 & #remove each value that is a 1
c(x[-1] == 1, FALSE) #followed by a 1 (never the case for the last value)
)]
#[1] 0 0 1 0 1 0 1 0 1
x = c(0,0,1,1,0,1,0,1,1,1,0,1)
x1 <- rle(x)
x1$lengths[x1$values==1] <- 1
inverse.rle(x1)
Depending on the vector size you could loop through it and use conditions for appending the value to the result. Here is a simple solution using your given input.
x <- c(0,0,1,1,0,1,0,1,1,1,0,1)
prev <- 0
y <- c()
for(i in x){
if (i == 1){
if (prev != 1){
y <- append(y,i)
}
}else{
y <- append(y,i)
}
prev <- i
}

Assign an element value based on element adjacencies in R

I have a data frame with {0,1} indicating whether a product was Small, Medium or Large.
dat <- data.frame(Sm = c(1,0,0), Med = c(0,1,0), Lg = c(0,0,1))
Sm Med Lg
1 1 0 0
2 0 1 0
3 0 0 1
I'm looking to assign 1's to the 0's leading up to a 1 in a given row. For example in row 2 the product is a "Med", so I'm looking to assign a 1 to the 0 in the "Sm" column.
Allocation size is a consideration so I'm looking for a vectorized approach without using a for loop please. The final solution should output the following:
Sm Med Lg
1 1 0 0
2 1 1 0
3 1 1 1
I've tried several variations of the code below, but the closest I can get is a ragged array which assigns all of the 1's correctly while dropping the elements that have legitimate 0's.
apply(dat, 1, function(x) {
x[1:which.max(x)] <- 1
})
[1] 1 1 1
And below, which gets close but without the needed trailing 0's
apply(dat, 1, function(x) {
temp <- x[1:which.max(x)]
unlist(lapply(temp, function(y) {
y <- 1
}))
})
[[1]]
Sm
1
[[2]]
Sm Med
1 1
[[3]]
Sm Med Lg
1 1 1
First, convert to matrix and use max.col to get the index of the 1 in each row:
mat <- as.matrix(dat)
mc <- max.col(mat)
logical construction Overwrite the matrix:
mat = +(col(mat) <= mc)
or construct an index of matrix positions to change and change 'em:
logical indexing
mat[col(mat) < mc] <- 1L
# or
mat[which(col(mat) < mc)] <- 1L
matrix indexing
idx <- do.call( rbind, lapply( seq_along(mc), function(i)
if (i==1L) NULL
else cbind(i,seq_len(mc[i]-1))
))
mat[idx] <- 1L
vector indexing
nr <- nrow(mat)
idx <- unlist( lapply( seq_along(mc), function(i)
if (mc[i]==1L) NULL
else seq(from = i, by = nr, length.out = mc[i]-1L)
))
mat[idx] <- 1L
The help for all three indexing methods can be found at help("[<-").
This will do what you want.
dat[which(dat$Med==1),]$Sm = 1
dat[which(dat$Lg==1),]$Med = 1
dat[which(dat$Lg==1),]$Sm = 1

Resources