OR between two strings with 0, 1 's - r

Consider two strings of the form below:
101001
010001
How I can do OR between these two and report number of ones?
My goal is to just report 4 for the two strings above.
Thanks very much for your help

There's probably a more elegant way, but how about this:
x = "101001"
y = "010001"
dat = c(strsplit(x, split=""), strsplit(y, split=""))
sum(dat[[1]] == 1 | dat[[2]] == 1)
or this:
sum(unlist(strsplit(x, split="")) == 1 | unlist(strsplit(y, split="")) == 1)
or, per #jbaums comment:
sum(as.numeric(strsplit(x, '')[[1]]) | as.numeric(strsplit(y, '')[[1]]))

If you're only dealing with binary, you can convert the strings to numerics, add them, and count the number of non-zeros. (Edited to incorporate Julius's recommendation)
x = "101001"
y = "010001"
xy <- as.numeric(x) + as.numeric(y)
length(gregexpr("(1|2)", xy)[[1]])
You can write this to run over a vector pretty easily too.
#* function to generate sample data
make_binary_string <- function(n = 10, len = 6)
{
vapply(1:n,
function(i, n, len) paste0(sample(0:1, 6, replace = TRUE), collapse = ""),
character(1),
n = n,
len = len)
}
set.seed(pi)
x <- make_binary_string(n = 10)
y <- make_binary_string(n = 10)
xy <- as.numeric(x) + as.numeric(y)
nchar(gsub("0", "", xy))

Here is what I tried.
df <- data.frame(strsplit(str1,split = ""), strsplit(str2,split = ""))
names(df) <- c('x1', 'x2')
This will convert strings into dataframe like this
x1 x2
1 1 0
2 0 1
3 1 0
4 0 0
5 0 0
6 1 1
And then count number of rows which have atleast one 1
nrow(df[df$x1 == 1 | df$x2 == 1,])
Or
sum(bitwOr(as.numeric(strsplit(str1,split = "")[[1]]) , as.numeric(strsplit(str2,split = "")[[1]])))

We can define a function to.bool() that converts a string to a sequence of boolean values:
to.bool <- function(boolstr) as.logical(as.integer(unlist(strsplit(boolstr,""))))
sum(to.bool("101001") | to.bool("010001"))
#[1] 4

Related

Replace integers in a data frame column with other integers in R?

I want to replace a vector in a dataframe that contains only 4 numbers to specific numbers as shown below
tt <- rep(c(1,2,3,4), each = 10)
df <- data.frame(tt)
I want to replace 1 = 10; 2 = 200, 3 = 458, 4 = -0.1
You could use recode from dplyr. Note that the old values are written as character. And the new values are integers since the original column was integer:
library(tidyverse):
df %>%
mutate(tt = recode(tt, '1'= 10, '2' = 200, '3' = 458, '4' = -0.1))
tt
1 10.0
2 10.0
3 200.0
4 200.0
5 458.0
6 458.0
7 -0.1
8 -0.1
To correct the error in the code in the question and provide for a shorter example we use the input in the Note at the end. Here are several alternatives. nos defined in (1) is used in some of the others too. No packages are used.
1) indexing To get the result since the input is 1 to 4 we can use indexing. This is probably the simplest solution given that the original values of tt are in 1:4.
nos <- c(10, 200, 458, -0.1)
transform(df, tt = nos[tt])
## tt
## 1 10.0
## 2 10.0
## 3 200.0
## 4 200.0
## 5 458.0
## 6 458.0
## 7 -0.1
## 8 -0.1
1a) If the input is not necessarily in 1:4 then we could use this generalization
transform(df, tt = nos[match(tt, 1:4)])
2) arithmetic Another approach is to use arithmetic:
transform(df, tt = 10 * (tt == 1) +
200 * (tt == 2) +
458 * (tt == 3) +
-0.1 * (tt == 4))
3) outer/matrix multiplication This would also work:
transform(df, tt = c(outer(tt, 1:4, `==`) %*% nos))
3a) This is the same except we use model.matrix instead of outer.
transform(df, tt = c(model.matrix(~ factor(tt) + 0, df) %*% nos))
4) factor The levels of the factor are 1:4 and the corresponding labels are defined by nos. Extract the labels using format and then convert them to numeric.
transform(df, tt = as.numeric(format(factor(tt, levels = 1:4, labels = nos))))
4a) or as a pipeline
transform(df, tt = tt |>
factor(levels = 1:4, labels = nos) |>
format() |>
as.numeric())
5) loop We can use a simple loop. Nulling out i at the end is so that it is not made into a column.
within(df, { for(i in 1:4) tt[tt == i] <- nos[i]; i <- NULL })
6) Reduce This is somewhat similar to (5) but implements the loop using Reduce.
fun <- function(tt, i) replace(tt, tt == i, nos[i])
transform(df, tt = Reduce(fun, init = tt, 1:4))
Note
df <- data.frame(tt = c(1, 1, 2, 2, 3, 3, 4, 4))

Thousand separator to numeric columns in R

I am trying to format numbers as shown (adding thousand separator). The function is working fine but post formatting the numbers, the numeric columns does not sort by numbers since there are characters
df <- data.frame(x = c(12345,35666,345,5646575))
format_numbers <- function (df, column_name){
df[[column_name]] <- ifelse(nchar(df[[column_name]]) <= 5, paste(format(round(df[[column_name]] / 1e3, 1), trim = TRUE), "K"),
paste(format(round(df[[column_name]] / 1e6, 1), trim = TRUE), "M"))
}
df$x <- format_numbers(df,"x")
> df
x
1 12.3 K
2 35.7 K
3 0.3 K
4 5.6 M
Can we make sure the numbers are sorted in descending/ascending order post formatting ?
Note : This data df is to be incorporated in DT table
The problem is the formating part. If you do it correctly--ie while maintaining your data as numeric, then everything else will fall in place. Here I will demonstrate using S3 class:
my_numbers <- function(x) structure(x, class = c('my_numbers', 'numeric'))
format.my_numbers <- function(x,..., d = 1, L = c('', 'K', 'M', 'B', 'T')){
ifelse(abs(x) >= 1000, Recall(x/1000, d = d + 1),
sprintf('%.1f%s', x, L[d]))
}
print.my_numbers <- function(x, ...) print(format(x), quote = FALSE)
'[.my_numbers' <- function(x, ..., drop = FALSE) my_numbers(NextMethod('['))
Now you can run your code:
df <- data.frame(x = c(12345,35666,345,5646575))
df$x <- my_numbers(df$x)
df
x
1 12.3K
2 35.7K
3 345.0
4 5.6M
You can use any mathematical operation on column x as it is numeric.
eg:
cbinding with its double and ordering from smallest to larges:
cbind(x = df, y = df*2)[order(df$x),]
x x
3 345.0 690.0 # smallest
1 12.3K 24.7K
2 35.7K 71.3K
4 5.6M 11.3M # largest ie Millions
Note that under the hood, x does not change:
unclass(df$x)
[1] 12345 35666 345 5646575 # Same as given

Sum all integers > 9 individually in R. E.g. 10 = 1+0, 11 = 1+1

Im trying to write a function based on the Luhn algorithm (mod 10 algorithm), and I need a function that sums all integers > 9 in my number vector individually. E.g. 10 should sum to 1+0=1, and 19 should sum to 1+9=10. Example code:
nmr <- ("1_9_8_2_0_5_0_1_3_3_4_8")
nmr <- strsplit(nmr, "_")
nmr <- as.numeric(as.character(unlist(nmr[[1]])))
luhn_alg <- c(0,0,2,1,2,1,2,1,2,1,2,0)
x <- nmr*luhn_alg
x
[1] 0 0 16 2 0 5 0 1 6 3 8 0
sum(x)
[1] 41
I dont want the sum of x to equal 41. Instead I want the sum to equal: 0+0+1+6+2+0+5+0+1+6+3+8+0=32. I tried with a for loop but doesn't seem to get it right. Any help is much appreciated.
You may need to split the data again after multiplying it with luhn_alg.
Luhn_sum <- function(x, y) {
nmr <- as.numeric(unlist(strsplit(x, "_")))
x1 <- nmr*y
x1 <- as.numeric(unlist(strsplit(as.character(x1), '')))
sum(x1)
}
nmr <- ("1_9_8_2_0_5_0_1_3_3_4_8")
luhn_alg <- c(0,0,2,1,2,1,2,1,2,1,2,0)
Luhn_sum(nmr, luhn_alg)
#[1] 32
You can use substring and seq to create a vector of single digit numbers, then you only need to do a sum over them:
sum(
as.numeric(
substring(
paste(x, collapse = ""),
seq(1, sum(nchar(x)), 1),
seq(1, sum(nchar(x)), 1)
)
)
)

Assign order to list of numbers in R

Suppose I have a vector like this
lst <- c(2,3,4,6,7,9,10)
Is it possible to number the items in sequence?
Expected Output
lst.rank <- c(1,2,3,1,2,1,2)
unlist(lapply(split(lst, cumsum(c(1, diff(lst)) != 1)), seq_along), use.names = FALSE)
#OR
ave(cumsum(c(1, diff(lst)) != 1), cumsum(c(1, diff(lst)) != 1), FUN = seq_along)
#[1] 1 2 3 1 2 1 2
In the same spirit as d.b's answer, but using rle and sequence.
sequence(rle(cumsum(c(1, diff(lst)) != 1))$lengths)
[1] 1 2 3 1 2 1 2
lst <- c(2,3,4,6,7,9,10)
m = 1
for (i in 1:(length(lst)-1) ){
if (lst[i+1] == lst[i]+1){
lst[i]=m
if(i == length(lst)-1) lst[i+1] = m + 1
m = m+1
}
else{
lst[i]=m
m = 1
}
}
lst

R: looping through list of variable names in data.frame to create new variables

I am trying to write a function that will take a data.frame, a list (or a character vector) of variable names of the data.frame and create some new variables with names derived from the corresponding variable names in the list and values from the variables named in the list.
For example, if data.frame d has variable x, y, z, w, the list of names is c('x', 'z') the output maybe vectors with names x.cat, z.cat and values based on values of d$x and d$z.
I can do this with a loop
df <- data.frame(x = c(1 : 10), y = c(11 : 20), z = c(21 : 30), w = c(41: 50))
vnames <- c("x", "w")
loopfunc <- function(dat, vlst){
s <- paste(vlst, "cat", sep = ".")
for (i in 1:length(vlst)){
dat[s[i]] <- NA
dat[s[i]][dat[vlst[i]] %% 4 == 0 ] <- 0
dat[s[i]][dat[vlst[i]] %% 4 == 1 | dat[vlst[i]] %%4 == 3] <- 1
dat[s[i]][dat[vlst[i]] %% 4 == 2 ] <- 2
}
dat[s]
}
dout <- loopfunc(df, vnames)
This would output a 10x2 data.frame with columns x.cat and w.cat, the values of these are 0, 1, or 2 depending on the remainder of the corresponding values of df$x and df$w mod 4.
I would like to find a way to something like this without loop, maybe using the apply functions?
Here is a failed attempt
noloopfunc <- function(dat, l){
assign(l[2], NA)
assign(l[2][d[l[1]] %% 4 == 0], 0)
assign(l[2][d[l[1]] %% 4 == 2], 2)
assign(l[2][(d[l[1]] %% 4 == 1) | (d[l[1]] %% 4 == 3)], 1)
as.name(l[2])
}
newvnames <- sapply(vnames, function(x){paste(x, "cat", sep = ".")})
vpairs <- mapply(c, vnames, newvnames, SIMPLIFY = F)
lapply(vpairs, noloopfunc, d <- df)
Here the formal argument l is supposed to represent vpairs[[1]] or vpairs[[2]], both string vectors of length 2.
I found several threads on Stackoverflow on converting strings to variable names but I couldn't find anything where it is used in this way where the variables have to be referred to subsequently and assigned values in a non interactive way.
Thanks for any help.
You can replace your loop with an apply variant
dout <- as.data.frame(sapply(vnames, function(x) {
out <- rep(NA, nrow(df))
out[df[,x] %% 4 == 0] <- 0
out[df[,x] %% 4 == 1 | df[,x] %% 4 == 3] <- 1
out[df[,x] %% 4 == 2] <- 2
out
}))
names(dout) <- paste(vnames, "cat", sep=".")

Resources