How to replace string from numeric value in R? - r

I have a dataset that contains the number of app installers, the data recorded with units M and K, like: 19M, 199K.
How to replace the prefixes with their values to convert values to numeric.
k to e+3
M to e+6

Edit: For values that have non-integer values.
x <- c("19M","20K","1K", "1.25M", "1.5K"); x
x <- sub("M", "e6", x); x
x <- sub("K", "e3", x); x
as.numeric(x)
[1] 19000000 20000 1000 1250000 1500
For integer values, the following is sufficient.
x <- c("19M","20K","1K")
x <- sub("M","000000", x)
x <- sub("K","000", x)
as.numeric(x)
1.9e+07 2.0e+04 1.0e+03

Related

Division between columns returns a number with no decimals

In R, I am trying to divide two numerical values. In this example, z returns a number with one decimal place:
x <- 10
y <- 3
z <- x/y
z
[1] 3.3
Without changing anything else in this example, R returns an integer, with no decimal places:
x <- 2966634
y <- 205.9
z <- x/y
z
[1] 14408.
I want to get decimals placed and control that, but I could not find an answer.
You have options(digits = ) let you fix the numbers of digits you want to show:
options(digits = 5)
x <- 2966634
y <- 205.9
z <- x/y
z
#[1] 14408
options(digits = 15)
z <- x/y
z
#14408.130160272

Tensorflow: Find greater than pairs and stack along axis

The problem I have using tensorflow is as follows:
For one tensor X with dims n X m
X = [[x11,x12...,x1m],[x21,x22...,x2m],...[xn1,xn2...,xnm]]
I want to get an n X m X m tensor which are n m X m matrices
Each m X m matrix is the result of:
tf.math.greater(tf.reshape(x,(-1,1)), x) where x is a row of X
In words, for every row k in X, Im trying to get the pairs i,j where xki > xkj. This gives me a matrix, and then I want to stack those matrices along the first axis, to get a n m x m cube.
Example:
X = [[1,2],[4,3], [5,7]
Result = [[[False, False],[True, False]],[[False, True],[False, False]], [[False, False],[True, False]]]
Result has shape 3 X 2 X 2
Reshaping each row is the same as reshaping all rows. Try this:
def fun(X):
n, m = X.shape
X1 = tf.expand_dims(X, -1)
X2 = tf.reshape(X, (n, 1, m))
return tf.math.greater(X1, X2)
X = tf.Variable([[1,2],[4,3], [5,7]])
print(fun(X))
Output:
tf.Tensor(
[[[False False]
[ True False]]
[[False True]
[False False]]
[[False False]
[ True False]]], shape=(3, 2, 2), dtype=bool)

Split a 24 X 24 matrix in 5 X 5 matrixes to get all possible combinations of rows and columns in R

I need to split a 24 X 24 dataframe in 5 X 5 dataframes where all possible combinations of rows and columns from the original 24 X 24 dataframe are included. Anyone up for the task?
There are 42504 combinations of 24 columns 5 by 5. Times an equal number of combinations of 24 rows 5 by 5 that's 1806590016 matrices. Each with 5*5 elements. If they are of class "integer" (32 bits), you'll need 168.2518 GB to store the result.
choose(24, 5)^2 * 5^2 * 4 / 1024/1024/1024
#[1] 168.2518
If this is really needed, here is a function to create sub-data.frames or sub-matrices where all possible combinations of rows and columns from the original data.frame or matrix are included.
sub_df <- function(x, n, output = c("data.frame", "matrix")){
nr <- nrow(x)
nc <- ncol(x)
out <- match.arg(output)
if(nr != nc){
warn <- sprintf("the data does not have an equal numbers of rows (%d) and columns (%d)", nr, nc)
warning(warn)
}
if(out == "matrix"){
combn(nr, n, \(i){
combn(nc, n, \(j) x[i, j])
})
} else {
combn(nr, n, \(i){
combn(nc, n, \(j) as.data.frame(x[i, j]), simplify = FALSE)
}, simplify = FALSE)
}
}
m <- matrix(1:25, nrow = 5)
sub_df(m, 2)
sub_df(m[-1,], 2) # gives a warning
sub_df(m, 2, "matrix")
sub_df(m, 2, "list") # gives an error

Specifying R to take one argument at a time when passing multiple arguments using '...'

I am a novice in R required by my superior to do things a certain way. I am interested in determining values of descriptive statistics setup count and heavy-dominance setup count. Setup count basically counts the number of setups found within a location, while heavy-dominance setup count counts the number of setups that has dominance values of x population ≥ 50% within the said location. This is how I would normally calculate said statistics:
##Normal Approach
#Sample Data 1
v <- c(53, 2, 97) #let vector "v" represent Location 1
w <- c(7, 16, 31, 44, 16) #let vector "w" represent Location 2
#Setup Count
sc_v <- length(v)
sc_w <- length(w)
sc <- c(sc_v, sc_w)
sc
#Heavy-Dominance Setup Count
hd_v <- length(which(v >= 50))
hd_w <- length(which(w >= 50))
hd <- c(hd_v, hd_w)
hd
I am tasked with developing a function that can both determine said statistical values from raw data and concatenate the outputs into a single vector. Here are the working functions I developed:
#Setup Count (2 vectors at a time only)
setup.count <- function(x, y){
a <- length(x)
b <- length(y)
d <- c(a, b)
d
}
#Heavy-Dominance Setup Count (2 vectors at a time only)
heavy.dominance <- function(x, y){
a <- length(which(x >= 50))
b <- length(which(y >= 50))
d <- c(a, b)
d
}
y <- setup.count(v, w)
y
z <- heavy.dominance(v, w)
z
Suppose there are more than two locations:
#Sample Data 2
v <- c(53, 2, 97)
w <- c(7, 16, 31, 44, 16)
x <- c(45, 22, 96, 74) #let vector "x" represent the additional Location 3
How can I specify R to take one argument at a time when passing multiple arguments using '...'? Here are the failed attempts to revise the abovementioned functions, to give an idea:
##Attempt 1
#Setup Count (incorrect v1)
setup.count <- function(x, ...){
data <- list(...)
a <- length(x)
b <- length(data) #will return the number of locations other than x, not the separate number of setups within each of these locations
d <- c(a, b)
d
}
#Heavy-Dominance Setup Count (incorrect v1)
heavy.dominance <- function(x, ...){
data <- list(...)
a <- length(which(x >= 50))
b <- length(which(data >= 50)) #will return the error "'list' object cannot be coerced to type 'double'"
d <- c(a, b)
d
}
y <- setup.count(v, w, x)
y
z <- heavy.dominance(v, w, x)
z
##Attempt 2
#Setup Count (incorrect v2)
setup.count <- function(x, ...){
data <- list(...)
a <- length(x)
b <- length(unlist(data)) #will return the total number of setups in all locations other than x, not as separate values
d <- c(a, b)
d
}
#Heavy-Dominance Setup Count (incorrect v2)
heavy.dominance <- function(x, ...){
data <- list(...)
a <- length(which(x >= 50))
b <- length(which(unlist(data) >= 50)) #will return the total number of setups with dominance ≥ 50% in all locations other than x, not as separate values
d <- c(a, b)
d
}
y <- setup.count(v, w, x)
y
z <- heavy.dominance(v, w, x)
z
You may just list() elements in the ellipsis. Use sapply() to loop over the list elements. Add a type= argument to have one function for both purposes, and a thresh= argument.
setup.fun <- function(..., type=c('count', 'dominance'), thresh=50) {
x <- list(...)
type <- match.arg(type)
if (type == 'count') sapply(x, length)
else sapply(x, function(x) length(which(x >= thresh)))
}
setup.fun(v, w, x)
# [1] 3 5 4
setup.fun(v, w, x, type='count')
# [1] 3 5 4
setup.fun(v, w, x, type='dominance')
# [1] 2 0 2
setup.fun(v, w, x, type='d')
# [1] 2 0 2
setup.fun(v, w, x, v)
# [1] 3 5 4 3
setup.fun(v)
# [1] 3
setup.fun(v, w, x, type='dominance', thresh=40)
# [1] 2 1 3

Matching interacting terms to a vector

I am trying to merge the xy vector with z vector according to the interaction terms that are in xy to the terms in z. Then change the final code to Q1, Q2...Q1*Q2
I have two vectors that need to match as vector xy:
x<-c(1,1,1,1,1,1,1,2,2,2,2,2,3,3,3,4,6,6,9,10,16,21)
y<-c(1,2,3,5,6,8,18,1,2,5,6,7,8,12,15,16,11,17,18,19,20,21)
I want any of 2*6,or 6*11 to be added to the vector z for any case of z because according to vector xy there are interactions between 2,6,11 according to vector z
xy=paste0(x,"*",y,collapse=",")
xy
# [1] #"1*1,1*2,1*3,1*5,1*6,1*8,1*18,2*1,2*2,2*5,2*6,2*7,3*8,3*12,3*15,4*16,6*11,6*17,#9*18,10*19,16*20,21*21"
z<-c(2,6,11)
z
#[1] 2 6 11
I want a fourth vector to have all interactions of z from vector xy and combined into a new vector xyz
xyz<-print("2+6+11+2*6+6*11")
#[1] "2+6+11+2*6+2*11+6*11"
xyz
#[1] "2+6+11+2*6+2*11+6*11"
then for each varaible 2,6,11 convert to Q1,Q2,Q3 So the end product looks like...
xyz<-print("Q1+Q2+Q3+Q1*Q2+Q2*Q3")
#[1]
#End result:
#"Q1+Q2+Q3+Q1*Q2+Q2*Q3"
Just loop through x/y and add an interaction if they are element of z:
x <- c(1,1,1,1,1,1,1,2,2,2,2,2,3,3,3,4,6,6,9,10,16,21)
y <- c(1,2,3,5,6,8,18,1,2,5,6,7,8,12,15,16,11,17,18,19,20,21)
z <- c(2,6,11)
xyz <- as.character(z)
for(i in 1:length(x)){
if(x[i] %in% z & y[i] %in% z & x[i] != y[i]){
xyz <- c(xyz, (paste0(x[i], "*", y[i])))
}
}
xyz <- paste(xyz, collapse = "+")
Then replace these numbers with any coding you define:
z_map <- c("Q1", "Q2", "Q3")
for(i in 1:length(z)){
xyz <- gsub(z[i], z_map[i], xyz)
}

Resources