I would like to calculate name number for a set of given names.
Name number is calculated by summing the value assigned to each alphabet. The values are given below:
a=i=j=q=y=1
b=k=r=2
c=g=l=s=3
d=m=t=4
h=e=n=x=5
u=v=w=6
o=z=7
p=f=8
Example: Name number of David can be calculated as follows:
D+a+v+i+d
4+1+6+1+4
16=1+6=7
Name number of David is 7.
I would like to write a function in R for doing this.
I am thankful for any directions or tips or package suggestions that I should look into.
This code snippet will accomplish what you want:
# Name for which the number should be computed.
name <- "David"
# Prepare letter scores array. In this case, the score for each letter will be the array position of the string it occurs in.
val <- c("aijqy", "bkr", "cgls", "dmt", "henx", "uvw", "oz", "pf")
# Convert name to lowercase.
lName <- tolower(name)
# Compute the sum of letter scores.
s <- sum(sapply(unlist(strsplit(lName,"")), function(x) grep(x, val)))
# Compute the "number" for the sum of letter scores. This is a recursive operation, which can be shortened to taking the mod by 9, with a small correction in case the sum is 9.
n <- (s %% 9)
n <- ifelse(n==0, 9, n)
'n' is the result that you want for any 'name'
You will want to create a vector of values, in alphabetical order, then use match to get their indices. Something like this:
a <- i <- j <- q <- y <- 1
b <- k <- r <- 2
c <- g <- l <- s <- 3
d <- m <- t <- 4
h <- e <- n <- x <- 5
u <- v <- w <- 6
o <- z <- 7
p <- f <- 8
vals <- c(a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z)
sum(vals[match(c("d","a","v","i","d"), letters)])
I'm sure there are several ways to do this, but here's an approach using a named vector:
x <- c(
"a"=1,"i"=1,"j"=1,"q"=1,"y"=1,
"b"=2,"k"=2,"r"=2,
"c"=3,"g"=3,"l"=3,"s"=3,
"d"=4,"m"=4,"t"=4,
"h"=5,"e"=5,"n"=5,"x"=5,
"u"=6,"v"=6,"w"=6,
"o"=7,"z"=7,
"p"=8,"f"=8)
##
name_val <- function(Name, mapping=x){
split <- tolower(unlist(strsplit(Name,"")))
total <-sum(mapping[split])
##
sum(as.numeric(unlist(strsplit(as.character(total),split=""))))
}
##
Names <- c("David","Betty","joe")
##
R> name_val("David")
[1] 7
R> sapply(Names,name_val)
David Betty joe
7 7 4
Related
I want to create a function which replaces the a chosen row of a matrix with zeros. I try to think of the matrix as arbitrary but for this example I have done it with a sample 3x3 matrix with the numbers 1-9, called a_matrix
1 4 7
2 5 8
3 6 9
I have done:
zero_row <- function(M, n){
n <- c(0,0,0)
M*n
}
And then I have set the matrix and tried to get my desired result by using my zero_row function
mat1 <- a_matrix
zero_row(M = mat1, n = 1)
zero_row(M = mat1, n = 2)
zero_row(M = mat1, n = 3)
However, right now all I get is a matrix with only zeros, which I do understand why. But if I instead change the vector n to one of the following
n <- c(0,1,1)
n <- c(1,0,1)
n <- c(1,1,0)
I get my desired result for when n=1, n=2, n=3 separately. But what i want is, depending on which n I put in, I get that row to zero, so I have a function that does it for every different n, instead of me having to change the vector for every separate n. So that I get (n=2 for example)
1 4 7
0 0 0
3 6 9
And is it better to do it in another form, instead of using vectors?
Here is a way.
zero_row <- function(M, n){
stopifnot(n <= nrow(M))
M[n, ] <- 0
M
}
A <- matrix(1:9, nrow = 3)
zero_row(A, 1)
zero_row(A, 2)
zero_row(A, 3)
I have data where rows are points and columns are coordinates x,y,z.
I'd like to calculate euclidean distance between points in couple, as 3-4, 11-12, 18-19 and so on... for example, I dont' need distance between 3 and 11, 12, 18
The problem is that I have to analize 1074 tables with 1000 rows or more, so I'm searching a way to do it automatically, maybe considering tha fact that I want to calculate distance between an odd number and the even following one. I don't care too much about the output format, but pls consider that after I have to select only distances <3.2, so a dataframe format will be great.
THANK YOU! :*
How about something like this:
First, I'll make some fake data
set.seed(4304)
df <- data.frame(
x = runif(1000, -1, 1),
y = runif(1000, -1, 1),
z = runif(1000, -1,1)
)
Make a sequence of values from 1 to the number of rows of your dataset by 2s.
s <- seq(1, nrow(df), by=2)
Use sapply() to make the distance between each pair of points.
out <- sapply(s, function(i){
sqrt(sum((df[i,] - df[(i+1), ])^2))
})
Organize the distances into a data frame
res <- data.frame(
pair = paste(rownames(df)[s], rownames(df)[(s+1)], sep="-"),
dist=out)
head(res)
# pair dist
# 1 1-2 1.379992
# 2 3-4 1.303511
# 3 5-6 1.242302
# 4 7-8 1.257228
# 5 9-10 1.107484
# 6 11-12 1.392247
Here is a function that can be applied to a data.frame or matrix holding the data.
DistEucl <- function(X){
i <- cumsum(seq_len(nrow(X)) %% 2 == 1)
sapply(split(X, i), function(Y){
sqrt(sum((Y[1, ] - Y[2, ])^2))
})
}
DistEucl(df1)
# 1 2 3 4
#1.229293 1.234273 1.245567 1.195319
With the data in DaveArmstrong's answer, the results are the same except for a names attribute in the above function's return value.
out2 <- DistEucl(df)
all.equal(out, out2)
#[1] "names for current but not for target"
identical(out, unname(out2))
#[1] TRUE
Data in the question
x <- c(13.457, 13.723, 15.319, 15.713, 18.446, 19.488, 19.762, 19.743)
y <- c(28.513, 29.656, 28.510, 27.342, 28.827, 28.24, 29.841, 30.942)
z <- c(40.513, 40.147, 43.281, 43.218, 43.095, 43.443, 40.094, 40.559)
df1 <- data.frame(x, y, z)
I have an ini-file, read as a list by R (in the example l). Now I want to add further sub-lists along a vector (m) and assign always the same constant to them. My attempt so far:
l <- list("A")
m <- letters[1:5]
n <- 5
for (i in 1:5){
assign(paste0("l$A$",m[i]), n)
}
# which does not work
# example of the desired outcome:
> l$A$e
[1] 5
I don't think that I have fully understood how lists work yet...
Try
L[["A"]][m] <- n
L$A$e
# [1] 5
Data:
L <- list(A = list())
m <- letters[1:5]
n <- 5
I am trying to calculate the combinations of elements of a matrix but each element should appear only once.
The (real) matrix is symmetric, and can have more then 5 elements (up to ~2000):
o <- matrix(runif(25), ncol = 5, nrow = 5)
dimnames(o) <- list(LETTERS[1:5], LETTERS[1:5])
# A B C D E
# A 0.4400317 0.1715681 0.7319108946 0.3994685 0.4466997
# B 0.5190471 0.1666164 0.3430245044 0.3837903 0.9322599
# C 0.3249180 0.6122229 0.6312876740 0.8017402 0.0141673
# D 0.1641411 0.1581701 0.0001703419 0.7379847 0.8347536
# E 0.4853255 0.5865909 0.6096330935 0.8749807 0.7230507
I desire to calculate the product of all the combinations of pairs (If possible it should appear all elements:AB, CD, EF if the matrix is of 6 elements), where for each pair one letter is the column, the other one is the row. Here are some combinations:
AB, CD, E
AC, BD, E
AD, BC, E
AE, BC, D
AE, BD, C
Where the value of the single element is just 1.
Combinations not desired:
AB, BC: Element B appears twice
AB, AC: Element A appears twice
Things I tried:
I thought about removing the unwanted part of the matrix:
out <- which(upper.tri(o), arr.ind = TRUE)
out <- cbind.data.frame(out, value = o[upper.tri(o)])
out[, 1] <- colnames(o)[out[, 1]]
out[, 2] <- colnames(o)[out[, 2]]
# row col value
# 1 A B 0.1715681
# 2 A C 0.7319109
# 3 B C 0.3430245
# 4 A D 0.3994685
# 5 B D 0.3837903
# 6 C D 0.8017402
# 7 A E 0.4466997
# 8 B E 0.9322599
# 9 C E 0.0141673
# 10 D E 0.8347536
My attempt involves the following process:
Make a copy of the matrix (out)
Store first value of the first row.
Remove all the pairs that involve any of the pair.
Select the next pair of the resulting matrix
Repeat until all rows are removed of the matrix
Repeat 2:5 starting from a different row
However, this method has one big problem, it doesn't guarantee that all the combinations are stored, and it could store several times the same combination.
My expected output is a vector, where each element is the product of the values in the cell selected by the combination:
AB, CD: 0.137553
How can I extract all those combinations efficiently?
This might work. I tested this on N elements = 5 and 6.
Note that this is not optimised, and hopefully can provide a framework for you to work from. With a much larger array, I can see steps involving apply and combn being a bottleneck.
The idea here is to generate a collection of unique sets first before calculating the product of the sets from another data.frame that stores values of sets.
Unique sets are identified by counting the number of unique elements in all combination pairs. For example, if N elements = 6, we expect length(unlist(combination)) == 6. The same is true if N elements = 7 (there will only be 3 pairs plus a remainder element). In cases where N elements is odd, we can ignore the remaining, unpaired element since it is constrained by the other elements.
library(dplyr)
library(reshape2)
## some functions
unique_by_n <- function(inlist, N){
## select unique combinations by count
## if unique, expect n = 6 if n elements = 6)
if(N %% 2) N <- N - 1 ## for odd numbers
return(length(unique(unlist(inlist))) == N)
}
get_combs <- function(x,xall){
## format and catches remainder if matrix of odd elements
xu <- unlist(x)
remainder <- setdiff(xall,xu) ## catch remainder if any
xset <- unlist(lapply(x, paste0, collapse=''))
finalset <- c(xset, remainder)
return(finalset)
}
## make dataset
set.seed(0) ## set reproducible example
#o <- matrix(runif(25), ncol = 5, nrow = 5) ## uncomment to test 5
#dimnames(o) <- list(LETTERS[1:5], LETTERS[1:5])
o <- matrix(runif(36), ncol = 6, nrow = 6)
dimnames(o) <- list(LETTERS[1:6], LETTERS[1:6])
o[lower.tri(o)] <- t(o)[lower.tri(o)] ## make matrix symmetric
n_elements = nrow(o)
#### get matrix
dat <- melt(o, varnames = c('Rw', 'Cl'), as.is = TRUE)
dat$Set <- apply(dat, 1, function(x) paste0(sort(unique(x[1:2])), collapse = ''))
## get unique sets (since your matrix is symmetric)
dat <- subset(dat, !duplicated(Set))
#### get sets
elements <- rownames(o)
allpairs <- expand.grid(Rw = elements, Cl = elements) %>%
filter(Rw != Cl) ## get all pairs
uniqpairsgrid <- unique(t(apply(allpairs,1,sort)))
uniqpairs <- split(uniqpairsgrid, seq(nrow(uniqpairsgrid))) ## get unique pairs
allpaircombs <- combn(uniqpairs,floor(n_elements/2)) ## get combinations of pairs
uniqcombs <- allpaircombs[,apply(allpaircombs, 2, unique_by_n, N = n_elements)] ## remove pairs with repeats
finalcombs <- apply(uniqcombs, 2, get_combs, xall=elements)
#### calculate results
res <- apply(finalcombs, 2, function(x) prod(subset(dat, Set %in% x)$value)) ## calculate product
names(res) <- apply(finalcombs, 2, paste0, collapse=',') ## add names
resdf <- data.frame(Sets = names(res), Products = res, stringsAsFactors = FALSE, row.names = NULL)
print(resdf)
#> Sets Products
#> 1 AB,CD,EF 0.130063454
#> 2 AB,CE,DF 0.171200062
#> 3 AB,CF,DE 0.007212619
#> 4 AC,BD,EF 0.012494787
#> 5 AC,BE,DF 0.023285088
#> 6 AC,BF,DE 0.001139712
#> 7 AD,BC,EF 0.126900247
#> 8 AD,BE,CF 0.158919605
#> 9 AD,BF,CE 0.184631344
#> 10 AE,BC,DF 0.042572488
#> 11 AE,BD,CF 0.028608495
#> 12 AE,BF,CD 0.047056905
#> 13 AF,BC,DE 0.003131029
#> 14 AF,BD,CE 0.049941770
#> 15 AF,BE,CD 0.070707311
Created on 2018-07-23 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0.9000).
Maybe the following does what you want.
Note that I was more interested in being right than in performance.
Also, I have set the RNG seed, to have reproducible results.
set.seed(9840) # Make reproducible results
o <- matrix(runif(25), ncol = 5, nrow = 5)
dimnames(o) <- list(LETTERS[1:5], LETTERS[1:5])
cmb <- combn(LETTERS[1:5], 2)
n <- ncol(cmb)
res <- NULL
nms <- NULL
for(i in seq_len(n)){
for(j in seq_len(n)[-seq_len(i)]){
x <- unique(c(cmb[, i], cmb[, j]))
if(length(x) == 4){
res <- c(res, o[cmb[1, i], cmb[2, i]] * o[cmb[1, j], cmb[2, j]])
nms <- c(nms, paste0(cmb[1, i], cmb[2, i], '*', cmb[1, j], cmb[2, j]))
}
}
}
names(res) <- nms
res
Sometimes I want to transform several data columns (usually character or factor) into one new column (usually a number). I try to do this using a lookup matrix. For example, my dataset is
dset <- data.frame(
x=c("a", "a", "b"),
y=c("v", "w", "w"),
stringsAsFactors=FALSE
)
lookup <- matrix(c(1:4), ncol=2)
rownames(lookup) <- c("a", "b")
colnames(lookup) <- c("v", "w")
Ideally (for my purpose here), I would now do
transform(dset, z=lookup[x,y])
and get my new data column. While this works in the one-dimensional case, this fails here, as lookup[x,y] returns a matrix. I came up with this function, which looks rather slow:
fill_from_matrix <- function(m, ...) {
arg <- list(...)
len <- sapply(arg, length)
if(sum(diff(len))!=0) stop("differing lengths in fill_from_matrix")
if(length(arg)!=length(dim(m))) stop("differing dimensions in fill_from_matrix")
n <- len[[1]]
dims <- length(dim(m))
res <- rep(NA, n)
for (i in seq(1,n)) {
one_arg <- list(m)
for (j in seq(1,dims)) one_arg[[j+1]] <- arg[[j]][[i]]
res[i] <- do.call("[", one_arg)
}
return(res)
}
With this function, I can call transform and get the result I wanted:
transform(dset, z=fill_from_matrix(lookup,x,y))
# x y z
# 1 a v 1
# 2 a w 3
# 3 b w 4
However, I am not satisfied with the code and wonder if there is a more elegant (and faster) way to perform this kind of transformation. How do I get rid of the for loops?
This is really quite easy and I suspect fast with base R indexing because the "[" function accepts a two-column matrix for this precise purpose:
> dset$z <- lookup[ with(dset, cbind(x,y)) ]
> dset
x y z
1 a v 1
2 a w 3
3 b w 4
If you needed it as a specific function then:
lkup <- function(tbl, rowidx, colidx){ tbl[ cbind(rowidx, colidx)]}
zvals <- lkup(lookup, dset$x, dset$y)
zvals
#[1] 1 3 4
(I'm pretty sure you can also use three and four column matrices if you have arrays of those dimensions.)
You can use library dplyr for inner_join and use a data.frame instead of matrix as lookup table:
library(dplyr)
lookup = transform(expand.grid(c('a','b'),c('v','w')), v=1:4) %>%
setNames(c('x','y','val'))
inner_join(dset, lookup, by=c('x','y'))
# x y val
#1 a v 1
#2 a w 3
#3 b w 4
A fast way is also to use data.table package, with my definition of lookup:
library(data.table)
setDT(lookup)
setDT(dset)
setkey(lookup, x ,y)[dset]
# x y val
#1: a v 1
#2: a w 3
#3: b w 4
If for any reason you have your matrix lookup as input, transform it in a dataframe:
lookup = transform(expand.grid(rownames(lookup), colnames(lookup)), v=c(lookup))
names(lookup) = c('x','y','val')