using a "pasted" name inside a function - r

I have a function that computes some things and then assigns that to a matrix. This matrix receives its name from a paste statement (based on some other current values). I then want to assign the dimnames to the matrix, but don't know how to make the pasted name be understood.
Here is what is going on:
function <- someComputations(labs) {
### bunch of computations, leading to X, Y, and Z:
matName <- paste("rhoMat_", X, sep = "") # this yields rhoMat_15 if X equals 15
assign(matName, Y %*% Z)
assign(dimnames(matName), labs) # labs is a list of row labels and column labels
return(matName)
}
This works well, including the first assign statement, and then it breaks down.
I have tried all kinds of approaches, such as eval(parse(text = matNum)), as.name(matNum), substitute(matNum), but to no avail.
Since I don't know the actual name of the matrix (because matNum is not given), I can't hardcode the name into the function--so I am stuck with its character name matName. How can I make R understand I want to set the dimnames of the matrix rhoMat_15, rather than of matName?
Thanks, Peter

dimnames(get(matName)) <- labs

Related

Subtracting list elements from another list in R

I have two lists and I want to subtract one list element wise with the other, in order to replicate a Matlab function bsxfun(#minus, lt, lt2). The two lists look something like the below (edit: now works without pracma package):
# Code
# First list
lt = c(list())
# I use these lines to pre-dim the list...
lt[[1]] = c(rep(list(1)))
lt[[2]] = c(rep(list(1)))
# ... such that I can add matrices it this way:
lt[[1]][[1]] = matrix(c(3),nrow=1, ncol=1,byrow=TRUE)
lt[[2]][[1]] = matrix(c(1),nrow=1, ncol=1, byrow=TRUE)
# Same with the second list:
lt2 = c(list())
lt2[[1]] = c(rep(list(1)))
lt2[[2]] = c(rep(list(1)))
lt2[[1]][[1]] = matrix(c(2,2,2),nrow=3, ncol=1,byrow=TRUE)
lt2[[2]][[1]] = matrix(c(1,1,1),nrow=3, ncol=1,byrow=TRUE)
Element wise subtraction would mean that that each row of an element of lt2 would be subtracted
by the respective element of the object lt, i.e., lt2[[1]][[1]] each row by 3, resulting in t(c(-1 -1 -1)).... and lt2[[2]][[1]] = t(c(0,0,0)) by 1 ... It is important to me that the list structure is maintained in the results.
Now I tried using lapply(lt2,"-",lt) but it does not work. Any suggestions?
I suspect you are looking for something like this skeleton code which subtracts 2 lists element-wise...
x <- list(1,2,3)
y <- list(4,5,6)
mapply('-', y, x, SIMPLIFY = FALSE)
but as noted, you need 2 identical lists (or at least R's recycling algorithms must make sense) as for example...
z <- list(4,5,6,7,8,9)
mapply('-',z,x,SIMPLIFY = FALSE)
You might be looking for something like this where you subtract a constant from each member of the list...
mapply('-',y,2, SIMPLIFY= FALSE)
I figured it out - I had another mistake in the question :/
Changing the second class as.numeric worked
lt3 = lapply(lt2[[1]],"-",as.numeric(lt[[1]]))

How to concatenate NOT as character in R?

I want to concatenate iris$SepalLength, so I can use that in a function to get the Sepal Length column from iris data frame. But when I use paste function paste("iris$", colnames(iris[3])), the result is as characters (with quotes), as "iris$SepalLength". I need the result not as a character. I have tried noquotes(), as.datafram() etc but it doesn't work.
freq <- function(y) {
for (i in iris) {
count <-1
y <- paste0("iris$",colnames(iris[count]))
data.frame(as.list(y))
print(y)
span = seq(min(y),max(y), by = 1)
freq = cut(y, breaks = span, right = FALSE)
table(freq)
count = count +1
}
}
freq(1)
The crux of your problem isn't making that object not be a string, it's convincing R to do what you want with the string. You can do this with, e.g., eval(parse(text = foo)). Isolating out a small working example:
y <- "iris$Sepal.Length"
data.frame(as.list(y)) # does not display iris$Sepal.Length
data.frame(as.list(eval(parse(text = y)))) # DOES display iris.$Sepal.Length
That said, I wanted to point out some issues with your function:
The input variable appears to not do anything (because it is immediately overwritten), which may not have been intended.
The for loop seems broken, since it resets count to 1 on each pass, which I think you didn't mean. Relatedly, it iterates over all i in iris, but then it doesn't use i in any meaningful way other than to keep a count. Instead, you could do something like for(count in 1 : length(iris) which would establish the count variable and iterate it for you as well.
It's generally better to avoid for loops in R entirely; there's a host of families available for doing functions to (e.g.) every column of a data frame. As a very simple version of this, something like apply(iris, 2, table) will apply the table function along margin 2 (the columns) of iris and, in this case, place the results in a list. The idea would be to build your function to do what you want to a single vector, then pass each vector through the function with something from the apply() family. For instance:
cleantable <- function(x) {
myspan = seq(min(x), max(x)) # if unspecified, by = 1
myfreq = cut(x, breaks = myspan, right = FALSE)
table(myfreq)
}
apply(iris[1:4], 2, cleantable) # can only use first 4 columns since 5th isn't numeric
would do what I think you were trying to do on the first 4 columns of iris. This way of programming will be generally more readable and less prone to mistakes.

Expand grid in R with paste

I am trying to analyse a dataframe using hierarchical clustering hclust function in R.
I would like to pass in a vector of p values I'll write beforehand (maybe something like c(5/4, 3/2, 7/4, 9/4)) and be able to have these specified as the different p value options with Minkowski distance when I use expand.grid. Ideally, when hyperparams is viewed, it would also be clear which value of p has been used for each minkowski, i.e. they should be labelled. So for example, where (if you run my code for hyperparams) there would currently just be one minkowski under Dists, for each of the methods in Meths, there would be, if I supplied the p vector as c(5/4, 3/2, 7/4, 9/4), now instead 4 rows for Minkowski distance: minkowski, p=5/4, minkowski, p=3/2, minkowski, p=7/4, minkowski, p=9/4 (or looking something like that, making the p values clear). Any ideas?
(Note: no packages please, only base R!)
Edit: I worded it poorly before, now rewritten. Let's take the following example instead:
acc <- function(x){
first = sum(x)
second = sum(x^2)
return(list(First=first,Second=second))
}
iris0 <- iris
iris1 <- cbind(log(iris[,1:4]),iris[5])
iris2 <- cbind(sqrt(iris[,1:4]),iris[5])
Now the important bit:
tests <- expand.grid(Dists=c("euclidean","maximum","manhattan","canberra","binary"),
DS=c("iris0","iris1","iris2"))
Table <- Map(function(x, ds){acc(table(ds$Species, cutree(hclust(dist(get(ds)[,1:4], method=x)),3)))},tests[[1]], tests[[2]])
This will work. But now if I want to include a term like "minkowski",p=3 in expand.grid, how would I do it?
tests <- expand.grid(Dists=c("euclidean","maximum","manhattan","canberra","binary","minkowski,p=3"),
DS=c("iris0","iris1","iris2"))
Table <- Map(function(x, ds){acc(table(ds$Species, cutree(hclust(dist(get(ds)[,1:4], method=x)),3)))},tests[[1]], tests[[2]])
This gives an error.
In reality there should be no p argument unless the method="minkowski". I have tried to use strsplit to get the first part of the expression into ds, and a switch with strsplit to get the second part and then use parse (it would return NULL if the length of the strsplit was not 2 -- this should pass no argument, I think). The issue seems to be that strsplit is not strsplit(x,",") fails to evaluate the vectorized x but rather tries to evaluate the character x which is not a string. Can anyone suggest any workaround/fix or other method for including the minkowski,p=1.6 terms and the like?
We can create a 'p' value column
tests <- expand.grid(Dists=c("euclidean","maximum","manhattan","canberra","binary",
"minkowski3", "minkowski4", "minkowski5"),
DS=c("iris0","iris1","iris2"))
Suppose, we have another column of 'p' values in 'tests', the above solution can be changed to
tests$p <- as.list(args(dist))$p # default value
i1 <- grepl("minkowski", tests$Dists)
tests$Dists <- sub("[0-9.]+$", "", tests$Dists)
tests$p[i1] <- rep(3:5, length.out = sum(i1))
Map(function(x, ds, p){
dist1 <- dist(get(ds)[, 1:4], method = x, p = p)
ct <- cutree(hclust(dist1), 3)
acc(table(get(ds)$Species, ct))},
as.character(tests[[1]]), as.character(tests[[2]]), tests$p )

R assign a list of values to a list of objects

Thank you for trying to help. I am happy to be corrected on all R misdemeanors.
I am not sure that I was entirely clear with my earlier post as below, so I will hope to clarify:
In the R console, my calls 'use source (etc)' to a .R file
Code within the .R file uses variables (for e.g. 'extracted info' ) ex1, ex2, ex3. These may hold strings or (a string of) numbers pulled from text.
In line with your guidance I've renamed my function to 'reset' (and ?reset indicates no other occurrences) are in scope. I'm passing both x and y which from outside the function:
#send variables ex1, ex2, ex3 together with location, loc and parse, prs to be reset with 0
reset(x<-c(loc,prs,ex1,ex2,ex3),y<-rep(c(0),length(x))) #repeats 0 in y variable as many times as there are entries for x
reset<-function(x,y){
print(c("resetting ",x," with ", y))
if (length(x) == length(y)) {x <- y
print(paste(x,"=",y),sep="") #both x and y should now be equal (to y)
} else {
paste("list lengths differ: x=",length(x)," y=",length(y),sep="")
}
}
Now both x and y are 0 but ex1, ex2 and ex3 still contain the previous values
I would like ex1, ex2 and ex3 all to be 0 before they are used in a subsequent section of code, so they don't contaminate extracted data with previous values such as:
loc<-str_locate(data[i],"=")
prs<-str_locate(data[i],",")
#extract data from the end of loc to before the occurrence of prs
ex1<-str_sub(data[i],loc[2]+1,prs[1]-1)
#cleanup
#below is simplified for example;
#in reality I wish to send ex1:ex(n) to be reset with values val1:val(n)
The desired outcome would be that back in the Rconsole >ex1 should now return 0.
Hope you can understand my dilemma and possibly help.
Say my code uses some variables to hold data extracted from a string using Stringr str_sub. The variables are temporary in that I use the values to construct other strings then they should be freed up to be used in an upcoming test: i.e. if (test==true){extract<-str_sub(string, start, end)}
For a later test, I would like extract==0; simple enough, but I have a few of these and would like to do it in one fell swoop.
I've used a for loop, but if there is a simpler way, please identify this.
My attempt is using a function:
#For variables loc, prs, ex1 and x2, set all values to 0
x<-assign(x<-c(loc, prs, ex1, ex2),y<-rep(c(0),length(x)))
#Function
assign <- function(x, y) {
if(length(x)==length(y)){
for (i in 1:length(x)){x[i]<-y[i]}
print(c("Assigned",x[i]))
return (x)
} else { print (c("list lengths differ: x=",length(x)," y=",length(y)))
}
}
The problem being that this returns x as 0, but the list of variables retain their values.
I'm a bit of a noob to both r and SO, so although I've benefitted from SO's bountiful advice on numerous occasions, this is my first question, so please be gentle. I have searched this issue, but have not found what I need in a few hours now. Hope you can help.
Beware of naming a function assign. There is already one in base-r and you will create confusion.
There are a couple of problems with your function besides its name. First, you do not need the for-loop to replace x by y, as this is a basic vectorized operation. Just use x <- y ; second, your should wrap your message in paste.
asgn <- function(x, y) {
if(length(x)==length(y)){
## This step is not needed, return(y) is better as #Rick proposed in their now deleted answer
## I am leaving it to show you how the for-loop is not needed
x<-y
return (x)
} else {
print (paste("list lengths differ: x=",length(x)," y=",length(y)))
return(x)
}
}
Then, there are a couple of problems with your function call. You use <- instead of = to specify the arguments. They are only somewhat synonymous for assigning variables, but a function argument is another matter. Finally, you are trying to use x is the definition of y in the arguments (length(x)), but this is not possible, because it is not yet defined, so it is looking for x in the parent environment. You should test your function with length(3) instead.
x<-asgn(x=c(loc, prs, ex1, ex2),y=rep(c(0),length(3)))

Project Euler #22, off by 158,055

I'm currently working through Project Euler problem 22 which has the following challenge:
Using names.txt (right click and 'Save Link/Target As...'), a 46K text file containing over five-thousand first names, begin by sorting it into alphabetical order. Then working out the alphabetical value for each name, multiply this value by its alphabetical position in the list to obtain a name score.
For example, when the list is sorted into alphabetical order, COLIN, which is worth 3 + 15 + 12 + 9 + 14 = 53, is the 938th name in the list. So, COLIN would obtain a score of 938 × 53 = 49714.
What is the total of all the name scores in the file?
The file can be downloaded using the above link. I've written the below code to solve the problem:
rm(list=ls())
library(splitstackshape)
#read in data from http://projecteuler.net/problem=22
names=sort(t(read.table("names.txt",sep=",")))
#letters to numbers conversion vectors
from=LETTERS[seq(1,26)]
to=as.character(seq(1,26))
#function to replace all letters with corresponding numbers
gsub2 = function(pattern, replacement, x, ...){
for(i in 1:length(pattern))
x = gsub(pattern[i],paste(replacement[i]," ",sep=""), x, ...)
x
}
#create df, run function, create row number var for later calculation
df=data.frame(names=names)
df$name.num = gsub2(from,to,df$names)
df$rownum=seq(1,nrow(df))
#split letter values, add across rows, multiply by row number to get name score and sum
df=concat.split(df,"name.num"," ")
df$name.sum=rowSums(df[,4:15],na.rm=TRUE)
df$name.score=df$name.sum*df$rownum
print(sum(df$name.score,na.rm=TRUE))
My result appears to be off 158,055 (I get 871040227 where it should be 871198282). I've spot checked parts of it, and it appears that the list of names is sorted correctly, and that the name scores are compiling correctly (for instance, I also get COLIN=49174). I've also read other threads troubleshooting this problem on SO, but they're mostly in Python and the problems seem to be different than mine. My suspicion is that either the names.txt file is somehow not being read in right or that perhaps the method I'm using (concat.split from the splitstackshape package) to split the df$name.num is incorrect, though it seems to be working correctly.
Any ideas?
Also, any suggestions on how to improve/simplify my code are more than welcome!
I used to have fun doing the Euler problems in R. Here's my solution to 22.
namesscore<-function(name) {
score<-0;
for(s in 1:nchar(name)) {
score<-score + which(substr(name,s,s)==LETTERS[1:26])
}
score
}
names<-scan("prob022.txt", "character", sep=",", quote="\"", na.strings="")
name.pos <- rank(names)
name.val <- sapply(names,namesscore)
sum(name.pos*name.val)
# [1] 871198282
There is a name "NA" in the list which may cause you problems.
As pointed out by #MrFlick, there's a 'NA' in the names list, so you need to treat it.
x = sort(scan('http://projecteuler.net/project/names.txt', what = '', sep =',', na.strings = ""))
s = sapply(x, function(w){
match(w, x) * sum(match(strsplit(w, '')[[1]], LETTERS))
})
print(sum(s))
# 871198282

Resources