I'm trying to create a y-axis label that is generated by pasting together two vectors that are the same length. The catch is that the first element needs to be italicized. Here's an example...
n <- 1:5
t <- LETTERS[1:5]
together <- paste(t, n)
plot(x=1:5, y=1:5, yaxt="n")
axis(2, at=1:5, label=together, las=2)
So, I'd like the t elements italicized. I've looked around expression, bquote, and substitute and am not making much progress. Anyone got a hint to help me here?
This is a bit tricky because the expression function expects a list of expressions. Therefore you need to convert the strings returned by paste to a list of unevaluated expressions. One way is like this
together <- do.call(expression, as.list(parse(text = paste0("italic(", t, ")~", n))))
You could use bquote
together <- as.expression(sapply(seq_along(t), function(i)
bquote(italic(.(t[i]))*.(n[i]))))
Or using for loop
v1 <- c()
for(i in seq_along(t)){
v1 <- c(v1, bquote(italic(.(t[i]))*.(n[i])))
}
together <- as.expression(v1)
Related
I am trying to analyse a dataframe using hierarchical clustering hclust function in R.
I would like to pass in a vector of p values I'll write beforehand (maybe something like c(5/4, 3/2, 7/4, 9/4)) and be able to have these specified as the different p value options with Minkowski distance when I use expand.grid. Ideally, when hyperparams is viewed, it would also be clear which value of p has been used for each minkowski, i.e. they should be labelled. So for example, where (if you run my code for hyperparams) there would currently just be one minkowski under Dists, for each of the methods in Meths, there would be, if I supplied the p vector as c(5/4, 3/2, 7/4, 9/4), now instead 4 rows for Minkowski distance: minkowski, p=5/4, minkowski, p=3/2, minkowski, p=7/4, minkowski, p=9/4 (or looking something like that, making the p values clear). Any ideas?
(Note: no packages please, only base R!)
Edit: I worded it poorly before, now rewritten. Let's take the following example instead:
acc <- function(x){
first = sum(x)
second = sum(x^2)
return(list(First=first,Second=second))
}
iris0 <- iris
iris1 <- cbind(log(iris[,1:4]),iris[5])
iris2 <- cbind(sqrt(iris[,1:4]),iris[5])
Now the important bit:
tests <- expand.grid(Dists=c("euclidean","maximum","manhattan","canberra","binary"),
DS=c("iris0","iris1","iris2"))
Table <- Map(function(x, ds){acc(table(ds$Species, cutree(hclust(dist(get(ds)[,1:4], method=x)),3)))},tests[[1]], tests[[2]])
This will work. But now if I want to include a term like "minkowski",p=3 in expand.grid, how would I do it?
tests <- expand.grid(Dists=c("euclidean","maximum","manhattan","canberra","binary","minkowski,p=3"),
DS=c("iris0","iris1","iris2"))
Table <- Map(function(x, ds){acc(table(ds$Species, cutree(hclust(dist(get(ds)[,1:4], method=x)),3)))},tests[[1]], tests[[2]])
This gives an error.
In reality there should be no p argument unless the method="minkowski". I have tried to use strsplit to get the first part of the expression into ds, and a switch with strsplit to get the second part and then use parse (it would return NULL if the length of the strsplit was not 2 -- this should pass no argument, I think). The issue seems to be that strsplit is not strsplit(x,",") fails to evaluate the vectorized x but rather tries to evaluate the character x which is not a string. Can anyone suggest any workaround/fix or other method for including the minkowski,p=1.6 terms and the like?
We can create a 'p' value column
tests <- expand.grid(Dists=c("euclidean","maximum","manhattan","canberra","binary",
"minkowski3", "minkowski4", "minkowski5"),
DS=c("iris0","iris1","iris2"))
Suppose, we have another column of 'p' values in 'tests', the above solution can be changed to
tests$p <- as.list(args(dist))$p # default value
i1 <- grepl("minkowski", tests$Dists)
tests$Dists <- sub("[0-9.]+$", "", tests$Dists)
tests$p[i1] <- rep(3:5, length.out = sum(i1))
Map(function(x, ds, p){
dist1 <- dist(get(ds)[, 1:4], method = x, p = p)
ct <- cutree(hclust(dist1), 3)
acc(table(get(ds)$Species, ct))},
as.character(tests[[1]]), as.character(tests[[2]]), tests$p )
I'm trying to replicate solution on applying multiple functions in sapply posted on R-Bloggers but I can't get it to work in the desired manner. I'm working with a simple data set, similar to the one generated below:
require(datasets)
crs_mat <- cor(mtcars)
# Triangle function
get_upper_tri <- function(cormat){
cormat[lower.tri(cormat)] <- NA
return(cormat)
}
require(reshape2)
crs_mat <- melt(get_upper_tri(crs_mat))
I would like to replace some text values across columns Var1 and Var2. The erroneous syntax below illustrates what I am trying to achieve:
crs_mat[,1:2] <- sapply(crs_mat[,1:2], function(x) {
# Replace first phrase
gsub("mpg","MPG",x),
# Replace second phrase
gsub("gear", "GeArr",x)
# Ideally, perform other changes
})
Naturally, the code is not syntactically correct and fails. To summarise, I would like to do the following:
Go through all the values in first two columns (Var1 and Var2) and perform simple replacements via gsub.
Ideally, I would like to avoid defining a separate function, as discussed in the linked post and keep everything within the sapply syntax
I don't want a nested loop
I had a look at the broadly similar subject discussed here and here but, if possible, I would like to avoid making use of plyr. I'm also interested in replacing the column values not in creating new columns and I would like to avoid specifying any column names. While working with my existing data frame it is more convenient for me to use column numbers.
Edit
Following very useful comments, what I'm trying to achieve can be summarised in the solution below:
fun.clean.columns <- function(x, str_width = 15) {
# Make character
x <- as.character(x)
# Replace various phrases
x <- gsub("perc85","something else", x)
x <- gsub("again", x)
x <- gsub("more","even more", x)
x <- gsub("abc","ohmg", x)
# Clean spaces
x <- trimws(x)
# Wrap strings
x <- str_wrap(x, width = str_width)
# Return object
return(x)
}
mean_data[,1:2] <- sapply(mean_data[,1:2], fun.clean.columns)
I don't need this function in my global.env so I can run rm after this but even nicer solution would involve squeezing this within the apply syntax.
We can use mgsub from library(qdap) to replace multiple patterns. Here, I am looping the first and second column using lapply and assign the results back to the crs_mat[,1:2]. Note that I am using lapply instead of sapply as lapply keeps the structure intact
library(qdap)
crs_mat[,1:2] <- lapply(crs_mat[,1:2], mgsub,
pattern=c('mpg', 'gear'), replacement=c('MPG', 'GeArr'))
Here is a start of a solution for you, I think you're capable of extending it yourself. There's probably more elegant approaches available, but I don't see them atm.
crs_mat[,1:2] <- sapply(crs_mat[,1:2], function(x) {
# Replace first phrase
step1 <- gsub("mpg","MPG",x)
# Replace second phrase. Note that this operates on a modified dataframe.
step2 <- gsub("gear", "GeArr",step1)
# Ideally, perform other changes
return(step2)
#or one nested line, not practical if more needs to be done
#return(gsub("gear", "GeArr",gsub("mpg","MPG",x)))
})
In R, I want to find out the effect of character string length on computation time of a certain operation. For this, I need random character strings of different lengths. All I can think of now is:
cases1 <- letters[sample(15)]
cases2 <- paste(letters[sample(15)], letters[sample(15)], sep="")
cases3 <- paste(letters[sample(15)], letters[sample(15)], letters[sample(15)], sep="")
How do I automate that?
I don't want to keep copypasting...
Or does anyone have a better idea?
Try
n <- 3
do.call(`paste0`,as.data.frame(replicate(n, letters[sample(15)])))
If you want say 1:3
n1 <- 1:3
lapply(n1, function(.n) do.call(`paste0`,
as.data.frame(replicate(.n, letters[sample(15)]))))
Or as #Berry showed in the comments
apply(replicate(3, letters[sample(15)]), MARGIN=1, paste, collapse="")
I have 11 lists of different length, imported into R as p1,p2,p3,...,p11. Now I want to get the rollmean (library TTR) from all lists and name the result p1y,p2y,...,p11y.
This seems to be the job for a loop, but I read that this is often not good practice in R. I tried something (foolish) like
sample=10
for (i in 1:11){
paste("p",i,"y",sep="")<-rollmean(paste("p",i,sep=""),sample)
}
which does not work.
I also tried to use it in combination with assign(), but as I understand assign can only take a variable and a single value.
As always it strikes me that I am missing some fundamental function of R.
As Manuel pointed out, your life will be easier if you combine the variables into a list. For this, you want mget (short for "multiple get").
var_names <- paste("p", 1:11, sep = "")
p_all <- mget(var_names, envir = globalenv())
Now simply use lapply to call rollmean on each element of your list.
sample <- 10
rolling_means <- lapply(p_all, rollmean, sample)
(Also, consider renaming the sample to something that isn't already a function name.)
I suggest leaving the answers as a list, but if you really like the idea of having separate rolling mean variables to match the separate p1, p11 variables then use list2env.
names(rolling_means) <- paste(var_names, "y", sep = "")
list2env(rolling_means, envir = globalenv())
You could group your lists into one and do the following
sample <- 10
mylist <- list(p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, p11)
for(i in 1:11) assign(paste('p',i,'y',sep=''), rollmean(mylist[i], sample))
This can be done with ?get and ?do.call .
x1<-1:3
x2 <- seq(3.5,5.5,1)
for (i in 1:2) {
sx<- (do.call("sin",list(c(get(paste('x',i,sep='',collapse=''))))))
cat(sx)
}
Sloppy example, but you get the idea, I hope.
R's named vectors are incredibly handy, however, I want to combine two vectors which contain the estimates of coefficients and the standard errors for those estimates, and both vectors have the same names:
> coefficients_estimated
y0 Xit (Intercept)
1.1 2.2 3.3
> ses_estimated
y0 Xit (Intercept)
.04 .11 .007
This would be easy to solve if I knew what order the elements were in for sure, but this isn't guaranteed in my script, so I can't simply do names(ses_estimated) <- whatever. All I want to do is add either "coef" or "se" to the end of each name, and to do this, I've come up with what I think is a pretty ugly hack:
names(coefficients_estimated) <- sapply(names(coefficients_estimated),
function(name)return(paste(name,"coef",sep=""))
)
names(ses_estimated) <- sapply(names(ses_estimated),
function(name)return(paste(name,"se",sep=""))
)
Is there an idomatic way to do this? Or am I going to have to stick with what I've written?
Assuming you're combining the vectors using c(), I don't believe there's a "pure" way to do this.
In your code above, you don't even need to use sapply. Just paste(names(coefficients_estimated), "coef", sep="") will get you the same thing. You can get a little simpler still by applying the names to the combined vector vs. separately.
If these were data frames, the suffixes argument would be exactly what you want.
setNames is helpful here:
# Make fake data for test:
namedData <- function(x) setNames(x, c("y0", "Xit", "(Intercept)"))
coefficients_estimated <- namedData(c(1.1, 2.2, 3.3))
ses_estimated <- namedData(c(.04, .11, .007))
# Do it:
withNameSuffix <- function(obj, suffix) setNames(obj, paste(names(obj), suffix, sep=""))
combined <- c(withNameSuffix(coefficients_estimated, "coef"),
withNameSuffix(ses_estimated, "se"))
coef_ses_estimated <- c(coefficients_estimated,ses_estimated)
names(coef_ses_estimated) <- as.vector(outer(names(coefficients_estimated),
c("coef","se"),paste,sep="_"))