Let us say I have a data frame indicating the factor level for each individual:
I.df = data.frame(variant = sample(x=c(0,1,2), size=30, replace = TRUE), tissue = sample(x=as.factor(c('cereb','hipo','arc')), size=30, replace = TRUE))
And I also have a vector with the means for each factor:
means.tissues = c(1.2, 3, 0.5)
names(means.tissues) = c('cereb', 'hipo', 'arc')
Then I want to create a vector of length equal to the number of rows of I.df, and where the value is the respective tissue for a given row. I.e.,
ind.tissues = rep(NA, nrow(I.df))
for(i in 1:nrow(I.df))
{
ind.tissues[i] = means.tissues[names(means.tissues) == I.df$tissue[i]]
}
I think the for loop is a rather inefficient way to do this, specially for matrices with very large n, is there a better/more efficient way to do this using vectorization code in R?
You can use match:
ind.tissues = means.tissues[match(I.df$tissue, names(means.tissues))]
The match function returns the position in argument 2 of each element in argument 1. We then use those indices to grab the correct elements in means.tissues.
Edit: As mentioned by #Joran in the comment, since means.tissues is a named vector, you can look it up by name instead of using match:
ind.tissues <- means.tissues[as.character(I.df$tissue)]
Related
I am simulating the number of events from a Poisson distribution (with parameter 9). For each event, I am simulating the price of the event using a lognormal distribution with parameters 2 and 1.1
I am running 100 simulations (each simulation represents a year). The code (which I am happy with) is:
simul <- list()
for(i in 1:100) {simul[[i]] <- rlnorm (rpois(1, 9), meanlog = 2, sdlog = 1.1) }
My issue is that the output "simul" is a list of lists and I don't know how to apply basic operations to it.
I want to be able to:
1. cap each individual simulated value (due to budget constraints)
2. obtain the total of all the simulated values, separately for each year (with and without capping)
3. obtain the mean value for each individual year (with and without capping)
4. calculate the 95th percentile for each year (with and without capping)
5. output the results into a dataframe (so that one column represents the total, one the mean, one the percentile etc) and each row represents a year
Something that seems to work is me pulling out individual lists:
sim1 <- simul[1]
I can now use "unlist" to flatten the list and apply any operations I want.
sims1 <- data.frame(unlist(sim1), nrow=length(sim1), byrow=F)
sims1 <- subset(sims1, select = c(unlist.sim1.))
colnames(sims1) <- "sim1"
quantile <- data.frame(quantile(sims1$sim1, probs = c(0.95)))
But I dont want to write the above logic 100 times for each sublist in the list... is there a way around it?
Any help on this would be much appreciated.
You actually don't have a list of lists, you have a list of vectors. With lists, a single bracket subset will return a list, e.g. simul[1:3] will return a list of the first 3 items of simul, and for consistency simul[1] returns a list of the first item of simul.
To extract an element, rather than a length-1 list, use [[. simul[[1]] is a vector you don't need to run unlist() on.
The nice thing about lists is you can work on them with for loops or with lapply/sapply functions. For example
raw_means = sapply(simul, mean)
raw_sums = sapply(simul, sum)
raw_95 = sapply(simul, quantile, probs = 0.95)
result = data.frame(raw_means, raw_sums, raw_95)
Or with a loop,
raw_means = raw_sums = raw_95 = numeric(length(simul))
for (i in seq_along(simul)) {
raw_means[i] = mean(simul[[i]])
raw_sums[i] = sum(simul[[i]])
raw_95[i] = quantile(simul[[i]], probs = 0.95)
}
result = data.frame(raw_means, raw_sums, raw_95)
When you say "cap", I'm not sure if you mean subset or reduce values (e.g., with pmin), so I'll leave that to you. But I'd recommend making a new list, like simul_cap = lapply(simul, pmin, 9) (if that's the operation you want) and running the same code on your capped list. You could even make the summary statistics a function, so instead of copy-pasting a bunch you end up doing raw_result = foo(simul) and cap_result = foo(simul_cap).
I have a vector of numeric values (vals.to.convert in example code below) representing elevations (in meters). I need to replace each value with a related metric that are associated with 1-meter bins (data in the 'becomes' column of the conversion.df data.frame below).
Right now I'm using cut() with conversion.df$becomes as the labels then coercing with as.character() and as.numeric() to get the binned numeric conversion.
Can anyone recommend a more efficient and elegant way to do this?
For example, with a raster, you can use raster::reclassify and a data.frame structured like conversion.df to make the substitution.
Here is example code:
vals.to.convert <- sample(1:80, 500, replace = T)
conversion.df <- data.frame(from = 0:79,
to = 1:80,
becomes = runif(80))
converted <- as.numeric(as.character(cut(vals.to.convert, 0:nrow(conversion.df), labels = conversion.df$becomes)))
you could use findInterval
converted <- conversion.df$becomes[
findInterval(vals.to.convert, conversion.df$from) - 1L]
or cut
converted <- conversion.df$becomes [cut(vals.to.convert, 0:80)]
I have a data set that contains multiple attributes with integer values from 1 to 5 and I would like to rescale these attributes so that their values range from -1 to 1. My current code that I have is
newdata$Rats = rescale(newdata$Rats, to = c(-1,1), from=c(1,5))
Where newdata is my dataset and Rats is one of my attributes. If I only had a few attributes to change that would be fine, but I have about 30 or so to change. Is there a way to use a for loop to do this or use the select function that R has or possibly another way?
Use lapply():
newdata[, c(1:30)] <- lapply(newdata[, c(1:30)],
function(x) rescale(x, to = c(-1, 1), from = c(1, 5)))
For the c(1:30), insert a vector of either positions of your variables within your dataframe, or a vector of the names of your variables as strings.
I am trying to write a for loop where if the cell of one matrix matches a letter it then fills a blank matrix with the entire row that matched. Here is my code
mets<-data.frame(read.csv(file="Metabolite_data.csv",header=TRUE))
full<-length(mets[,6])
A=matrix(,nrow=4930,ncol=8, byrow=T)
for (i in 1:full){
if (mets[i,6]=="A") (A[i,]=(mets[i,]))
}
If I replace the i in the if statement with a single number it works to fill that row of matrix A, however it will not fill more then one row. TIA
You might be getting problems going from data frame to matrix. It could be that just using "mets" as a matrix instead of a data frame could solve your problem, or you could use as.matrix within your for loop. An example of the latter with made-up data since I don't have your "metabolite_data.csv":
mets <- matrix(sample(LETTERS[1:4], 80, replace = TRUE), nrow = 10, ncol = 8)
mets <- as.data.frame(mets)
A <- matrix(nrow = nrow(mets), ncol = ncol(mets), byrow = TRUE)
for(i in 1:nrow(mets)){
if(mets[i,6] == "A"){
A[i,] = as.matrix(mets[i,])
}
}
print(A)
You may wanna try to specify ncol=dim(mets)[2] to make sure you are providing same number of inputs to fill the matrix.
I have a time series with multiple columns, some have NAs in them, for example:
date.a<-seq(as.Date('2014-01-01'),as.Date('2014-02-01'),by = 2)
date.b<-seq(as.Date('2014-01-01'),as.Date('2014-02-15'),by = 3)
df.a <- data.frame(time=date.a, A=sin((1:16)*pi/8))
df.b <- data.frame(time=date.b, B=cos((1:16)*pi/8))
my.ts <- merge(xts(df.a$A,df.a$time),xts(df.b$B,df.b$time))
I'd like to apply a function to each of the rows, in particular:
prices2percreturns <- function(x){100*diff(x)/x}
I think that sapply should do the trick, but
sapply(my.ts, prices2percreturns)
gives Error in array(r, dim = d, dimnames = if (!(is.null(n1 <- names(x[[1L]])) & :
length of 'dimnames' [1] not equal to array extent. I suspect that this is due to the NAs when merging, but maybe I'm just doing something wrong. Do I need to remove the NAs or is there something wrong with the length of the vector returned by the function?
Per the comments, you don't actually want to apply the function to each row. Instead you want to leverage the vectorized nature of R. i.e. you can simply do this
100*diff(my.ts)/my.ts
If you do want to apply a function to each row of a matrix (which is what an xts object is), you can use apply with MARGIN=1. i.e. apply(my.ts, 1, myFUN).
sapply(my.ts, myFUN) would work like apply(my.ts, 2, myFUN) in this case -- applying a function to each column.
Your diff(x) will be 1 shorter than your x. Also your returns will be based on the results. You want returns based on the starting price not the end price. Here I change the function to reflect that and apply the function per column.
prices2percreturns <- function(x){100*diff(x)/x[-length(x)]}
prcRets = apply(my.ts, 2, prices2percreturns)