I have a data set that contains multiple attributes with integer values from 1 to 5 and I would like to rescale these attributes so that their values range from -1 to 1. My current code that I have is
newdata$Rats = rescale(newdata$Rats, to = c(-1,1), from=c(1,5))
Where newdata is my dataset and Rats is one of my attributes. If I only had a few attributes to change that would be fine, but I have about 30 or so to change. Is there a way to use a for loop to do this or use the select function that R has or possibly another way?
Use lapply():
newdata[, c(1:30)] <- lapply(newdata[, c(1:30)],
function(x) rescale(x, to = c(-1, 1), from = c(1, 5)))
For the c(1:30), insert a vector of either positions of your variables within your dataframe, or a vector of the names of your variables as strings.
Related
I am trying to automatically generate variable names (that all follow the same naming pattern x1,x2,...) within the expand.grid function in R.
Here is a miniature version of my problem:
I have a list of 25 variable names, and a matrix with 25 columns and 5 rows . Now I have to assign the names in the list to the corresponding matrix columns, but this has to be done within expand.grid (this is because my ultimate goal is to generate the grid of data, but I also need the variable names for further processing).
Here is some code:
## Create list holding 25 variable names- I actually have way more than 25
ln <- vector(mode="list", length=25)
for (i in 1:25){
ln[[i]]<-paste("x", i, sep="")
}
ln
ln[[1]]
## Make matrix dataset without variable/column names
sampl<- matrix(rnorm(25) , nrow = 5, ncol=25)
sampl
require(MASS)
## WHAT I WANT
## Automatically assign variable names from list ln within expand.grid
test <- expand.grid(x1=seq(min(sampl[,1]-1), max(sampl[,1]+1),
by=0.1),
x2=seq(min(sampl[,2]-1), max(sampl[,2]+1),
by=0.1),
x3=seq(min(sampl[,3]-1), max(sampl[,3]+1),
by=0.1),
\\...and so on ...\\
x25=seq(min(sampl[,25]-1), max(sampl[,25]+1),
by=0.1)
)
Alternatively, is there anyway to generate the data grid that I want from the matrix and then assign names from the list without using expand.grid?
We can use apply to loop over the columns, get the sequence between the min and max by 0.1, set the names of the list with 'ln' values and apply expand.grid (it should be noted that the number of combinations will be huge)
test2 <- expand.grid(setNames(apply(sampl, 2, \(x) seq(min(x-1),
max(x + 1), by = 0.1)), unlist(ln)))
I have a vector of numeric values (vals.to.convert in example code below) representing elevations (in meters). I need to replace each value with a related metric that are associated with 1-meter bins (data in the 'becomes' column of the conversion.df data.frame below).
Right now I'm using cut() with conversion.df$becomes as the labels then coercing with as.character() and as.numeric() to get the binned numeric conversion.
Can anyone recommend a more efficient and elegant way to do this?
For example, with a raster, you can use raster::reclassify and a data.frame structured like conversion.df to make the substitution.
Here is example code:
vals.to.convert <- sample(1:80, 500, replace = T)
conversion.df <- data.frame(from = 0:79,
to = 1:80,
becomes = runif(80))
converted <- as.numeric(as.character(cut(vals.to.convert, 0:nrow(conversion.df), labels = conversion.df$becomes)))
you could use findInterval
converted <- conversion.df$becomes[
findInterval(vals.to.convert, conversion.df$from) - 1L]
or cut
converted <- conversion.df$becomes [cut(vals.to.convert, 0:80)]
I have a data set x consists of 4 columns. When I apply range(x) I receive one answer for all rows. How can I get the range for each row of the 4 columns without using loops?
This is a typical case for functions of the *apply-family, which are technically loops with a special syntax. In your case, you can use
apply(X = x, MARGIN = 1, FUN = range)
This tells R to apply the function range() over all rows, as expressed by MARGIN = 1 (MARGIN = 2 would be the same over all columns).
I am trying to create new variables in a dataframe that represent multiple lags. I have one time series in it right now "series" and I would like to create 10 different variables, each representing a certain lag of "series". So the resulting data frame would have the original variable "series," plus 10 variables named (1, 2, 3, 4, ... 10) that would represent that number of lags. I am currently trying this on a for loop:
for (i in 1:max.lag){
lag.death$"i" <- lag(tscampos, i)
}
But after reading here, I suspect I might want to use one of the apply functions? Any ideas?
There you go: this function will allow you getting a lagged version of your serie whenever you'll need it. ('better than storing each lagged replicate of the same serie in 10 different columns I find)
lag.death = data.frame(series = floor(runif(10,0,100)));
lag.death$serie
lagit4me = function(serie,lag){
n = length(serie);
pad = rep(0,lag);
return(c(pad,serie)[1:n]);
}
lagit4me(lag.death$serie,1);
lagit4me(lag.death$serie,3);
'can tweak it then to allow negative lags or etc.
( But if you really need it: )
allIn1 = lapply(0:10,lagit4me,serie=lag.death$series);
allIn1 = data.frame(allIn1);
names(allIn1) = 0:10;
allIn1
Enjoy :)
You can also use purrr::map(), similar to lapply() above. This uses dplyr::lag(), instead of lagit4me()
library(dplyr)
library(purrr)
num.lags <- 0:10
list.lags <-
purrr::map(
.x = num.lags,
.f = ~ dplyr::lag(series, .x)
)
Note, you need to name the list elements to coerce to a data_frame
chr.lags <- paste0("lag_", num.series.lags)
names(list.model.subset.lags) <- chr.lags
tbl.model.subset.lags <-
dplyr::bind_rows(list.model.subset.lags)
This produces a tbl with 11 variables, the input variable (lag_0) and 10 lagged variables (with NAs)
print(tbl.model.subset.lags)
Let us say I have a data frame indicating the factor level for each individual:
I.df = data.frame(variant = sample(x=c(0,1,2), size=30, replace = TRUE), tissue = sample(x=as.factor(c('cereb','hipo','arc')), size=30, replace = TRUE))
And I also have a vector with the means for each factor:
means.tissues = c(1.2, 3, 0.5)
names(means.tissues) = c('cereb', 'hipo', 'arc')
Then I want to create a vector of length equal to the number of rows of I.df, and where the value is the respective tissue for a given row. I.e.,
ind.tissues = rep(NA, nrow(I.df))
for(i in 1:nrow(I.df))
{
ind.tissues[i] = means.tissues[names(means.tissues) == I.df$tissue[i]]
}
I think the for loop is a rather inefficient way to do this, specially for matrices with very large n, is there a better/more efficient way to do this using vectorization code in R?
You can use match:
ind.tissues = means.tissues[match(I.df$tissue, names(means.tissues))]
The match function returns the position in argument 2 of each element in argument 1. We then use those indices to grab the correct elements in means.tissues.
Edit: As mentioned by #Joran in the comment, since means.tissues is a named vector, you can look it up by name instead of using match:
ind.tissues <- means.tissues[as.character(I.df$tissue)]