Alternative to for loop in R - r

i have this script:
x<-seq(1,5)
y<-seq(6,10)
z<-sample(25)
x.range <- range(x)
y.range <- range(y)
df <- expand.grid(x = seq(from = x.range[1], to = x.range[2], by = 1), y = seq(from = y.range[1],
to = y.range[2], by = 1))
df$z<-z
x1<-c(1,2,3)
y1<-c(6,7,8)
z1<-c(10,12,13)
df_1<-data.frame(x1,y1,z1)
n<-length(df_1$x1)
df_pred<-data.frame(0,0,0)
names(df_pred)[1:3] <- c("x", "y", "z_pred")
for(i in 1:n)
{df_pred[i,]<-filter(df, x==df_1$x1[i], y==df_1$y1[i])}
sqm <- mean((df_pred[,3]-df_1[,3])^2)
I want to calculate the quadratic error between z value of df and z1 value of df_1. To do this i use a loop for to extract the rows that i need from df, basing on x1 and y1 values of df_1.
I ask you if there is something different to this for loop, to do the same thing (using, for example, dplyr package). Thanks.

If you name columns of df_1 as "x","y"and "z" similar to df then you can use
df_1 <- data.frame(x=x_1,y=y_1,z=z_1)
library(dplyr)
inner_join(df,df_1,by=c("x","y"))

I am not sure what is your loop for yet you want to try this. I use it to replace your loop.
df_pred <- subset(df, x %in% df_1$x1 & y %in% df_1$y1)
Let me know if it solves your problem

Related

How to create a single data frame with multiple vectors result of a loop operation?

I have a .wav file and want to get power spectrums for successive no overlapping time windows.
The data of the power spectrum is obtained with the next function, once seewave and tuneR libraries are loaded:
n <- 0:1
sound1 <- readWave("D:\\sound.wav")
result <- do.call(cbind, lapply(n, function(x)
meanspec(sound1,from=x,to=x+1,wl=16,plot=FALSE)))
result1 <- data.frame(result)
The ouput will be
structure(list(x = c(0, 2.75625, 5.5125, 8.26875, 11.025, 13.78125,
16.5375, 19.29375), y = c(1, 0.551383594277632, 0.0742584974502194,
0.0399059818168578, 0.0218500553648978, 0.0176655910374274,
0.00904887363707214,
0.00333698474894753), x.1 = c(0, 2.75625, 5.5125, 8.26875, 11.025,
13.78125, 16.5375, 19.29375), y.1 = c(1, 0.558106398109396,
0.145460335046358,
0.0804097312947365, 0.0476025570412434, 0.0393549921764155,
0.0203584314573552,
0.00737927765210362)), class = "data.frame", row.names = c(NA,
But in the resultant df I only need y and y.1 but no x and x.1. As you may notice x and 1.x have the same data and such iformation is redundant. In short: I only need y data.
Thankyou for your suggestions!
There are more than a few ways to do what you are talking about. I don't know the length of the vector you are talking about though or the way meanspec returns its data, so you will have to fill that in yourself
vec_length <- length(amplitude_vector)
wav_df <- data.frame(matrix(nrow = 0, ncol = vec_length + 1))
for(i in 0:(end-1)){
#Add relevant code to get the amplitude vector from the function below
amp_vec <- meanspec(sound1, from = i, to = i+1, plot = FALSE)...
wav_df <- rbind(wav_df,c(i,amp_vec))
}
colnames(wav_df) <- c("start-time",...)#Add in the other column names that you want
wav_df should then have the information you want.
You may use lapply -
n <- 0:9 #to end at 9-10;change as per your preference
Sound1 <- readWave("D:\\Sound.wav")
result <- do.call(rbind, lapply(n, function(x)
meanspec(sound1,from=x,to=x+1,plot=FALSE)))
result
#to get dataframe as output
#result <- data.frame(result)

Adding a column to a data frame by calculating each value to be added

Good evening,
I asked a question earlier and found it hard to implement the solution so I am gonna reask it in a more clear way.
I have the problem, that I want to add a column to a dataframe of daily returns of a stock. Lets say its normally distributed and I would like to add a column that contains the value at risk (hist) whose function I wrote myself.
The restriction is that each observation should be assigned to my function and take the last 249 observations as well.
So when the next observation is calculated it should also take only the last 249 observations of the das before. So the input values should move as the time goes on. In other words I want values from 251 days ago to be excluded. Hopefully I explained myself well enough. If not maybe the code speaks for me:
df<- data.frame(Date=seq(ISOdate(2000,1,1), by = "days", length.out = 500), Returns=rnorm(500))
#function
VaR.hist<- function(x, n=250, hd=20, q=0.05){
width<-nrow(x)
NA.x<-na.omit(x)
quantil<-quantile(NA.x[(width-249):width],probs=q)
VaR<- quantil*sqrt(hd)%>%
return()
}
# Run the function on the dataframe
df$VaR<- df$Returns%>%VaR.hist()
Error in (width - 249):width : argument of length 0
This is the Error code that I get and not my new Variable...
Thanks !!
As wibom wrote in the comment nrow(x) does not work for vectors. What you need is length() instead. Also you do not need return() in the last line as R automatically returns the last line of a function if there is no early return() before.
library(dplyr)
df<- data.frame(Date=seq(ISOdate(2000,1,1), by = "days", length.out = 500), Returns=rnorm(500))
#function
VaR.hist <- function(x, n=250, hd=20, q=0.05){
width <- length(x) # here you need length as x is a vector, nrow only works for data.frames/matrixes
NA.x <- na.omit(x)
quantil <- quantile(NA.x[(width-249):width], probs = q)
quantil*sqrt(hd)
}
# Run the function on the dataframe
df$VaR <- df$Returns %>% VaR.hist()
It's a bit hard to understand what you want to do exactly.
My understanding is that you wish to compute a new variable VarR, calculated based on the current and previous 249 observations of df$Returns, right?
Is this about what you wish to do?:
library(tidyverse)
set.seed(42)
df <- tibble(
Date = seq(ISOdate(2000, 1, 1), by = "days", length.out = 500),
Returns=rnorm(500)
)
the_function <- function(i, mydata, hd = 20, q = .05) {
r <-
mydata %>%
filter(ridx <= i, ridx > i - 249) %>%
pull(Returns)
quantil <- quantile(r, probs = q)
VaR <- quantil*sqrt(hd)
}
df <-
df %>%
mutate(ridx = row_number()) %>%
mutate(VaR = map_dbl(ridx, the_function, mydata = .))
If you are looking for a base-R solution:
set.seed(42)
df <- data.frame(
Date = seq(ISOdate(2000, 1, 1), by = "days", length.out = 500),
Returns = rnorm(500)
)
a_function <- function(i, mydata, hd = 20, q = .05) {
r <- mydata$Returns[mydata$ridx <= i & mydata$ridx > (i - 249)]
quantil <- quantile(r, probs = q)
VaR <- quantil*sqrt(hd)
}
df$ridx <- 1:nrow(df) # add index
df$VaR <- sapply(df$ridx, a_function, mydata = df)

Function to change all variables of factor type to lower case

I need to create a function in order to change all my factor variables to lower case.
I've already done that:
change_lower=function(x){if(is.factor(x)) tolower(x)}
But I think I'm doing something wrong, maybe the if isn't good for what I want. Any ideas?
You can use mutate_if if you want to automatically convert a large number of columns. Be sure to convert to character first (as #DanY pointed out):
library(dplyr)
df <- data.frame(x = c(1,2,3), y = c("A","B","C"), z = c("i","K","l"))
df <- df %>% mutate_if(is.factor, function(x) tolower(as.character(x)))
In base R:
df <- data.frame(x = c(1,2,3), y = c("A","B","C"), z = c("i","K","l"))
ind <- names(df)[sapply(df, is.factor)]
for (i in ind){
df[[i]] <- tolower(as.character(df[[i]]))
}
or
df[,ind] <- lapply(ind, function(x) tolower(as.character(df[[x]])))
# Input data:
df <- data.frame(x = c(1,2,3), y = c("A","B","C"), z = c("i","K","l"))
# Convert factors to lowercase:
df <- lapply(df, function(x){if(is.factor(x)) as.factor(tolower(as.character(x))) else x})
# Proof:
str(df)

How to filter values if are within ranges to add as a new column in data frame

I have a big data frames named "X" and "Y" that look like this:
Y <- data.frame(chrom = c(21,21,21,21,21),
chromStart = c(14720086, 14759761, 14799594, 14847192, 14860997),
chromEnd = c(14722086, 14761761, 14801594, 14849192, 14862997),
TargetGenes = c("ENSG00000185390", "ENSG00000175302",
"ENSG00000175302", "ENSG00000219280", "ENSG00000226930"))
X <- data.frame(POS = c(14720573, 14720652, 14721241, 14721279, 14721280))
and here what I have tried to do:
X$TargetGene <- apply(X$POS, 1, function(v) ifelse(length(k <- which(v >= Y$chromStart & v <= Y$chromEnd))>0,
as.character(Y$TargetGenes[k]), NA))
I am looking for an R solution to check for every row in "X" and finds if X$POS is between Y$chrStart and Y$chrEnd, so then adding the correspond Y$ID to "X" as a new column named "TargetGene".
There may be few values from X$POS that fit into a range in Y data frame. Also, it is possible for some X$POS that do not find any match range in Y data frame.
p.s. the values of Y$chr are the same.
Desired output:
> X
POS TargetGene
14720573 ENSG0000017
14720652 ENSG0000017
14721241 <NA>
15221776 ENSG0000022
I do approcciate for any insight.
Try with:
X$TargetGenes <- apply(X, 1,
function(v) {
ifelse(length(k <- which(v >= Y$chromStart & v <= Y$chromEnd)) > 0,
as.character(Y$TargetGenes[k]),
NA)
})
You can have a look at fuzzyjoin:
data <- tibble(X = 1:5)
iris %>%
fuzzyjoin::fuzzy_left_join(data,
by = c("Sepal.Length" = "X",
"Sepal.Width" = "X"),
match_fun = c(`>`, `<`))
However, this might be rather slow, depending on your situation.

r - Reordering only certain parts of a column in a data frame

I want to create a data frame of coordinates which lists points in the following order:
i.e. left to right in the first row, then right to left in the second row, etc.
I can generate all the points by doing this:
x_val <- -3:3; x_len <- length(x_val)
y_val <- -2:2; y_len <- length(y_val)
expand.grid(x=-x_lim:x_lim, y=y_lim:-y_lim)
But it's not in the right order so I tried this:
df <- lapply(1:y_len, function(i){
data.frame(x=(-1)^(1+i)*x_val,
y=rep(-y_val[i],x_len))})
df <- do.call("rbind", df)
Although it works, is there a cleaner way to achieve what I want to do? Thanks.
I am not sure if it is cleaner, but here it goes:
x_val <- -3:3; x_len <- length(x_val)
y_val <- -2:2; y_len <- length(y_val)
n <- x_len*y_len
x <- rep(c(x_val, rev(x_val)), n, length.out = n)
y <- sort(rep(c(y_val), n, length.out = n), decreasing =TRUE)
df <- data.frame(x, y)

Resources