I'm exercising my function writing skills today. Can someone explain why the function I wrote doesn't remove columns 2 and 3 from the data frame?
data <- data.frame(x = 2, y = 3, z = 4)
rmvar <- function(x){
lapply(X = x, FUN = function(x){
x <- NULL})}
rmvar(data[,2:3])
You could modify it
rmvar <- function(x, indx){
x[indx] <- lapply(x[indx], FUN=function(x) x <- NULL)
x
}
rmvar(data, 2:3)
# x
#1 2
As #nico mentioned in the comments, this is easier by just data[-(2:3)]. But, I guess you want to do this with lapply/NULL.
Related
I thought that the following problem must have been answered or a function must exist to do it, but I was unable to find an answer.
I have a nested loop that takes a row from one 3-col. data frame and copies it next to each of the other rows, to form a 6-col. data frame (with all possible combinations). This works fine, but with a medium sized data set (800 rows), the loops take forever to complete the task.
I will demonstrate on a sample data set:
Sdat <- data.frame(
x = c(10,20,30,40),
y = c(15,25,35,45),
ID =c(1,2,3,4)
)
compar <- data.frame(matrix(nrow=0, ncol=6)) # to contain all combinations
names(compar) <- c("x","y", "ID", "x","y", "ID")
N <- nrow(Sdat) # how many different points we have
for (i in 1:N)
{
for (j in 1:N)
{
Temp1 <- Sdat[i,] # data from 1st point
Temp2 <- Sdat[j,] # data from 2nd point
C <- cbind(Temp1, Temp2)
compar <- rbind(C,compar)
}
}
These loops provide exactly the output that I need for further analysis. Any suggestion for vectorizing this section?
You can do:
ind <- seq_len(nrow(Sdat))
grid <- expand.grid(ind, ind)
compar <- cbind(Sdat[grid[, 1], ], Sdat[grid[, 2], ])
A naive solution using rep (assuming you are happy with a data frame output):
compar <- data.frame(x = rep(Sdat$x, each = N),
y = rep(Sdat$y, each = N),
id = rep(1:n, each = N),
x1 = rep(Sdat$x, N),
y1 = rep(Sdat$y, N),
id_1 = rep(1:n, N))
I am writing a function to build new data frames based on existing data frames. So I essentially have
f1 <- function(x,y) {
x_adj <- data.frame("DID*"= df.y$`DM`[x], "LDI"= df.y$`DirectorID*`[-(x)], "LDM"= df.y$`DM`[-(x)], "IID*"=y)
}
I have 4,000 data frames df., so I really need to use this and R is returning an error saying that df.y is not found. y is meant to be used through a list of all the 4000 names of the different df. I am very new at R so any help would be really appreciated.
In case more specifics are needed I essentially have something like
df.1 <- data.frame(x = 1:3, b = 5)
And I need the following as a result using a function
df.11 <- data.frame(x = 1, c = 2:3, b = 5)
df.12 <- data.frame(x = 2, c = c(1,3), b = 5)
df.13 <- data.frame(x = 3, c = 1:2, b = 5)
Thanks in advance!
OP seems to access data.frame with dynamic name.
One option is to use get:
get(paste("df",y,sep = "."))
The above get will return df.1.
Hence, the function can be modified as:
f1 <- function(x,y) {
temp_df <- get(paste("df",y,sep = "."))
x_adj <- data.frame("DID*"= temp_df$`DM`[x], "LDI"= temp_df$`DirectorID*`[-(x)],
"LDM"= temp_df$`DM`[-(x)], "IID*"=y)
}
The case I have is I want to "tack on" a bunch of columns to an existing data.frame, where each column is a function that does math on other columns. My goals are:
I want to specify the functions once
I don't want to worry about having to pass arguments in the right order and/or match them by name
I want to specify the order in which to apply the functions once
I want the new column names to be the function names
Ideally I want something like:
df <- data.frame(a = rnorm(10), b = rnorm(10))
y <- function (x) a + b
z <- function (x) b * y
df2 <- lapply (list (y, z), df)
where df2 is a data.frame with 4 columns: a, b, y and z. I think this achieves the goals.
The closest I've gotten to this is the following:
df <- data.frame(a = rnorm(10), b = rnorm(10))
y <- function (x) x$a + x$b
z <- function (x) x$b * x$y
funs <- list (
y = y,
z = z
)
df2 <- df
df2$y <- funs$y(df2)
df2$z <- funs$z(df2)
This achieves goals 1 and 2, but not 3 and 4.
Thanks in advance for the help.
This maybe the thing you want. After defining the function dfapply, it can be used very similar to your original intention without too much things like x$a etc, except to use expression instead of function.
dfapply <- function(exprs, df){
for (expr in exprs) {
df <- within(df, eval(expr))
}
df
}
df <- data.frame(a = rnorm(10), b = rnorm(10))
expr1 <- expression(y <- a + b)
expr2 <- expression(z <- b * y)
df2 <- dfapply(c(expr1, expr2), df)
How can I create a data frame with 2 rows with this structure?
X1 Y1 Calc1 X2 Y2 Calc2 … Xn Yn Calcn
1 4 0.25 2 5 0.4 i i+3 i/i+3
I tried using this code:
dataRowTemp<-numeric(length = 0)
dataRow<-numeric(length = 0)
headerRowTemp<-character(length = 0)
headerRow<-character(length = 0)
for (i in 1:150){
X<- i
Y<- i+3
Calc <- X/Y
dataRowTemp <- c(X,Y,Calc)
dataRow<-c(dataRow,dataRowTemp)
headerRowTemp <- paste(c("X", i),c("Y", i),c("Calc", i),sep='')
headerRow<-c(headerRow,headerRowTemp)
}
unfortunately, I can’t create the a correct header (titleRow) and how can I combine them to data.frame later?
Is there an elegant and better way to do so?
Build a function to be used in each iteration.
myfun <- function(i) {
X <- i
Y <- i + 3
c(X = X, Y = Y, Calc = X/Y)
}
Set the number of iterations.
n <- 150
Apply the function to the numbers from 1 to n, use matrix(..., nrow = 1) to store the output in a matrix of only 1 row, and transform it into a data.frame (because it is what you say you aim at).
mydf <- data.frame(matrix(sapply(seq_len(n), myfun), nrow = 1))
Use paste0 in a loop to iteratively assign names to the column of your data.frame.
names(mydf) <- c(sapply(seq_len(n), function(i) paste0(c('X', 'Y', 'Calc'), i)))
i have this script:
x<-seq(1,5)
y<-seq(6,10)
z<-sample(25)
x.range <- range(x)
y.range <- range(y)
df <- expand.grid(x = seq(from = x.range[1], to = x.range[2], by = 1), y = seq(from = y.range[1],
to = y.range[2], by = 1))
df$z<-z
x1<-c(1,2,3)
y1<-c(6,7,8)
z1<-c(10,12,13)
df_1<-data.frame(x1,y1,z1)
n<-length(df_1$x1)
df_pred<-data.frame(0,0,0)
names(df_pred)[1:3] <- c("x", "y", "z_pred")
for(i in 1:n)
{df_pred[i,]<-filter(df, x==df_1$x1[i], y==df_1$y1[i])}
sqm <- mean((df_pred[,3]-df_1[,3])^2)
I want to calculate the quadratic error between z value of df and z1 value of df_1. To do this i use a loop for to extract the rows that i need from df, basing on x1 and y1 values of df_1.
I ask you if there is something different to this for loop, to do the same thing (using, for example, dplyr package). Thanks.
If you name columns of df_1 as "x","y"and "z" similar to df then you can use
df_1 <- data.frame(x=x_1,y=y_1,z=z_1)
library(dplyr)
inner_join(df,df_1,by=c("x","y"))
I am not sure what is your loop for yet you want to try this. I use it to replace your loop.
df_pred <- subset(df, x %in% df_1$x1 & y %in% df_1$y1)
Let me know if it solves your problem