I am trying to create a data frame that I do not know the size of. Is there a way to create a data frame that adapts to your variables?
Am I able to do something like this?
df <- function(n){
x <- numeric(0)
y <- numeric(0)
z <- numeric(0)
i <- 0
repeat{
x[i] <- value1(...)
y[i] <- value2(...)
z[i] <- value3(...)
i < i + 1
if(i >= n){
break
}
}
df <- data.frame(val1 = x, val2 = y, val3 = z)
}
For this sake lets assume that value1(), value2(), and value3() just return some numeric value.
I think you can do this way:
initialaize your data frame as empty using empty vectors:
df <- data.frame(val1=value1(),
val2=value2(),
val3=value3())
Related
I made a user defined function...
From a vector x, y, f(x,y) returns list of (x,y,z)...
Now I want to do iterations of
data1 <- f(x,y)
data2 <- f(data1$x, data1$y)
data3 <- f(data2$x, data2$y)
data4 <- f(data3$x, data3$y)
and so on...
Is there a way to make a loop for this?
I tried to use paste function
data1 <- f(x,y)
for (i = 2:10) {
assign(paste("data",i,sep=""), f(paste("data",i-1,"$x",sep=""), paste("data",i-1,"$y",sep=""))
}
but it gets error since input becomes "data1$x" which is string not numeric.
As Vincent just replied you can make a list, and a list of lists etc. This will make it easier to produce what you want.
I made an example for you:
x <- 1:10; y <- 11:20
f <- function(x, y) {return(list(x = x+1, y = y+1))}
data <- c()
data[[1]] <- f(x, y)
for(i in 2:10){
data[[i]] <- f(data[[i-1]]$x, data[[i-1]]$y)
}
You can then get x from time i with data[[i]]$x.
I was wondering if I could create a for loop where i goes up by decimals. I have tried writing:
for (i in seq(2,6,.1))
{
data1 <- data[data$x1 > i,]
model <- lm(y~x1, data = data1)
r = summary(model1)$r.squared
result[[i]] = r
}
but the result only gives 5 observations from taking only the integers from 2-6.
Is there a way to get around this.
result[[i]] inside your loop will never work with decimal values of i,
because list indexes must be integers.
Other than that, you can loop in increments of .1, if you change the way you think about .1 increments:
for (i in seq(20, 60)) {
div <- i / 10
data1 <- data[data$x1 > div,]
model <- lm(y~x1, data = data1)
result[[i]] = summary(model1)$r.squared
}
The way you did that is not the best in R. That way is better (but not the best). However it is close to your original code.
data = data.frame(y = runif(100), x1 = runif(1000, 1,7))
f = function(x, data)
{
data1 <- data[data$x1 > x,]
model <- lm(y ~ x1, data = data1)
r <- summary(model)$r.squared
return(r)
}
results = lapply(seq(2,6,.1), f, data)
Can anyone tell me what’s preventing this loop from running?
For each row i, in column 3 of the data frame ‘depth.df’, the loop preforms a mathematical function, using a second data frame, 'linker.df' (it multiplies i by a constant / a value from linker.df which is found by matching the value of i.
If I run the loop for a single instance of i, (lets say its = 50) it runs fine:
cor.depth <- function(depth.df){
result <- seq(from=1, to=(nrow(depth.df)))
x <- 8971
for(i in 1:nrow(depth.df)){
result[i] <- depth.df[i,3]*(x /( linker.df [i,2][ linker.df [i,1] == 50]))
return(result)
}
}
>97,331
but if I run it to loop over each instance of i, it always returns an error:
cor.depth <- function(depth.df){
result <- seq(from=1, to=(nrow(depth.df)))
x <- 8971
for(i in 1:nrow(depth.df)){
result[i] <- depth.df[i,3]*(x /( linker.df [i,2][ linker.df [i,1] %in% depth.df[i,3]]))
return(result)
}
}
Error in result[i] <- depth.df[i, 3] * (all_SC_bins/(depth.ea.bin.all[, :
replacement has length zero
EDIT
Here is a reproducible data set provided to illustrate data structure and issue
#make some data as an example
#make some data as an example
linker.data <- sample(x=40:50, replace = FALSE)
linker.df <- data.frame(
X = linker.data
, Y = sample(x=2000:3000, size = 11, replace = TRUE)
)
depth.df <- data.frame(
X = sample(x=9000:9999, size = 300, replace = TRUE)
, Y = sample(x=c("A","G","T","C"), size = 300, replace = TRUE)
, Z = sample(linker.data, size = 300, replace = TRUE)
)
cor.depth <- function(depth.df){
result <- seq(from=1, to=(nrow(depth.df)))
x <- 8971
for(i in 1:nrow(depth.df)){
result[i] <- depth.df[i,3]*(x /( linker.df [i,2][ linker.df [i,1] %in% depth.df[i,3]]))
return(result)
}
}
Error emerges because denominator returns integer(0) or numeric(0) or a FALSE result on most rows. Your loop attempts to find exact row number, i, where both dataframes' respective X and Z match. Likely, you intended where any of the rows match which would entail using a second, nested loop with an if conditional on matches.
cor.depth <- function(depth.df){
result <- seq(from=1, to=(nrow(depth.df)))
x <- 8971
for(i in 1:nrow(depth.df)){
for (j in 1:nrow(linker.df)){
if (linker.df[j,1] == depth.df[i,3]) {
result[i] <- depth.df[i,3]*(x /( linker.df[j,2]))
}
}
}
return(result)
}
Nonetheless, consider merge a more efficient, vectorized approach which matches any rows between both sets on ids. The setNames below renames columns to avoid duplicate headers:
mdf <- merge(setNames(linker.df, paste0(names(linker.df), "_l")),
setNames(depth.df, paste0(names(depth.df), "_d")),
by.x="X_l", by.y="Z_d")
mdf$result <- mdf$X_l * (8971 / mdf$Y_l)
And as comparison, the two approaches would be equivalent:
depth.df$result <- cor.depth(depth.df)
depth.df <- with(depth.df, depth.df[order(Z),]) # ORDER BY Z
mdf <- with(mdf, mdf[order(X_l),]) # ORDER BY X_L
all.equal(depth.df$result, mdf$result)
# [1] TRUE
i am experimenting with and R and I can't find the way to do the next thing:
1- I want to multiply if x == 3 multiply by "y" value of the same row
2- Add all computations done in step 1.
x <- 3426278722533992028364647392927338
y <- 7479550949037487987438746984798374
x <- as.numeric(strsplit(as.character(x), "")[[1]])
y <- as.numeric(strsplit(as.character(y), "")[[1]])
Table <- table(x,y)
Table <- data.frame(Table)
Table$Freq <- NULL
So I tried creating a function:
Calculation <- function (x,y) {
z <- if(x == 3){ x * y }
w <- sum(z)
}
x and y are the columns of the data.frame
This prints and error which I struggle to solve...
Thanks for your time,
Kylian Pattje
2 things here:
1. Use ifelse in your function,
Calculation <- function (x,y) {
z <- ifelse(x == 3, x * y, NA)
w <- sum(z, na.rm = TRUE)
return(w)
}
2. Make sure your variables are NOT factors,
Table[] <- lapply(Table, function(i) as.numeric(as.character(i)))
Calculation(Table$x, Table$y)
#[1] 84
I'm writing some code in R and I came across following problem:
Basically, I want to calculate a variable X[k], where X takes on values for each k, like this:
where A is a known variable which takes on different values for each index.
For the moment, I have something like this:
k <- NULL
X <- NULL
z<- 1: n
for (k in seq(along =z)){
for (j in seq (along = 1:k)){
X[k] = 1/k*sum(A[n-k]/A[n-j+1])
}
}
which can't be right. Any idea on how to fix this one?
As always, any help would be dearly appreciated.
Try this
# define A
A <- c(1,2,3,4)
n <- length(A)
z <- 1:n
#predefine X (don't worry, all values will be overwritten, but it will have the same length as A
X <- A
for(k in z){
for(j in 1:k){
X[k] = 1/k*sum(A[n-k]/A[n-j+1])
}
}
You don't need to define z, it is only used inside the for. In this case, do for(k in 1:n){
As
You can do the following
set.seed(42)
A <- rnorm(10)
k <- sample(length(A), 4)
calc_x <- function(A, k){
n <- length(A)
c_sum <- cumsum(1/rev(A)[1:max(k)])
A[n-k]/k * c_sum[k]
}
calc_x(A,k)
what returns:
[1] 0.07775603 2.35789999 -0.45393983 0.13323284