Creating a Data Frame with unknown size in R

Creating a Data Frame with unknown size in R - r

I am trying to create a data frame that I do not know the size of. Is there a way to create a data frame that adapts to your variables?
Am I able to do something like this?
df <- function(n){
x <- numeric(0)
y <- numeric(0)
z <- numeric(0)
i <- 0
repeat{
x[i] <- value1(...)
y[i] <- value2(...)
z[i] <- value3(...)
i < i + 1
if(i >= n){
break
}
}
df <- data.frame(val1 = x, val2 = y, val3 = z)
}
For this sake lets assume that value1(), value2(), and value3() just return some numeric value.

I think you can do this way:
initialaize your data frame as empty using empty vectors:
df <- data.frame(val1=value1(),
val2=value2(),
val3=value3())

Related

R iterative assign using different data frames

I made a user defined function...
From a vector x, y, f(x,y) returns list of (x,y,z)...
Now I want to do iterations of
data1 <- f(x,y)
data2 <- f(data1$x, data1$y)
data3 <- f(data2$x, data2$y)
data4 <- f(data3$x, data3$y)
and so on...
Is there a way to make a loop for this?
I tried to use paste function
data1 <- f(x,y)
for (i = 2:10) {
assign(paste("data",i,sep=""), f(paste("data",i-1,"$x",sep=""), paste("data",i-1,"$y",sep=""))
}
but it gets error since input becomes "data1$x" which is string not numeric.

As Vincent just replied you can make a list, and a list of lists etc. This will make it easier to produce what you want.
I made an example for you:
x <- 1:10; y <- 11:20
f <- function(x, y) {return(list(x = x+1, y = y+1))}
data <- c()
data[[1]] <- f(x, y)
for(i in 2:10){
data[[i]] <- f(data[[i-1]]$x, data[[i-1]]$y)
}
You can then get x from time i with data[[i]]$x.

For loop using decimals

I was wondering if I could create a for loop where i goes up by decimals. I have tried writing:
for (i in seq(2,6,.1))
{
data1 <- data[data$x1 > i,]
model <- lm(y~x1, data = data1)
r = summary(model1)$r.squared
result[[i]] = r
}
but the result only gives 5 observations from taking only the integers from 2-6.
Is there a way to get around this.

result[[i]] inside your loop will never work with decimal values of i,
because list indexes must be integers.
Other than that, you can loop in increments of .1, if you change the way you think about .1 increments:
for (i in seq(20, 60)) {
div <- i / 10
data1 <- data[data$x1 > div,]
model <- lm(y~x1, data = data1)
result[[i]] = summary(model1)$r.squared
}

The way you did that is not the best in R. That way is better (but not the best). However it is close to your original code.
data = data.frame(y = runif(100), x1 = runif(1000, 1,7))
f = function(x, data)
{
data1 <- data[data$x1 > x,]
model <- lm(y ~ x1, data = data1)
r <- summary(model)$r.squared
return(r)
}
results = lapply(seq(2,6,.1), f, data)

Loop for value matching won't work across data frames for multiple instances

Can anyone tell me what’s preventing this loop from running?
For each row i, in column 3 of the data frame ‘depth.df’, the loop preforms a mathematical function, using a second data frame, 'linker.df' (it multiplies i by a constant / a value from linker.df which is found by matching the value of i.
If I run the loop for a single instance of i, (lets say its = 50) it runs fine:
cor.depth <- function(depth.df){
result <- seq(from=1, to=(nrow(depth.df)))
x <- 8971
for(i in 1:nrow(depth.df)){
result[i] <- depth.df[i,3]*(x /( linker.df [i,2][ linker.df [i,1] == 50]))
return(result)
}
}
>97,331
but if I run it to loop over each instance of i, it always returns an error:
cor.depth <- function(depth.df){
result <- seq(from=1, to=(nrow(depth.df)))
x <- 8971
for(i in 1:nrow(depth.df)){
result[i] <- depth.df[i,3]*(x /( linker.df [i,2][ linker.df [i,1] %in% depth.df[i,3]]))
return(result)
}
}
Error in result[i] <- depth.df[i, 3] * (all_SC_bins/(depth.ea.bin.all[, :
replacement has length zero
EDIT
Here is a reproducible data set provided to illustrate data structure and issue
#make some data as an example
#make some data as an example
linker.data <- sample(x=40:50, replace = FALSE)
linker.df <- data.frame(
X = linker.data
, Y = sample(x=2000:3000, size = 11, replace = TRUE)
)
depth.df <- data.frame(
X = sample(x=9000:9999, size = 300, replace = TRUE)
, Y = sample(x=c("A","G","T","C"), size = 300, replace = TRUE)
, Z = sample(linker.data, size = 300, replace = TRUE)
)
cor.depth <- function(depth.df){
result <- seq(from=1, to=(nrow(depth.df)))
x <- 8971
for(i in 1:nrow(depth.df)){
result[i] <- depth.df[i,3]*(x /( linker.df [i,2][ linker.df [i,1] %in% depth.df[i,3]]))
return(result)
}
}

Error emerges because denominator returns integer(0) or numeric(0) or a FALSE result on most rows. Your loop attempts to find exact row number, i, where both dataframes' respective X and Z match. Likely, you intended where any of the rows match which would entail using a second, nested loop with an if conditional on matches.
cor.depth <- function(depth.df){
result <- seq(from=1, to=(nrow(depth.df)))
x <- 8971
for(i in 1:nrow(depth.df)){
for (j in 1:nrow(linker.df)){
if (linker.df[j,1] == depth.df[i,3]) {
result[i] <- depth.df[i,3]*(x /( linker.df[j,2]))
}
}
}
return(result)
}
Nonetheless, consider merge a more efficient, vectorized approach which matches any rows between both sets on ids. The setNames below renames columns to avoid duplicate headers:
mdf <- merge(setNames(linker.df, paste0(names(linker.df), "_l")),
setNames(depth.df, paste0(names(depth.df), "_d")),
by.x="X_l", by.y="Z_d")
mdf$result <- mdf$X_l * (8971 / mdf$Y_l)
And as comparison, the two approaches would be equivalent:
depth.df$result <- cor.depth(depth.df)
depth.df <- with(depth.df, depth.df[order(Z),]) # ORDER BY Z
mdf <- with(mdf, mdf[order(X_l),]) # ORDER BY X_L
all.equal(depth.df$result, mdf$result)
# [1] TRUE

How to condition a computation and then add al computation done in R?

i am experimenting with and R and I can't find the way to do the next thing:
1- I want to multiply if x == 3 multiply by "y" value of the same row
2- Add all computations done in step 1.
x <- 3426278722533992028364647392927338
y <- 7479550949037487987438746984798374
x <- as.numeric(strsplit(as.character(x), "")[[1]])
y <- as.numeric(strsplit(as.character(y), "")[[1]])
Table <- table(x,y)
Table <- data.frame(Table)
Table$Freq <- NULL
So I tried creating a function:
Calculation <- function (x,y) {
z <- if(x == 3){ x * y }
w <- sum(z)
}
x and y are the columns of the data.frame
This prints and error which I struggle to solve...
Thanks for your time,
Kylian Pattje

2 things here:
1. Use ifelse in your function,
Calculation <- function (x,y) {
z <- ifelse(x == 3, x * y, NA)
w <- sum(z, na.rm = TRUE)
return(w)
}
2. Make sure your variables are NOT factors,
Table[] <- lapply(Table, function(i) as.numeric(as.character(i)))
Calculation(Table$x, Table$y)
#[1] 84

General code for a summation in R

I'm writing some code in R and I came across following problem:
Basically, I want to calculate a variable X[k], where X takes on values for each k, like this:
where A is a known variable which takes on different values for each index.
For the moment, I have something like this:
k <- NULL
X <- NULL
z<- 1: n
for (k in seq(along =z)){
for (j in seq (along = 1:k)){
X[k] = 1/k*sum(A[n-k]/A[n-j+1])
}
}
which can't be right. Any idea on how to fix this one?
As always, any help would be dearly appreciated.

Try this
# define A
A <- c(1,2,3,4)
n <- length(A)
z <- 1:n
#predefine X (don't worry, all values will be overwritten, but it will have the same length as A
X <- A
for(k in z){
for(j in 1:k){
X[k] = 1/k*sum(A[n-k]/A[n-j+1])
}
}
You don't need to define z, it is only used inside the for. In this case, do for(k in 1:n){

As
You can do the following
set.seed(42)
A <- rnorm(10)
k <- sample(length(A), 4)
calc_x <- function(A, k){
n <- length(A)
c_sum <- cumsum(1/rev(A)[1:max(k)])
A[n-k]/k * c_sum[k]
}
calc_x(A,k)
what returns:
[1] 0.07775603 2.35789999 -0.45393983 0.13323284

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Creating a Data Frame with unknown size in R - r

I think you can do this way: initialaize your data frame as empty using empty vectors: df <- data.frame(val1=value1(), val2=value2(), val3=value3())

Related

R iterative assign using different data frames

For loop using decimals

Loop for value matching won't work across data frames for multiple instances

How to condition a computation and then add al computation done in R?

General code for a summation in R

Categories

Resources