CREATING AN EMPTY DATA FRAME:
data <- data.frame(ticks = numeric(0), identity = numeric(0), p_h = numeric(0), p_y = numeric(0), v_x = numeric(0), v_y = numeric(0), size = numeric(0), homo = numeric(0))
ADDING DATA TO DATA FRAME:
while (x < timeStepsToRun)
{
.....
data[i, ] <- c(ag$ticks, ag$who, ag$xcor, ag$ycor, ag$v-x, ag$v-y,"10","1")
i=i+1;
...
}
THOUGH I GET THE FOLLOWING ERROR WHEN ADDING DATA:
Error in value[[jvseq[[jjj]]]] : subscript out of bounds
In addition: Warning message:
In matrix(value, n, p) : data length exceeds size of matrix
Please suggest a better strategy or help me in correcting the above.
Thanks in advance!
If you know how large you need your data.frame to be, prespecify the size, then you won't encounter these kind of errors:
rows <- 1e4 # it's not clear how many you actually need from your example
data <- setNames(as.data.frame(matrix(nrow = rows, ncol = 8))
c('ticks', 'identity', 'p_h', 'p_y', 'v_x', 'v_y', 'size', 'homo'))
Then you can fill it in the way that you describe. Even creating a dataframe larger than the one you need and cutting it down to size later is more efficient than growing it row by row.
If you know the classes of the columns you are going to create it can also be performance-enhancing to prespecify the column classes:
rows <- 1e4
data <- data.frame(ticks = integer(rows),
identity = character(rows),
p_h = numeric(rows),
p_y = numeric(rows),
v_x = numeric(rows),
v_y = numeric(rows),
size = numeric(rows),
homo = numeric(rows))
I gotchu:
data <- data.frame(ticks=NA, identity=NA, p_h=NA, p_y=NA, v_x=NA, v_y=NA,
size=NA, homo=NA)
timeStepstoRun <- 10
x <- #something
i <- 1
while (x < timeStepstoRun) {
data[i,] <- 1:8
i <- i + 1
}
Just replace timeStepstoRun, x, and data[x,] <- ... with whatever you actually have. This is never gonna be the good way to do what you're trying to do, but I thought I'd just throw it out.
My personal favorite solution:
# Create data frame with 0 rows and 3 columns.
df <- data.frame(matrix(ncol = 3, nrow = 0))
# Provide column names.
colnames(df) <- c('var1', 'var2', 'var3')
# Add the row.
df[nrow(df) + 1,] = c("a", "b", "c")
Once you have this simplified version working, you can adapt it to your while loop.
Source: https://www.statology.org/create-empty-data-frame-in-r/
Related
I am working on an assignment, which tasks me to generate a list of data, using the below code.
##Use the make_data function to generate 25 different datasets, with mu_1 being a vector
x <- seq(0, 3, len=25)
make_data <- function(a){
n = 1000
p = 0.5
mu_0 = 0
mu_1=a
sigma_0 = 1
sigma_1 = 1
y <- rbinom(n, 1, p)
f_0 <- rnorm(n, mu_0, sigma_0)
f_1 <- rnorm(n, mu_1, sigma_1)
x <- ifelse(y == 1, f_1, f_0)
test_index <- createDataPartition(y, times = 1, p = 0.5, list = FALSE)
list(train = data.frame(x = x, y = as.factor(y)) %>% slice(-test_index),
test = data.frame(x = x, y = as.factor(y)) %>% slice(test_index))
}
dat <- sapply(x,make_data)
The code looks good to go, and 'dat' appears to be a 25 column, 2 row table, each with its own data frame.
Now, each data frame within a cell has 2 columns.
And this is where I get stuck.
While I can get to the data frame in row 1, column 1, just fine (i.e. just use dat[1,1]), I can't reach the column of 'x' values within dat[1,1]. I've experimented with
dat[1,1]$x
dat[1,1][1]
But they only throw weird responses: error/null.
Any idea how I can pull the column? Thanks.
dat[1, 1] is a list.
class(dat[1, 1])
#[1] "list"
So to reach to x you can do
dat[1, 1]$train$x
Or
dat[1, 1][[1]]$x
As a sidenote, instead of having this 25 X 2 matrix as output in dat I would actually prefer to have a nested list.
dat <- lapply(x,make_data)
#Access `x` column of first list from `train` dataset.
dat[[1]]$train$x
However, this is quite subjective and you can chose whatever format you like the best.
I thought that the following problem must have been answered or a function must exist to do it, but I was unable to find an answer.
I have a nested loop that takes a row from one 3-col. data frame and copies it next to each of the other rows, to form a 6-col. data frame (with all possible combinations). This works fine, but with a medium sized data set (800 rows), the loops take forever to complete the task.
I will demonstrate on a sample data set:
Sdat <- data.frame(
x = c(10,20,30,40),
y = c(15,25,35,45),
ID =c(1,2,3,4)
)
compar <- data.frame(matrix(nrow=0, ncol=6)) # to contain all combinations
names(compar) <- c("x","y", "ID", "x","y", "ID")
N <- nrow(Sdat) # how many different points we have
for (i in 1:N)
{
for (j in 1:N)
{
Temp1 <- Sdat[i,] # data from 1st point
Temp2 <- Sdat[j,] # data from 2nd point
C <- cbind(Temp1, Temp2)
compar <- rbind(C,compar)
}
}
These loops provide exactly the output that I need for further analysis. Any suggestion for vectorizing this section?
You can do:
ind <- seq_len(nrow(Sdat))
grid <- expand.grid(ind, ind)
compar <- cbind(Sdat[grid[, 1], ], Sdat[grid[, 2], ])
A naive solution using rep (assuming you are happy with a data frame output):
compar <- data.frame(x = rep(Sdat$x, each = N),
y = rep(Sdat$y, each = N),
id = rep(1:n, each = N),
x1 = rep(Sdat$x, N),
y1 = rep(Sdat$y, N),
id_1 = rep(1:n, N))
I am writing a function to build new data frames based on existing data frames. So I essentially have
f1 <- function(x,y) {
x_adj <- data.frame("DID*"= df.y$`DM`[x], "LDI"= df.y$`DirectorID*`[-(x)], "LDM"= df.y$`DM`[-(x)], "IID*"=y)
}
I have 4,000 data frames df., so I really need to use this and R is returning an error saying that df.y is not found. y is meant to be used through a list of all the 4000 names of the different df. I am very new at R so any help would be really appreciated.
In case more specifics are needed I essentially have something like
df.1 <- data.frame(x = 1:3, b = 5)
And I need the following as a result using a function
df.11 <- data.frame(x = 1, c = 2:3, b = 5)
df.12 <- data.frame(x = 2, c = c(1,3), b = 5)
df.13 <- data.frame(x = 3, c = 1:2, b = 5)
Thanks in advance!
OP seems to access data.frame with dynamic name.
One option is to use get:
get(paste("df",y,sep = "."))
The above get will return df.1.
Hence, the function can be modified as:
f1 <- function(x,y) {
temp_df <- get(paste("df",y,sep = "."))
x_adj <- data.frame("DID*"= temp_df$`DM`[x], "LDI"= temp_df$`DirectorID*`[-(x)],
"LDM"= temp_df$`DM`[-(x)], "IID*"=y)
}
my problem is similar to the question as followingthe problem of R-input Format
I have tried the above code in the above link and revised some part to suit my data. my data is like follow
I want my data can be created as a data frame with 4 variable vectors. The code what I have revised is
formatMhsmm <- function(data){
nb.sequences = nrow(data)
nb.variables = ncol(data)
data_df <- data.frame(matrix(unlist(data), ncol = 4, byrow = TRUE))
# iterate over these in loops
rows <- 1: nb.sequences
# build vector with id value
id = numeric(length = nb.sequences)
for( i in rows)
{
id[i] = data_df[i,2]
}
# build vector with time value
time = numeric (length = nb.sequences)
for( i in rows)
{
time[i] = data_df[i,3]
}
# build vector with observation values
sequences = numeric(length = nb.sequences)
for(i in rows)
{
sequences[i] = data_df[i, 4]
}
data.df = data.frame(id,time,sequences)
# creation of hsmm data object need for training
N <- as.numeric(table(data.df$id))
train <- list(x = data.df$sequences, N = N)
class(train) <- "hsmm.data"
return(train)
}
library(mhsmm)
dataset <- read.csv("location.csv", header = TRUE)
train <- formatMhsmm(dataset)
print(train)
The output observation is not the data of 4th col, it's a list of (4, 8, 12,...,396, 1, 1, ..., 56, 192,...,6550, 68, NA, NA,...) It has picked up 1/4 data of each col. Why it is like this?
Thank you very much!!!!
Why don't you simply count yout observations by Id, and create the hsmm.data object directly? Supposing yout dataframe is called "data", we have:
N <- as.numeric(table(data$id))
train <- list(x=data$location, N = N)
class(train) <- "hsmm.data"
Extracted from http://www.jstatsoft.org/v39/i04/paper
I am trying to vectorize my nested for loop code using apply/mapply/lapply/sapply or any other way to reduce the running time. My code is as follows:
for (i in 1:dim){
for (j in i:dim){
if(mydist.fake[i,j] != d.hat.fake[i,j]){
if((mydist.fake[i,j]/d.hat.fake[i,j] > 1.5)|(d.hat.fake[i,j]/mydist.fake[i,j]>1.5)){
data1 = cbind(rowNames[i],rowNames[j], mydist.fake[i,j], d.hat.fake[i,j], 1)
colnames(data1) = NULL
row.names(data1) = NULL
data = rbind(data, data1)
}else{
data1 = cbind(rowNames[i],rowNames[j], mydist.fake[i,j], d.hat.fake[i,j], 0)
colnames(data1) = NULL
row.names(data1) = NULL
data = rbind(data, data1)
}
}
}
}
write.table(data, file = "fakeTest.txt", sep ="\t", col.names = FALSE, row.names = FALSE)
rowNames is the vector of rownames of all data points
data is a dataframe
mydist.fake and d.hat.fake are distance matrices (where the diagonal is zero and values of upper and lower triangle is same) and therefore, interested in the transversal of lower triangle (leaving values of diagonals too).
The dimensions of the both the matrices are the same.
The major problem I am facing is the vectorization of the j loop where j is initialized as i.
A vectorized version of your code is:
dist1 <- mydist.fake
dist2 <- d.hat.fake
data <- data.frame(i = rowNames[row(dist1)[lower.tri(dist1)]],
j = rowNames[col(dist1)[lower.tri(dist1)]],
d1 = dist1[lower.tri(dist1)],
d2 = dist2[lower.tri(dist2)])
data <- transform(data, outcome = d1/d2 > 1.5 | d2/d1 > 1.5)
I tested it successfully using the following sample data:
X <- matrix(runif(200), 20, 10)
Y <- matrix(runif(200), 20, 10)
rowNames <- paste0("var", seq_len(nrow(X)))
mydist.fake <- as.matrix(dist(X))
d.hat.fake <- as.matrix(dist(Y))