What I want is to create 60 data frames with 500 rows in each. I tried the below code and, while I get no errors, I am not getting the data frames. However, when I do a View on the as.data.frame, I get the view, but no data frame in my environment. I've been trying for three days with various versions of this code:
getDS <- function(x){
for(i in 1:3){
for(j in 1:30000){
ID_i <- data.table(x$ID[j: (j+500)])
}
}
as.data.frame(ID_i)
}
getDS(DATASETNAME)
We can use outer (on a small example)
out1 <- c(outer(1:3, 1:3, Vectorize(function(i, j) list(x$ID[j:(j + 5)]))))
lapply(out1, as.data.table)
--
The issue in the OP's function is that inside the loop, the ID_i gets updated each time i.e. it is not stored. Inorder to do that we can initialize a list and then store it
getDS <- function(x) {
ID_i <- vector('list', 3)
for(i in 1:3) {
for(j in 1:3) {
ID_i[[i]][[j]] <- data.table(x$ID[j:(j + 5)])
}
}
ID_i
}
do.call(c, getDS(x))
data
x <- data.table(ID = 1:50)
I'm not sure the description matches the code, so I'm a little unsure what the desired result is. That said, it is usually not helpful to split a data.table because the built-in by-processing makes it unnecessary. If for some reason you do want to split into a list of data.tables you might consider something along the lines of
getDS <- function(x, n=5, size = nrow(x)/n, column = "ID", reps = 3) {
x <- x[1:(n*size), ..column]
index <- rep(1:n, each = size)
replicate(reps, split(x, index),
simplify = FALSE)
}
getDS(data.table(ID = 1:20), n = 5)
Related
I have to get the same rows from two datasets without using function as merge or packages like dplyr. basically I can only use for cycles and if.
I've come up to this solution:
#since the two data frames are really big, I've reduced them using:
tab1 <- tab1[seq(800,1000),]
tab2 <- tab2[seq(800,1000),]
rname1 <- rownames(tab1)
rname2 <- rownames(tab2)
vecres <- c()
#since I need the results from only the first 3 columns of datasets:
for (i in rname1) {
a <- tab1[i,c(1,2,3)]
for (j in rname2) {
b <- tab2[j,c(1,2,3)]
cond <- a == b
singlecond <- all(cond)
if (singlecond) {vecres[i] <- c(a[i,c(1,2,3)])}
}
}
I. don't know how to go on and where I'm making mistakes... please help!
You can try the code below
tab1[do.call(paste, tab1[1:3]) %in% do.call(paste, tab2[1:3]), ]
If you really want for loops, you can try
vecres <- c()
for (i in rname1) {
a <- tab1[i, c(1, 2, 3)]
for (j in rname2) {
b <- tab2[j, c(1, 2, 3)]
cond <- a == b
singlecond <- all(cond)
if (singlecond) {
vecres <- c(vecres, i)
}
}
}
tab1[vecres,]
I want to do an operation if each data frame of a list. I want to perform the Kolmogorov–Smirnov (KS) test for one column in each data frame. I am using the code below but it is not working:
PDF_mean <- matrix(nrow = length(siteNumber), ncol = 4)
PDF_mean <- data.frame(PDF_mean)
names(PDF_mean) <- c("station","normal","gamma","gev")
listDF <- mget(ls(pattern="DSF_moments_"))
length(listDF)
i <- 1
for (i in length(listDF)) {
PDF_mean$station[i] <- siteNumber[i]
PDF_mean$normal[i] <- ks.test(list[i]$mean,"pnorm")$p.value
PDF_mean$gev[i] <- ks.test(list[i]$mean,"pgev")$p.value
PDF_mean$gamma[i] <- ks.test(list[i]$mean,"gamma")$p.value
}
Any help?
It is not length(listDF) instead, it would be seq_along(listDF) or 1:length(listDF) (however, it is more appropriate with seq_along) because length is a single value and it is not doing any loop
for(i in seq_along(listDF)) {
PDF_mean$station[i] <- listDF[[i]]$siteNumber
PDF_mean$normal[i] <- ks.test(listDF[[i]]$mean,"pnorm")$p.value
PDF_mean$gev[i] <- ks.test(listDF[[i]]$mean,"pgev")$p.value
PDF_mean$gamma[i] <- ks.test(listDF[[i]]$mean,"gamma")$p.value
}
I thought that the following problem must have been answered or a function must exist to do it, but I was unable to find an answer.
I have a nested loop that takes a row from one 3-col. data frame and copies it next to each of the other rows, to form a 6-col. data frame (with all possible combinations). This works fine, but with a medium sized data set (800 rows), the loops take forever to complete the task.
I will demonstrate on a sample data set:
Sdat <- data.frame(
x = c(10,20,30,40),
y = c(15,25,35,45),
ID =c(1,2,3,4)
)
compar <- data.frame(matrix(nrow=0, ncol=6)) # to contain all combinations
names(compar) <- c("x","y", "ID", "x","y", "ID")
N <- nrow(Sdat) # how many different points we have
for (i in 1:N)
{
for (j in 1:N)
{
Temp1 <- Sdat[i,] # data from 1st point
Temp2 <- Sdat[j,] # data from 2nd point
C <- cbind(Temp1, Temp2)
compar <- rbind(C,compar)
}
}
These loops provide exactly the output that I need for further analysis. Any suggestion for vectorizing this section?
You can do:
ind <- seq_len(nrow(Sdat))
grid <- expand.grid(ind, ind)
compar <- cbind(Sdat[grid[, 1], ], Sdat[grid[, 2], ])
A naive solution using rep (assuming you are happy with a data frame output):
compar <- data.frame(x = rep(Sdat$x, each = N),
y = rep(Sdat$y, each = N),
id = rep(1:n, each = N),
x1 = rep(Sdat$x, N),
y1 = rep(Sdat$y, N),
id_1 = rep(1:n, N))
I have multiple columns in a table called "Gr1","Gr2",...,"Gr10".
I want to convert the class from character to integer. I want to do it in a dynamic way, I'm trying this, but it doesn't work:
for (i in 1:10) {
Col <- paste0('Students1$Gr',i)
Col <- as.integer(Col)
}
My objective here is to know how to add dynamically the for variable to the name of a column. Something like:
for (i in 1:10) {
Students1$Gr(i) <- as.integer(Students1$Gr(i))
}
Any idea is welcome.
Thank you very much,
Matias
# Example matrix
xm <- matrix(as.character(1:100), ncol = 10);
colnames(xm) <- paste0('Gr', 1:10);
# Example data frame
xd <- as.data.frame(xm, stringsAsFactors = FALSE);
# For matrices, this works
xm <- apply(X = xm, MARGIN = 2, FUN = as.integer);
# For data frames, this works
for (i in 1:10) {
xd[ , paste0('Gr', i)] <- as.integer(xd[ , paste0('Gr', i)]);
}
I'd like to be able use for() loop to automate the same operation that runs over many variables modifying them.
Here's simplest example to could design:
varToChange = list( 1:10, iris$Species[1:10], letters[1:10]) # assume that it has many more than just 3 elements
varToChange
for (i in varToChange ) {
if (is.character(y)) i <- as.integer(as.ordered(i))
if (is.factor(y)) i <- as.integer(i)
}
varToChange # <-- Here I want to see my elements as integers now
Here's actual example that led me to this question - taken from: Best way to plot automatically all data.table columns using ggplot2
In the following function
f <- function(dt, x,y,k) {
if (is.numeric(x)) x <- names(dt)[x]
if (is.numeric(y)) y <- names(dt)[y]
if (is.numeric(k)) k <- names(dt)[k]
ggplot(dt, aes_string(x,y, col=k)) + geom_jitter(alpha=0.1)
}
f(diamonds, 1,7,2)
instead of brutally repeating the same line many times, as a programmer, I would rather have a loop to repeat this line for me.
Something like this one:
for (i in c(x,y,k)) {
if (is.numeric(i)) i <- names(dt)[i]
}
In C/C++ this would have been done using pointers. In R - is it all possible?
UPDATE: Very nice idea to use Map below. However it does not work for this example
getColName <- function(dt, x) {
if (is.numeric(x)) {
x <- names(dt)[x]
}
x
}
f<- function(dt, x,y,k) {
list(x,y,k) <- Map(getColName, list(x,y,k), dt)
# if (is.numeric(x)) x <- names(dt)[x]
# if (is.numeric(y)) y <- names(dt)[y]
# if (is.numeric(k)) k <- names(dt)[k]
ggplot(dt, aes_string(x,y, col=k)) + geom_jitter(alpha=0.1)
}
f(diamonds, 1,7,2) # Brrr..
No need for for loop, just Map a function over each of your list items
varToChange = list( 1:10, iris$Species[1:10], letters[1:10])
myfun <- function(y) {
if (is.character(y)) y <- as.integer(as.ordered(y))
if (is.factor(y)) y <- as.integer(y)
y
}
varToChange <- Map(myfun, varToChange)
UPDATE: Map never modifies variables in place, This is simply not done in R. Use the new values returned by Map
f<- function(dt, x, y, k) {
args <- Map(function(x) getColName(dt, x), list(x=x,y=y,k=k))
ggplot(dt, aes_string(args$x,args$y, col=args$k)) + geom_jitter(alpha=0.1)
}
f(diamonds, 1,7,2)
You have two choices for iteration in R, iterate over variables themselves, or over their indices. I generally recommend iterating over indices. This case illustrates a strong advantage of that because your question is a non-issue if you are using indices.
varToChange = list( 1:10, iris$Species[1:10], letters[1:10])
for (i in seq_along(varToChange)) {
if (is.character(varToChange[[i]])) varToChange[[i]] <- as.integer(as.factor(varToChange[[i]]))
if (is.factor(varToChange[[i]])) varToChange[[i]] <- as.integer(varToChange[[i]])
}
I also replaced as.ordered() with as.factor() - the only difference between an ordered factor and a regular factor are the default contrasts used in modeling. As you are just coercing to integer, it doesn't matter.