I am looking to reorder my data into a new dataframe (list in the example below) which the first observation is first, then the last observation is second, both observations are removed from the initial dataframe and then repeat.
data <- seq(1,12,1)
i <- 1
ii <- 1:length(data)
newData <- seq(1,12,1)
for (i in ii){
a <- 1
newData[i] <- data[a]
i <- i + 1
b <- as.numeric(length(data))
newData[i]<- data[b]
index <- c(a, b)
data <- data[-index]
i <- i + 1
}
I receive the error: "Error in newData[i] <- data[b] : replacement has length zero" and the loop stops at i = 8, and the list "data" is empty.
If I run the contents of the loop, but not the loop itself, I get my desired outcome both in this example and my task; but obviously I want to run the loop given the size of my data.
As MrFlick mentioned, you can't modify index in a for loop. But given you only need every second index, you can specify that your loop definition, by using
ii <- seq(1,length(data),2)
However, you don't need a for loop for rearranging the elements of your vector data. you only need a vector of the form (firs, last, second, secon last, etc.):
m = matrix(c(1:6,12:7), ncol=2)
i = as.vector(t(m))
newdata = data[i]
Related
I have 2 data.frames that store values that will be retrieved and used in an equation. I would like to calculate the result for each row in df1 using each row in df2... basically iterating through the first df rows with each row of the second df.
df1 <- tibble(Type=c("atype","btype","ctype"),
h=c("1","2","3"),
w=c("4","5","6"),
ED=c("101","102","103"))
df2 <- tibble(Item=c("123-htc","but-456","xtc","newID"),
limit=c("rnorm(1)","12","13","14"),
zone=c("30","40","30","11"))
#Note: values in dfs stored as characters/strings to also use get values from random number functions
resItm <- list()
resTyp <- list()
for (i in 1:length(df1$Type)) {
h <- NULL
w <- NULL
ED <- NULL
limit <- NULL
zone <- NULL
for (j in 1:length(df2$Item)) {
Typ<-df1[i,]
Itm<-df2[j,]
h<-eval(str2lang(Typ$h))
w<-eval(str2lang(Typ$w))
ED<-eval(str2lang(Typ$ED))
limit<-eval(str2lang(Itm$limit))
zone<-eval(str2lang(Itm$zone))
res1 <- (h * limit * ED) / (zone)
resItm[[j]] <- list(Item=Itm$Item, Result=res1)
}
resTyp[[i]] <- list(Type=Typ$Type, Res_Item = resItm)
}
str(resTyp)
This worked fine with nested for loops and got me to the expected results, however, I was trying to also do this with purrr::map. And thats when it didn't work so well
map2(df1,df2,function(.x,.y) {(eval(str2lang(.x$h))*eval(str2lang(.y$limit))*eval(str2lang(.x$ED)))/eval(str2lang(.y$zone))})
#Error in `map2()`:
#! Can't recycle `.x` (size 4) to match `.y` (size 3).
#Run `rlang::last_error()` to see where the error occurred.
How the heck do I get this to work... i also tried nested map() calls, and tried putting the calculation in a named function map(df1,~map(df2,~c(...))) but that doesnt let me get at the values in each column...
Surely passing 2 dfs row wise to a function has been tried before where each row in one df is iterated with the other?? any help appreciated!
I intend to fill a matrix I created that has 1000 rows and 2 columns. Here B is 1000.
resampled_ests <- matrix(NA, nrow = B, ncol = 2)
names(resampled_ests) <- c("Intercept_Est", "Slope_Est")
I want to fill it using a for loop looping from 1 to 1000.
ds <- diamonds[resampled_values[b,],]
Here, each of the ds(there should be 1000 versions of it in the for loop) is a data frame with 2 columns and 2000 rows. and I would like to use the lm() function to get the Beta coefficients of the two columns of data.
for (b in 1:B) {
#Write code that fills in the matrix resample_ests with coefficent estimates.
ds <- diamonds[resampled_values[b,],]
lm2 <- lm(ds$price~ds$carat, data = ds)
rowx <- coefficients(lm2)
resampled_ests <- rbind(rowx)
}
However, after I run the loop, resampled_ests, which is supposed to be a matrix of 1000 rows only shows 1 row, 1 pair of coefficients. But when I test the code outside of the loop by replacing b with numbers, I get different results which are correct. But by putting them together in a for loop, I don't seem to be row binding all of these different pairs of coefficients. Can someone explain why the result matrix resampled_etsis only showing one result case(1 row) of data?
rbind(x) returns x because you're not binding it to anything. If you want to build a matrix row by row, you need something like
resampled_ests <- rbind(resampled_ests, rowx)
This also means you need to initialize resampled_ests before the loop.
Which, if you're doing that anyway, I might just make a 1000 x 2 matrix of zeros and fill in the rows in the loop. Something like...
resampled_ests <- matrix(rep(0, 2*B), nrow=B)
for (b in 1:B) {
ds <- diamonds[resampled_values[b,],]
lm2 <- lm(ds$price~ds$carat, data = ds)
rowx <- coefficients(lm2)
resampled_ests[b,] <- rowx
}
I am trying to subset this data frame by pre determined row numbers.
# Make dummy data frame
df <- data.frame(data=1:200)
train.length <- 1:2
# Set pre determined row numbers for subsetting
train.length.1 = 1:50
test.length.1 = 50:100
train.length.2 = 50:100
test.length.2 = 100:150
train.list <- list()
test.list <- list()
# Loop for subsetting by row, using row numbers in variables above
for (i in 1:length(train.length)) {
# subset by row number, each row number in variables train.length.1,2etc..
train.list[[i]] <- df[train.length.[i],] # need to place the variable train.length.n here...
test.list[[i]] <- df[test.length.[i],] # place test.length.n variable here..
# save outcome to lists
}
My question is, if I have my row numbers stored in a variable, how I do place each [ith] one inside the subsetting code?
I have tried:
df[train.length.[i],]
also
df[paste0"train.length.",[i],]
however that pastes as a character and it doesnt read my train.length.n variable... as below
> train.list[[i]] <- df[c(paste0("train.length.",train.length[i])),]
> train.list
[[1]]
data data1
NA NA NA
If i have the variable in there by itself, it works as intended. Just need it to work in a for loop
Desired output - print those below
train.set.output.1 <- df[train.length.1,]
test.set.output.1 <- df[test.length.1,]
train.set.output.2 <- df[train.length.2,]
test.set.output.2 <- df[test.length.2,]
I can do this manually, but its cumersome for lots of train / test sets... hence for loop
Consider staggered seq() and pass the number sequences in lapply to slice by rows. Also, for equal-length dataframes, you likely intended starts at 1, 51, 101, ...
train_num_set <- seq(1, 200, by=50)
train.list <- lapply(train_num_set, function(i) df[c(i:(i+49)),])
test_num_set <- seq(51, 200, by=50)
test.list <- lapply(test_num_set, function(i) df[c(i:(i+49)),])
Create a function that splits your data frame into different chunks:
split_frame_by_chunks <- function(data_frame, chunk_size) {
n <- nrow(data_frame)
r <- rep(1:ceiling(n/chunk_size),each=chunk_size)[1:n]
sub_frames <- split(data_frame,r)
return(sub_frames)
}
Call your function using your data frame and chunk size. In your case, you are splitting your data frame into chunks of 50:
chunked_frames <- split_frame_by_chunks(data_frame, 50)
Decide number of train/test splits to create in the loop
num_splits <- 2
Create the appropriate train and test sets inside your loop. In this case, I am creating the 2 you showed in your question. (i.e. the first loop creates a train and test set with rows 1-50 and 50-100 respectively):
for(i in 1:num_splits) {
this_train <- chunked_frames[i]
this_test <- chunked_frames[i+1]
}
Just do whatever you need to the dynamically created train and test frames inside your loop.
I have a problem in R trying to accumulate data that is gathered inside a for loop, that will iterate 200 + times before it ends, into a numeric vector defined before the loop starts. When the function returns the results it is apparent that the vector is only holding data from the last iteration of the loop. See the following pseudo codish example:
results <- numeric()
for(i in records)
a <- read a record in
b <- identify complete cases in a
c < sum(b)
if(c < 10)
d <- strip out rows with NAs in a
results <- cor(d[3:4])
endif
print results
end
One thing I am fairly certain of is that I need to some how define the length of "results" but the exact size is unknown until the function ends.
Any and all help will be appreciated.
As #Alex points out you can extend the vector like this:
results <- c(results, cor(d[3:4]))
or you can extend it implicitly like this:
results[i] <- cor(d[3:4])
or you can use the above line in conjunction with initializing the full
length vector like this:
results <- numeric(length(numeric()))
I am trying to populate a data frame from within a for loop in R. The names of the columns are generated dynamically within the loop and the value of some of the loop variables is used as the values while populating the data frame. For instance the name of the current column could be some variable name as a string in the loop, and the column can take the value of the current iterator as its value in the data frame.
I tried to create an empty data frame outside the loop, like this
d = data.frame()
But I cant really do anything with it, the moment I try to populate it, I run into an error
d[1] = c(1,2)
Error in `[<-.data.frame`(`*tmp*`, 1, value = c(1, 2)) :
replacement has 2 rows, data has 0
What may be a good way to achieve what I am looking to do. Please let me know if I wasnt clear.
It is often preferable to avoid loops and use vectorized functions. If that is not possible there are two approaches:
Preallocate your data.frame. This is not recommended because indexing is slow for data.frames.
Use another data structure in the loop and transform into a data.frame afterwards. A list is very useful here.
Example to illustrate the general approach:
mylist <- list() #create an empty list
for (i in 1:5) {
vec <- numeric(5) #preallocate a numeric vector
for (j in 1:5) { #fill the vector
vec[j] <- i^j
}
mylist[[i]] <- vec #put all vectors in the list
}
df <- do.call("rbind",mylist) #combine all vectors into a matrix
In this example it is not necessary to use a list, you could preallocate a matrix. However, if you do not know how many iterations your loop will need, you should use a list.
Finally here is a vectorized alternative to the example loop:
outer(1:5,1:5,function(i,j) i^j)
As you see it's simpler and also more efficient.
You could do it like this:
iterations = 10
variables = 2
output <- matrix(ncol=variables, nrow=iterations)
for(i in 1:iterations){
output[i,] <- runif(2)
}
output
and then turn it into a data.frame
output <- data.frame(output)
class(output)
what this does:
create a matrix with rows and columns according to the expected growth
insert 2 random numbers into the matrix
convert this into a dataframe after the loop has finished.
this works too.
df = NULL
for (k in 1:10)
{
x = 1
y = 2
z = 3
df = rbind(df, data.frame(x,y,z))
}
output will look like this
df #enter
x y z #col names
1 2 3
Thanks Notable1, works for me with the tidytextr
Create a dataframe with the name of files in one column and content in other.
diretorio <- "D:/base"
arquivos <- list.files(diretorio, pattern = "*.PDF")
quantidade <- length(arquivos)
#
df = NULL
for (k in 1:quantidade) {
nome = arquivos[k]
print(nome)
Sys.sleep(1)
dados = read_pdf(arquivos[k],ocr = T)
print(dados)
Sys.sleep(1)
df = rbind(df, data.frame(nome,dados))
Sys.sleep(1)
}
Encoding(df$text) <- "UTF-8"
I had a case in where I was needing to use a data frame within a for loop function. In this case, it was the "efficient", however, keep in mind that the database was small and the iterations in the loop were very simple. But maybe the code could be useful for some one with similar conditions.
The for loop purpose was to use the raster extract function along five locations (i.e. 5 Tokio, New York, Sau Paulo, Seul & Mexico city) and each location had their respective raster grids. I had a spatial point database with more than 1000 observations allocated within the 5 different locations and I was needing to extract information from 10 different raster grids (two grids per location). Also, for the subsequent analysis, I was not only needing the raster values but also the unique ID for each observations.
After preparing the spatial data, which included the following tasks:
Import points shapefile with the readOGR function (rgdap package)
Import raster files with the raster function (raster package)
Stack grids from the same location into one file, with the function stack (raster package)
Here the for loop code with the use of a data frame:
1. Add stacked rasters per location into a list
raslist <- list(LOC1,LOC2,LOC3,LOC4,LOC5)
2. Create an empty dataframe, this will be the output file
TB <- data.frame(VAR1=double(),VAR2=double(),ID=character())
3. Set up for loop function
L1 <- seq(1,5,1) # the location ID is a numeric variable with values from 1 to 5
for (i in 1:length(L1)) {
dat=subset(points,LOCATION==i) # select corresponding points for location [i]
t=data.frame(extract(raslist[[i]],dat),dat$ID) # run extract function with points & raster stack for location [i]
names(t)=c("VAR1","VAR2","ID")
TB=rbind(TB,t)
}
was looking for the same and the following may be useful as well.
a <- vector("list", 1)
for(i in 1:3){a[[i]] <- data.frame(x= rnorm(2), y= runif(2))}
a
rbind(a[[1]], a[[2]], a[[3]])