I know there are a lot of posts on how to save data out of loops to data frames, but i've been having some trouble making it work for me. Currently i am only able to get my data using print, but would like for it to instead be put into a data frame. I can't predict how many lines of data or responses per line (although I just need a single true/false) it will give.
Suggestions on how to get the P loop to output data to a dataframe?
max <- max(x$a)
for (n in 1:max) {
print(n)
#right now i'm just printing the iteration and data to console
result <- x[x$a==n,"b"]
test <- unique(as.numeric(unlist(result)))
#Below is the loop i'd like to save the data from
for (P in test)
print({
ar <- x[x$b==P & x$a!=n,"a"]
ar1 <- sapply(unique(as.numeric(unlist(ar))),
function(f)
x[x$a==f & x$b!=P,"b"])
af <- sapply(ar1, function(f) any(match(f,result)))
})
}
Thanks!
Initiate an empty data frame:
results <- data.frame(it=numeric(), P=numeric(), value=logical())
And then instead of printing, just add this inside your loop:
results[nrow(results)+1,] <- list( [your 3 values separated by ","] )
Related
I am currently working on an imputation project where I need to evaluate my methods of imputation. I have my incomplete dataframe with NAs from which I calculate the missing rate for every column/variable. My second data frame contains the complete cases which I extracted from the first data frame. I now want to simulate the missingness structure of the real data in the frame containing the complete cases. the data frame with the generated NAs get stored in the object "result" as you can see in the code. If I now want to replicate this code and thus generate 100 different data frames like "result", how do I replicate and save them separately?
I'm a beginner and would be really thankful for your answers!
I tried to put my loop which generates the NAs in another loop which contains the replicate() command and counts from 1:100 and saves these 100 replicated data frames but it didn't work at all.
result = data.frame(res0=rep(NA, dim(comp_cas)[1]))
for (i in 1:length(Z32_miss_item$miss_per_item)) {
dat = comp_cas[,i]
missRate = Z32_miss_item$miss_per_item[i]
cat (i, " ", paste0(dat, collapse=",") ," ", missRate, "!\n")
df <- data.frame("res"= GenMiss(x=dat, missrate = missRate), stringsAsFactors = FALSE)
colnames(df) = gsub("res", paste0("Var", i), colnames(df))
result = cbind(result, df)
}
result = result[,-1]
I expect that every data frame of the 100 runs get saved in a separate .rda file in my project folder.
also, is imputation and the evaluation of fitness of the latter beginner stuff in r or at what level of proficiency am I if you take a look at the code that I posted?
It is difficult to guess what exactly you are doing without some dummy data. But it is fine to have loops within loops and to save data.frames. Firstly, I would avoid the replicate function here as it has a strange syntax and just stick with plain loops. Secondly, you must make sure that the loops have different indexes (i.e. for(i ... should be surrounded by, say, for(j ... since functions can loop outside their scope in R. Finally, use saveRDS rather than save, as you can then have each object (data.frame) saved in separate .rds files. The save function is designed for saving your whole workspace so that you can pick up where you left off.
fun <- function(i){
df <- data.frame(x=rnorm(5))
names(df) <- paste0("x",i)
df
}
for(j in 1:100){
res <- data.frame(id=1:5)
for(i in 1:10){
res <- cbind(res, fun(i))
}
saveRDS(res, sprintf("replication_%s.rds",j))
}
I have a list of locally saved html files. I want to extract multiple nodes from each html and save the results in a vector. Afterwards, I would like to combine them in a dataframe. Now, I have a piece of code for 1 node, which works (see below), but it seems quite long and inefficient if I apply it for ~ 20 variables. Also, something really strange with the saving to vector (XXX_name) it starts with the last observation and then continues with the first, second, .... Do you have any suggestions for simplifying the code/ making it more efficient?
# Extracts name variable and stores in a vector
XXX_name <- c()
for (i in 1:216) {
XXX_name <- c(XXX_name, name)
mydata <- read_html(files[i], encoding = "latin-1")
reads_name <- html_nodes(mydata, 'h1')
name <- html_text(reads_name)
#print(i)
#print(name)
}
Many thanks!
You can put the workings inside a function then apply that function to each of your variables with map
First, create the function:
read_names <- function(var, node) {
mydata <- read_html(files[var], encoding = "latin-1")
reads_name <- html_nodes(mydata, node)
name <- html_text(reads_name)
}
Then we create a df with all possible combinations of inputs and apply the function to that
library(tidyverse)
inputs <- crossing(var = 1:216, node = vector_of_nodes)
output <- map2(inputs$var, inputs$node, read_names)
I'm trying to obtain GPS coordinate information for each species in a given data frame of species names using a package-specific function (Red::records) which pulls coordinate information from a database containing information about species distributions.
My For-loop is constructed below, where iterations is the nrow(names) and the function records returns lat/long coordinates:
for(i in 1:iterations){
gbif[i,1] <- names[i,] ## grab names
try(temp1 <- records(names[i,]))
try(temp1$scientificName <- names[i,])
try(temp2 <- merge(gbif, temp1, by.x="V1", by.y="scientificName"))
datalist[[i]] <- temp2
}
After executing this loop, I am able to obtain data for species; however, it is not appropriately merged with the namelist. For example, calling records("Agyneta flibuscrocus") correctly returns 5 unique lat/long coordinates while calling records("Agyneta mongolica") produces an error with 0 records found (this is valid for each species when checked online).
After this loop, I bind all of the obtained records into a single data frame using:
dat = do.call(rbind, datalist) ## merge all occurrence data from GBIF into
one data frame
dat <- unique(dat)
When I go to verify this data frame, I get the following sample data:
Agyneta flibuscrocus -115.58400 49.72
Agyneta flibuscrocus -117.58400 51.299
...
Agyneta mongolica -115.58400 49.72
Agyneta mongolica -117.58400 51.299
These erroneous replications are also repeated throughout the rest of the 200 names. As a side note, I wrapped everything in try statements because the code will not execute if it runs into a record that produces 0 results from the database.
I feel like I am overlooking something very obvious here?
Reproducible Data & Code:
install.packages("red")
library(red)
names = data.frame("Acantheis variatus", "Agyneta flibuscrocus", "Agyneta
mongolica", "Alpaida alticeps", "Alpaide venilliae", "Amaurobius
transversus", "Apochinomma nitidum")
iterations = nrow(names)
datalist = list()
temp1 <- data.frame() ## temporary data frame for joining occurrence data
from GBIF
for(i in 1:iterations){
gbif <- names[i,] ## grab name
try(temp1 <- records(gbif))
try(temp1$V1 <- gbif)
datalist[[i]] <- temp1
}
dat = do.call(rbind, datalist)
I adapted some parts of your script and now it seems to work properly (with your example data the function only successfully retrieves data for one species, the one that got replicated in your code, but that's not a coding issue).
The main reason for the erroneous duplications was the variable temp1 being reused. try(temp1 <- records(gbif)) failed but try(temp1$V1 <- gbif) did not, since both temp1 and gbif were (erroneously) defined. Make sure that variables defined in an iteration of a loop don't get carried over to the next iteration.
iterations = nrow(myNames)
datalist = list()
for(i in 1:iterations){
gbif <- myNames[i,] ## grab name
try_result <- try(records(gbif))
if(class(try_result) != "try-error"){
temp1 <- try_result
temp1$V1 <- gbif
datalist[[i]] <- temp1
rm(temp1)
}else{
datalist[[i]] <- NA
}
rm(try_result)
}
dat <- do.call(rbind, datalist[!is.na(datalist)])
I have written a loop in R (still learning). My purpose is to pick the max AvgConc and max Roll_TotDep from each looping file, and then have two data frames that each contains all the max numbers picked from individual files. The code I wrote only save the last iteration results (for only one single file)... Can someone point me a right direction to revise my code, so I can append the result of each new iteration with previous ones? Thanks!
data.folder <- "D:\\20150804"
files <- list.files(path=data.folder)
for (i in 1:length(files)) {
sub <- read.table(file.path(data.folder, files[i]), header=T)
max1Conc <- sub[which.max(sub$AvgConc),]
maxETD <- sub[which.max(sub$Roll_TotDep),]
write.csv(max1Conc, file= "max1Conc.csv", append=TRUE)
write.csv(maxETD, file= "maxETD.csv", append=TRUE)
}
The problem is that max1Conc and maxETD are not lists data.frames or vectors (or other types of object capable of storing more than one value).
To fix this:
maxETD<-vector()
max1Conc<-vector()
for (i in 1:length(files)) {
sub <- read.table(file.path(data.folder, files[i]), header=T)
max1Conc <- append(max1Conc,sub[which.max(sub$AvgConc),])
maxETD <- append(maxETD,sub[which.max(sub$Roll_TotDep),])
write.csv(max1Conc, file= "max1Conc.csv", append=TRUE)
write.csv(maxETD, file= "maxETD.csv", append=TRUE)
}
The difference here is that I made the two variables you wish to write out empty vectors (max1Conc and maxETD), and then used the append command to add each successive value to the vectors.
There are more idiomatic R ways of accomplishing your goal; personally, I suggest you look into learning the apply family of functions. (http://adv-r.had.co.nz/Functionals.html)
I can't directly test the whole thing because I don't have a directory with files like yours, but I tested the parts, and I think this should work as an apply-driven alternative. It starts with a pair of functions, one to ingest a file from your directory and other to make a row out of the two max values from each of those files:
library(dplyr)
data.folder <- "D:\\20150804"
getfile <- function(filename) {
sub <- read.table(file.path(data.folder, filename), header=TRUE)
return(sub)
}
getmaxes <- function(df) {
rowi <- data.frame(AvConc.max = max(df[,"AvConc"]), ETD.max = max(df[,"ETD"]))
return(rowi)
}
Then it uses a couple of rounds of lapply --- embedded in piping courtesy ofdplyr --- to a) build a list with each data set as an item, b) build a second list of one-row data frames with the maxes from each item in the first list, c) rbind those rows into one big data frame, d) and then cbind the filenames to that data frame for reference.
dfmax <- lapply(as.list(list.files(path = data.folder)), getfiles) %>%
lapply(., getmaxes) %>%
Reduce(function(...) rbind(...), .) %>%
data.frame(file = list.files(path = data.folder), .)
I have a dataframe data with information on tiffs, including one column txt describing the content of the tiff. Unfortunately, txt is not always correct and we need to correct them by hand. Therefore I want to loop over each row in data, show the tiff and ask for feedback, which is than put into data$txt.cor.
setwd(file.choose())
Some test tiffs (with nonsene inside, but to show the idea...):
txt <- sample(100:199, 5)
for (i in 1:length(txt)){
tiff(paste0(i, ".tif"))
plot(txt[i], ylim = c(100, 200))
dev.off()
}
and the dataframe:
pix.files <- list.files(getwd(), pattern = "*.tif", full.names = TRUE)
pix.file.info <- file.info(pix.files)
data <- cbind(txt, pix.file.info)
data$file <- row.names(pix.file.info)
data$txt.cor <- ""
data$txt[5] <- 200 # wrong one
My feedback function (error handling stripped):
read.number <- function(){
n <- readline(prompt = "Enter the value: ")
n <- as.character(n) #Yes, character. Sometimes we have alphanumerical data or leading zeros
}
Now the loop, for which help would be very much appreciated:
for (i in nrow(data)){
file.show(data[i, "file"]) # show the image file
data[i, "txt.cor"] <- read.number() # aks for the feedback and put it back into the dataframe
}
In my very first attempts I was thinking of the plot.lm idea, where you go through the diagnostic plots after pressing return. I suspect that plot and tiffs are not big friends. file.show turned out to be easier. But now I am having a hard time with that loop...
Your problem is that you don't loop over the data, you only evaluate the last row. Simply write 1:nrow(data)to iterate over all rows.
To display your tiff images in R you can use the package rtiff:
library(rtiff)
for (i in 1:nrow(data)){
tif <- readTiff(data[i,"file"]) # read in the tiff data
plot(tif) # plot the image
data[i, "txt.cor"] <- read.number() # aks for the feedback and put it back into the dataframe
}