R function displays object "obser" but won't write.CSV - r

My code works and displays the correct values on screen for print(obser) but will not write the .csv and when I tried str(obser) it gave error object 'obser' not found. I have tried various online help and books and the function is correctly written.
If instead of running the function in the console in RStudio, I run line by line in the scrip screen will the csv then be created?
complete <- function(directory= "specdata", id = 1:332){
# directory <- "specdata"
# id <- 1:332
files_list <- list.files(path=directory,full.names=T)[id]
NumOfFiles <- length(files_list)
obser <- data.frame()
indivFile <-data.frame()
nobserv <- vector(mode= "integer", length = NumOfFiles)
for (i in 1:NumOfFiles){
indivFile <- read.csv(files_list[i]) # read data file into df inc NA's
indivFile <- na.omit(indivFile) # removes NA prev file
x <- nrow(indivFile[1])
nobserv[i] <- x
}
x_name <-"ID"
y_name <-"nobs"
obser <- data.frame(id, nobserv)
return(obser) # object returned
print(obser)
wd <- getwd()
setwd(wd)
write.csv(obser, file="Observations2.csv")
}

Try and save the object returned from the function. Ergo:
output<-complete("specdata",id=1:332)
write.csv(output,"Observations2.csv",row.names=F)

Related

Trouble using mutate within a for loop

I'm trying to write a function called complete that takes a file directory (which has csv files titled 1-332) and the title of the file as a number to print out the number of rows without NA in the sulfate or nitrate columns. I am trying to use mutate to add a column titled nobs which returns 1 if neither column is na and then takes the sum of nobs for my answer, but I get an error message that the object nob is not found. How can I fix this? The specific file directory in question is downloaded within this block of code.
library(tidyverse)
if(!file.exists("rprog-data-specdata.zip")) {
temp <- tempfile()
download.file("https://d396qusza40orc.cloudfront.net/rprog%2Fdata%2Fspecdata.zip",temp)
unzip(temp)
unlink(temp)
}
complete <- function(directory, id = 1:332){
#create a list of files
files_full <- list.files(directory, full.names = TRUE)
#create an empty data frame
dat <- data.frame()
for(i in id){
dat <- rbind(dat, read.csv(files_full[i]))
}
mutate(dat, nob = ifelse(!is.na(dat$sulfate) & !is.na(dat$nitrate), 1, 0))
x <- summarise(dat, sum = sum(nob))
return(x)
}
When one runs the following code nobs should be 117, but I get an error message instead
complete("specdata", 1)
Error: object 'nob' not found"
I think the function below should get what you need. Rather than a loop, I prefer map (or apply) in this setting. It's difficult to say where your code went wrong without the error message or an example I can run on my machine, however.
Happy Coding,
Daniel
library(tidyverse)
complete <- function(directory, id = 1:332){
#create a list of files
files_full <- list.files(directory, full.names = TRUE)
# cycle over each file to get the number of nonmissing rows
purrr::map_int(
files_full,
~ read.csv(.x) %>% # read in datafile
dplyr::select(sulfate, nitrate) %>% # select two columns of interest
tidyr::drop_na %>% # drop missing observations
nrow() # get the number of rows with no missing data
) %>%
sum() # sum the total number of rows not missing among all files
}
As mentioned, avoid building objects in a loop. Instead, consider building a list of data frames from each csv then call rbind once. In fact, even consider base R (i.e., tinyverse) for all your needs:
complete <- function(directory, id = 1:332){
# create a list of files
files_full <- list.files(directory, full.names = TRUE)
# create a list of data frames
df_list <- lapply(files_full[id], read.csv)
# build a single data frame with nob column
dat <- transform(do.call(rbind, df_list),
nob = ifelse(!is.na(sulfate) & !is.na(nitrate), 1, 0)
)
return(sum(dat$nob))
}

R Project: Loop calculation on files in folder

I have a bunch of text files in a folder. The script should read the whole list and do a calculation on each file. The result should be written in a "results.txt" file. I also want to have the name of the processed file in the results table and the result. But this line is still missing. But I don't know how to add it.
I am pretty far, but now I am stuck:
library(data.table)
ldf <- list() # creates a list
list_txt <- dir(pattern = "*.txt")
for (k in 1:length(list_txt)){
ldf[[k]] <- fread(list[k], select = c("Count"))
br=c(0,1,3,9,15,500) #Set breaks
bins=c(0,1,2,3,4) #Set bins
freq=hist(ldf[[k]]$Count, breaks=br, plot=FALSE)
df=data.frame(bins, frequency=freq$counts)
df$pct <- df$frequency*100 / sum(df$frequency)
df$pct<-round(df$pct,digits=0)
df$hscore<-df$pct * df$bins
hscore=sum(df$hscore)
cat(df$hscore,file="results.txt",sep="\n")
}
The error code I get is:
Error in hist.default(ldf[[k]]$Count, breaks = br, plot = FALSE) :
some 'x' not counted; maybe 'breaks' do not span range of 'x'
Any suggestions?
I tried a bit more and came to this code, wich works without error messages:
library(data.table)
ldf <- list() # creates a list
list_txt <- dir(pattern = "*.txt")
for (k in 1:length(list_txt)){
ldf[[k]] <- fread(list_txt[k], select = c("Count"))
br=c(-Inf,1,3,9,15,Inf) #Set breaks
bins=c(0,1,2,3,4) #Set bins
freq=hist(ldf[[k]]$Count, breaks=br, plot=FALSE)
df=data.frame(bins, frequency=freq$counts)
df$pct <- df$frequency*100 / sum(df$frequency)
df$pct<-round(df$pct,digits=0)
df$hscore<-df$pct * df$bins
hscore=sum(df$hscore)
cat(df$hscore,file="results.txt",sep="\n")
}
BUT, it creates a results.txt file with only 5 entries. When I call the list_txt, there are 164 files.
What could be the problem?

Error writing raster in for loop

-
I am beginner in R and I have made a script where I convert a .grd file into .tif file in a for loop (I have quite a lot of .grd files that should be converted). The script is as you can see below.
# Set your work directory
setwd("xxx")
library(rgdal)
library(sp)
library(raster)
# Data: .grd files in order to reproduce the code below
g1 <- raster(ncol=10, nrow=10)
vals <- 1:100
g1 <- setValues(g1, vals)
writeRaster(g1, filename="TEST_G1.grd", overwrite=TRUE)
g2 <- raster(ncol=50, nrow=50)
vals <- 1
g2 <- setValues(g2, vals)
writeRaster(g2, filename="TEST_G2.grd", overwrite=TRUE)
# Convert .grd to geotif in a for loop
rlist <- list.files(pattern=".grd$")
for (i in rlist)
{
#Read raster
x <- raster(rlist[i])
#make new file name
filename <- rlist[i]
n <- unlist(strsplit(filename, split='.', fixed=TRUE))[1]
name <- paste(n, ".tif", sep="")
#write the raster as GTiff
writeRaster(x, filename=name, format="GTiff", overwrite=TRUE)
}
I have run all sentences in the script separately and managed to get geotif file. However, when I run the lobe I get the following error:
Error in .local(.Object, ...) :
Error in .rasterObjectFromFile(x, band = band, objecttype = "RasterLayer", :
Cannot create a RasterLayer object from this file. (file does not exist)
The file does exist, and when I just run the last line separately the tif file is created... so I don't understand what is wrong with the loop.
If I e.g. write "for (i in 1:2)" instead of "for (i in rlist)" it works. but I would prefer not to count the number of files in every directory, which is the reason I was trying to use "for (i in rlist)"
Thanks a lot for you help!

undefined columns selected error - works with 1 csv file gives error with more

I have a few functions that I created to to help me analyze some data. My main function starts by binding all .csv files that are in a folder and then calls other functions to perform various tasks, it looks like this:
x <- function (directory){
files <- list.files(directory, full.names = TRUE)
num_files <-length(files)
options(stringsAsFactors = TRUE)
df <- data.frame()
for (i in 1:num_files) {
df_data <- read.csv(files[i])
df <- rbind(df, df_data)
}
df$Stauts <- "ba"
ab_cid <- input() # simple input function see below for input functin code
df$Status[df$cid %in% bad_cid] <- "ab"
df$Status <- as.factor(df$Status)
bad_var_list <- prep_dataset(df)
df <- df[,!(names(df) % in% bad_var_list)]
df
}
Here is the input function:
input <- function(){
x <- readline("Enter a comma seperated list of cids with ab status :")
x <- as.numeric(unlist(strsplit(x, ",")))
x
}
Another function is later called to clean up the data to meet some requirements that I have
The code in the prep_dataset function starts out like this, it gives me an error in the last line shown here:
prep_dataset(data){
df<- subset(df, Status == 'ab')
listfactors <- sapply(df2, is.factor)
df_factors <- df[,listfactors]
df_bad <- df_factors[,(colSums(df_factors == "") >= nrow(df_factors) * .20)]
......
}
When i run my function x('Folder Name') if there is one .csv file in the folder it runs fine, I get the desired results. However if there is more than one file I get this:
Error in `[.data.frame`(df_factors, , (colSums(df_factors == :
undefined columns selected
Called from: `[.data.frame`(df_factors, , (colSums(df_factors == "") >= nrow(df_factors)*0.2))
I took two csv files and manually put them into one and than I compared the data frames that get created when I combined them vs when they are combined in the for loop and they look identical - no clue whats going and why this error message keeps popping up.
So I discovered that for some reason when the read.csv file ran on a folder that has 2 or more csv files it would replace empty "" columns with NAs for what looks like only columns that were all empty, this happened once the prep_dataset() function ran this df<- subset(df, Status == 'ab'). Also, this would only happen if the folder had multiple csv files and would not happen with a single csv file - I'm not really sure why.
But to fix the issue I had to do get rid of the NAs by doing the following:
char <- sapply(df_factors, as.character)
char[is.na(char)] <- ""
df_char <- as.data.frame(char)
Now when the function continues and runs
df_bad <- df_char[,(colSums(df_char == "") >= nrow(df_char) * .20)]
The undefined columns selected error does not happen anymore.

Printing out the mean value of the specific column

I have the following code in air which is aimed at doing the following: provide the mean of the column i specify of all .csv files in a directory
meanColumn <- function(directory, pol, id=1:2){
getfiles <- list.files(directory, full.names=TRUE)
for(i in id) {
print("hello") #just to check whether looping goes fine
file<- read.csv(getfiles[i]) #store all results
colPol <- file[, pol] #get the second column of the .csv file
x <- mean(colPol) #get the mean of this column
print(x) #print it :)
}
getfiles #just for checking
}
When I run this function like this meanColumn("Assignment1", 2) I get the following output however.
Any thoughts on where I go wrong?
I already got the answer. I forgot to filter out the "NA" values. This works:
meanColumn <- function(directory, pol, id=1:2){
getfiles <- list.files(directory, full.names=TRUE)
for(i in id) {
print("hello")
file<- read.csv(getfiles[i])
col_mean_with_NA <- file[, pol]
colPol <- col_mean_with_NA[!is.na(col_mean_with_NA)]
x <- sum(colPol)
print(x)
}
getfiles
}

Resources