Printing out the mean value of the specific column - r

I have the following code in air which is aimed at doing the following: provide the mean of the column i specify of all .csv files in a directory
meanColumn <- function(directory, pol, id=1:2){
getfiles <- list.files(directory, full.names=TRUE)
for(i in id) {
print("hello") #just to check whether looping goes fine
file<- read.csv(getfiles[i]) #store all results
colPol <- file[, pol] #get the second column of the .csv file
x <- mean(colPol) #get the mean of this column
print(x) #print it :)
}
getfiles #just for checking
}
When I run this function like this meanColumn("Assignment1", 2) I get the following output however.
Any thoughts on where I go wrong?

I already got the answer. I forgot to filter out the "NA" values. This works:
meanColumn <- function(directory, pol, id=1:2){
getfiles <- list.files(directory, full.names=TRUE)
for(i in id) {
print("hello")
file<- read.csv(getfiles[i])
col_mean_with_NA <- file[, pol]
colPol <- col_mean_with_NA[!is.na(col_mean_with_NA)]
x <- sum(colPol)
print(x)
}
getfiles
}

Related

how to change the variable at once for a set of file with the name in a list

I have filelist filelist<- c("file1", "file2","file4", "file4"), each one stands for a df in my enviroment, and each file has variable score, I would like to change score value all at once.
what should i do so I can use the filelist to looping the codes to changed the value.
My codes are here, it won't work, but might give you an idea what i am trying to get:
for i to length(filelist){
paste0(fillist[i],"$score") <- mapvalues(paste0(fillist[i],"$score"),
from=value$A,
to=value$B)
}
Thanks. My problem is how to get fileX$score instead of "fileX$score"
Try something like
fileReader <- function(filename){
if(grepl("csv$", filename)) return(read.csv(filename))
# And so on...
}
fileReader <- function(data, filename){
if(grepl("rds$", filename)) saveRDS(data, filename)
# And so on...
}
changeFile <- function(df){
df$score <- correct_value
return(df)
}
for(file in filelist){
temp <- fileReader(file)
temp <- changeFile(temp)
fileSaver(temp, file)
}

R function displays object "obser" but won't write.CSV

My code works and displays the correct values on screen for print(obser) but will not write the .csv and when I tried str(obser) it gave error object 'obser' not found. I have tried various online help and books and the function is correctly written.
If instead of running the function in the console in RStudio, I run line by line in the scrip screen will the csv then be created?
complete <- function(directory= "specdata", id = 1:332){
# directory <- "specdata"
# id <- 1:332
files_list <- list.files(path=directory,full.names=T)[id]
NumOfFiles <- length(files_list)
obser <- data.frame()
indivFile <-data.frame()
nobserv <- vector(mode= "integer", length = NumOfFiles)
for (i in 1:NumOfFiles){
indivFile <- read.csv(files_list[i]) # read data file into df inc NA's
indivFile <- na.omit(indivFile) # removes NA prev file
x <- nrow(indivFile[1])
nobserv[i] <- x
}
x_name <-"ID"
y_name <-"nobs"
obser <- data.frame(id, nobserv)
return(obser) # object returned
print(obser)
wd <- getwd()
setwd(wd)
write.csv(obser, file="Observations2.csv")
}
Try and save the object returned from the function. Ergo:
output<-complete("specdata",id=1:332)
write.csv(output,"Observations2.csv",row.names=F)

Calculate the mean of one column across multiple .csv files How?

I am newbie in R and have got to calculate the mean of column sulf from 332 files. The mean formulas bellow works well with 1 file . The problem comes when I attempt to calculate across the files.
Perhaps the reading all files and storing them in mydata does not work well? Could you help me out?
Many thanks
pollutantmean <- function(specdata,pollutant=xor(sulf,nit),i=1:332){
specdata<-getwd()
pollutant<-c(sulf,nit)
for(i in 1:332){
mydata<-read.csv(file_list[i])
}
sulfate <- (subset(mydata,select=c("sulfate")))
sulf <- sulfate[!is.na(sulfate)]
y <- mean(sulf)
print(y)
}
This is not tested, but the steps are as followed. Note also that this kind of questions are being asked over and over again (e.g. here). Try searching for "work on multiple files", "batch processing", "import many files" or something akin to this.
lx <- list.files(pattern = ".csv", full.names = TRUE)
# gives you a list of
xy <- sapply(lx, FUN = function(x) {
out <- read.csv(x)
out <- out[, "sulfate", drop = FALSE] # do not drop to vector just for fun
out <- out[is.na(out[, "sulfate"]), ]
out
}, simplify = FALSE)
xy <- do.call(rbind, xy) # combine the result for all files into one big data.frame
mean(xy[, "sulfate"]) # calculate the mean
# or
summary(xy)
If you are short on RAM, this can be optimized a bit.
thank you for your help.
I have sorted it out. the key was to use full.names=TRUE in list.files and rbind(mydata, ... ) as otherwise it reads the files one by one and does not append them after each other, which is my aim
See below. I am not sure if it is the most "R" solution but it works
pollutantmean<-function(directory,pollutant,id=1:332){
files_list <- list.files(directory, full.names=TRUE)
mydata <- data.frame()
for (i in id) {
mydata <- rbind(mydata, read.csv(files_list[i]))
}
if(pollutant %in% "sulfate")
{
mean(mydata$sulfate,na.rm=TRUE)
}
else
{if(pollutant %in% "nitrate")
{
mean(mydata$nitrate,na.rm=TRUE)
}
else
{"wrong pollutant"
}
}
}
`

R iterate to read csv files

pollutantmean <- function(id){
n <- length(id)
for (i in 1 : n){
pol <- read.csv('id[i].csv')
}
}
pollutantmean(150:160)
The filenames of csv are like 001.csv, 002.csv, 100.csv etc
001, 002 and 100, these are id, and each csv has a column of id whose content is 1 if the filename is 001.
When I run this code, the console remind me this is no such file id[i].csv
First of all, you don't need a loop. And second, you need to think about how to represent ids.
ids <- sprintf("%03i", 1:999) # 0's are padded at the beginning
filenames <- paste0(ids, ".csv")
results <- lapply(filenames, read.csv) # you get a list of data frames
Alternatively you can read in all csv files in a certain folder using, say:
results <- lapply(dir(pattern="\\.csv$"), read.csv)
The "\.csv$" stuff means that ".csv" has to be at the end of the filename. (see ?regexpr for technicalities)
... and a function that takes a number and gives you back a data frame would look like this:
read.this <- function(i) read.csv(sprintf("%003i.csv",i))
... And now you can lapply it to your desired range:
lapply(101:150, read.this)
The first problem is line 4 and it should be replaced by
pol <- read.csv(paste0(id[i], ".csv"))
If id[i] is within quotes (either simple or double), it's understood litterally by read.csv, eg the function is looking for something named id[i].csv and which explains your error message.
But with such function, pol will be overwritten anyway at every step anyway.
If you really want to wrapup these lines into a function you need to return a list:
pollutantmean <- function(id){
res <- vector("list", length(id))
for (i in 1:n){
res[[i]] <- read.csv(paste0(id[i], ".csv"))
}
}
But a loop here would not be very elegant here, so we can simply:
pollutantmean <- function(id){
lapply(id, function(i) read.csv(paste0(i, ".csv"))
}
Or even (no function option) this should work:
lapply(id, function(i) read.csv(paste0(i, ".csv"))

Why does this work in the R console but not in a function

When I enter this code directly into the R console it works perfectly.
for(i in files) df<-rbind(df,read.csv(paste(directory,i,sep="/")))
If I nest the above code in the below function it throws an
Unexpected input Error
pollutantmean <- function(directory, pollutant, id){
files <- list.files(directory)
df <- data.frame()
newdf <- data.frame()
for(i in files){
df <-rbind(df,read.csv(paste(directory,i,sep="/")))
}
for(j in id){
newdf <- rbind(df[which(df$ID==j),])
}
return(mean(na.omit(newdf[,pollutant])))
}
Another post on stackoverflow suggests it is caused by copy-pasting and newline character mismatch
Error only when running whole block of code
However, I wonder if here it is caused by wrong file path. Maybe the directory variable you are passing already ends is "/" ?
I think your function is ok if you call it right. This works for me:
directory <- "~/Documents/temp/csv/"
pollutantmean <- function(directory, pollutant, id) {
files <- list.files(directory)
df <- data.frame()
newdf <- data.frame()
for(i in files) {
df <-rbind(df,read.csv(paste(directory,i,sep="/")))
}
for(j in id){
newdf <- rbind(df[which(df$ID==j),])
}
return(mean(na.omit(newdf[,pollutant])))
}
pollutantmean(directory, 'val', c('x'))
[1] 2.5
for data that looks like this:
ID,val
x,1
y,2
z,3
and this:
ID,val
x,4
w,2

Resources