How to find mean of values over multiple csv files in R - r

I am trying to write an R function that can loop through 300 .csv files and calculate the mean of columns. Can someone give me high level guidance on how to do this? All the files are in the same directory and have the same column headings. The mechanics of it shouldn't be that hard, but I am having a hard time finding enough documentation on R syntax to do this. Please help. Thank you.

There are a few ways to load files into R, but probably the easiest is using the list.files function. Your code would look something like this:
setwd("") # set to your directory
files <- list.files() #load the file names into the workspace
for(i in sequence(length(files))){
yourData <- read.scv(files[i])
yourMeans <- apply(yourData, 1, mean)
#save your means in some meaningful way from each csv.
}

Related

reading multiple csv files using data.table doesn't work when given files path, possible bug?

I want to read multiple csv files where I only read two columns from each. So my code is this:
library(data.table)
files <- list.files(pattern="C:\\Users\\XYZ\\PROJECT\\NAME\\venv\\RawCSV_firstBatch\\*.csv")
temp <- lapply(files, function(x) fread(x, select = c("screenNames", "retweetUserScreenName")))
data <- rbindlist(temp)
This yields character(0). However when I move those csv files out to where my script is, and change the files to this:
files <- list.files(pattern="*.csv")
#....
My dir() output is this:
[1] "adjaceny_list.R" "cleanusrnms_firstbatch"
[3] "RawCSV_firstBatch" "username_cutter.py"
everything gets read. Could you help me track down what's exactly going on please? The folder that contains these csv files are in same directory where the script is. SO even if I do patterm= "RawCSV_firstBatch\\*.csv" same problem.
EDIT:
also did:
files <- list.files(path="C:\\Users\\XYZ\\PROJECT\\NAME\\venv\\RawCSV_firstBatch\\",pattern="*.csv")
#and
files <- list.files(pattern="C:/Users/XYZ/PROJECT/NAME/venv/RawCSV_firstBatch/*.csv")
Both yielded empty data frame.
#NelsonGon mentioned a workaround:
Do something like: list.files("./path/folder",pattern="*.csv$") Use ..
or . as required.(Not sure about using actual path). Can also utilise
~
So that works. Thank you. (sorry have 2 days limit before I tick this as answer)

Create data tables using SPSS in R

Using expss package I am creating cross tabs by reading SPSS files in R. This actually works perfectly but the process takes lots of time to load. I have a folder which contains various SPSS files(usually 3 files only) and through R script I am fetching the last modified file among the three.
setwd('/file/path/for/this/file/SPSS')
library(expss)
expss_output_viewer()
#get all .sav files
all_sav <- list.files(pattern ='\\.sav$')
#use file.info to get the index of the file most recently modified
pass<-all_sav[with(file.info(all_sav), which.max(mtime))]
mydata = read_spss(pass,reencode = TRUE) # read SPSS file mydata
w <- data.frame(mydata)
args <- commandArgs(TRUE)
Everything is perfect and works absolutely fine but it generally takes too much time to load large files(112MB,48MB for e.g) which isn't good.
Is there a way I can make it more time-efficient and takes less time to create the table. The dropdowns are created using PHP.
I have searched for this and found another library called 'haven' but I am not sure whether that can give me significance as well. Can anyone help me with this? I would really appreciate that. Thanks in advance.
As written in the expss vignette (https://cran.r-project.org/web/packages/expss/vignettes/labels-support.html) you can use in the following way:
# we need to load packages strictly in this order to avoid conflicts
library(haven)
library(expss)
spss_data = haven::read_spss("spss_file.sav")
# add missing 'labelled' class
spss_data = add_labelled_class(spss_data)

How can I perform the same set of commands separately for each file in the same file path?

I am trying to deal with extracting a subset from multiple .grb2 files in the same file path, and write them in a csv. I am able to do it for one (or a few) by using the following set of commands:
GRIB <- brick("tmp2m.1989102800.time.grb2")
GRIB <- as.array(GRIB)
readGDAL("tmp2m.1989102800.time.grb2")
tmp2m.6hr <- GRIB[51,27,c(261:1232)]
str(tmp2m.6hr)
tmp2m.data <- data.frame(tmp2m.6hr)
write.csv(tmp2m.data,"tmp1.csv")
The above set of commands extract, in csv, temperature values for specific latitude "51" and longitude "27", as well as for a specific time range "c(261:1232)".
Now I have hundreds of these files (with different file names, of course) in the same directory and I want to do the same for all. As you know, better than me, I cannot do this to one by one, changing the file name each time.
I have struggled a lot with this, but so far I did not manage to do it. Since I am new in R, and my knowledge is limited, I would very much appreciate any possible help with this.
The simplest way would be to use a normal for loop:
path <- "your file path here"
input.file.names <- dir(path, pattern =".grb2")
output.file.names <- paste0(tools::file_path_sans_ext(file.names),".csv")
for(i in 1:length(file.names)){
GRIB <- brick(input.file.names[i])
GRIB <- as.array(GRIB)
readGDAL(input.file.names[i]) # edited line
tmp2m.6hr <- GRIB[51,27,c(261:1232)]
str(tmp2m.6hr)
tmp2m.data <- data.frame(tmp2m.6hr)
write.csv(tmp2m.data,output.file.names[i])
}
You could of course create the body of the for loop into a function and then use the standard lapply or the map function from purrr.
Note that this code will print out different CSV files. If you want to append the data to a single file then you should check out write.table

Using a for loop to write in multiple .grd files

I am working with very large data layers for a SDM class and because of this I ended up breaking some of my layers into a bunch of blocks to avoid memory restraint. These blocks were written out as .grd files, and now I need to get them read back into R and merged together. I am extremely new to R an programming in general so any help would be appreciated. What I have been trying so far looks like this:
merge.coarse=raster("coarseBlock1.grd")
for ("" in 2:nBlocks){
merge.coarse=merge(merge.coarse,raster(paste("coarseBlock", ".grd", sep="")))
}
where my files are in coarseBlock.grd and are sequentially numbered from 1 to nBlocks (259)
Any feed back would be greatly appreciated.
Using for loops is generally slow in R. Also, using functions like merge and rbind in a for loop eat up a lot of memory because of the way R passes values to these functions.
A more efficient way to do this task would be to call lapply (see this tutorial on apply functions for details) to load the files into R. This will result in a list which can then be collapsed using the rbind function:
rasters <- lapply(list.files(GRDFolder), FUN = raster)
merge.coarse <- do.call(rbind, rasters)
I'm not too familiar with .grd files, but this overall process should at least get you going in the right direction. Assuming all your .grd files (1 through 259) are stored in the same folder (which I will refer to as GRDFolder), then you can try this:
merge.coarse <- raster("coarseBlock1.grd")
for(filename in list.files(GRDFolder))
{
temp <- raster(filename)
merge.coarse <- rbind(merge.coarse, temp)
}

raster images stacked recursively

I have the following problem, please.
I need to read recursively raster images, stack and store them in a file with different names (e.g. name1.tiff, name2.tiff, ...)
I tried the following:
for (i in 10) {
fn <- system.file ("external / test.grd", package = "raster")
fn <-stack (fn) # not sure if this idea can work.
fnSTACK[,, i] <-fn
}
here expect a result of the form:
dim (fnSTACK)
[1] 115 80 10
or something like that
but it didn't work.
Actually, I have around 300 images that I have to be store with different names.
The purpose is to extract time series information (if you know another method or suggestions I would appreciate it)
Any suggestions are welcomed. Thank you in advance for your time.
What I would first do is put all your *.tiff in a single folder. Then read all their names into a list. Stack them and then write a multi-layered raster. I'm assuming all the images have the same extent and projection.
### Load necessary packages
library(tiff)
library(raster)
library(sp)
library(rgdal) #I cant recall what packages you might need so this is probably
library(grid) # overkill
library(car)
############ function extracts the last n characters from a string
############ without counting the last m
subs <- function(x, n=1,m=0){
substr(x, nchar(x)-n-m+1, nchar(x)-m)
}
setwd("your working directory path") # you set your wd to were all your images are
filez <- list.files() # creates a list with all the files in the wd
no <- length(filez) # amount of files found
imagestack <- stack() # you initialize your raster stack
for (i in 1:no){
if (subs(filez[i],4)=="tiff"){
image <- raster(filez[i]) # fill up raster stack with only the tiffs
imagestack <- addLayer(imagestack,image)
}
}
writeRaster(imagestack,filename="output path",options="INTERLEAVE=BAND",overwrite=TRUE)
# write stack
I did not try this, but it should work.
Your question is rather vague and it would have helped if you had provided a full example script such that it could be more easily understood. You say you need to read several (probably not recursively?) raster images (files, presumably) and create a stack. Then you need to store them in files with different names. That sounds like copying the files to new files with a different names, and there are R functions for that, but that is probably not what you intended to ask.
if you have a bunch of files (with full path names or in the working directory), e.g. from list.files()
f <- system.file ("external/test.grd", package = "raster")
ff <- rep(f, 10)
you can do
library(raster)
s <- stack(ff)
I am assuming that you simply need this stack for operations in R (it is an object, but not a file). You can extract the values in many ways (see the help files and vignette of the raster package). If you want a three dimensional array, you can do
a <- as.array(s)
dim(a)
[1] 115 80 10
thanks "JEquihua" for your suggestion, just need to add the initial variable before addLayer ie:
for (i in 1:no){
if (subs(filez[i],4)=="tiff"){
image <- raster(filez[i]) # fill up raster stack with only the tiffs
imagestack <- addLayer(imagestack,image)
}
}
And sorry "RobertH", I'm newbie about R. I will be ask, more sure or exact by next time.
Also, any suggestions for extracting data from time series of MODIS images stacked. Or examples of libraries: "rts ()", "ndvits ()" or "bfast ()"
Greetings to the entire community.
Another method for stacking
library(raster)
list<-list.files("/PATH/of/DATA/",pattern="NDVI",
recursive=T,full.names=T)
data_stack<-stack(list)

Resources