R extract a single data from mutliple csv files - r

I have multiple csv files (more than 100). Each file represents a time period. In each file, there are 29 lines that need to be skiped (text lines). At line 30, I have a matrix of temperature as a function of latitude and longitude coordinates. For example: at latitude 68.80 and longitude 48.40268, the temperature is 5.94.
So, I would like to extract the temperature at a specific combination of latitude and longitude coordinates for every time period I have (for every file).
I can write the code for a single file, but I'm affraid I don't know how to do it in a loop or how to make it faster.
Any help is appreciated, thank you. Sorry if this is similar to other questions, I read what I could find on that topic, but it did not seem to fit for my problem.
The code for one file:
filenames <- list.files(path="E:/Documents...")
fileone <- read.csv(filenames[1], skip=29, header=T, sep=";")
names(fileone) <- c("Lat", "68.88", "68.86", "68.85", "68.83", "68.82", "68.80", "68.79", "68.77", "68.76", "68.74", "68.73", "68.71")
Tempone <- fileone[which(fileone$Lat==48.40268), "68.80"]

Assuming data size relative to your system are small enough (relative to your system)
to fit into memory all together, you can accomplish this in one shot using lists
## Grab the filienames, just like you're doing
filenames <- list.files(path="E:/Documents...")
## Assuming all columns have the same column names
c.nms <- c("Lat", "68.88", "68.86", "68.85", "68.83", "68.82", "68.80", "68.79", "68.77", "68.76", "68.74", "68.73", "68.71")
## Import them all in one shot, as a list of data.frames
AllData <- lapply(filenames, read.table,
skip=29, header=TRUE, sep=";", row.names=NULL, col.names=c.nms)
## Then to get all of your rows
PulledRows <-
lapply(AllData, function(DF)
DF[fileone$Lat==48.40268, "68.80"]
)
If you are pulling different lat/longs per file, you can use mapply with a vector/list of lat/longs

Related

How do I merge RasterBrick columns and rename after extract from NetCDF data ? [R language]

everyone I am just a beginner of R,so following code or question might misunderstand R function. Hopes you guys don't mind it.
I got hundreds ERA Interim Daily Precipitation data from ECMWF.
Each monthly NetCDF file contain twice per day precipitation and extract them by Longitude & Latitude.
Finally convert to CSV file,but itself store (12:00 ;00:00) two columns for
same days rainfall.
Image of output CSV
Ideally ,I would like to obtain single day rainfall for whole year or month. But, don't know how to do it in R.
Q1: Is there any library or function support merge columns names and keep
same dates?
Q2: However, I am not sure lubridate library can accomplish or others will be
better.
-----Thanks for any answers---------
If somebody want to view NetCDF file and .csv file ,you can get from my google driver:
https://drive.google.com/drive/folders/1ogQp_-MlvmJCDmzm38dXHbUmWAM6hi1v?usp=sharing
P.S In Google Drive SumMonth.csv is extract result
R Code:
library(raster)
library(ncdf4)
f <- list.files("C:/Users/asus/Downloads/extract",
pattern = "*.nc", full.names = TRUE)
ncloop <-lapply(f,nc_open)# Read whold folder nc.file
brick <-lapply(f,brick,varname = "tp")
brick2 <-stack(brick)# Combine all month
dim(brick2)#check dim right
##### input location file with Lon&Lat
pointCoordinates <- read.table("C:/Users/asus/Downloads/pointFile.csv",
header = TRUE, sep = ",",stringsAsFactors=FALSE)
coordinates(pointCoordinates) <- ~Long + Lat # Assign XY for extract
rasValue=extract(brick2, pointCoordinates)#extract
combinePointValue=cbind(pointCoordinates,rasValue)#construct database
## Outout as csv
cc <- as.data.frame(combinePointValue)
write.csv(cc,file ="SumMonth.csv")

How to write a loop for creating cropped raster for every id of a shapefile with a raster base?

I'm still new to R and don't know how to create a loop for my workprocess to make it more efficient.
I have a Digital Elevation Model (raster Barrow_5m.tif), a shapefile for lakes and buffer with 10 iDs in a row of the table each.
In the script below I created a new raster file for all values of the lake and the buffer shape file with the data from the DEM raster. This works fine.
setwd("...")
Barrow_5m <- raster("Barrow_5m.tif")
Barrow_DTLB <- st_read("Barrow_DTLB.shp")
Barrow_DTLB_Buffer <- st_read("Barrow_DTLB_BufferOUT.shp")
Barrow_lake <- crop(Barrow_5m, extent(Barrow_DTLB))
raster_lake <- rasterize(Barrow_DTLB, Barrow_lake, mask = TRUE)
Barrow_buffer <- crop(Barrow_2m, extent(Barrow_DTLB_Buffer))
raster_buffer <- rasterize(Barrow_DTLB_Buffer, Barrow_buffer, mask = TRUE)
writeRaster(raster_lake, "raster_lake.tif")
writeRaster(raster_buffer, "raster_buffer.tif")
But now I want to have a raster file for every id of the lake and the buffer shapefile seperately, so 2x10 files.
I thought it's best to write a loop for this, but my skills are not enough so far to do this.
Also other questions didn't bring the solution so far. I tried to help me with this.
Alternatively I could use my end product tif from the script above and undo this in files for every ID.
I want to write the loop and not do it by hand for all the IDs of the shapefiles, because afterwards I am going to do the same with an even bigger shapefile of more values.
I found a solution now, by extracting data by the ID.
It creates a largelist with 11 elements and all values of each id, which is sufficient for my further work. You can also directly creat the mean, max, min, etc values of each element (so each ID).
k <- Barrow_DTLB$ID #k= number of rows
LakesA <- extract(raster_lakeA, Barrow_DTLB[k, ])
LakesA_mean <- extract(raster_lakeA, Barrow_DTLB[k, ], fun=mean)
Maybe this solution is also helpful for a few, who already viewed the question.
I think this should work:
for (i in unique(raster_lake)){
r <- raster_lake
r[!(values(r) == i)] <- NA
r <- trim(r)
writeRaster(r, paste0("raster_lake_", i, ".tif"))
}

organising a .csv file in R

I'm programming a script for the calculation of cover around points in R.
I have two inputs: an IMG raster file, and a .csv with all the points.
I've used this script:
library(raster)
library(rgdal)
#load in raster and locality data
map <- raster('map.IMG')
sites <- read.csv('points.csv', header=TRUE)
#convert lat/lon to appropirate projection
coordinates(sites) <- c("X", "Y")
proj4string(sites) <- CRS("+init=epsg:27700")
#extract values to points
Landcover<-extract (map, sites, buffer=2000)
extraction <- lapply(Landcover, function(serial) prop.table(table(serial)))
# Write .csv file
lapply(extraction, function(x) write.table( data.frame(x), 'test2.csv' , append= T, sep=',' ))
I get a .csv file in my map, but the data isn't organised in the way I would like it to be.
There a three columns in the csv file. One with 'x', one with 'Freq' (Which I think is the code for every class in my image) and one with the cover part, somewhere between 0-1. See the image included.Image
I want to have on the rows the serial and classes, and under that the correct serial with it's coverage.
Also every point isn't named, so I can't see which is which. In the points.csv I have for example a 'serial' code for every point, which i would like to use for that.
Can somebody steer me in the right direction?
I hope I have been clear with my questions, thank in advance!

Populating a matrix (or a DF or a DT) with a loop from a folder containing txt files

I wrote my first code in R for treating some spectra [basically .txt files with a Xcol (wavelength) and Ycol (intensity)].
The code works for single files, provided I write the file name in the code. Here the code working for the first file HKU47_PSG_1_LW_0.txt.
setwd("C:/Users/dd16722/R/Raman/Data")
# import Spectra
PSG1_LW<-read.table("HKU47_PSG_1_LW_0.txt")
colnames(PSG1_LW)[colnames(PSG1_LW)=="V2"] <- "PSG1_LW"
PSG2_LW<-read.table("HKU47_PSG_2_LW_all_0.txt")
colnames(PSG2_LW)[colnames(PSG2_LW)=="V2"] <- "PSG2_LW"
#Plot 2 spectra and define the Y range
plot(PSG1_LW$V1, PSG1_LW$PSG1_LW, type="l",xaxs="i", yaxs="i", main="Raman spectra", xlab="Raman shift (cm-1)", ylab="Intensity", ylim=range(PSG1_LW,PSG2_LW))
lines(PSG2_LW$V1, PSG2_LW$PSG2_LW, col=("red"), yaxs="i")
# Temperature-excitation line correction
laser = 532
PSG1_LW_corr <- PSG1_LW$PSG1_LW*((10^7/laser)^3*(1-exp(-6.62607*10^(-34)*29979245800*PSG1_LW$V1/(1.3806488*10^(-23)*293.15)))*PSG1_LW$V1/((10^7/laser)-PSG1_LW$V1)^4)
PSG1_Raw_Corr <-cbind (PSG1_LW,PSG1_LW_corr)
lines(PSG1_LW$V1, PSG1_LW_corr, col="red")
plot(PSG1_LW$V1, PSG1_Raw_Corr$PSG1_LW_corr, type="l",xaxs="i", yaxs="i", xlab="Raman shift (cm-1)", ylab="Intensity")
Now, it's time for another little step forward. In the folder, there are many spectra (in the code above I reported the second one: HKU47_PSG_2_LW_all_0.txt) having again 2 columns, same length of the first file. I suppose I should merge all the files in a matrix (or DF or DT).
Probably I need a loop as I need a code able to check automatically the number of files contained in the folder and ultimately to create an object with several columns (i.e. the double of the number of the files).
So I started like this:
listLW <- list.files(path = ".", pattern = "LW")
numLW <- as.integer(length(listLW))
numLW represents the number of iterations I need to set. The question is: how can I populate a matrix (or DF or DT) in order to have in the first 2 columns the first txt file in my folder, then the second file in the 3rd and 4th columns etc? Considering that I need to perform some other operations as I showed above in the code.
I have been reading about loop in R since yestarday but actually could not find the best and easy solution.
Thanks!
You could do something like
# Load data.table library
require(data.table)
# Import the first file
DT_final <- fread(file = listLW[1])
# Loop over the rest of the files and use cbind to merge them into 1 DT
for(file in setdiff(listLW, listLW[1])) {
DT_temp <- fread(file)
DT_final <- cbind(DT_final, DT_temp)
}

Multple csv files reading in a loop and calculating column wise average in R

I have 3 csv files, I have three columns in all the three files( Maths, Physics and Chemistry) and marks of all the students. I created a loop to read all the files and saved in a dataframe as follows. In every file line numbers 1,2,4,5 need to be skipped.
files <- list.files(pattern = ".csv")
for(i in 1:length(files)){
data <- read.csv(files[i], header=F, skip=2) # by writing skip=2 I could only skip first two lines.
View(data)
mathavg[i] <- sum(as.numeric(data$math), na.rm=T)/nrow(data)
}
result <- cbind(files,mathavg)
write.csv(result,"result_mathavg.csv")
I could not able to calculate the average of math column in all the three files.
Like this I need to calculate for all the three subjects across three files.
any help????
This should work,
files <- c("testa.csv","testb.csv","testc.csv")
list_files <- lapply(files,read.csv,header=F,stringsAsFactors=F)
list_files <- lapply(list_files, function(x) x[-c(1,2,4,5),])
mathav <- sapply(list_files,function(x) mean(as.numeric(x[,2]),na.rm=T))
result <- cbind(files,mathav)
write.csv(result,"result_mathavg.csv",row.names=F)
I didn't have access to your files, so I made up three and called them 'files'. I used the lapply function to load the files, then to remove the lines that you didn't want. I got the average using the sapply function then I went back to your code to get result, etc.
mathavg needs to be initialized before it can be operated on with []. To remove lines 4 and 5 you just need to perform a subsetting operation after reading the data. lines 4 and 5 become 2 and 3 if you skip the first 2 lines when reading the data.
files <- list.files(pattern = ".csv")
mathavg<-''
for(i in 1:length(files)){
data <- read.csv(files[i], header=F, skip=2, stringsAsFactors=F) # by writing skip=2 I could only skip first two lines.
data<-data[-c(2,3),]
mathavg[i] <- mean(as.numeric(data$math), rm.NA=T) ##best to use R's builtin function to calculate the mean
}
result <- cbind(files,mathavg)
write.csv(result,"result_mathavg.csv")

Resources