I built a function to seasonally adjust Brazilian economic data, due to Carnival.
But this way, I can adjust only one series at a time, in my clipboard.
I've been trying, then, to adjust more series (copy several series one next to the other) but unsuccessfully.
Can you help me?
Thanks!
seasbrasil<-function(y0,m0,yT,mT) {carnaval<-c(as.Date("2000-03-07"),as.Date("2001-02-27"),as.Date("2002-02-12"),as.Date("2003-03-04"),as.Date("2004-02-24"),as.Date("2005-02-08"),as.Date("2006-02-28"),as.Date("2007-02-20"),as.Date("2008-02-05"),as.Date("2009-02-24"),as.Date("2010-02-16"),as.Date("2011-03-08"),as.Date("2012-02-21"),as.Date("2013-02-12"),as.Date("2014-03-04"),as.Date("2015-02-17"),as.Date("2016-02-09"))
library(seasonal)
Sys.setenv(X13_PATH = "C:\\Users\\gfernandes\\Documents\\x13as")
checkX13()
data(holiday)
carnaval.ts <- genhol(carnaval, start = -1, end = 2, center = "calendar")
x <- read.table(file = "clipboard", sep = "\t", header=FALSE)
x <-ts(x,start=c(y0,m0),end=c(yT,mT),frequency=12)
xsa <-seas(x,xreg=carnaval.ts,regression.usertype="holiday",x11=list())
summary(xsa)
plot(xsa)
xsa<-final(xsa)
write.csv(xsa, file = "C:\\Users\\gfernandes\\Documents\\ajuste.csv")
getwd()
}
Using the clipboard to read data is not a scaleable solution instead would suggest
creating a list of file names using list.files and applying your function on this list.
#Load all libraries first
library(seasonal)
#Define your data directory
DIR="C:\\path-to-your-dir\\"
#Replace .dat with file extension applicable
# set recursive = TRUE if you have tree directory structure
TS_fileList <- list.files(path=DIR,pattern=".dat",full.names = TRUE,recursive=FALSE)
#define carnival dates
carnaval<-c(
"2000-03-07","2001-02-27","2002-02-12",
"2003-03-04","2004-02-24","2005-02-08",
"2006-02-28","2007-02-20","2008-02-05",
"2009-02-24","2010-02-16","2011-03-08",
"2012-02-21","2013-02-12","2014-03-04",
"2015-02-17","2016-02-09")
#format carnival variable as date
carnaval <- as.Date(carnaval,format="%Y-%m-%d")
data(holiday)
carnaval.ts <- genhol(carnaval, start = -1, end = 2, center = "calendar")
Function:
fn_adj_seasbrasil <-function(
filePath = "C:\\path-to-your-dir\\file1.dat",
carnivalTS = carnaval.ts,
y0,
m0,
yT,
mT) {
#moved few operations outside this function
#since they are common to all files
#instead now the carnival series is
#input as parameter
x <- read.table(file = filePath, sep = "\t", header=FALSE)
x <- ts(x,start=c(y0,m0),end=c(yT,mT),frequency=12)
xsa <-seas(x,xreg = carnivalTS,regression.usertype="holiday",x11=list())
summary(xsa)
plot(xsa)
xsa<-final(xsa)
#save seasonally adjusted file with different suffix
fileName = tail(unlist(strsplit(filePath,sep="/")),1)
suffix = "adjuste"
#for adjusted time series of file1.dat
# the name will be adjuste_file1.dat
newFilePath = head(unlist(strsplit(filePath,sep="/")),1)
newFileName = paste0(newFilePath,"/",suffix,"_",fileName)
write.csv(xsa, file = newFileName)
cat(paste0("Saved file:",newFileName,"\n"))
}
#define y0,m0,yT,mT and then for all files call the function
lapply(TS_fileList,function(x) fn_adj_seasbrasil(filePath = x,carnivalTS = carnaval.ts, y0,m0,yT,mT) )
This might not work for your in first pass but can be resolved by familiarising yourself
with tutorials like these ATS UCLA and also reading
function help of ?read.table,?list.files , ?strsplit etc.
Related
this is probably a basic question but I still need some help to figure out what I should do.
My code is very simple, there are a bunch of calculations to get raster files stored in variables :NDWI,NDWI, VSI, etc. each calculation is based on an satellite image which has a name with the date in it. I will find a way to extract the date and store it in a variable.
In essence, what I want is a part of my code that goes over all the files created to save them as raster in a specific path of my laptop, under this format ( "variable name"_"date".tif. )
What I have for now :
Dossier <- "C:/Users/Perrin/Desktop/INRA/Raster/sentinel/L1C_T31UDR_A019210_20190225T105315/S2A_MSIL1C_20190225T105021_N0207_R051_T31UDR_20190225T125616.SAFE/GRANULE/L1C_T31UDR_A019210_20190225T105315/IMG_DATA"
library(raster)
list.files(Dossier)
Bande1 <- raster(list.files(pattern = "\\B01.jp2$"))
Bande2 <- raster(list.files(pattern = "\\B02.jp2$"))
Bande3 <- raster(list.files(pattern = "\\B03.jp2$"))
Bande4 <- raster(list.files(pattern = "\\B04.jp2$"))
NDVI <- (Bande8-Bande4)/(Bande8+Bande4)
NDWI <- (Bande8A-Bande11)/(Bande8A-Bande11)
NDDI <- (NDVI-NDWII)/(NDVI+NDWII)
writeRaster(NDWI, "C:/Users/Perrin/Desktop/INRA/résultats R/NDWI.tif", overwrite = T)
writeRaster(NDVI, "C:/Users/Perrin/Desktop/INRA/résultats R/NDVI", overwrite = T)
writeRaster(NDDI, "C:/Users/Perrin/Desktop/INRA/résultats R/NDDI.tif", overwrite = T)
Any help will be very much appreciated.
Create a list with your objects and then use Map to loop over it:
rasterList <- list(NDVI = NDVI, NDWI = NDWI, NDDI = NDDI)
filenames<-sprintf("C:/Users/Perrin/Desktop/INRA/résultats R/%s_%s.tif",
names(rasterList), format(Sys.Date(), "%Y%m%d"))
Map(writeRaster, rasterList, filenames, MoreArgs = list(overwrite = TRUE))
A machine I use spits out .csv files named by the time. But I need them named after the plate they were read from, which is contained within the file.
I created list of files:
files <- list.files(path="", pattern="*.csv")
I then tried using a for-loop to first create a data frame from each file containing the 1st row only, then to create a variable from the relevant piece of data, (the desired name), and then renaming the files.
for(x in files)
{
y <- read.csv(x, nrow = 1, header = FALSE, stringsAsFactors = TRUE)
z <- y[2, 2]
file.rename(x, z)
}
It didn't work. After 7 hours of trying (new to R) I am here. Please give simple advice, I have basically zero R experience.
I believe the following for loop does what the question asks for if the new filename is the second column header value.
If it is not, change nmax to the appropriate column number.
fls <- list.files(pattern = '\\.csv')
for(f in fls){
x <- scan(file = f, what = character(), nmax = 2, nlines = 1, sep = ',')
g <- paste0(x[2], '.csv')
file.rename(f, g)
}
I have around 100 csv files in a particular directory and I want to use moving average forecast for all files. Following is the code I have written:
fileNames <- Sys.glob("*.csv")
for (fileName in fileNames) {
abc <- read.csv(fileName, header = TRUE, sep = ",")
library(stats)
library(graphics)
library(forecast)
library(TTR)
library(zoo)
library(tseries)
abc1 = abc[,1]
abc1 = t(t(abc1))
abc1 = as.vector(abc1)
abc2 = ts(abc1, frequency = 12,start = c(2014,1))
abc_decompose = decompose(abc2)
plot(abc_decompose)
forecast = (abc_decompose$trend)
x <- data.frame(abc, forecast)
write.csv (x, file = fileName, row.names=FALSE, col.names=TRUE)
}
Now this code is working perfectly. It's appending a column called forecast to each of the csv files and writing the forecast values in them. The problem is among those 100 csv files some are too small and R is showing following error:
Error in decompose(abc2) : time series has no or less than 2 periods
Actually I'm not interested in files having less than 10 entries but deleting those manually is difficult. Please help.
You can count the rows of each csv file inside your for loop
nrows <- sapply( csvfile, function(f) nrow(read.csv(f)) )
And then do what you do if nrows>10.
In order to delete those files use unlink.
unlink(x, recursive = FALSE, force = FALSE)
I have an R script in which I can read the lines from a .sam file after mapping and I want to parse lines of sam file into strings in order to be easier to manipulate them and create the wig files that I want or to calculate the cov3 and cov5 that I need.
Can you help me please to make this script work faster? How can I parse lines of a huge .sam file into a data frame faster? Here is my script:
gc()
rm(list=ls())
exptPath <- "/home/dimitris/INDEX3PerfectUnique31cov5.sam"
lines <- readLines(exptPath)
pos = lines
pos
chrom = lines
chrom
pos = ""
chrom = ""
nn = length(lines)
nn
# parse lines of sam file into strings(this part is very very slow)
rr = strsplit(lines,"\t", fixed = TRUE)
rr
trr = do.call(rbind.data.frame, rr)
pos = as.numeric(as.character(trr[8:nn,4]))
# for cov3
#pos = pos+25
#pos
chrom = trr[8:nn,3]
pos = as.numeric(pos)
pos
tab1 = table(chrom,pos, exclude="")
tab1
ftab1 = as.data.frame(tab1)
ftab1 = subset(ftab1, ftab1[3] != 0)
ftab1 = subset(ftab1, ftab1[1] != "<NA>")
oftab1 = ftab1[ order(ftab1[,1]), ]
final.ftab1 = oftab1[,2:3]
write.table(final.ftab1, "ind3_cov5_wig.txt", row.names=FALSE,
sep=" ", quote=FALSE)
It's hard to provide a detailed answer without access to sample inputs and outputs (e.g., subsets of your data on dropbox). The Bioconductor solution would convert the sam file to bam
library(Rsamtools)
bam <- "/path/to/new.bam")
asBam("/path/to/old.sam", bam)
then read the data in, perhaps directly (see ?scanBam and ?ScanBamParam to import just the fields / regions of interest)
rr <- scanBam(bam)
or in the end more conveniently
library(GenomicAlignments)
aln <- readGAlignments(bam)
## maybe cvg <- coverage(bam) ?
There would be several steps to do your manipulations, ending with a GRanges object (sort of like a data.frame, but where the rows have genomic coordinates) or related object
## ...???
## gr <- GRanges(seqnames, IRanges(start, end), strand=..., score=...)
The end goal is to export to a wig / bigWig / bed file using
library(rtracklayer)
export(gr, "/path/to.wig")
There are extensive help resources, including package vignettes, man pages, and the Bioconductor mailing list
Iv'e written the following code to import data into R:
## specify where all the data files are stored
DataFolder <- "DataFolder"
## obtain the name of each file in DataFolder
files <- list.files(DataFolder)
## obtain name of each file
LocNames <- unique(sub("^([^.]*).*", "\\1", files)) # this removes the extension and keeps the unique names
for (i in 1:length(LocNames)){
#
car <- read.table(paste(DataFolder, paste(LocNames[i], ".car", sep=""), sep="/"),
header = TRUE, sep = "\t", colClasses=c(dateTime="POSIXct"))
car <- aggregate(car[colnames(car)[2:length(colnames(car))]],list(dateTime = cut(car$dateTime,breaks = "hour")),mean, na.rm = TRUE)
#
light <- read.table(paste(DataFolder, paste(LocNames[i], ".light", sep=""), sep="/"),
header = TRUE, sep = "\t", colClasses=c(dateTime="POSIXct"))
light <- aggregate(light[colnames(light)[2]],list(dateTime = cut(light$dateTime, breaks = "hour")),mean, na.rm = TRUE)
}
So, here I have a DataFolder where all of my files are stored. The files are named according to the location where the data was recorded and the extension of the file given the name of the variable measured. Here we have car sales and light as examples.
From here I would like to reduce the size of the arguments inside of the loop so instead of having to name one variable after the other repeating the same steps I want to only have to write the variable name e.g. car, light and then the outcome of the script shown will be returned.
Please let me know if my intentions have not been clear.
Just use a function. Something to the effect of
## specify where all the data files are stored
DataFolder <- "DataFolder"
## obtain the name of each file in DataFolder
files <- list.files(DataFolder)
readMyFiles <- function(DataFolder, LocNames, extension){
data <- read.table(paste(DataFolder, paste(LocNames[i], ".", extension, sep=""), sep="/"),
header = TRUE, sep = "\t", colClasses=c(dateTime="POSIXct"))
data <- aggregate(data[colnames(data)[2:length(colnames(data))]],list(dateTime = cut(data$dateTime,breaks = "hour")),mean, na.rm = TRUE)
data
}
## obtain name of each file
LocNames <- unique(sub("^([^.]*).*", "\\1", files)) # this removes the extension and keeps the unique names
for (i in 1:length(LocNames)){
car <- readMyFiles(DataFolder, LocNames, ".car")
light <- readMyFiles(DataFolder, LocNames, ".light")
}