Import file from environment instead of read.table - r

I am using a package of someone else. As you see, there is a ImportHistData term in the function. I want to import the file from environment as rainfall name instead of rainfall.txt. When I replace rainfall.txt with rainfall, I got this error:
Error in read.table(x, header = FALSE, fill = TRUE, na.strings = y) :
'file' must be a character string or connection
So, to import file not as a text, which way should I follow?
Original shape of the function
DisagSimul(TimeScale=1/4,BLpar=list(lambda=l,phi=f,kappa=k,
alpha=a,v=v,mx=mx,sx=NA),CellIntensityProp=list(Weibull=FALSE,
iota=NA),RepetOpt=list(DistAllowed=0.1,FacLevel1Rep=20,MinLevel1Rep=50,
TotalRepAllowed=5000),NumOfSequences=10,Statistics=list(print=TRUE,plot=FALSE),
ExportSynthData=list(exp=TRUE,FileContent=c("AllDays"),file="15min.txt"),
ImportHistData=list("rainfall.txt",na.values="NA",FileContent=c("AllDays"),
DaysPerSeason=length(rainfall$Day)),PlotHyetographs=FALSE,RandSeed=5)
Source of ImportHistData part in the function
ImportHistDataFun(mode = 1, x = ImportHistData$file,
y = ImportHistData$na.values, z = ImportHistData$FileContent[1],
w = TRUE, s = ImportHistData$DaysPerSeason, timescale = 1)

First, check documentation of the package and see if the method (?DisagSimul) allows a data frame in memory to be used for ImportHistData argument instead of reading from an external .txt file.
If the function is set up to only read a file from disk and you do not want to save your rainfall data frame permanently as a file, consider using a tempfile that exists only in the R session or until you use unlink():
# INITIALIZE TEMP FILE
tf <- tempfile(pattern = "", fileext = ".txt")
# EXPORT rainfall to FILE
write.table(rainfall, tf, row.names=FALSE)
...
# USE TEMPFILE IN METHOD
DisagSimul(...
ImportHistData = list(tf, na.values="NA", FileContent=c("AllDays"),

Related

Using tidyverse to read data from s3 bucket

I'm trying to read a .csv file stored in an s3 bucket, and I'm getting errors. I'm following the instructions here, but either it does not work or I am making a mistake and I'm not getting what I'm doing wrong.
Here's what I'm trying to do:
# I'm working on a SageMaker notebook instance
library(reticulate)
library(tidyverse)
sagemaker <- import('sagemaker')
sagemaker.session <- sagemaker$Session()
region <- sagemaker.session$boto_region_name
bucket <- "my-bucket"
prefix <- "data/staging"
bucket.path <- sprintf("https://s3-%s.amazonaws.com/%s", region, bucket)
role <- sagemaker$get_execution_role()
client <- sagemaker.session$boto_session$client('s3')
key <- sprintf("%s/%s", prefix, 'my_file.csv')
my.obj <- client$get_object(Bucket=bucket, Key=key)
my.df <- read_csv(my.obj$Body) # This is where it all breaks down:
##
## Error: `file` must be a string, raw vector or a connection.
## Traceback:
##
## 1. read_csv(my.obj$Body)
## 2. read_delimited(file, tokenizer, col_names = col_names, col_types = col_types,
## . locale = locale, skip = skip, skip_empty_rows = skip_empty_rows,
## . comment = comment, n_max = n_max, guess_max = guess_max,
## . progress = progress)
## 3. col_spec_standardise(data, skip = skip, skip_empty_rows = skip_empty_rows,
## . comment = comment, guess_max = guess_max, col_names = col_names,
## . col_types = col_types, tokenizer = tokenizer, locale = locale)
## 4. datasource(file, skip = skip, skip_empty_rows = skip_empty_rows,
## . comment = comment)
## 5. stop("`file` must be a string, raw vector or a connection.",
## . call. = FALSE)
When working with Python, I can read a CSV file using someting like this:
import pandas as pd
# ... Lots of boilerplate code
my_data = pd.read_csv(client.get_object(Bucket=bucket, Key=key)['Body'])
This is very similar to what I'm trying to do in R, and it works with Python... so why does it not work on R?
Can you point me in the right path?
Note: Although I could use a Python kernel for this, I'd like to stick to R, because I'm more fluent with it than with Python, at least when it comes to dataframe crunching.
I'd recommend trying the aws.s3 package instead:
https://github.com/cloudyr/aws.s3
Pretty simple - set your env variables:
Sys.setenv("AWS_ACCESS_KEY_ID" = "mykey",
"AWS_SECRET_ACCESS_KEY" = "mysecretkey",
"AWS_DEFAULT_REGION" = "us-east-1",
"AWS_SESSION_TOKEN" = "mytoken")
and then once that is out of the way:
aws.s3::s3read_using(read.csv, object = "s3://bucket/folder/data.csv")
Update: I see you're also already familiar with boto and trying to use reticulate so leaving this easy wrapper for that here:
https://github.com/cloudyr/roto.s3
Looks like it has a great api for example the variable layout you're aiming to use:
download_file(
bucket = "is.rud.test",
key = "mtcars.csv",
filename = "/tmp/mtcars-again.csv",
profile_name = "personal"
)
read_csv("/tmp/mtcars-again.csv")

How do I get the file path of a file saved using write.xlsx or another function in R?

I am creating two dataframes and one graph on Rstudio. I wrote code to transfer them to an Excel file on different sheets, but each time I have to choose the file path using file.choose(). Is it possible to assign the file path to the variable when saving the file for the first time? If such a method exists, how can it be done?
I would also like to receive comments on how to more easily export my dataframes to an excel file. I shared my codes.
Thank you to everyone.
dataframe1 <- data.frame("A"=1, "B"=2)
dataframe2 <- data.frame("C"=3,"D"=4)
list_of_datasets <- list("Name of DataSheet1" = dataframe1, "Name of Datasheet2" = dataframe2, )
write.xlsx(list_of_datasets, file = "writeXLSX2.xlsx")
dflist <- list("Sonuçlar"=yazılacakdosya0, "Frame"=dtf, "Grafik"="")
edc <- write.xlsx(dflist, file.choose(new = T), colNames = TRUE,
borders = "surrounding",
firstRow = T,
headerStyle = hs)
require(ggplot2)
q1 <- qplot(hist(yazılacakdosya0$Puan))
print(q1)
insertPlot(wb=edc, sheet = "Grafik")
saveWorkbook(edc, file = file.choose(), overwrite = T)
Just save the file path before you call saveWorkbook
file = file.choose()
saveWorkbook(edc, file = file, overwrite = T)

How do I apply the same action to all Excel Files in the directory?

I need to shape the data stored in Excel files and save it as new .csv files. I figured out what specific actions should be done, but can't understand how to use lapply.
All Excell files have the same structure. Each of the .csv files should have the name of original files.
## the original actions successfully performed on a single file
library(readxl)
library("reshape2")
DataSource <- read_excel("File1.xlsx", sheet = "Sheet10")
DataShaped <- melt(subset(DataSource [-(1),], select = - c(ng)), id.vars = c ("itemname","week"))
write.csv2(DataShaped, "C:/Users/Ol/Desktop/Meta/File1.csv")
## my attempt to apply to the rest of the files in the directory
lapply(Files, function (i){write.csv2((melt(subset(read_excel(i,sheet = "Sheet10")[-(1),], select = - c(ng)), id.vars = c ("itemname","week"))))})
R returns the result to the console but doesn't create any files. The result resembles .csv structure.
Could anybody explain what I am doing wrong? I'm new to R, I would be really grateful for the help
Answer
Thanks to the prompt answer from #Parfait the code is working! So glad. Here it is:
library(readxl)
library(reshape2)
Files <- list.files(full.names = TRUE)
lapply(Files, function(i) {
write.csv2(
melt(subset(read_excel(i, sheet = "Decomp_Val")[-(1),],
select = -c(ng)),id.vars = c("itemname","week")),
file = paste0(sub(".xlsx", ".csv",i)))
})
It reads an Excel file in the directory, drops first row (but headers) and the column named "ng", melts the data by labels "itemname" and "week", writes the result as a .csv to the working directory attributing the name of the original file. And then - rinse and repeat.
Simply pass an actual file path to write.csv2. Otherwise, as denoted in docs ?write.csv, the default value for file argument is empty string "" :
file: either a character string naming a file or a connection open for writing. "" indicates output to the console.
Below concatenates the Excel file stem to the specified path directory with .csv extension:
path <- "C:/Users/Ol/Desktop/Meta/"
lapply(Files, function (i){
write.csv2(
melt(subset(read_excel(i, sheet = "Sheet10")[-(1),],
select = -c(ng)),
id.vars = c("itemname","week")),
file = paste0(path, sub(".xlsx", ".csv", i))
)
})

Seasonally adjust several series on R

I built a function to seasonally adjust Brazilian economic data, due to Carnival.
But this way, I can adjust only one series at a time, in my clipboard.
I've been trying, then, to adjust more series (copy several series one next to the other) but unsuccessfully.
Can you help me?
Thanks!
seasbrasil<-function(y0,m0,yT,mT) {carnaval<-c(as.Date("2000-03-07"),as.Date("2001-02-27"),as.Date("2002-02-12"),as.Date("2003-03-04"),as.Date("2004-02-24"),as.Date("2005-02-08"),as.Date("2006-02-28"),as.Date("2007-02-20"),as.Date("2008-02-05"),as.Date("2009-02-24"),as.Date("2010-02-16"),as.Date("2011-03-08"),as.Date("2012-02-21"),as.Date("2013-02-12"),as.Date("2014-03-04"),as.Date("2015-02-17"),as.Date("2016-02-09"))
library(seasonal)
Sys.setenv(X13_PATH = "C:\\Users\\gfernandes\\Documents\\x13as")
checkX13()
data(holiday)
carnaval.ts <- genhol(carnaval, start = -1, end = 2, center = "calendar")
x <- read.table(file = "clipboard", sep = "\t", header=FALSE)
x <-ts(x,start=c(y0,m0),end=c(yT,mT),frequency=12)
xsa <-seas(x,xreg=carnaval.ts,regression.usertype="holiday",x11=list())
summary(xsa)
plot(xsa)
xsa<-final(xsa)
write.csv(xsa, file = "C:\\Users\\gfernandes\\Documents\\ajuste.csv")
getwd()
}
Using the clipboard to read data is not a scaleable solution instead would suggest
creating a list of file names using list.files and applying your function on this list.
#Load all libraries first
library(seasonal)
#Define your data directory
DIR="C:\\path-to-your-dir\\"
#Replace .dat with file extension applicable
# set recursive = TRUE if you have tree directory structure
TS_fileList <- list.files(path=DIR,pattern=".dat",full.names = TRUE,recursive=FALSE)
#define carnival dates
carnaval<-c(
"2000-03-07","2001-02-27","2002-02-12",
"2003-03-04","2004-02-24","2005-02-08",
"2006-02-28","2007-02-20","2008-02-05",
"2009-02-24","2010-02-16","2011-03-08",
"2012-02-21","2013-02-12","2014-03-04",
"2015-02-17","2016-02-09")
#format carnival variable as date
carnaval <- as.Date(carnaval,format="%Y-%m-%d")
data(holiday)
carnaval.ts <- genhol(carnaval, start = -1, end = 2, center = "calendar")
Function:
fn_adj_seasbrasil <-function(
filePath = "C:\\path-to-your-dir\\file1.dat",
carnivalTS = carnaval.ts,
y0,
m0,
yT,
mT) {
#moved few operations outside this function
#since they are common to all files
#instead now the carnival series is
#input as parameter
x <- read.table(file = filePath, sep = "\t", header=FALSE)
x <- ts(x,start=c(y0,m0),end=c(yT,mT),frequency=12)
xsa <-seas(x,xreg = carnivalTS,regression.usertype="holiday",x11=list())
summary(xsa)
plot(xsa)
xsa<-final(xsa)
#save seasonally adjusted file with different suffix
fileName = tail(unlist(strsplit(filePath,sep="/")),1)
suffix = "adjuste"
#for adjusted time series of file1.dat
# the name will be adjuste_file1.dat
newFilePath = head(unlist(strsplit(filePath,sep="/")),1)
newFileName = paste0(newFilePath,"/",suffix,"_",fileName)
write.csv(xsa, file = newFileName)
cat(paste0("Saved file:",newFileName,"\n"))
}
#define y0,m0,yT,mT and then for all files call the function
lapply(TS_fileList,function(x) fn_adj_seasbrasil(filePath = x,carnivalTS = carnaval.ts, y0,m0,yT,mT) )
This might not work for your in first pass but can be resolved by familiarising yourself
with tutorials like these ATS UCLA and also reading
function help of ?read.table,?list.files , ?strsplit etc.

How to combine many csv files into a large csv without holding the whole object in RAM

I am working on combining csv files into one large csv file that will not be able to fit into my machine's RAM. Is there anyway to go about doing that in R? I realize that I could load each individual csv file into R and append the file to an existing database table but for quirky reasons I'm looking to end up with a large csv file.
Try to read each csv file one by one and write out with write.table and option append = T.
Something like this:
read one csv file;
write.table(..., append = T) to the final csv file;
remove the table with rm();
gc().
Repeate until all files are written out.
You can use the option append = TRUE
first <- data.frame(x = c(1,2), y = c(10,20))
second <- data.frame(c(3,4), c(30,40))
write.table(first, "file.csv", sep = ",", row.names = FALSE)
write.table(second, "file.csv", append = TRUE, sep = ",", row.names = FALSE, col.names = FALSE)
First create 3 test files and then create a variable Files containing their names. We used Sys.glob to do get the vector of file names but you may need to modify this statement. Then define outFile as the name of the output file. For each component of Files read in the file with that name and write it out. If it is the first file then write it all out and if it is a subsequent file write it all except for the header being sure to use append = TRUE. Note that L is overwritten each time a file is read in so that only one file takes up space at a time.
# create test files using built in data frame BOD
write.csv(BOD, "BOD1.csv", row.names = FALSE)
write.csv(BOD, "BOD2.csv", row.names = FALSE)
write.csv(BOD, "BOD3.csv", row.names = FALSE)
Files <- Sys.glob("BOD*.csv") # modify as appropriate
outFile <- "out.csv"
for(f in Files) {
L <- readLines(f)
if (f == Files[1]) cat(L, file = outFile, sep = "\n")
else cat(L[-1], file = outFile, sep = "\n", append = TRUE)
}
# check that the output file was written properly
file.show(outFile)
The loop could alternately be replaced with this:
for(f in Files) {
d <- read.csv(f)
first <- f == Files[1]
write.table(d, outFile, sep = ",", row.names = FALSE, col.names = first, append = !first)
}

Resources