I would like to make a function to import CSV-files:
# function to read csv
myfunc <- function(x){
paste0(x,"_RAW") <- read.csv(paste0("C:/Users/User/Documents/R/",x,".csv"))
}
einlesen("myvalue")
This does not work. What make I wrong?
The read.csv() function, which is a variant of read.table, returns a data frame. So, the following version of your function might make more sense:
myfunc <- function(x) {
read.csv(paste0("C:/Users/User/Documents/R/", x, ".csv"))
}
# calling returns a data frame representation of the input CSV file
df <- myfunc("myvalue")
Takes directory path and filename, returns dataframe:
read_input_csv <- function(dirpath, filename){
# Load package:
require("readr")
# Import data:
data.frame(read.csv(paste0(dirpath, "/", filename, ifelse(length(grep(".csv", filename)) < 1, ".csv", ""))),
stringsAsFactors = FALSE)
}
# Works for both:
tmp <- read_input_csv("~/Example_dir", "example_filename")
tmp <- read_input_csv("~/Example_dir", "example_filename.csv")
Related
I want to export a couple of data frames to an excel file using the function write.xlsx() from openxlsx. So, for example the following:
library(openxlsx)
x <- c(1,2,3)
for (i in x) {
name <- paste("sheet", i, sep = "")
assign(name, data.frame(1:4, 2:3))
path <- paste("/some_directory/",name,".xlsx" , sep = "")
write.xlsx(name, file = path)
}
This does create three different data frames with the values 1 to 4 and 2 to 3, those have the right names, it also creates three different excel files with the right names, but the excel files only contain the name instead of the values from the dataframe. Does anyone know how to change that?
you need to keep your data.frame in a variable:
library(glue)
library(openxlsx)
x <- c(1,2,3)
for (i in x) {
name <- paste("sheet", i, sep = "")
df <- data.frame(1:4, 2:3) # This step is missing in your example
assign(name, df)
path <- glue("/some_directory/{name}.xlsx", name = name)
write.xlsx(df, file = path)
}
``
I have 1500+ .txt files called data_{date from 2015070918 to today} all with 7 columns worth of data and variable row amounts. I have managed to use the following code to extract and merge the data into one table:
files = list.files(pattern = ".txt")
myData <- lapply(files, function(x) {
tryCatch(read.table(x, header = F, sep = ','), error=function(e) NULL)
})
Note: there are no headers on the columns, currently I don't even know which variable is which!
At the moment the data only has the date in the file name and therefore it isn't possible to distinguish between each subset of daily data. I want to create an additional column to include the date which I can extract if I can include the filename in an additional column.
I searched on stackexchange and came across this possible solution: Importing multiple .csv files into R and adding a new column with file name
df <- do.call(rbind, lapply(files, function(x) cbind(read.csv(x, header = F, sep = ","), name=strsplit(x,'\\.')[[1]][1])))
However I get the following error:
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input
I have used read.csv on individual files and they have imported without any issues. Any ideas to resolve this would be greatly appreciated!
This should work, if your read.table command is correct:
myData_list <- lapply(files, function(x) {
out <- tryCatch(read.table(x, header = F, sep = ','), error = function(e) NULL)
if (!is.null(out)) {
out$source_file <- x
}
return(out)
})
myData <- data.table::rbindlist(myData_list)
In the past I found that you can spare yourself a lot of headache using data.table::fread instead of read.table. So you could consider this:
myData_list <- lapply(files, function(x) {
out <- data.table::fread(x, header = FALSE)
out$source_file <- x
return(out)
})
myData <- data.table::rbindlist(myData_list)
You can add the tryCatch part back if necessary. Depending on how the files vector looks, basename() might be interesting to use on the column source_file.
You could try using sapply with an index corresponding to each of the files:
files <- list.files(pattern = ".txt")
myData <- lapply(seq_along(files), function(x) {
tryCatch(
{
dt <- read.table(files[x], header = F, sep = ',')
dt$index <- x # or files[x] is you want to use the file name instead
dt
},
error=function(e) { NULL }
)
})
I'm trying to write a custom function to load multiple RDS files and assign them to separate objects within my environment. The code for the function is below:
read_multi_rds <- function(filepath, regrex) {
## grab all files in filepath with regrex provided
files <- list.files(path = filepath, pattern = regrex)
var_names <- character(0)
for(i in 1:length(files)){
name <- substr(files[i], 1, (nchar(files[i])-4)) ## -4 to remove the .rds from the var name
var_names[i] <- name
}
for(i in 1:length(files)){
file <- readRDS(paste0(filepath, files[i]))
assign(var_names[i], file)
}
}
When I test this function by running each bit of the function separately:
filepath <- "I:/Data Sets/"
regrex <- "^cleaned"
files <- list.files(path = filepath, pattern = regrex)
var_names <- character(0)
...followed by...
for(i in 1:length(files)){
name <- substr(files[i], 1, (nchar(files[i])-4)) ## -4 to remove the .rds from the var name
var_names[i] <- name
}
...and finally...
for(i in 1:length(files)){
file <- readRDS(paste0(filepath, files[i]))
assign(var_names[i], file)
}
...the objects are loaded into the environment.
But when I try to load the objects using the function:
read_multi_rds(filepath = "I:/Data Sets/", regrex = "^cleaned")
Nothing loads. I've added the line:
print('done')
at the end of the function to make sure it's running in its entirety, and it seems to be. I'm not getting any error messages or warnings, either.
Is there something I need to add into the function to properly load these items into my environment? Or is this just not possible to do as a function in R? I'm happy just using the code as is within my scripts, but being able to use it as a function would be much neater if I could pull it off.
assign, when used in a function, assigns in the environment of the function. You have to tell assign to assign in the global environment, as the following code illustrates:
data(mtcars)
tmp <- tempfile(fileext = ".csv")
write.csv(mtcars, tmp)
read_wrong <- function(file_name = tmp) {
f <- read.csv(file_name)
assign("my_data", f)
ls() # shows that my_data is in the current environment
}
read_correct <- function(file_name = tmp) {
f <- read.csv(file_name)
assign("my_data", f, envir = .GlobalEnv)
ls() # shows that my_data is not in the current environment
}
read_wrong()
# [1] "f" "file_name" "my_data"
ls() # no my_data
# [1] "mtcars" "read_correct" "read_wrong" "tmp"
read_correct()
# [1] "f" "file_name"
ls()
# [1] "mtcars" "my_data" "read_correct" "read_wrong" "tmp"
Having said that I would not use assign in the first place but instead return a list of data frames from the function.
read_better <- function(file_name = tmp) {
parsed_name <- basename(tmp) # do some parsing here to get a proper object name
f <- read.csv(file_name)
setNames(list(f), parsed_name)
}
all_data <- read_better()
I wrote a simple function:
myfunction <- function(fileName, stringsAsFactors=TRUE,
check.names=FALSE,
skip =1,...) {
Data <- read.delim(fileName, skip = skip,
stringsAsFactors=stringsAsFactors,
check.names = check.names, ...)
cb <- list()
Index <- as.numeric(as.factor(Data[,1]))
cb <- cbind(Data, Index)
return(cb)
}
This function takes the first column of the file named Data, create an Index according to that first column and then cbind the file Data and the index created.
This function will be applied in file named: myfile_00.txt, myfile_01.txt and so on. For one single file it looks like:
myfunction (fileName = "myfile_00.txt")
myfunction (fileName = "myfile_01.txt")
.......
I have around 1000 files so I suppose, the loop can be as from another post:
mytxt <- dir(pattern=".txt")
n <- length(mytxt)
mylist <- vector("list", n)
for(i in 1:n) {
mylist[[i]] <- read.delim(mytxt[i], header = F, skip = 1)
}
then:
d <- lapply(mylist, myfunction)
Unfortunately it does not work... When using lapply an error occurs:
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
'file' must be a character string or connection
Since I' m new in R probably I' m doing mistakes I'm not able to figure out.
Like #Arun pointed out, you are trying to run your function twice: once on the files and once one the data frames you have created... Instead, your code should look like this:
files <- list.files(pattern = ".txt")
mylist <- lapply(files, myfunction)
I am new to R and trying to do some correlation analysis on multiple sets of data. I am able to do the analysis, but I am trying to figure out how I can output the results of my data. I'd like to have output like the following:
NAME,COR1,COR2
....,....,....
....,....,....
If I could write such a file to output, then I can post process it as needed. My processing script looks like this:
run_analysis <- function(logfile, name)
{
preds <- read.table(logfile, header=T, sep=",")
# do something with the data: create some_col, another_col, etc.
result1 <- cor(some_col, another_col)
result1 <- cor(some_col2, another_col2)
# somehow output name,result1,result2 to a CSV file
}
args <- commandArgs(trailingOnly = TRUE)
date <- args[1]
basepath <- args[2]
logbase <- paste(basepath, date, sep="/")
logfile_pattern <- paste( "*", date, "csv", sep=".")
logfiles <- list.files(path=logbase, pattern=logfile_pattern)
for (f in logfiles) {
name = unlist(strsplit(f,"\\."))[1]
logfile = paste(logbase, f, sep="/")
run_analysis(logfile, name)
}
Is there an easy way to create a blank data frame and then add data to it, row by row?
Have you looked at the functions in R for writing data to files? For instance, write.csv. Perhaps something like this:
rs <- data.frame(name = name, COR1 = result1, COR2 = result2)
write.csv(rs,"path/to/file",append = TRUE,...)
I like using the foreach library for this sort of thing:
library(foreach)
run_analysis <- function(logfile, name) {
preds <- read.table(logfile, header=T, sep=",")
# do something with the data: create some_col, another_col, etc.
result1 <- cor(some_col, another_col)
result2 <- cor(some_col2, another_col2)
# Return one row of results.
data.frame(name=name, cor1=result1, cor2=result2)
}
args <- commandArgs(trailingOnly = TRUE)
date <- args[1]
basepath <- args[2]
logbase <- paste(basepath, date, sep="/")
logfile_pattern <- paste( "*", date, "csv", sep=".")
logfiles <- list.files(path=logbase, pattern=logfile_pattern)
## Collect results from run_analysis into a table, by rows.
dat <- foreach (f=logfiles, .combine="rbind") %do% {
name = unlist(strsplit(f,"\\."))[1]
logfile = paste(logbase, f, sep="/")
run_analysis(logfile, name)
}
## Write output.
write.csv(dat, "output.dat", quote=FALSE)
What this does is to generate one row of output on each call to run_analysis, binding them into a single table called dat (the .combine="rbind" part of the call to foreach causes row binding). Then you can just use write.csv to get the output you want.