I want to create a config file. In an R file it would look like the following:
#file:config.R
min_birthday_year <- 1920
max_birthday <- Sys.Date() %m+% months(9)
min_startdate_year <- 2010
max_startdate_year <- 2022
And in the main script I would do: source("config.R") .
However, now I want to source the config data from a .csv file. Does anyone have any idea how to? The file could also be in a .txt format
First thing I would suggest is looking into the config package.
It allows you to specify variables in a yaml text file. I haven't used it but it seems pretty neat and looks like it may be a good solution.
If you don't want to use that, then if your csv is something like this, with var names in one column and values in the next:
min_birthday_year,1920
max_birthday,Sys.Date() %m+% months(9)
min_startdate_year,2010
max_startdate_year,2022
then you could do something like this:
# Read in the file
# assuming that names are in one column and values in another
# will create vars using vals from second col with names from first
config <- read.table("config.csv", sep = ",")
# mapply with assign, with var names in one vector and values in the other
# eval(parse()) call to evaluate value as an expression - needed to evaluate the Sys.Date() thing.
# tryCatch in case you add a string value to the csv at some point, which will throw an error in the `eval` call
mapply(
function(x, y) {
z <- tryCatch(
eval(parse(text = y)),
error = function(e) y
)
assign(x, z, inherits = TRUE)
},
config[[1]],
config[[2]]
)
Related
I amtrying to do some R coding for my project. Where I have to read some .csv files from one directory in R and I have to assign data frame as df_subject1_activity1, i have tried nested loops but it is not working.
ex:
my dir name is "Test" and i have six .csv files
subject1activity1.csv,
subject1activity2.csv,
subject1activity3.csv,
subject2activity1.csv,
subject2activity2.csv,
subject2activity3.csv
now i want to write code to load this .csv file in R and assign dataframe name as
ex:
subject1activity1 = df_subject1_activity1
subject1activity2 = df_subject1_activity2
.... so on using for loop.
my expected output is:
df_subject1_activity1
df_subject1_activity2
df_subject1_activity3
df_subject2_activity1
df_subject2_activity2
df_subject2_activity3
I have trie dfollowing code:
setwd(dirname(getActiveDocumentContext()$path))
new_path <- getwd()
new_path
data_files <- list.files(pattern=".csv") # Identify file names
data_files
for(i in 1:length(data_files)) {
for(j in 1:4){
assign(paste0("df_subj",i,"_activity",j)
read.csv2(paste0(new_path,"/",data_files[i]),sep=",",header=FALSE))
}
}
I am not getting desire output.
new to R can anyone please help.
Thanks
One solution is to use the vroom package (https://www.tidyverse.org/blog/2019/05/vroom-1-0-0/), e.g.
library(tidyverse)
library(vroom)
library(fs)
files <- fs::dir_ls(glob = "subject_*.csv")
data <- purrr::map(files, ~vroom::vroom(.x))
list2env(data, envir = .GlobalEnv)
# You can also combine all the dataframes if they have the same columns, e.g.
library(data.table)
concat <- data.table::rbindlist(data, fill = TRUE)
You are almost there. As always, if you are unsure, is never a bad idea to code clearly using more lines.
data_files <- list.files(pattern=".csv", full.names=TRUE) # Identify file names data_files
for( data_file in data_files) {
## check that the data file matches our expected pattern:
if(!grepl( "subject[0-9]activity[0-9]", basename(data_file) )) {
warning( "skiping file ", basename(data_file) )
next
}
## start creating the variable name from the filename
## remove the .csv extension
var.name <- sub( "\\.csv", "", basename(data_file), ignore.case=TRUE )
## prepend 'df' and introduce underscores:
var.name <- paste0(
"df",
gsub( "(subject|activity)", "_\\1", var.name ) ## this looks for literal 'subject' and 'acitivity' and if found, adds an underscore in front of it
)
## now read the file
data.from.file <- read.csv2( data_file )
## and assign it to our variable name
assign( var.name, data.from.file )
}
I don't have your files to test with, but should the above fail, you should be able to run the code line by line and easily see where it starts to go wrong.
I have a list of files like:
nE_pT_sbj01_e2_2.csv,
nE_pT_sbj02_e2_2.csv,
nE_pT_sbj04_e2_2.csv,
nE_pT_sbj05_e2_2.csv,
nE_pT_sbj09_e2_2.csv,
nE_pT_sbj10_e2_2.csv
As you can see, the name of the files is the same with the exception of 'sbj' (the number of the subject) which is not consecutive.
I need to run a for loop, but I would like to retain the original number of the subject. How to do this?
I assume I need to replace length(file) with something that keeps the original number of the subject, but not sure how to do it.
setwd("/path")
file = list.files(pattern="\\.csv$")
for(i in 1:length(file)){
data=read.table(file[i],header=TRUE,sep=",",row.names=NULL)
source("functionE.R")
Output = paste("e_sbj", i, "_e2.Rdata")
save.image(Output)
}
The code above gives me as output:
e_sbj1_e2.Rdata,e_sbj2_e2.Rdata,e_sbj3_e2.Rdata,
e_sbj4_e2.Rdata,e_sbj5_e2.Rdata,e_sbj6_e2.Rdata.
Instead, I would like to obtain:
e_sbj01_e2.Rdata,e_sbj02_e2.Rdata,e_sbj04_e2.Rdata,
e_sbj05_e2.Rdata,e_sbj09_e2.Rdata,e_sbj10_e2.Rdata.
Drop the extension "csv", then add "Rdata", and use filenames in the loop, for example:
myFiles <- list.files(pattern = "\\.csv$")
for(i in myFiles){
myDf <- read.csv(i)
outputFile <- paste0(tools::file_path_sans_ext(i), ".Rdata")
outputFile <- gsub("nE_pT_", "e_", outputFile, fixed = TRUE)
save(myDf, file = outputFile)
}
Note: I changed your variable names, try to avoid using function names as a variable name.
If you use regular expressions and sprintf (or paste0), you can do it easily without a loop:
fls <- c('nE_pT_sbj01_e2_2.csv', 'nE_pT_sbj02_e2_2.csv', 'nE_pT_sbj04_e2_2.csv', 'nE_pT_sbj05_e2_2.csv', 'nE_pT_sbj09_e2_2.csv', 'nE_pT_sbj10_e2_2.csv')
sprintf('e_%s_e2.Rdata',regmatches(fls,regexpr('sbj\\d{2}',fls)))
[1] "e_sbj01_e2.Rdata" "e_sbj02_e2.Rdata" "e_sbj04_e2.Rdata" "e_sbj05_e2.Rdata" "e_sbj09_e2.Rdata" "e_sbj10_e2.Rdata"
You can easily feed the vector to a function (if possible) or feed the function to the vector with sapply or lapply
fls_new <- sprintf('e_%s_e2.Rdata',regmatches(fls,regexpr('sbj\\d{2}',fls)))
res <- lapply(fls_new,function(x) yourfunction(x))
If I understood correctly, you only change extension from .csv to .Rdata, remove last "_2" and change prefix from "nE_pT" to "e". If yes, this should work:
Output = sub("_2.csv", ".Rdata", sub("nE_pT, "e", file[i]))
I have this little problem in R : I loaded a dataset, modified it and stored it in the variable "mean". Then I used an other variable "dataset" also containing this dataset
data<-read.table()
[...modification on data...]
mean<-data
dataset<-mean
I used the variable "dataset" in some other functions of my script, etc. and at the end I want to store in a file with the name "table_mean.csv"
Of course the command write.csv(tabCorr,file=paste("table_",dataset,".csv",sep=""))
nor the one with ...,quote(dataset)... do what I want...
Does anyone know how I can retrieve "mean" (as string) from "dataset" ?
(The aim would be that I could use this script for other purposes simply changing e.g. dataset<-variance)
Thank you in advance !
I think you are trying to do something like the following code does:
data1 <- 1:4
data2 <- 4:8
## Configuration ###
useThisDataSet <- "data2" # Change to "data1" to use other dataset.
currentDataSet <- get(x = useThisDataSet)
## Your data analysis.
result <- fivenum(currentDataSet)
## Save results.
write.csv(x = result, file = paste0("table_", useThisDataSet, ".csv"))
However, a better alternative would be to wrap your code into a function and pass in your data:
doAnalysis <- function(data, name) {
result <- fivenum(data)
write.csv(x = result, file = paste0("table_", name, ".csv"))
}
doAnalysis(data1, "data1")
If you always want to use the name of the object passed into the function as part of the filename, we can use non-standard evaluation to save some typing:
doAnalysisShort <- function(data) {
result <- fivenum(data)
write.csv(x = result, file = paste0("table_", substitute(data), ".csv"))
}
doAnalysisShort(data1)
I am running the following code in order to open up a set of CSV files that have temperature vs. time data
temp = list.files(pattern="*.csv")
for (i in 1:length(temp))
{
assign(temp[i], read.csv(temp[i], header=FALSE, skip =20))
colnames(as.data.frame(temp[i])) <- c("Date","Unit","Temp")
}
the data in the data frames looks like this:
V1 V2 V3
1 6/30/13 10:00:01 AM C 32.5
2 6/30/13 10:20:01 AM C 32.5
3 6/30/13 10:40:01 AM C 33.5
4 6/30/13 11:00:01 AM C 34.5
5 6/30/13 11:20:01 AM C 37.0
6 6/30/13 11:40:01 AM C 35.5
I am just trying to assign column names but am getting the following error message:
Error in `colnames<-`(`*tmp*`, value = c("Date", "Unit", "Temp")) :
'names' attribute [3] must be the same length as the vector [1]
I think it may have something to do how my loop is reading the csv files. They are all stored in the same directory in R.
Thanks for your help!
I'd take a slightly different approach which might be more understandable:
temp = list.files(pattern="*.csv")
for (i in 1:length(temp))
{
tmp <- read.csv(temp[i], header=FALSE, skip =20)
colnames(tmp) <- c("Date","Unit","Temp")
# Now what do you want to do?
# For instance, use the file name as the name of a list element containing the data?
}
Update:
temp = list.files(pattern="*.csv")
stations <- vector("list", length(temp))
for (i in 1:length(temp)) {
tmp <- read.csv(temp[i], header=FALSE, skip =20)
colnames(tmp) <- c("Date","Unit","Temp")
stations[[i]] <- tmp
}
names(stations) <- temp # optional; could process file names too like using basename
station1 <- station[[1]] # etc station1 would be a data.frame
This 2nd part could be improved as well, depending upon how you plan to use the data, and how much of it there is. A good command to know is str(some object). It will really help you understand R's data structures.
Update #2:
Getting individual data frames into your workspace will be quite hard - someone more clever than I may know some tricks. Since you want to plot these, I'd first make names more like you want with:
names(stations) <- paste(basename(temp), 1:length(stations), sep = "_")
Then I would iterate over the list created above as follows, creating your plots as you go:
for (i in 1:length(stations)) {
tmp <- stations[[i]]
# tmp is a data frame with columns Date, Unit, Temp
# plot your data using the plot commands you like to use, for example
p <- qplot(x = Date, y = Temp, data = tmp, geom = "smooth", main = names(stations)[i])
print(p)
# this is approx code, you'll have to play with it, and watch out for Dates
# I recommend the package lubridate if you have any troubles parsing the dates
# qplot is in package ggplot2
}
And if you want to save them in a file, use this:
pdf("filename.pdf")
# then the plotting loop just above
dev.off()
A multipage pdf will be created. Good Luck!
It is usually not recommended practice to use the 'assign' statement in R. (I should really find some resources on why this is so.)
You can do what you are trying using a function like this:
read.a.file <- function (f, cnames, ...) {
my.df <- read.csv(f, ...)
colnames(my.df) <- cnames
## Here you can add more preprocessing of your files.
}
And loop over the list of files using this:
lapply(X=temp, FUN=read.a.file, cnames=c("Date", "Unit", "Temp"), skip=20, header=FALSE)
"read.csv" returns a data.frame so you don't need "as.data.frame" call;
You can use "col.names" argument to "read.csv" to assign column names;
I don't know what version of R you are using, but "colnames(as.data.frame(...)) <-" is just an incorrect call since it calls for "as.data.frame<-" function that does not exist, at least in version 2.14.
A short-term fix to your woes is the following, but you really need to read up more on using R as from what you did above I expect you'll get into another mess very quickly. Maybe start by never using assign.
lapply(list.files(pattern = "*.csv"), function (f) {
df = read.csv(f, header = F, skip = 20))
names(df) = c('Date', 'Unit', 'Temp')
df
}) -> your_list_of_data.frames
Although more likely you want this (edited to preserve file name info):
df = do.call(rbind,
lapply(list.files(pattern = "*.csv"), function(f)
cbind(f, read.csv(f, header = F, skip = 20))))
names(df) = c('Filename', 'Date', 'Unit', 'Temp')
At a glance it appears that you are missing a set of subset braces, [], around the elements of your temp list. Your attribute list has three elements but because you have temp[i] instead of temp[[i]] the for loop isn't actually accessing the elements of the list thus treating as an element of length one, as the error says.
I am using R to calculate the mean values of a column in a file like so:
R
file1 = read.table("x01")
mean(file1$V4)
However I have no experience building loops involving R, only with bash.
How would I convert this into a loop that did this for every file in a folder and saved the output into one file with the file name and mean value as the 2 columns for each row?
eg:
x01(or file1 if that is simpler) 23.4
x02 25.4
x03 10.4
etc
(Don't mind if the solution is bash and R or exclusively R)
Many thanks for your help!
Current error from one of the solutions using bash and R:
Error in `[.data.frame`(read.table("PercentWindowConservedRanked_Lowest_cleanfor1000genomes_1000regions_x013", :
undefined columns selected
Calls: mean -> [ -> [.data.frame
Execution halted
This is similar to what #jmsigner has done, but with minor changes. For instance, writing to a file is done at the end. The code has not been tested.
out <- lapply(list.files(), FUN = function(x) {
m <- mean(read.table(x, header = TRUE)$V4)
return(m)
})
result <- do.call("cbind", out) #merge a list column-wise
# before writing, you can make column names pretty with colnames()
# e.g. colnames(result) <- c("x01", "x02")
write.table(result, file = "means.txt")
Assuming the columns are always named the same, you could do the following in R:
out.file <- 'means.txt'
for (i in list.files()) {
tmp.file <- read.table(i, header=TRUE) # Not sure if you have headers or not
tmp.mean <- mean(tmp.file1$V4)
write(paste0(i, "," tmp.mean), out.file, append=TRUE)
}
Or the same thing with more bash:
for i in $(ls *)
do
mean=$(Rscript -e "mean(read.table('$i', header=T)[, 'V4'])")
echo $i,$mean >> means.txt
done
My solution is also similar to #jmsinger but you can specify the path to your files in the code itself and then calculate the mean like this :
filename <- system("ls /dir/",intern=TRUE)
for(i in 1:length(filename)){
file <- read.table(filename[i],header=TRUE) ## if you have headers in your files ##
mean <- mean(file$V4)
write.table(mean,file=paste("/dir",paste("mean",filename[i],sep="."),sep="/"))
##if you wish to write the means of all the files in seperate files rather than one.
}
hope this helps