Files details from folder - r

I'd like to loop through a list of files and record detailed info about them (size, no. of rows, means of columns).
I just started with storing the info in a data frame:
df<-data.frame()
all <-list.files(pattern=".csv")
for (i in all){
file<-read.csv(i)
filas<-nrow(file)
cols<-ncol(file)
info<-c(i,filas,cols)
df<-rbind(df,i,filas,cols)
}
but it triggers an error caused by the 'i' variable, which is just a file name. What am I doing wrong?
Thanks in advance, p.

Don't use for loops. Rather, use lapply in combination with do.call to obtain your desired result. Try:
do.call(rbind,lapply(all,function(x) {y<-read.csv(x); c(file=x, filas=nrow(y), cols=ncol(y))}))
Your approach was failing because in order of rbind to work, you need two data.frames with the same number of columns. You initially have created an empty data.frame (with 0 column) and this couldn't be rbinded to a vector of length 3 (assuming that you want a row for each file showing file name, number of rows and number of columns). If you really want to use a for loop, you should do something like:
for (i in 1:length(all)) {
file<-read.csv(all[i])
info<- data.frame(file=all[i], filas=nrow(file), cols=ncol(file))
if (i==1) df<-info else df<-rbind(df,info)
}

Related

Convert entire dataframe dynamically

I have some dataframe allchemicals that generates n columns. I want to convert each of the columns in the dataframe so that it updates in real time. I have tried
convertedframes<-reactiveValues(cc0=0, mgl=0, ngl=0, ugl=0)
outputallconcs<-reactiveValues(chemicals=0)
observe({
convertedframes$cc0<-allchemicalscc0()
convertedframes$mgl<-allchemicals()*1
convertedframes$ngl<-allchemicals()*1000
convertedframes$ugl<-allchemicals()*1000000
})
observeEvent(input$run_button, {
req(allchemicals())
if(input$OCUnits=="c/c0"){
outputallconcs$chemicals<-convertedframes$cc0
}
if(input$OCunits=="mg/L"){
outputallconcs$chemicals<-convertedframes$mgl
}
if(input$OCunits=="ng/L"){
outputallconcs$chemicals<-convertedframes$ngl
}
if(input$OCunits=="ug/L"){
outputallconcs$chemicals<-convertedframes$ugl
}
})
But this leaves me with the error Warning: Error in if: argument is of length zero
When I do output$sum<-renderTable(outputallconcs$chemicals) I see the output is the dataframe that I want. When I try a similar method with just a single column this method works fine because I can just reference the one column name, however, things seem more complicated with a varying number of columns. Is there any easy way to do this? I apologize this is not a reproducible example, to generate these dataframes takes hundreds of lines of code which didn't seem necessary to share.

Create list from column and filter from the resulting list

The below code allows simple filtering of a list:
#Filter to applicable codes only
ICS_List <- "QMM|QJG|QH8|QM7|QUE|QHG"
EofEMSOAs <- EofEMSOAs[grep(ICS_List, EofEMSOAs$Code),]
What I am looking to do instead is take all data from the column from another dataframe within a project and use the grep function to filter for values contained within that column - there could be hundreds so typing a manual list is not practical.
I have tried the below but it results in error 'argument 'pattern' has length > 1 and only the first element will be used. Seems using dplyr in this way does not create the same output as manually typing in a list which is throwing the error so I only get one result.
#To filter from required dataframe 'EofEMSOAsIMD'
EofEMSOAsCodeListOnly <- dplyr::pull(EofEMSOAsIMD, "Area Code")
EofEMSOAsFinalList <- EofEMSOAs[grep(EofEMSOAsCodeListOnly, EofEMSOAs$msoa11cd),]
Could anyone please amend the above so it does work using similar logic to the code at top of this question, namely 1. List created from column 2. Dataframe filtered for matches to that list? Thank you.

Dynamically assign elements to object

In R, I am trying to dynamically select columns of a data.frame called DF. If
cutOffYear=2014
and
forecast_years=3
Then this piece of code
paste0("DF$X",cutOffYear+1:forecast_years)
yields:
[1] "DF$X2015" "DF$X2016" "DF$X2017"
Assuming all three columns exist in DF how do I assign the column variables to the characters?
I have tried a lot of combinations of get, assign and paste0 but I am failing.
We can try with [ to select the columns. It is often error prone when using $. If we need to get the output as a data.frame with columns specified in the pasted combination of 'cutOffYear', 'forecast_years', then the below code should work fine
DF[paste0("X", cutOffYear+1:forecast_years)]

Writing a for loop in r

I don't know how to write for-loops in r. Here is what I want to do:
I have a df called "na" with 50 columns (ana1_1:ana50_1). I want to loop these commands over all columns. Here are the commands for the first two columns (ana1_1 and ana2_1):
t<-table(na$ana1_1)
ana1_1<-capture.output(sort(t))
cat(ana1_1,file="ana.txt",sep="\n",append=TRUE)
t<-table(na$ana2_1)
ana2_1<-capture.output(sort(t))
cat(ana2_1,file="ana.txt",sep="\n",append=TRUE)
After the loop, all tables (ana1_1:ana50_1) should be written in ana.txt Has anyone an idea, how to solve the problem? Thank you very much!
One approach would be to loop through the columns with lapply and using the same code as in the OP's post
invisible(lapply(na, function(x) {
x1 <- capture.output(sort(table(x)))
cat(x1, file='ana.txt', sep="\n", append=TRUE)
}))
Wrapping with invisible so that it won't print 'NULL' in the R console.
We can wrap with a condition to check if the file already exists so that it won't add the same lines by accidentally running the code again.
if(!file.exists('ana.txt')){
invisible( lapply(na, function(x) {
x1 <- capture.output(sort(table(x)))
cat(x1, file='ana.txt', sep="\n", append=TRUE)
}))
}
Here is a solution with a for loop. Loops tend to be slow in r so people prefer other solutions (e.g. the great answer provided by akrun). This answer is for your understanding of the loop syntax:
for(i in 1:50){
t1<-table(na[,i])
t2<-capture.output(sort(t1))
cat(t2,file="ana.txt",sep="\n",append=TRUE)
}
We are looping through i from 1 to 50 (first line). To select a column there's two (there's actually more than two, but that's for another time) ways to access it: na$ana1_1 or na[,1] both select the first column (second line). In the first case you refer by column name, in the second by column index. Here the second case is more convenient. The rest is your desired calculations.
Be aware that cat creates a new file if ana.txt is not existing yet and appends to it if it is already there.

In R, package xts, how would one iterate period subsetting over a list without throwing errors?

Assume:
list of n xts objects in .GlobalEnv with the suffix ".raw" (e.g: ABC.raw)
have created a list of .raw names in a list (ie, rawfiles <- ls(pattern="*.raw",envir=.GlobalEnv))
Would like to:
loop or lapply through rawfiles and subset a particular timeperiod in each iteration
for example, to write this as a single line would be: new <- ABC.raw["T09:00/T10:00"] if I wanted to subset ABC.raw from 9am to 10am each day.
The problem is:
Doesn't seem to be an easy way of passing["Thh:mm/Thh:mm"] to a loop, apply or assign without causing errors.
Any ideas how to pass this?
In pidgeon code, I guess I'm looking for a working equivalent of:
for(i in 1:length(raw)){
raw[i]["T09:00/T10:00"]
}
Many thanks in advance for any assistance on this.
Try get.
get(x) retrieves the variable whose name is stored in x, so foo<-1; get('foo') would return 1.
for ( rawname in rawfiles ) {
get(rawname)["T09:00/T10:00"]
}

Resources