How to force R to read in numerical order? - r

I have several files in one folder and I want to rename them,I noticed that R reads in alphbatical order, so I used the command mixedsort and it worked but when I checked the results I found that the files were read in a different order not numerically. The name of the first file is Daily_NPP1.bin up to Daily_NPP365.bin
a<- list.files("C:\\New folder (6)", "*.bin", full.names = TRUE)
k<- mixedsort(a)#### load package feild
b <- sprintf("C:carbonflux\\Daily_Rh%d.bin", seq(k))
file.rename(a, b)
How do I force R to read in numerical order?

If renaming is all you want to do you could just do the following, regardless of sorting.
b <- sub("^.*?([0-9]+)\\.bin$", "C:\\\\carbonflux\\\\Daily_Rh\\1.bin", a)
file.rename(a, b)
The first argument to sub extracts the numbers at the end of the file names, and the second pastes it into the new file name template (at the position of \\1). All the \\\\ are needed to escape the backslashes properly.

Here is a way to order the vector without renaming the files:
# Replication of data:
a <- sort(paste0("Daily_NPP",1:365,".bin"))
# Extract numbers and order:
a <- a[order(as.numeric(gsub("[^0-9]","",a)))]

Related

Loop over a large number of CSV files with the same statements in R?

I'm having a lot of trouble reading/writing to CSV files. Say I have over 300 CSV's in a folder, each being a matrix of values.
If I wanted to find out a characteristic of each individual CSV file such as which rows had an exact number of 3's, and write the result to another CSV fil for each test, how would I go about iterating this over 300 different CSV files?
For example, say I have this code I am running for each file:
values_4 <- read.csv(file = 'values_04.csv', header=FALSE) // read CSV in as it's own DF
values_4$howMany3s <- apply(values_04, 1, function(x) length(which(x==3))) // compute number of 3's
values_4$exactly4 <- apply(values_04[50], 1, function(x) length(which(x==4))) // show 1/0 on each column that has exactly four 3's
values_4 // print new matrix
I am then continuously copy and pasting this code and changing the "4" to a 5, 6, etc and noting the values. This seems wildly inefficient to me but I'm not experienced enough at R to know exactly what my options are. Should I look at adding all 300 CSV files to a single list and somehow looping through them?
Appreciate any help!
Here's one way you can read all the files and proceess them. Untested code as you haven't given us anything to work on.
# Get a list of CSV files. Use the path argument to point to a folder
# other than the current working directory
files <- list.files(pattern=".+\\.csv")
# For each file, work your magic
# lapply runs the function defined in the second argument on each
# value of the first argument
everything <- lapply(
files,
function(f) {
values <- read.csv(f, header=FALSE)
apply(values, 1, function(x) length(which(x==3)))
}
)
# And returns the results in a list. Each element consists of
# the results from one function call.
# Make sure you can access the elements of the list by filename
names(everything) <- files
# The return value is a list. Access all of it with
everything
# Or a single element with
everything[["values04.csv"]]

Extracting a single cell value from multiple csv files in R and

I have 500 csv. files with data that looks like:
sample data
I want to extract one cell (e.g. B4 or 0.477) per a csv file and combine those values into a single csv. What are some recommendations on how to do this easily?
You can try something like this
all.fi <- list.files("/path/to/csvfiles", pattern=".csv", full.names=TRUE) # store names of csv files in path as a string vector
library(readr) # package for read_lines and write_lines
ans <- sapply(all.fi, function(i) { eachline <- read_lines(i, n=4) # read only the 4th line of the file
ans <- unlist(strsplit(eachline, ","))[2] # split the string on commas, then extract the 2nd element of the resulting vector
return(ans) })
write_lines(ans, "/path/to/output.csv")
I can not add a comment. So, I will write my comment here.
Since your data is very large and it is very difficult to load it individually, then try this: Importing multiple .csv files into R. It is similar to the first part of your problem. For second part, try this:
You can save your data as a data.frame (as with the comment of #Bruno Zamengo) and then you can use select and merge functions in R. Then, you can easily combine them in single csv file. With select and merge functions you can select all the values you need and them combine them. I used this idea in my project. Do not forget to use lapply.

remove multiple sequenced files in R

i created multiple files named 1:100 + random letter to file:
for (i in 1:100){
file.create( paste0(i , ".txt"), showWarnings=TRUE)
# assign random LETTER to files
AZ <- sample(LETTERS,1)
cat(AZ,file = paste0(i,".txt"),append=TRUE)
#rename files, and create new file with append of LETTERS
name <- scan(file=paste0(i,".txt"), what="character")
file.rename(paste0(i,".txt"), paste0(i, name,".txt"))
Now, i have a lot of files named like "1T, 2C, 3Y,..., 100A" and i want to remove all these files (not removing the rest that has in the directory) with file.remove function, how should i remove them without naming one by one? and all the directory named "exercicio03" with everything inside?
ps.: i have already tried
file.remove(paste0(i,name,".txt"))
but is removing only the last file "100A"
You can easily remove only the files with names like "1T.txt, 2C.txt, 3Y.txt, ..., 100A.txt" with the following two lines of code:
remove.files <- list.files(".", pattern="^[0-9]{1,3}[A-Z]{1}\\.txt$")
do.call(file.remove,list(remove.files))
The script obtains all text files beginning with 1-3 digits followed by a letter in the current directory where you created them, and removes them.
Since you used a sample function, I think you can only be 100% sure that you remove only these files and no others, if you (did) save the values you became from that sample function.
So your first part should have been:
AZ<-NA
for (i in 1:100){
file.create( paste0(i , ".txt"), showWarnings=TRUE)
# assign random LETTER to files
AZ[i] <- sample(LETTERS,1)
cat(AZ[i],file = paste0(i,".txt"),append=TRUE)
#rename files, and create new file with append of LETTERS
name <- scan(file=paste0(i,".txt"), what="character")
file.rename(paste0(i,".txt"), paste0(i, name,".txt"))
}
That way you can afterwards remove them all via this :
for (i in 1:100){
file.remove(paste0(i,AZ[i],".txt"))
}

Replace multiple strings in multiple files with R

I have something like 700,000 files in a folder where I need to find and replace multiple strings with different other strings (all 4 caracters codes). It is unsure if a string is present or not in a file. I'm trying to use gsub but I can't find how to do it with regular expressions. Can someone tell me a good and efficient way to handle this task?
This is the code I've used so far. It worked well with only one y <- gsub(...) instruction but doesn't work for my purpose, obviously because only the last gsub instruction is taken into account for defining the y variable...
chm_files <- list.files(getwd(), pattern=("^[[:digit:]]*.chm$"), full.names=F)
for(chm_file in chm_files) {
x <- readLines(chm_file)
y <- gsub("AG02|AG07|AG05|AG18|AG19|AG08|AG09|AG17", "AGRL", x)
y <- gsub("SB28|SB42|SB43|SB33|SB41|SB34|SB39|SB35", "SWHT", x)
y <- gsub("WB28|WB42|WB43|WB32|WB09|WB33|WB41|WB26", "BARL", x)
y <- gsub("WW02|WW25|WW08|WW31|WW05|WW28|WW19|WW42", "WWHT", x)
cat(y, file=chm_file, sep="\n")
}
I am sure there are already numerous pre-built functions for this task in various R-packages, but anyhow I just cooked this one up for myself and others to use/modify. Apart from the tasks request above it also prints out a tracking log of the count of all changes made across files function: multi_replace.
Here is some example code of how it should be run
# local directory with files you want to work with
setwd("C:/Users/DW/Desktop/New folder")
# get a list of files based on a pattern of interest e.g. .html, .txt, .php
filer = list.files(pattern=".php")
# f - list of original string values you want to change
f <- c("localhost","dbtest","root","oldpassword")
# r - list of values to replace the above values with
# make sure the indexing of f & r
r <- c("newhost", "newdb", "newroot", "newpassword")
# Run the function and watch all your changes take place ;)
tracking_sheet <- multi_replace(filer, f, r)
tracking_sheet
setwd("D:/R Training Material Kathmandu/File renaming procedures")
filer = list.files(pattern="2016")
f <- c("DATA,","$")
r <- c("","")
tracking_sheet <- multi_replace(filer, f, r)
tracking_sheet
I used the above script but the code failed to replace the $ sign among all files

Importing .CSV using a list in R

so I have 29 data files that I want to load into R. The files are called "1.csv", "2.csv" etc. all the way to 29. Here is the code depicting what I'm trying to do:
file.number <- c(1:29)
"the value in file.number".data <- read.csv("the value in file.number"".csv")
Basically I am looking for a way to load code based on a list, and label it accordingly. Is this possible?
Any help will be greatly appreciated!!!
This would probably work
dfList <- setNames(lapply(paste0(1:29, ".csv"), read.csv), paste0(1:29, ".data"))
Now you've got a named list of 29 data frames. Then you can access each individual data frame with the $ operator, e.g. dfList$"4.data". Note that you'll need quotes or backticks since you've chosen to begin the names with a digit. You can avoid that by using [[ to access the elements i.e. dfList[["4.data"]], or changing to different names such as paste0("data", 1:29), or any name that doesn't begin with a digit.
Another option would be Map
Map(read.csv, paste0(1:29, ".csv"))
This will automatically set the names to the names of the file being read i.e. 1.csv, 2.csv, etc. But again, backticks or quotes would be needed to access the elements with the $ operator because the names begin with digits.
listwithdfs <- lapply(1:29, function(x) read.csv(paste0(x, ".csv")) )
names(listwithdfs) <- 1:29
better to only have one single object in workspace.
now you can index with
listwithdfs[[13]]

Resources