How to use an index to read multiple files at a time? - r

I want R to read the five files with names like
"alpha_rarefaction_8000_0.txt" ... "alpha_rarefaction_12000_0.txt"
and write it as
"alpha8000" ... "alpha12000", respectively.
I used the following code, but it did not work. Please help. What's wrong with my codes?
I tried to search like "how to use index in R function" or "how to write executable loop in R", but nothing helps. What kind of search strategy should I use to get effective results where searching the answers on Google?
for(i in seq(8000,12000,by=1000)) {
paste("rare",i,sep="")<-read.table(paste("alpha_rarefaction",i,"0.txt",sep="_"))
}
or
read.rare<-function(i){
paste("rare",$i,sep="")<-read.table(paste("alpha_rarefaction",$i,"0.txt",sep="_"))
}
i<-seq(8000,12000,by=1000)
read.rare(i)

I would recommend you read the files into a list, possibly doing it this way -
## create the sequence for the file names
s <- 8:12 * 1e3
# [1] 8000 9000 10000 11000 12000
## create the full file names from the sequence above
files <- sprintf("alpha_rarefaction_%d_0.txt", s)
# [1] "alpha_rarefaction_8000_0.txt" "alpha_rarefaction_9000_0.txt"
# [3] "alpha_rarefaction_10000_0.txt" "alpha_rarefaction_11000_0.txt"
# [5] "alpha_rarefaction_12000_0.txt"
## Now we can loop the file names, reading the data into a list
## and setting the names for each element
datalist <- setNames(lapply(files, read.table), paste0("alpha", s))
This will keep all the data frames in a list, which will make working with them later a lot easier. You can access them individually with the $ operator. They have names
names(datalist)
[1] "alpha8000" "alpha9000" "alpha10000" "alpha11000" "alpha12000"
so datalist$alpha9000, for example, accesses the second data set (and alternatively with datalist[[2]]).

Related

Loop over a large number of CSV files with the same statements in R?

I'm having a lot of trouble reading/writing to CSV files. Say I have over 300 CSV's in a folder, each being a matrix of values.
If I wanted to find out a characteristic of each individual CSV file such as which rows had an exact number of 3's, and write the result to another CSV fil for each test, how would I go about iterating this over 300 different CSV files?
For example, say I have this code I am running for each file:
values_4 <- read.csv(file = 'values_04.csv', header=FALSE) // read CSV in as it's own DF
values_4$howMany3s <- apply(values_04, 1, function(x) length(which(x==3))) // compute number of 3's
values_4$exactly4 <- apply(values_04[50], 1, function(x) length(which(x==4))) // show 1/0 on each column that has exactly four 3's
values_4 // print new matrix
I am then continuously copy and pasting this code and changing the "4" to a 5, 6, etc and noting the values. This seems wildly inefficient to me but I'm not experienced enough at R to know exactly what my options are. Should I look at adding all 300 CSV files to a single list and somehow looping through them?
Appreciate any help!
Here's one way you can read all the files and proceess them. Untested code as you haven't given us anything to work on.
# Get a list of CSV files. Use the path argument to point to a folder
# other than the current working directory
files <- list.files(pattern=".+\\.csv")
# For each file, work your magic
# lapply runs the function defined in the second argument on each
# value of the first argument
everything <- lapply(
files,
function(f) {
values <- read.csv(f, header=FALSE)
apply(values, 1, function(x) length(which(x==3)))
}
)
# And returns the results in a list. Each element consists of
# the results from one function call.
# Make sure you can access the elements of the list by filename
names(everything) <- files
# The return value is a list. Access all of it with
everything
# Or a single element with
everything[["values04.csv"]]

Use multiple dataframes from a package of data in R

I am working with a large dataset in an R package.
I need to get all of the separate data frames into my global environment, preferably into a list of data frames so that I can use lapply to do some repetitive operations later.
So far I've done the following:
l.my.package <- data(package="my.package")
lc.my.package <- l.my.package[[3]]
lc.df.my.package <- as.data.frame(lc.my.package)
This effectively creates a data frame of the location and name of each of the .RData files in my package, so I can load them all.
I have figured out how to load them all using a for loop.
I create a vector of path names and feed it into the loop:
f <- path('my/path/folder', lc.df.my.package$Item, ext="rdata")
f.v <- as.vector(f)
for (i in f.v) {load(i)}
This loads everything into separate data frames (as I want), but it obviously doesn't put the data frames into a list. I thought lapply would work here, but when I use lapply, the resulting list is a list of character strings (the title of each dataframe with no data included). That code looks like this:
f.l <- as.list(f)
func <- function(i) {load(i)}
df.list <- lapply(f.l, func)
I am looking for one of two possible solutions:
how can I efficiently collect the output of for loop into a list (a "while" loop would likely be too slow)?
how can I adjust lapply so the output includes each entire dataframe instead of just the title of each dataframe?
Edit: I have also tried introducing the "envir=.GlobalEnv" argument into load() within lapply. When I do that, the data frames load, but still not in a list. The list still contains only the names as character strings.
If you are willing to use a packaged solution, I wrote a package call libr that does exactly what you are asking for. Here is an example:
library(libr)
# Create temp directory
tmp <- tempdir()
# Save some data to temp directory
# for illustration purposes
saveRDS(trees, file.path(tmp, "trees.rds"))
saveRDS(rock, file.path(tmp, "rocks.rds"))
# Create library
libname(dat, tmp)
# library 'dat': 2 items
# - attributes: not loaded
# - path: C:\Users\User\AppData\Local\Temp\RtmpCSJ6Gc
# - items:
# Name Extension Rows Cols Size LastModified
# 1 rocks rds 48 4 3.1 Kb 2020-11-05 23:25:34
# 2 trees rds 31 3 2.4 Kb 2020-11-05 23:25:34
# Load library
lib_load(dat)
# Examine workspace
ls()
# [1] "dat" "dat.rocks" "dat.trees" "tmp"
# Unload the library from memory
lib_unload(dat)
# Examine workspace again
ls()
# [1] "dat" "tmp"
#rawr's response works perfectly:
df.list <- mget(l.my.package$results[, 'Item'], inherits = TRUE)

Retaining original file names when processing multiple raster files using R

I have the following problem: I need to process multiple raster files using the same function in R package landscapemetrics. Basically my raster files are parts of a country map, all of the same shape and size (i.e. quadrants. I figured out a code for 1 file, but I have to do the same with more than 600 rasters. So, doing it manually is very irrational. The steps in my code are the following:
# 1. I load "raster" and "landscapemetrics" packages:
library(raster)
library(landscapemetrics)
# 2. I read in my quadrant:
Quadrant <- raster("C:\\Users\\customer\\Documents\\ ... \\2434-44.tif")
# 3. I process the raster to get landscape metrics tibble:
LS_metrics <- calculate_lsm(landscape = Quadrant)
# 4. Finally, I write it into a csv:
write.csv(LS_metrics, file = "2434-44.csv")
I need to keep the same file name for my csv files as I had for tif (e.g. results from processing quadrant "2434-44.tif", need to be stored in "2434-44.csv", possibly in a folder in wd).
I am new to R. I tried to use list.files() and then apply a for loop, but my code did not work.
I need your advice.
Yours faithfully,
Denis
Your question is really about iteration and character (filename) manipulation; not about landscapemetrics etc. There are many similar questions on this site and resources elsewhere that you can consult. The basic approach can be like this:
# get input filenames
inf <- list.files("/my/path", pattern="\\.tif$", full=TRUE)
# create output filenames
outf <- gsub(".tif", ".csv", basename(inf))
# perhaps put output files in particular folder
dir.create("out", FALSE, FALSE)
outf <- file.path("out", outf)
# iterate
for (i in 1:length(inf)) {
# read input
input <- raster(inf[i])
# do something
output <- data.frame(id=1)
# write output
write.csv(output, outf[i])
}
It's very hard to help without further information. What was the issue with your approach of looping through all files using list.files(). In general, this should work.
Furthermore, most likely you don't want to calculate all available landscape metrics, but rather specify a subselection during the calculate_lsm() function call.

cbind column in several csv files in r

I am new to R and dont know exactly how to do for loops.
Here is my problem: I have about 160 csv files in a folder, each with a specific name. In each file, there is a pattern:"HL.X.Y.Z.", where X="Region", Y="cluster", and Z="point". What i need to do is read all these csv files, extract strings from the names, create a column with the strings for each csv file, and bind all these csv files in a single data frame.
Here is some code of what i am trying to do:
setwd("C:/Users/worddirect")
files.names<-list.files(getwd(),pattern="*.csv")
files.names
head(files.names)
>[1] "HL.1.1.1.2F31CA.150722.csv" "HL.1.1.2.2F316A.150722.csv"
[3] "HL.1.1.3.2F3274.150722.csv" "HL.1.1.4.2F3438.csv"
[5] "HL.1.10.1.3062CD.150722.csv" "HL.1.10.2.2F343D.150722.csv"
Doing like this to read all files works just fine:
files.names
for (i in 1:length(files.names)) {
assign(files.names[i], read.csv(files.names[i],skip=18))
}
Adding an extra column for an individual csv files like this works fine:
test<-cbind("Region"=rep(substring(files.names[1],4,4),times=nrow(HL.1.1.1.2F31CA.150722.csv)),
"Cluster"=rep(substring(files.names[1],6,6),times=nrow(HL.1.1.1.2F31CA.150722.csv)),
"Point"=rep(substring(files.names[1],8,8),times=nrow(HL.1.1.1.2F31CA.150722.csv)),
HL.1.1.1.2F31CA.150722.csv)
head(test)
Region Cluster Point Date.Time Unit Value
1 1 1 1 6/2/14 11:00:01 PM C 24.111
2 1 1 1 6/3/14 1:30:01 AM C 21.610
3 1 1 1 6/3/14 4:00:01 AM C 20.609
However, a for loop of the above doesn`t work.
files.names
for (i in 1:length(files.names)) {
assign(files.names[i], read.csv(files.names[i],skip=18))
cbind("Region"=rep(substring(files.names[i],4,4),times=nrow(i)),
"Cluster"=rep(substring(files.names[i],6,6),times=nrow(i)),
"Point"=rep(substring(files.names[i],8,8),times=nrow(i)),
i)
}
>Error in rep(substring(files.names[i], 4, 4), times = nrow(i)) :
invalid 'times' argument
The final step would be to bind all the csv files in a single data frame.
I appreciate any suggestion. If there is any simpler way to do what i did i appreciate too!
There are many ways to solve a problem in R. A more R-like way to solve this problem is with an apply() function. The apply() family of functions acts like an implied for loop, applying one or more operations to each item in passed to it via a function argument.
Another important feature of R is the anonymous function. Combining lapply() with an anonymous function we can solve your multi file read problem.
setwd("C:/Users/worddirect")
files.names<-list.files(getwd(),pattern="*.csv")
# read csv files and return them as items in a list()
theList <- lapply(files.names,function(x){
theData <- read.csv(x,skip=18)
# bind the region, cluster, and point data and return
cbind(
"Region"=rep(substring(x,4,4),times=nrow(theData)),
"Cluster"=rep(substring(x,6,6),times=nrow(theData)),
"Point"=rep(substring(x,8,8),times=nrow(theData)),
theData)
})
# rbind the data frames in theList into a single data frame
theResult <- do.call(rbind,theList)
regards,
Len
i is number, which doesn't have nrow property.
You can use following code
result = data.frame()
for (i in 1:length(files.names)) {
assign(files.names[i], read.csv(files.names[i],skip=18))
result = rbind(
cbind(
"Region"=rep(substring(files.names[i],4,4),times=nrow(files.names[i])),
"Cluster"=rep(substring(files.names[i],6,6),times=nrow(files.names[i])),
"Point"=rep(substring(files.names[i],8,8),times=nrow(files.names[i])),
files.names[i]))
}

How to programmatically get header information of dataset from UCI data repository in R

I am trying to collect publicly available datasets from UCI repository for R. I understand there are lots of datasets already usable with several R packages such as mlbench. But there are still several datasets I will need from UCI repository.
This is a trick I learned
url="http://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.data"
credit<-read.csv(url, header=F)
But this does not get header (variable name) information. That information is in *.names file in text format. Any idea how I can programmatically get header information as well?
I suspect you'll have to use regular expressions to accomplish this. Here's an ugly, but general solution that should work on a variety of *.names files, assuming their formats are similar to the one you posted.
names.file.url <-'http://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.names'
names.file.lines <- readLines(names.file.url)
# get run lengths of consecutive lines containing a colon.
# then find the position of the subgrouping that has a run length
# equal to the number of columns in credit and sum the run lengths up
# to that value to get the index of the last line in the names block.
end.of.names <- with(rle(grepl(':', names.file.lines)),
sum(lengths[1:match(ncol(credit), lengths)]))
# extract those lines
names.lines <- names.file.lines[(end.of.names - ncol(credit) + 1):end.of.names]
# extract the names from those lines
names <- regmatches(names.lines, regexpr('(\\w)+(?=:)', names.lines, perl=TRUE))
# [1] "A1"  "A2"  "A3"  "A4"  "A5"  "A6"  "A7"  "A8"  "A9"  "A10" "A11"
# [12] "A12" "A13" "A14" "A15" "A16"
I'm guessing that Attribute Information must be the names in the specific file you pointed. Here is a very, very dirty solution to do that. I use a fact that there is a pattern - your names are followed by : so we separte the strings of characters by : using scan, and then grab names from raw vector:
url="http://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.data"
credit<-read.csv(url, header=F)
url.names="http://archive.ics.uci.edu/ml/machine-learning-databases/credit-screening/crx.names"
mess <- scan(url.names, what="character", sep=":")
#your names are located from 31 to 61, every second place in the vector
mess.names <- mess[seq(31,61,2)]
names(credit) <- mess.names

Resources