Array of tables - r

I want to write some R tables into an excel file. So I have the follow?
data <- list.files(path=getwd())
n <- length(list)
for (i in 1:n)
{
data1 <- read.csv(data[i])
outline <- data1[,2]
outline <- as.table(outline)
print(outline) # this prints all n tables
write.csv(outline, 'Test.csv') # this only writes the last table
}
But I only get the last file written into the csv file. Not all of them. How would I fix this?

your writing to test.csv every time. So you keep over writing files. You need to change the filename for each step to keep the different files.
try:
data <- list.files(path=getwd())
n <- length(list)
for (i in 1:n)
{
data1 <- read.csv(data[i])
outline <- data1[,2]
outline <- as.table(outline)
print(outline) # this prints all n tables
name <- paste0(i,"X.csv")
write.csv(outline, name)
}
Looking at your code perhaps you want this instead:
data <- list.files(path=getwd())
n <- length(list)
for (i in 1:n)
{
data1 <- read.csv(data[i])
outline <- data1[,2]
outline <- as.data.frame(table(outline))
print(outline) # this prints all n tables
name <- paste0(i,"X.csv")
write.csv(outline, name)
}

Related

How can I tell R to apply functions to multiple data?

I have been stacking this work for quite long time, tried different approaches but couldn't succeed.
what I want is to apply following 4 functions to 30 different data (data1,2,3,...data30) within for loop or whatsoever in R. These datasets have same (10) column numbers and different rows.
This is the code I wrote for first data (data1). It works well.
for(i in 1:nrow(data1)){
data1$simp <-diversity(data1$sp, "simpson")
data1$shan <-diversity(data1$sp, "shannon")
data1$E <- E(data1$sp)
data1$D <- D(data1$sp)
}
I want to apply this code for other 29 data in order not to repeat the process 29 times.
Following code what I am trying to do now. But still not right.
data.list <- list(data1, data2,data3,data4,data5)
for(i in data.list){
data2 <- NULL
i$simp <-diversity(i$sp, "simpson")
i$shan <-diversity(i$sp, "shannon")
i$E <- E(i$sp)
i$D <- D(i$sp)
data2 <- rbind(data2, i)
print(data2)
}
So I wanna ask how I can tell R to apply functions to other 29 data?
Thanks in advance!
You can do this with Map.
fun <- function(DF){
for(i in 1:nrow(DF)){
DF$simp <-diversity(DF$sp, "simpson")
DF$shan <-diversity(DF$sp, "shannon")
DF$E <- E(DF$sp)
DF$D <- D(DF$sp)
}
DF
}
result.list <- Map(fun, data.list)
Or, if you don't want to have a function fun in the .GlobalEnv, with lapply.
result.list <- lapply(data.list, function(DF){
for(i in 1:nrow(DF)){
DF$simp <-diversity(DF$sp, "simpson")
DF$shan <-diversity(DF$sp, "shannon")
DF$E <- E(DF$sp)
DF$D <- D(DF$sp)
}
DF
})
If I understand the question, it you're ultimately asking about your 'data2' variable and how to merge these all together? I think the issue you're having is that you're setting data2 <- NULL with each loop iteration. The proposed solution below moves this definition outside the loop and the call to rbind() should now append all your data frames together to return the consolidated dataset.
data.list <- list(data1, data2,data3,data4,data5) #all 29 can go here
data2 <- NULL
for(i in data.list){
i$simp <-diversity(i$sp, "simpson")
i$shan <-diversity(i$sp, "shannon")
i$E <- E(i$sp)
i$D <- D(i$sp)
data2 <- rbind(data2, i)
}
print(data2)
I am assuming that your data1, ..., dataN are files stored in a directory and you're reading them one at a time. Also they have the same header.
What you can do is to import them one at a time and then perform the operations you want, as you mentioned:
files <- list.files(directoryPath) #maybe you can grep() some specific files
for (f in files){
data <- read.table(f) #choose header, sep and so on...
for(i in 1:nrow(data)){
data$simp <-diversity(data$sp, "simpson")
data$shan <-diversity(data$sp, "shannon")
data$E <- E(data$sp)
data$D <- D(data$sp)
}
}
be careful that you must be in the working directory or you must add a path to the filename while reading the tables (i.e. paste(path, f, sep=""))
There are plenty of options, here's one using only base functions:
data.list <- list(data1, data2, data3, data4, data5)
changed_data <- lapply(data.list, function(my_data) {
my_data$simp <-diversity(my_data$sp, "simpson")
my_data$shan <-diversity(my_data$sp, "shannon")
my_data$E <- E(my_data$sp)
my_data$D <- D(my_data$sp)
my_data})

R Function with for Loops creates several dataframes, how to have each one have a different name

I have a for loop that loops through a list of urls,
url_list <- c('http://www.irs.gov/pub/irs-soi/04in21id.xls',
'http://www.irs.gov/pub/irs-soi/05in21id.xls',
'http://www.irs.gov/pub/irs-soi/06in21id.xls',
'http://www.irs.gov/pub/irs-soi/07in21id.xls',
'http://www.irs.gov/pub/irs-soi/08in21id.xls',
'http://www.irs.gov/pub/irs-soi/09in21id.xls',
'http://www.irs.gov/pub/irs-soi/10in21id.xls',
'http://www.irs.gov/pub/irs-soi/11in21id.xls',
'http://www.irs.gov/pub/irs-soi/12in21id.xls',
'http://www.irs.gov/pub/irs-soi/13in21id.xls',
'http://www.irs.gov/pub/irs-soi/14in21id.xls',
'http://www.irs.gov/pub/irs-soi/15in21id.xls')
dowloads an excel file from each one assigns it to a dataframe and performs a set of data cleaning operations on it.
library(gdata)
for (url in url_list){
test <- read.xls(url)
cols <- c(1,4:5,97:98)
test <- test[-(1:8),cols]
test <- test[1:22,]
test <- test[-4,]
test$Income <-test$Table.2.1...Returns.with.Itemized.Deductions..Sources.of.Income..Adjustments..Itemized.Deductions.by.Type..Exemptions..and.Tax..Items..by.Size.of.Adjusted.Gross.Income..Tax.Year.2015..Filing.Year.2016.
test$Total_returns <- test$X.2
test$return_dollars <- test$X.3
test$charitable_deductions <- test$X.95
test$charitable_deduction_dollars <- test$X.96
test[1:5] <- NULL
}
My problem is that the loop simply writes over the same dataframe for each iteration through the loop. How can I have it assign each iteration through the loop to a data frame with a different name?
Use assign. This question is a duplicate of this post: Change variable name in for loop using R
For your particular case, you can do something like the following:
for (i in 1:length(url_list)){
url = url_list[i]
test <- read.xls(url)
cols <- c(1,4:5,97:98)
test <- test[-(1:8),cols]
test <- test[1:22,]
test <- test[-4,]
test$Income <-test$Table.2.1...Returns.with.Itemized.Deductions..Sources.of.Income..Adjustments..Itemized.Deductions.by.Type..Exemptions..and.Tax..Items..by.Size.of.Adjusted.Gross.Income..Tax.Year.2015..Filing.Year.2016.
test$Total_returns <- test$X.2
test$return_dollars <- test$X.3
test$charitable_deductions <- test$X.95
test$charitable_deduction_dollars <- test$X.96
test[1:5] <- NULL
assign(paste("test", i, sep=""), test)
}
You could write to a list:
result_list <- list()
for (i_url in 1:length(url_list)){
url <- url_list[i_url]
...
result_list[[i_url]] <- test
}
You can also name the list
names(result_list) <- c("df1","df2","df3",...)
Here's another approach with lapply instead of for loops which will write all resulting data.frames as separate list items which can then be re-named (if needed).
url_list <- c('http://www.irs.gov/pub/irs-soi/04in21id.xls',
...
'http://www.irs.gov/pub/irs-soi/15in21id.xls')
readURLFunc <- function(z){
test <- readxl::read_xls(z)
...
test[1:5] <- NULL
return(test)}
data_list <- lapply(url_list, readURLFunc)

R iterate to read csv files

pollutantmean <- function(id){
n <- length(id)
for (i in 1 : n){
pol <- read.csv('id[i].csv')
}
}
pollutantmean(150:160)
The filenames of csv are like 001.csv, 002.csv, 100.csv etc
001, 002 and 100, these are id, and each csv has a column of id whose content is 1 if the filename is 001.
When I run this code, the console remind me this is no such file id[i].csv
First of all, you don't need a loop. And second, you need to think about how to represent ids.
ids <- sprintf("%03i", 1:999) # 0's are padded at the beginning
filenames <- paste0(ids, ".csv")
results <- lapply(filenames, read.csv) # you get a list of data frames
Alternatively you can read in all csv files in a certain folder using, say:
results <- lapply(dir(pattern="\\.csv$"), read.csv)
The "\.csv$" stuff means that ".csv" has to be at the end of the filename. (see ?regexpr for technicalities)
... and a function that takes a number and gives you back a data frame would look like this:
read.this <- function(i) read.csv(sprintf("%003i.csv",i))
... And now you can lapply it to your desired range:
lapply(101:150, read.this)
The first problem is line 4 and it should be replaced by
pol <- read.csv(paste0(id[i], ".csv"))
If id[i] is within quotes (either simple or double), it's understood litterally by read.csv, eg the function is looking for something named id[i].csv and which explains your error message.
But with such function, pol will be overwritten anyway at every step anyway.
If you really want to wrapup these lines into a function you need to return a list:
pollutantmean <- function(id){
res <- vector("list", length(id))
for (i in 1:n){
res[[i]] <- read.csv(paste0(id[i], ".csv"))
}
}
But a loop here would not be very elegant here, so we can simply:
pollutantmean <- function(id){
lapply(id, function(i) read.csv(paste0(i, ".csv"))
}
Or even (no function option) this should work:
lapply(id, function(i) read.csv(paste0(i, ".csv"))

Referring to 'i'th element of list in R

I've got a problem in R.
I have loaded files from folder (as filelist) using this method:
ff <- list.files(path=" ", full.names=TRUE)
myfilelist <- lapply(ff, read.table)
names(myfilelist) <- list.files(path=" ", full.names=FALSE)
In myfilelist I have dataframe name as: A1.txt, A2.txt, A3.txt.. etc
Now I would like to use the 'i'th element of list to change my data, for example
with each data frame delete rows the sum of which = 0.
I tried:
A1 <- A1[which(rowSums(A1) > 0),]
and it works.
How can I do it for all A[i] at once?
Try this code:
lapply(myfilelist, function(x) {
x <- x[which(rowSums(x) > 0),]
return(x)
})

How to loop through a list of strings in r

I have some bloated code in R which I'm trying to streamline. I'm trying to read spreadsheets into a dataframe and then transpose each one.
I have a list as follows
var <- c("amp_genes.annotated.BLCA.txt","amp_genes.annotated.BRCA.txt")
for (i in var) {
var[i] <- readWorksheet(wk, sheet="var[i]", header=T)
var[i] <- as.data.frame(var[i])
var[i] <- t(var1[i][3:ncol(var1[i]),])
}
The sheet = line has to have double quotes around the string variable.
This just tells me I have an unexpected }
Maybe try this; not sure it will work as I don't have your spreadsheets, but give it a try and let me know... And maybe if it doesn't work right out, it can hopefully unblock you wherever you're stuck.
library(XLConnect)
wk <- loadWorkbook("workbookname.xls")
sheetnames <- getSheets(object = wk)
content.tr <- list()
# To access sheets by their names
for (sheetname in sheetnames) {
content <- readWorksheet(wk, sheet=sheetname, header=T)
content.tr[[sheetname]] <- t(content[3:ncol(content),])
}
# To access sheets by their position
for (pos in c(1,2) {
content <- readWorksheet(wk, sheet=i, header=T)
content.tr[[sheetname[i]]] <- t(content[3:ncol(content),])
}
To access the dataframes:
names(content.tr)
spreadsheet1 <- content.tr[[1]]
spreadsheet2 <- content.tr[[2]]

Resources