R: Creating new variables in a for loop and assign values - r

I would like to create a new variable, assign a list of values, and write into a hierarchical data frame. I tried the below but it is not writing in.
for(i in 1:sample){
for(j in 1:10){
x[,j]<-0
name <- paste("hierdata[[i]]$Test", j, sep = "_")
assign(name, rowSums(alpha+beta+x)))
}
}
Appreciate some help.

To address #Moody_Mudskipper 's comment:
The best way to declare hierarchical data is to use indexed lists:
hierdata <- lapply(1:sample,
function(iterator)
{
temp_list <- lapply(1:10,
function(j)
{
x[,j]<-0
value <- rowSums(alpha+beta+x)
return(value)
})
names(temp_list) <- lapply(1:10,function(j){paste0("temp_",j)})
return(temp_list)
})
not really a "one liner", but it contains all good "stuff". Lapply which returns list by default so it just nests list in each list.
Hope you enjoy some "good practice". :)
I just wanted to address your question precisely in my first try.

Try using
for(i in 1:sample){
hierdata[[i]] <- list()
for(j in 1:10){
code_j_init <- paste0("hierdata[[",i,"]]$Test_",j,"<- list()")
eval(parse(text = code_j_init))
hierdata[[i]][[j]] <- list(1,2,3)
}
}
or
for(i in 1:sample){
hierdata[[i]] <- list()
names <- c()
for(j in 1:10){
hierdata[[i]][[j]] <- list(1,2,3)
names <- c(names,paste0("Test_",j))
}
names(hierdata[[i]]) <- names
}

Related

Usage of iteration element within name pattern

I would like to iterate with a for loop trough a list applying the following function to all list elements:
new_x = do.call("rbind",mget(ls(pattern = "^x.*")))
where x is a certain name pattern of a dataframe.
How do I iterate through a list where the list element i is the name pattern for my function?
The goal would be to get something like this:
for (i in filenames){
i = do.call("rbind",mget(ls(pattern = "^i.*")))
}
So my question is basically how to use i within a name pattern, so I'm able to use the loop to rbind togerther seperate parts of a dataframe xpart1, xpart2, xpart3 to x; ypart1, ypart2, ypart3 to y and so on....
Thank you in advance!
If we are using a for loop, then an option
v1 <- ls(pattern = "^x.*")
lst1 <- vector('list', length(v1))
for(i in seq_along(v1)){
lst1[[i]] <- get(v1[i])
}
do.call(rbind, lst1)
Or if we need to use i to create the pattern, we can use paste
lst1 <- vector('list', length(filenames))
names(lst1) <- filenames
for(i in filenames){
lst1[[i]] <- get(ls(pattern = paste0(i, ".*")))
}
do.call(rbind, lst1)
NOTE: get returns the value of a single object, whereas mget returns more than one object in a list. If we use for loop, we assume that it is returning one object within the loop and get is only needed
Based on the OP's clarification, we can also use mget
xs <- paste0("xpart", 1:100)
ys <- paste0("ypart", 1:100)
xsdat <- do.call(rbind, mget(xs))
ysdat <- do.call(rbind, mget(ys))

How can I tell R to apply functions to multiple data?

I have been stacking this work for quite long time, tried different approaches but couldn't succeed.
what I want is to apply following 4 functions to 30 different data (data1,2,3,...data30) within for loop or whatsoever in R. These datasets have same (10) column numbers and different rows.
This is the code I wrote for first data (data1). It works well.
for(i in 1:nrow(data1)){
data1$simp <-diversity(data1$sp, "simpson")
data1$shan <-diversity(data1$sp, "shannon")
data1$E <- E(data1$sp)
data1$D <- D(data1$sp)
}
I want to apply this code for other 29 data in order not to repeat the process 29 times.
Following code what I am trying to do now. But still not right.
data.list <- list(data1, data2,data3,data4,data5)
for(i in data.list){
data2 <- NULL
i$simp <-diversity(i$sp, "simpson")
i$shan <-diversity(i$sp, "shannon")
i$E <- E(i$sp)
i$D <- D(i$sp)
data2 <- rbind(data2, i)
print(data2)
}
So I wanna ask how I can tell R to apply functions to other 29 data?
Thanks in advance!
You can do this with Map.
fun <- function(DF){
for(i in 1:nrow(DF)){
DF$simp <-diversity(DF$sp, "simpson")
DF$shan <-diversity(DF$sp, "shannon")
DF$E <- E(DF$sp)
DF$D <- D(DF$sp)
}
DF
}
result.list <- Map(fun, data.list)
Or, if you don't want to have a function fun in the .GlobalEnv, with lapply.
result.list <- lapply(data.list, function(DF){
for(i in 1:nrow(DF)){
DF$simp <-diversity(DF$sp, "simpson")
DF$shan <-diversity(DF$sp, "shannon")
DF$E <- E(DF$sp)
DF$D <- D(DF$sp)
}
DF
})
If I understand the question, it you're ultimately asking about your 'data2' variable and how to merge these all together? I think the issue you're having is that you're setting data2 <- NULL with each loop iteration. The proposed solution below moves this definition outside the loop and the call to rbind() should now append all your data frames together to return the consolidated dataset.
data.list <- list(data1, data2,data3,data4,data5) #all 29 can go here
data2 <- NULL
for(i in data.list){
i$simp <-diversity(i$sp, "simpson")
i$shan <-diversity(i$sp, "shannon")
i$E <- E(i$sp)
i$D <- D(i$sp)
data2 <- rbind(data2, i)
}
print(data2)
I am assuming that your data1, ..., dataN are files stored in a directory and you're reading them one at a time. Also they have the same header.
What you can do is to import them one at a time and then perform the operations you want, as you mentioned:
files <- list.files(directoryPath) #maybe you can grep() some specific files
for (f in files){
data <- read.table(f) #choose header, sep and so on...
for(i in 1:nrow(data)){
data$simp <-diversity(data$sp, "simpson")
data$shan <-diversity(data$sp, "shannon")
data$E <- E(data$sp)
data$D <- D(data$sp)
}
}
be careful that you must be in the working directory or you must add a path to the filename while reading the tables (i.e. paste(path, f, sep=""))
There are plenty of options, here's one using only base functions:
data.list <- list(data1, data2, data3, data4, data5)
changed_data <- lapply(data.list, function(my_data) {
my_data$simp <-diversity(my_data$sp, "simpson")
my_data$shan <-diversity(my_data$sp, "shannon")
my_data$E <- E(my_data$sp)
my_data$D <- D(my_data$sp)
my_data})

Web scraping data in R when there are a lot of links

I am trying to scrape baseball data from baseball-reference (e.g., https://www.baseball-reference.com/teams/NYY/2017.shtml). I have a huge vector of URLS that I created using a for loop, since the links follow a specific pattern. However, I am having trouble running my code, probably because I have to make too many connections within R. There are over 17000 elements in my vector, and my code stops working once it gets to around 16000. Is there an easier and perhaps a more efficient way to replicate my code?
require(Lahman)
teams <- unique(Teams$franchID)
years <- 1871:2017
urls <- matrix(0, length(teams), length(years))
for(i in 1:length(teams)) {
for(j in 1:length(years)) {
urls[i, j] <- paste0("https://www.baseball-reference.com/teams/",
teams[i], "/", years[j], ".shtml")
}
}
url_vector <- as.vector(urls)
list_of_batting <- list()
list_of_pitching <- list()
for(i in 1:length(url_vector)) {
url <- url_vector[i]
res <- try(readLines(url), silent = TRUE)
## check if website exists
if(inherits(res, "try-error")) {
list_of_batting[[i]] <- NA
list_of_pitching[[i]] <- NA
}
else {
urltxt <- readLines(url)
urltxt <- gsub("-->", "", gsub("<!--", "", urltxt))
doc <- htmlParse(urltxt)
tables_full <- readHTMLTable(doc)
tmp1 <- tables_full$players_value_batting
tmp2 <- tables_full$players_value_pitching
list_of_batting[[i]] <- tmp1
list_of_pitching[[i]] <- tmp2
}
print(i)
closeAllConnections()
}

R Function with for Loops creates several dataframes, how to have each one have a different name

I have a for loop that loops through a list of urls,
url_list <- c('http://www.irs.gov/pub/irs-soi/04in21id.xls',
'http://www.irs.gov/pub/irs-soi/05in21id.xls',
'http://www.irs.gov/pub/irs-soi/06in21id.xls',
'http://www.irs.gov/pub/irs-soi/07in21id.xls',
'http://www.irs.gov/pub/irs-soi/08in21id.xls',
'http://www.irs.gov/pub/irs-soi/09in21id.xls',
'http://www.irs.gov/pub/irs-soi/10in21id.xls',
'http://www.irs.gov/pub/irs-soi/11in21id.xls',
'http://www.irs.gov/pub/irs-soi/12in21id.xls',
'http://www.irs.gov/pub/irs-soi/13in21id.xls',
'http://www.irs.gov/pub/irs-soi/14in21id.xls',
'http://www.irs.gov/pub/irs-soi/15in21id.xls')
dowloads an excel file from each one assigns it to a dataframe and performs a set of data cleaning operations on it.
library(gdata)
for (url in url_list){
test <- read.xls(url)
cols <- c(1,4:5,97:98)
test <- test[-(1:8),cols]
test <- test[1:22,]
test <- test[-4,]
test$Income <-test$Table.2.1...Returns.with.Itemized.Deductions..Sources.of.Income..Adjustments..Itemized.Deductions.by.Type..Exemptions..and.Tax..Items..by.Size.of.Adjusted.Gross.Income..Tax.Year.2015..Filing.Year.2016.
test$Total_returns <- test$X.2
test$return_dollars <- test$X.3
test$charitable_deductions <- test$X.95
test$charitable_deduction_dollars <- test$X.96
test[1:5] <- NULL
}
My problem is that the loop simply writes over the same dataframe for each iteration through the loop. How can I have it assign each iteration through the loop to a data frame with a different name?
Use assign. This question is a duplicate of this post: Change variable name in for loop using R
For your particular case, you can do something like the following:
for (i in 1:length(url_list)){
url = url_list[i]
test <- read.xls(url)
cols <- c(1,4:5,97:98)
test <- test[-(1:8),cols]
test <- test[1:22,]
test <- test[-4,]
test$Income <-test$Table.2.1...Returns.with.Itemized.Deductions..Sources.of.Income..Adjustments..Itemized.Deductions.by.Type..Exemptions..and.Tax..Items..by.Size.of.Adjusted.Gross.Income..Tax.Year.2015..Filing.Year.2016.
test$Total_returns <- test$X.2
test$return_dollars <- test$X.3
test$charitable_deductions <- test$X.95
test$charitable_deduction_dollars <- test$X.96
test[1:5] <- NULL
assign(paste("test", i, sep=""), test)
}
You could write to a list:
result_list <- list()
for (i_url in 1:length(url_list)){
url <- url_list[i_url]
...
result_list[[i_url]] <- test
}
You can also name the list
names(result_list) <- c("df1","df2","df3",...)
Here's another approach with lapply instead of for loops which will write all resulting data.frames as separate list items which can then be re-named (if needed).
url_list <- c('http://www.irs.gov/pub/irs-soi/04in21id.xls',
...
'http://www.irs.gov/pub/irs-soi/15in21id.xls')
readURLFunc <- function(z){
test <- readxl::read_xls(z)
...
test[1:5] <- NULL
return(test)}
data_list <- lapply(url_list, readURLFunc)

How to save results from for loop on list into a new list under "i" vector name?

I have the following code:
final_results <- list()
myfunc <- function(v1) {
deparse(substitute(v1))
}
for (i in mylist) {
...calculations...
tmp_results <- as.data.frame(cbind(effcrs,weights))
colnames(tmp_results) <- c('efficiency',names(inputs),
names(outputs)) # header
rownames(tmp_results) <- namesDMU[,1]
#Save to list
name_in_list <- myfunc(i)
dea_results[[name_in_list]] <- tmp_results
}
The above code loops through a list of data frames. I would like each result yielded from the loop to be stored in a separate list under the same name as the original file obtained from mylist or i
I tried using the deparse substitute. when i apply it to an individual item in mylist it looks like this:
myfunc(standard_DEA$'2010-11-11')
[1] "standard_DEA$\"2010-11-11\""
I don't know what the issue is. At the moment it saves everything under the name "i" and replaces all vectors so the end result is a list of 1.
Thank you in advance
This looks like you want a do loop.
library(dplyr)
function_which_returns_dataframe = function(i) {
...calculations...
tmp_results <- as.data.frame(cbind(effcrs,weights))
colnames(tmp_results) <- c('efficiency',names(inputs),
names(outputs)) # header
rownames(tmp_results) <- namesDMU[,1]
tmp_results
}
data_frame(mylist = mylist,
name = names(mylist)) %>%
group_by(mylist, name) %>%
do(function_which_returns_dataframe(.$mylist[[1]]))

Resources