Looping through list and concatenating strings in R (Syntax) - r

I'm trying to create a loop that goes from 1 to 14. Each integer in this loop would be added to the end of the name of a newly created dataframe. Then, it should search a column in an existing dataframe based on the concatenation of a number and text. I've been searching for hours but cannot find a solution.
What I mean is:
while (i <= 14) {
"newDF" + i <- oldDf %>%
filter(str_detect(ColumnName, "TEXT" + i)
}
The new dataframes should look like this:
newDF1,newDF2... newDF14
They should be created based on a concatenated string (text + i):
text1,text2..text14
My first challenge is to create a new dataframe based on the concatenation of text and i. I've tried using the str_c command and the str_glue command but get the following error message.
Error in str_c("newDF", i)) <- oldDF:
target of assignment expands to non-language object
Error in str_glue("newDF{i}") <- oldDF:
target of assignment expands to non-language object

The major problem with your code above is that you can't have any operations to the left of your assignment operator.
for (i in 1:14){
assign(str_glue("newDF{i}"), oldDF %>%
filter(str_detect(ColumnName, str_glue("TEXT{i}"))))
}
So technically, this would work even though I feel like there's a better way to do this either with nested lists or using spread and gather. I would say more, but I don't have enough context to solve the problem.

Related

Combining many vectors into one larger vector (in an automated way)

I have a list of identifiers as follows:
url_num <- c('85054655', '85023543', '85001177', '84988480', '84978776', '84952756', '84940316', '84916976', '84901819', '84884081', '84862066', '84848942', '84820189', '84814935', '84808144')
And from each of these I'm creating a unique variable:
for (id in url_num){
assign(paste('test_', id, sep = ""), FUNCTION GOES HERE)
}
This leaves me with my variables which are:
test_8505465, test_85023543, etc, etc
Each of them hold the correct output from the function (I've checked), however my next step is to combine them into one big vector which holds all of these created variables as a seperate element in the vector. This is easy enough via:
c(test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144)
However, as I update the original 'url_num' vector with new identifiers, I'd also have to come down to the above chunk and update this too!
Surely there's a more automated way I can setup the above chunk?
Maybe some sort of concat() function in the original for-loop which just adds each created variable straight into an empty vector right then and there?
So far I've just been trying to list all the variable names and somehow get the output to be in an acceptable format to get thrown straight into the c() function.
for (id in url_num){
cat(as.name(paste('test_', id, ",", sep = "")))
}
...which results in:
test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144,
This is close to the output I'm looking for but because it's using the cat() function it's essentially a print statement and its output can't really get put anywhere. Not to mention I feel like this method I've attempted is wrong to begin with and there must be something simpler I'm missing.
Thanks in advance for any help you guys can give me!
Troy

Iterate conditional count across list items R

I am attempting to count all the instances across a list of data frames where a certain variable is over a given value. I have tried to do it as so:
for (name in myList){
nrow(subset(myList[[name]], var >=6))
}
as I found here: http://www.statisticsblog.com/2010/03/r-tip-iterating-over-list/
However, I get the following error:
Error in myList[[name]] : invalid subscript type 'list'
I know that nrow works because I have used it on a specific list item outside of the loop and it succeeded. I can't seem to figure out why the error is arising. The list names are set up as so:
myList$`i.j.k`
with i, j, and k each taking on a different numerical value. I generated the list as so from a data frame read in from a .csv file:
myList <- split(data, f=list(data$i, data$j, data$k))
What is causing the error? Or, is there a better way to do a conditional count across all list elements (there are 2000+ of them, so any non-loop way would be ideal). Thanks!
I figured it out thanks to the comment from #PoGibas:
Rather than
for (name in myList){
nrow(subset(myList[[name]], var >=6))
}
it should be:
for (name in myList){
nrow(subset(name, var >=6))
}

How to search for a specific column name in data

So this is a bit strange, I am pretty new to R and facing this weird problem.
I have a data frame, and there is a column called SNDATE which is a combined value of two different columns.
I want to check if the data frame has a column named SN, if it doesn't I will split SNDATE to fill the SN column.
Here is the code
if(!('SN' %in% colnames(data))){
#do some spliting here
}
Funny thing is, it keeps saying it's there, and the stuff in it never gets triggered.
And when I do this:
print(data$SN)
It will print the value of data$SNDATE. So does R have some sort of lazy name filling or something? This is very strange to me.
Thank you very much for the help
When you do
print(data$SN)
it works because $ is using partial name matching. For another example, try
mtcars$m
There is no column named m, so $ partially matches mpg. Unfortunately, this is not used in %in%, so you will need to use the complete exact column name in
if(!('SNDATE' %in% colnames(data))){
#do some spliting here
}
You could insead use something along the lines of pmatch()
names(mtcars)[2] <- "SNDATE"
names(mtcars)[pmatch("SN", names(mtcars))]
# [1] "SNDATE"
So the if() statement might go something like this -
nm <- colnames(data)
if(!nm[pmatch("SN", nm)] %in% nm) {
...
}
Or even
if(is.na(pmatch("SN", names(data)))
might be better

R Loop error using character

I have the below function which inserts a row into a table (new_scores) based upon the attribute that I feed into it (where the attribute represents a table that I select things from):
buildNewScore <- function(x) {
something <- bind_rows(new_scores,x%>%select(ATT,ADJLOGSCORE))
return(something)
}
Which works fine when I define x.
But when I try to create a for loop that feeds the rest of my attributes into the function it falls over as I'm feeding in a character.
attlist <- c('Z','Y','X','W','V','U','T','RT','RO')
record_count <- length(attlist)
for (x in c(1:record_count)){
buildNewScore(attlist[x])
}
I've tried to convert the attribute into other classes but I can't get the loop to use anything I change it to (name, data.frame etc.).
Anyone have any ideas as to where I'm going wrong - is my attlist vector in the wrong format?
Thanks,
Spikelete.

how to convert a charcter string to a name that accepts data (data frame name) in R

I have stored a list of names as characters and want to convert them to something that can be accepted as data frame name. something like this:
for (i in 1:18) {
str[i] <- paste("alert_month_amount_",i,sep="")
}
name_str = as.character(str)
then name_str will be:
name_str[1] would be "alert_month_amount_1"
now i want to assign certain data to a data frame that uses name_str[i] inside a loop like:
for (n in 1:18){
name_str[n] <- subset(by_Month_Acct_Num,month==month_index[n] & year==year_index[n])
}
but this does not work perhaps because the names are passed as characters inside double quotation mark ("). I would appreciate your help.
You can use assign for this:
assign(name_str[n], subset(by_Month_Acct_Num,month==month_index[n] & year==year_index[n]))
This is FAQ 7.21. The most important part of that answer is the end where it says (like #MrFlick) that it is better to use a list. You really should learn how to take advantage of R's vectorized functions.
The paste and paste0 functions are both vectorized, so your first bit of code can be replaced with:
name_str <- paste0("alert_month_amount_", 1:18)
without need for the loop.
You could create your list and fill it with code like:
alert_month_amount <- list()
for(i in 1:18) {
alert_month_amount[[i]] <- subset(by_Month_Acct_Num,month==month_index[n] & year==year_index[n])
}
Or possibly even easier using the split function. You could also use lapply or mapply.
If you want the elements named then just do:
names(alert_month_amount) <- name_str
Now with everything in a single list you can copy, save, delete, etc. one object rather than needing another loop to do each individual piece. If you want to do the same thing (calculate a summary, fit a regression, etc.) on each piece created then with everything in a list you can just use lapply or sapply on the list rather than having to create another loop and figuring out how to grab each piece in the loop and save it to an output object.

Resources