Adding columns to multiple dataframes via loop - r

I have several dataframes and would like to add columns with a loop. At the moment the code looks like this:
FR1$MONTH<-'2015-01'
FR2$MONTH<-'2015-02'
FR3$MONTH<-'2015-03'
FR4$MONTH<-'2015-04'
I have tried the following:
for (i in 1:12) {
assign(paste("FR",i,$,"MONTH",sep=""),paste("2015-",i,sep=""))
}
Unfortunatly it doesnt work.
Can anybody tell me what is wrong with my try, or even better: How to do this right as I suspect a loop isnt the best solution.

Well, one issue that would throw you an error is that the '$' should be within quotes within the first paste() call.
I would try, however:
eval(parse(text = paste0("FR", i, "$MONTH <- 2015-", i)))
within your loop. And you may want to use an ifelse() to get the 0 in the month when you need it.
And I second Colonel's comment about keeping your data.frames within some other data structure.

Related

Combining many vectors into one larger vector (in an automated way)

I have a list of identifiers as follows:
url_num <- c('85054655', '85023543', '85001177', '84988480', '84978776', '84952756', '84940316', '84916976', '84901819', '84884081', '84862066', '84848942', '84820189', '84814935', '84808144')
And from each of these I'm creating a unique variable:
for (id in url_num){
assign(paste('test_', id, sep = ""), FUNCTION GOES HERE)
}
This leaves me with my variables which are:
test_8505465, test_85023543, etc, etc
Each of them hold the correct output from the function (I've checked), however my next step is to combine them into one big vector which holds all of these created variables as a seperate element in the vector. This is easy enough via:
c(test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144)
However, as I update the original 'url_num' vector with new identifiers, I'd also have to come down to the above chunk and update this too!
Surely there's a more automated way I can setup the above chunk?
Maybe some sort of concat() function in the original for-loop which just adds each created variable straight into an empty vector right then and there?
So far I've just been trying to list all the variable names and somehow get the output to be in an acceptable format to get thrown straight into the c() function.
for (id in url_num){
cat(as.name(paste('test_', id, ",", sep = "")))
}
...which results in:
test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144,
This is close to the output I'm looking for but because it's using the cat() function it's essentially a print statement and its output can't really get put anywhere. Not to mention I feel like this method I've attempted is wrong to begin with and there must be something simpler I'm missing.
Thanks in advance for any help you guys can give me!
Troy

How to search for a specific column name in data

So this is a bit strange, I am pretty new to R and facing this weird problem.
I have a data frame, and there is a column called SNDATE which is a combined value of two different columns.
I want to check if the data frame has a column named SN, if it doesn't I will split SNDATE to fill the SN column.
Here is the code
if(!('SN' %in% colnames(data))){
#do some spliting here
}
Funny thing is, it keeps saying it's there, and the stuff in it never gets triggered.
And when I do this:
print(data$SN)
It will print the value of data$SNDATE. So does R have some sort of lazy name filling or something? This is very strange to me.
Thank you very much for the help
When you do
print(data$SN)
it works because $ is using partial name matching. For another example, try
mtcars$m
There is no column named m, so $ partially matches mpg. Unfortunately, this is not used in %in%, so you will need to use the complete exact column name in
if(!('SNDATE' %in% colnames(data))){
#do some spliting here
}
You could insead use something along the lines of pmatch()
names(mtcars)[2] <- "SNDATE"
names(mtcars)[pmatch("SN", names(mtcars))]
# [1] "SNDATE"
So the if() statement might go something like this -
nm <- colnames(data)
if(!nm[pmatch("SN", nm)] %in% nm) {
...
}
Or even
if(is.na(pmatch("SN", names(data)))
might be better

Writing a for loop in r

I don't know how to write for-loops in r. Here is what I want to do:
I have a df called "na" with 50 columns (ana1_1:ana50_1). I want to loop these commands over all columns. Here are the commands for the first two columns (ana1_1 and ana2_1):
t<-table(na$ana1_1)
ana1_1<-capture.output(sort(t))
cat(ana1_1,file="ana.txt",sep="\n",append=TRUE)
t<-table(na$ana2_1)
ana2_1<-capture.output(sort(t))
cat(ana2_1,file="ana.txt",sep="\n",append=TRUE)
After the loop, all tables (ana1_1:ana50_1) should be written in ana.txt Has anyone an idea, how to solve the problem? Thank you very much!
One approach would be to loop through the columns with lapply and using the same code as in the OP's post
invisible(lapply(na, function(x) {
x1 <- capture.output(sort(table(x)))
cat(x1, file='ana.txt', sep="\n", append=TRUE)
}))
Wrapping with invisible so that it won't print 'NULL' in the R console.
We can wrap with a condition to check if the file already exists so that it won't add the same lines by accidentally running the code again.
if(!file.exists('ana.txt')){
invisible( lapply(na, function(x) {
x1 <- capture.output(sort(table(x)))
cat(x1, file='ana.txt', sep="\n", append=TRUE)
}))
}
Here is a solution with a for loop. Loops tend to be slow in r so people prefer other solutions (e.g. the great answer provided by akrun). This answer is for your understanding of the loop syntax:
for(i in 1:50){
t1<-table(na[,i])
t2<-capture.output(sort(t1))
cat(t2,file="ana.txt",sep="\n",append=TRUE)
}
We are looping through i from 1 to 50 (first line). To select a column there's two (there's actually more than two, but that's for another time) ways to access it: na$ana1_1 or na[,1] both select the first column (second line). In the first case you refer by column name, in the second by column index. Here the second case is more convenient. The rest is your desired calculations.
Be aware that cat creates a new file if ana.txt is not existing yet and appends to it if it is already there.

In R, I am trying to make a for loop that will cycle through variable names and perform functions on them

I have variables that are named team.1, team.2, team.3, and so forth.
First of all, I would like to know how to go through each of these and assign a data frame to each one. So team.1 would have data from one team, then team.2 would have data from a second team. I am trying to do this for about 30 teams, so instead of typing the code out 30 times, is there a way to cycle through each with a counter or something similar?
I have tried things like
vars = list(sprintf("team.x%s", 1:33)))
to create my variables, but then I have no luck assigning anything to them.
Along those same lines, I would like to be able to run a function I made for cleaning and sorting the individual data sets on all of them at once.
For this, I have tried a for loop
for (j in 1:33) {
assign(paste("team.",j, sep = ""), cleaning1(paste("team.",j, sep =""), j))
}
where cleaning1 is my function, with two calls.
cleaning1(team.1, 1)
This produces the error message
Error in who[, -1] : incorrect number of dimensions
So obviously I am hoping the loop would count through my data sets, and also input my function calls and reassign my datasets with the newly cleaned data.
Is something like this possible? I am a complete newbie, so the more basic, the better.
Edit:
cleaning1:
cleaning1 = function (who, year) {
who[,-1]
who$SeasonEnd = rep(year, nrow(who))
who = (who[-nrow(who),])
who = tbl_df(who)
for (i in 1:nrow(who)) {
if ((str_sub(who$Team[i], -1)) == "*") {
who$Playoffs[i] = 1
} else {
who$Playoffs[i] = 0
}
}
who$Team = gsub("[[:punct:]]",'',who$Team)
who = who[c(27:28,2:26)]
return(who)
}
This works just fine when I run it on the data sets I have compiled myself.
To run it though, I have to go through and reassign each data set, like this:
team.1 = cleaning1(team.1, 1)
team.2 = cleaning1(team.2, 2)
So, I'm trying to find a way to automate that part of it.
I think your problem would be better solved by using a list of data frames instead of many variables containing one data frame each.
You do not say where you get your data from, so I am not sure how you would create the list. But assuming you have your data frames already stored in the variables team.1 etc., you could generate the list with
team.list <- list(team.1, team.2, ...,team.33)
where the dots stand for the variables that I did not write explicitly (you will have to do that). This is tedious, of course, and could be simplified as follows
team.list <- do.call(list,mget(paste0("team.",1:33)))
The paste0 command creates the variable names as strings, mget converts them to the actual objects, and do.call applies the list command to these objects.
Now that you have all your data in a list, it is much easier to apply a function on all of them. I am not quite sure how the year argument should be used, but from your example, I assume that it just runs from 1 to 33 (let me know, if this is not true and I'll change the code). So the following should work:
team.list.cleaned <- mapply(cleaning1,team.list,1:33)
It will go through all elements of team.list and 1:33 and apply the function cleaning1 with the elements as its arguments. The result will again be a list containing the output of each call, i.e.,
list( cleaning1(team.list[[1]],1), cleaning1(team.list[[2]],2), ...)
Since you are now to R I strongly recommend that you read the help on the apply commands (apply, lapply, tapply, mapply). There are very useful and once you got used to them, you will use them all the time...
There is probably also a simple way to directly generate the list of data frames using lapply. As an example: if the data frames are read in from files and you have the file names stored in a character vector file.names, then something along the lines of
team.list <- lapply(file.names,read.table)
might work.

In R, package xts, how would one iterate period subsetting over a list without throwing errors?

Assume:
list of n xts objects in .GlobalEnv with the suffix ".raw" (e.g: ABC.raw)
have created a list of .raw names in a list (ie, rawfiles <- ls(pattern="*.raw",envir=.GlobalEnv))
Would like to:
loop or lapply through rawfiles and subset a particular timeperiod in each iteration
for example, to write this as a single line would be: new <- ABC.raw["T09:00/T10:00"] if I wanted to subset ABC.raw from 9am to 10am each day.
The problem is:
Doesn't seem to be an easy way of passing["Thh:mm/Thh:mm"] to a loop, apply or assign without causing errors.
Any ideas how to pass this?
In pidgeon code, I guess I'm looking for a working equivalent of:
for(i in 1:length(raw)){
raw[i]["T09:00/T10:00"]
}
Many thanks in advance for any assistance on this.
Try get.
get(x) retrieves the variable whose name is stored in x, so foo<-1; get('foo') would return 1.
for ( rawname in rawfiles ) {
get(rawname)["T09:00/T10:00"]
}

Resources