Basic Apply in R dataframe (possible for loops as well) - r

I'm trying to do some work on a basic dataframe, you can see it below:
> print(thisEmailList)
user_name user_email
1 Test, Joe joejoejt#gmeel.com
2 adminintor, Admin jimmyadminj#gmeel.com
I would like to send some emails to these folks, but I am unsure what is the best approach. I have a function, sendmail, that seems to work fine with strings, but how do I iterate or apply this function to my dataframe?
I've tried many different tacts, for loops, and functions in lapply, but I cannot seem to get it to appear in the same way the database puts it out. I always seem get something to this effect:
user_name
1 Test, Joe
2 adminintor, Admin
user_email
1 joejoejt#gmeel.com
2 jimmyadminj#gmeel.com
I am thinking of thisEmailList as rows and columns, and I would like to loop through the rows, not the columns. R has been quite the difference in how to think, and I am just not getting the syntax, or how I go about sending an email to each row in the above.
Update 1
I think I finally figured it out, for a forloop anyway. If anyone has a suggestion that doesn't involved a for loop, that would be fantastic.
for (i in 1:nrow(thisEmailList)){
#Note this is just for testing, the sendmailr part has never been an issue, just getting the row/columns to loop in the right order.
print(paste(thisEmailList[i,2], thisEmailList[i,1]))
}
[1] "joejoejt#gmeel.com Test, Joe"
[1] "jimmyadminj#gmeel.com adminintor, Admin"

You want to use the basic apply function in row-mode (second parameter is 1):
apply(data.frame(thisEmailList$user_name, thisEmailList$user_email),
1,
function(x) {
# send email to user x[1]
# whose email address is x[2] )
})
You can't use the normal data frame column references inside apply, so I create a temporary input data frame whose first column (x[1]) is the user_name and whose second column (x[2]) is the user_email.

Related

Referencing recently used objects in R

My question refers to redundant code and a problem that I've been having with a lot of my R-Code.
Consider the following:
list_names<-c("putnam","einstein","newton","kant","hume","locke","leibniz")
combined_df_putnam$fu_time<-combined_df_putnam$age*365.25
combined_df_einstein$fu_time<-combined_einstein$age*365.25
combined_df_newton$fu_time<-combined_newton$age*365.25
...
combined_leibniz$fu_time<-combined_leibniz$age*365.25
I am trying to slim-down my code to do something like this:
list_names<-c("putnam","einstein","newton","kant","hume","locke","leibniz")
paste0("combined_df_",list_names[0:7]) <- data.frame("age"=1)
paste0("combined_df_",list_names[0:7]) <- paste0("combined_df_",list_names[0:7])$age*365.25
When I try to do that, I get "target of assignment expands to non-language object".
Basically, I want to create a list that contains descriptors, use that list to create a list of dataframes/lists and use these shortcuts again to do calculations. Right now, I am copy-pasting these assignments and this has led to various mistakes because I failed to replace the "name" from the previous line in some cases.
Any ideas for a solution to my problem would be greatly appreciated!
The central problem is that you are trying to assign a value (or data.frame) to the result of a function.
In paste0("combined_df_",list_names[0:7]) <- data.frame("age"=1), the left-hand-side returns a character vector:
> paste0("combined_df_",list_names[0:7])
[1] "combined_df_putnam" "combined_df_einstein" "combined_df_newton"
[4] "combined_df_kant" "combined_df_hume" "combined_df_locke"
[7] "combined_df_leibniz"
R will not just interpret these strings as variables that should be created and be referenced to. For that, you should look at the function assign.
Similarily, in the code paste0("combined_df_",list_names[0:7])$age*365.25, the paste0 function does not refer to variables, but simply returns a character vector -- for which the $ operator is not accepted.
There are many ways to solve your problem, but I will recommend that you create a function that performs the necessary operations of each data frame. The function should then return the data frame. You can then re-use the function for all 7 philosophers/scientists.

R - Why does frameex[ind, ] needs a ", " to display rows

I am new to R and I have troubles understanding how displaying an index works.
# Find indices of NAs in Max.Gust.SpeedMPH
ind <- which(is.na(weather6$Max.Gust.SpeedMPH))
# Look at the full rows for records missing Max.Gust.SpeedMPH
weather6[ind, ]
My code here works, no problem but I don't understand why weather6[ind] won't display the same thing as weather6[ind, ] . I got very lucky and mistyped the first time.
I apologize in advance that the question might have been posted somewhere else, I searched and couldn't find a proper answer.
So [ is a function just like any other function in R, but we call it strangely. Another way to write it in this case would be:
'[.data.frame'(weather6,ind,)
or the other way:
'[.data.frame'(weather6,ind)
The first three arguments to the function are named x, i and j. If you look at the code, early on it branches with the line:
if (Narg < 3L)
Putting the extra comma tells R that you've called the function with 3 arguments, but that the j argument is "missing". Otherwise, without the comma, you have only 2 arguments, and the function code moves on the the next [ method for lists, in which it will extract the first column instead.

Combining many vectors into one larger vector (in an automated way)

I have a list of identifiers as follows:
url_num <- c('85054655', '85023543', '85001177', '84988480', '84978776', '84952756', '84940316', '84916976', '84901819', '84884081', '84862066', '84848942', '84820189', '84814935', '84808144')
And from each of these I'm creating a unique variable:
for (id in url_num){
assign(paste('test_', id, sep = ""), FUNCTION GOES HERE)
}
This leaves me with my variables which are:
test_8505465, test_85023543, etc, etc
Each of them hold the correct output from the function (I've checked), however my next step is to combine them into one big vector which holds all of these created variables as a seperate element in the vector. This is easy enough via:
c(test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144)
However, as I update the original 'url_num' vector with new identifiers, I'd also have to come down to the above chunk and update this too!
Surely there's a more automated way I can setup the above chunk?
Maybe some sort of concat() function in the original for-loop which just adds each created variable straight into an empty vector right then and there?
So far I've just been trying to list all the variable names and somehow get the output to be in an acceptable format to get thrown straight into the c() function.
for (id in url_num){
cat(as.name(paste('test_', id, ",", sep = "")))
}
...which results in:
test_85054655,test_85023543,test_85001177,test_84988480,test_84978776,test_84952756,test_84940316,test_84916976,test_84901819,test_84884081,test_84862066,test_84848942,test_84820189,test_84814935,test_84808144,
This is close to the output I'm looking for but because it's using the cat() function it's essentially a print statement and its output can't really get put anywhere. Not to mention I feel like this method I've attempted is wrong to begin with and there must be something simpler I'm missing.
Thanks in advance for any help you guys can give me!
Troy

Getting variables out of a function in R

So here is some backghround info:
I have created a question and answer function in R. After the user calls the function they are prompted a succession of questions that will eventually be used to populate a report using R markdown. The function is divided into sections that follow the intended report and each section ends with a data.frame that has the question category, the answer and the name of the variable. In total there are 17 sections which means that there are 17 data.frames that get strung together using rbind function before the function writes the final data.frame to a .csv, saves it to a directory and exits. This function works well and I have no problems with it at all.
My problem lies in the fact that once the function ends I am not able to call the variables back to the console. this is a problem because if I would like to populate a report with the questions in R markdown I cannot because they only exist in the realm of the function.
What I have tried already:
I have already tried creating a list (using c()) containing the variables from each section and had the function return the list. however this did not work since it not only returns a small portion of the list and it only populates the readlines I passed to the variables. I need the be able to call the variable and receive what was answered.
I have called back the the .csv that was saved by the function and attempted to use the assign function to assign the variable name to the variable answer. This worked only when I entered one line at a time and fails when I attempt to assign column 1 to column 2. Considering there are 163 questions assigning them one at a time is a waste of time. I have even tried using the lapply and sapply functions to do this but there always a failure with the assign function
I need to be able to bring out the 163 variables that were created during the execution of the function. Here is a sample of the function for whom ever is interested to play around with.
sv<-function(){
Name<-readline("What is your Name?")
Date<-readline("What date is the site audit set for?(mm/dd/yyyy)")
Number<-readline("What is the project number")
Bname<-readline("What is the buildings name?")
ADD<-readline("What is the buildings address?(123 Fake Street)")
City<-readline("What city is the bulding located in?")
Pcode<-readline("What is the buildings postal code?")
HOO<-readline("What are the building's hours of operation?")
PHONE<-readline("What is the building's telephone number? (555-555-5555)")
FAX<-readline("What is the Fire Department's fax number? (555-555-5555)")
CONTACT<-readline("Who is the contact person for the Building? (First name, Last name)")
}
I thank you in advance for you help. Also please note I have searched through the site and saw similar questions but was not able to make the suggestions work so I apologize if this is redundant. Rember I need to be able to call Name and receive the name I entered once the function has done its thing.
Use the global assignment operator:
> sv <- function(){
+ Name <<- readline("What is your Name?")
+ }
> sv()
What is your Name?mkemp6
> print(Name)
[1] "mkemp6"

Writing a for loop in r

I don't know how to write for-loops in r. Here is what I want to do:
I have a df called "na" with 50 columns (ana1_1:ana50_1). I want to loop these commands over all columns. Here are the commands for the first two columns (ana1_1 and ana2_1):
t<-table(na$ana1_1)
ana1_1<-capture.output(sort(t))
cat(ana1_1,file="ana.txt",sep="\n",append=TRUE)
t<-table(na$ana2_1)
ana2_1<-capture.output(sort(t))
cat(ana2_1,file="ana.txt",sep="\n",append=TRUE)
After the loop, all tables (ana1_1:ana50_1) should be written in ana.txt Has anyone an idea, how to solve the problem? Thank you very much!
One approach would be to loop through the columns with lapply and using the same code as in the OP's post
invisible(lapply(na, function(x) {
x1 <- capture.output(sort(table(x)))
cat(x1, file='ana.txt', sep="\n", append=TRUE)
}))
Wrapping with invisible so that it won't print 'NULL' in the R console.
We can wrap with a condition to check if the file already exists so that it won't add the same lines by accidentally running the code again.
if(!file.exists('ana.txt')){
invisible( lapply(na, function(x) {
x1 <- capture.output(sort(table(x)))
cat(x1, file='ana.txt', sep="\n", append=TRUE)
}))
}
Here is a solution with a for loop. Loops tend to be slow in r so people prefer other solutions (e.g. the great answer provided by akrun). This answer is for your understanding of the loop syntax:
for(i in 1:50){
t1<-table(na[,i])
t2<-capture.output(sort(t1))
cat(t2,file="ana.txt",sep="\n",append=TRUE)
}
We are looping through i from 1 to 50 (first line). To select a column there's two (there's actually more than two, but that's for another time) ways to access it: na$ana1_1 or na[,1] both select the first column (second line). In the first case you refer by column name, in the second by column index. Here the second case is more convenient. The rest is your desired calculations.
Be aware that cat creates a new file if ana.txt is not existing yet and appends to it if it is already there.

Resources