Getting a for loop to ignore missing values - R - r

I've got this for loop:
for(i in 1:length(class.data$ID)) {
class.data$FinalExam_GroupMCScore[i]=mc.data$PSYC.260.Exam....2017.3.
[which(mc.data$SIS.User.ID == class.data$FinalExam_MCGroupNumber[i])]
}
To merge two class grade files. Students did a part of their final exam in groups. The problem I'm having is that not everyone opted to do the group portion so they are missing a code for class.data$FinalExam_MCGroupNumber. The for loop gets hung up on these missing values and I can't get past. I suspect I need a an if statement embedded in there but I'm not familiar enough with R yet to write one in.
I've looked at some of the other posts on this and they don't help just because I'm having a tough time seeing how to embed an if or ifelse with a more complicated function following. Any help would be appreciated! I just want it to assign an NA on FinalExam_GroupMCScore to all students with NA on FinalExam_MCGroupNumber and carry on as normal!
Thank you!!

Related

Pulling bond security names from ISIN in R

I'm trying to convert individual ISINs into their respective bond names in R. I've been able to achieve it in Excel, but weirdly passing the 'bdp' function doesn't seem to work in the desired way in R.
To give an example, I currently have an ISIN for a government bond: GB00BK5CVX03, I would like to dynamically convert said ISIN into the name for this bond (UKT 0.625 06/07/2025 GOVT).
In excel, I do:
=BDP("GB00BK5CVX03 ISIN", "ID_BB_SEC_NUM_DES")
And it delivers a useable result: UKT 0.625 06/07/25
In R, I try pretty much the same thing:
bdp("GB00BK5CVX03 ISIN", "ID_BB_SEC_NUM_DES")
And it delivers:
I was expecting a similar result to the excel output (namely a string that I could then attach to an object).
Does anyone know where I'm going wrong? Any help is much appreciated.
So I managed to solve it, turns out the API will not respond to "ISIN" being at the end of the ISIN, even though it works fine in excel.
Therefore changing the code to read:
bdp("GB00BK5CVX03 GOVT", "ID_BB_SEC_NUM_DES")
Solved the issue.

Mean value for different groups

I am stuck with a 'for' loop and would greatly appreciate some help.
I have a dataframe, called 'df' including data for the number of people per household (household_size), ranging from 0 (I replaced the missing values with a 0) to 8, as well as the number of car.
My aim is to write a quick code that computes the average number of cars depending on the household size.
I tried the following:
avg <- function(df){
i <- df$household_size
for (i in 0 : 8){
print(mean(df$car))
}
}
I'm pretty sure I'm missing something really basic here, but I don't know what.
Thanks everyone for your input.
I wouldn't have used a function for this. However, this is an exercise as part of an introductory coding with R module that specifically requires a for-loop.
Here a solution to print the mean for each size group using a for loop. Let me know if it worked
for(i in unique(df$household_size)){
print(paste(i,' : ',mean(df[df$household_size%in%i,car])))
}
As mentioned in a comment, I took away the function part because I don't see the point of having it. But if it's mandatory, you can use lapply, that behaves a bit like a for loop according to me:
lapply(unique(df$household_size), function(i){
return(paste(i,' : ',mean(df[df$household_size%in%i,car])))
}
)

Not aggregating correctly

My goal of this code is to create a loop that aggregates each company's word frequency by a certain principle vector I created and adds it to a list. The problem is, after I run this, it only prints the 7 principles that I have rather than the word frequencies along side them. The word frequencies being the certain column of the FREQBYPRINC.AG data frame. Individually, running this code without the loop and just testing out a certain column, it works no problem. For some reason, the loop doesn't want to give me the correct data frames for the list. Any suggestions?
list.agg<-vector("list",ncol(FREQBYPRINC.AG)-2)
for (i in 1:14){
attach(FREQBYPRINC.AG)
list.agg[i]<-aggregate(FREQBYPRINC.AG[,i+1],by=list(Type=principle),FUN=sum,na.rm=TRUE)
}
I really wish I could help. After reading your statement, It seems that to you , you feel that the code should be working and it is not. Well maybe there exists a glitch.
Since you had previously specified list. agg as a list, you need to subset it with double square brackets. Try this one out:
list.agg<-vector("list",ncol(FREQBYPRINC.AG)-2)
for (i in 1:14){
list.agg[[i]]<-aggregate(FREQBYPRINC.AG[,i+1],by=list
(Type=principle),FUN=sum,na.rm=TRUE)}

How do I reference previous/next rows when iterating through a data frame in R?

I have a dataset that looks like this (I'm simplifying slightly here):
Column 1 has a user id
Column 2 has a url title
Column 3 has an actual url
The data is already ordered by user and time. So its User 1 and all the URLs they visited in ascending order of time and then User 2 and the URLs they visited in ascending order of time etc etc
What I'm trying to do is loop through the dataset and look for "triplets" where the first rows url doesn't contain my keyword (something like google or facebook or nytimes or whatever), the second rows url does contain my keyword, and the third row doesn't contain my keyword. Basically checking to see which websites users visited before and after any specific website.
I've figured out I can look for the keyword using:
if(length(grep("facebook",url)) > 0)
But I haven't been able to figure out how to loop through the code and achieve what I'm trying to do.
If you could break your response into two parts, I would really appreciate it:
Part 1: Is there any way to loop through a dataframe and have access to all the columns? I was able to work on a single column with this code:
new_data <- data.frame (url)
for (url in data$url)
if(length(grep("keyword",url)) > 0) {
new_data <- rbind(new_data,data.frame(url = url))
}
This approach is limited though because I can only reference a single column in my dataframe. Whats the better solution here? I tried:
for (row in data) and then referencing columns by row[column_number] and row['column_name'] to no avail
I also tried for (i in 1:nrow(data)) and then referencing columns using data[i,column_number] and that didn't work either (That should have worked right?) I figured if this method worked I could use i-1 and i+1 to access other rows! I know this isn't the traditional way of doing things in R, but if you could still offer an explanation on how to do it this way I would really appreciate it.
Part 2: How do I accomplish my actual goal, as stated earlier? I'd like to learn to do it the "R way"; I imagine its going to involve plyr or lapply, but I haven't managed to figure out how to use those functions even after extensive reading, let alone use them and include references to previous/next rows.
Thanks in advance for your help, any guidance is appreciated!
Use [-1]:
last <- nrow(df)
penu <- nrow(df) - 1
df$ContainsKeyword <- FALSE
df$ContainsKeyword[grep("keyword", df$url)] <- TRUE
df$TripletFound <- NA
for (i in 2:penu){
df$TripletFound[i] <- {df$ContainsKeyword[i-1] & df$ContainsKeyword[i+1]} & {!df$ContainsKeyword[i]}
}

Using ifelse statement to condense variables

New to R, taking a very accelerated class with very minimal instruction. So I apologize in advance if this is a rookie question.
The assignment I have is to take a specific column that has 21 levels from a dataframe, and condense them into 4 levels, using an if, or ifelse statement. I've tried what feels like hundreds of combinations, but this is the code that seemed most promising:
> b2$LANDFORM=ifelse(b2$LANDFORM=="af","af_type",
ifelse(b2$LANDFORM=="aflb","af_type",
ifelse(b2$LANDFORM=="afub","af_type",
ifelse(b2$LANDFORD=="afwb","af_type",
ifelse(b2$LANDFORM=="afws","af_type",
ifelse(b2$LANDFORM=="bfr","bf_type",
ifelse(b2$LANDFORM=="bfrlb","bf_type",
ifelse(b2$LANDFORM=="bfrwb","bf_type",
ifelse(b2$LANDFORM=="bfrwbws","bf_type",
ifelse(b2$LANDFORM=="bfrws","bf_type",
ifelse(b2$LANDFORM=="lb","lb_type",
ifelse(bs$LANDFORM=="lbaf","lb_type",
ifelse(b2$LANDFORM=="lbub","lb_type",
ifelse(b2$LANDFORM=="lbwb","lb_type","ws_type"))))))))))))))
LANDFORM is a factor, but I tried changing it to a character too, and the code still didn't work.
"ws_type" is the catch all for the remaining variables.
the code runs without errors, but when I check it, all I get is:
> unique(b2$LANDFORM)
[1] NA "af_type"
Am I even on the right path? Any suggestions? Should I bite the bullet and make a new column with substr()? Thanks in advance.
If your new levels are just the first two letters of the old ones followed by _type you can easily achieve what you want through:
#prototype of your column
mycol<-factor(sample(c("aflb","afub","afwb","afws","bfrlb","bfrwb","bfrws","lb","lbwb","lbws","wslb","wsub"), replace=TRUE, size=100))
as.factor(paste(sep="",substr(mycol,1,2),"_type"))
After a great deal of experimenting, I consulted a co-worker, and he was able to simplify a huge amount of this. Basically, I should have made a new column composed of the first two letters of the variables in LANDFORM, and then sample from that new column and replace values in LANDFORM, in order to make the ifelse() statement much shorter. The code is:
> b2$index=as.factor(substring(b2$LANDFORM,1,2))
b2$LANDFORM=ifelse(b2$index=="af","af_type",
ifelse(b2$index=="bf","bf_type",
ifelse(b2$index=="lb","lb_type",
ifelse(b2$index=="wb","wb_type",
ifelse(b2$index=="ws","ws_type","ub_type")))))
b2$LANDFORM=as.factor(b2$LANDFORM)
Thanks to everyone who gave me some guidance!

Resources