Subset function in R wont work with vector selection - r

I have this weird problem where I have something like this in my code:
#(2,1,6,3)
states.vector <- unique(data$state)
I am iterating through the vector to subset data for each value in the "state" column. At some point through my iteration, the following line of code gives me an empty data frame:
#When state == 1
data.state <- subset(data,state==states.vector[state])
If state is == 1, it means that states.vector[state] == 2. But when I do the following, it works just fine:
subset(data,state==2)
What is weird is that I used this process multiple times, and it worked fine for the exact same task, with the same format for "data", but with some different values inside.
What am I doing wrong?

I think jlhoward has already explained what the problem is.
Why don't you use something like the following lines of code to loop through your states?
states.vector <- unique(data$state)
for (selected_state in states.vector) {
data.state <- subset(data,state==selected_state)
#...
}

Related

Loop over several dataframes in R

I have several data frames that I would like to be used in the same code, one after the other. In the code lines that I have written, I am using the variable "my_data" (which is basically a dataframe). Thus, I thought the easiest solution would be to assign each of my other dataframes to "my_data", one after the other, so that all the code that follows can be executed for each data frame in a loop without changing the code I already have.
The structure I have looks as follows:
#Datasets:
my_data
age_date
gender_data
income_data
## Code that uses "my_data" follows here" ##
How can I create a loop that first assigns "age_data" to "my_data" and executes the code where "my_data" was used as a variable. Then, after it reaches the end, restarts and assigns "gender_data" to the variable "my_data" and does the same until this has been done for all variables.
Help is much appreciated!
I am attempting to answer based upon information provided:
datanames <- c("age_data","gender_data","income_data")
for (dname in datanames){
my_data <- data.frame()
my_data <- get(dname)
# here you can write rest of the code
rm(mydata)
}
Maybe you can try get within for loop
for (i in c( "age_date", "gender_data","income_data")) {
my_data <- get(i)
}

R row selection providing partial results

I'm having an issue, which I have found a solution for, but would like to understand what was going on in the original coding.
So I started with a table pulled from an SQL database and wanted information for 1 client, who is covered by 2 client numbers.
Originally I was running this to select those account numbers.
match <- c("C524",'5568')
gtc <- gtc[gtc$AccountNumber == match,]
However this was only returning about half of the desired results, and the results returned vary at different times (this was running as a weekly report), and depending on the PC running it.
Now, I've set up a loop which works fine and extracts all the results, but would really like to know what was going on with the original query.
match <- c("C524",'5568')
for (each in match) {
gtcLoop<- gtc[gtc$AccountNumber == each,]
result<-rbind(result,gtcLoop)
}
Also, long time lurker, first time poster so let me know if I've done anything wrong in this question.
You need to replace == by %in%:
gtc <- data.frame(AccountNumber = sample(c(match, "something"), 10, replace = TRUE))
gtc[gtc$AccountNumber %in% match,]
Just to tag onto Qaswed's answer (+1), you need to understand what is happening when you compute vector comparisons like ==. See:
?`==`
and
?`%in%`
then try something like 1 == c(1,2) and 1 %in% c(1,2).
The reason you are getting half the results is because the row subset is using the first evaluation only, as in:
df <- data.frame(id=c(1:5), acct_cd = letters[1:5])
df[df$acct_cd == c("a","c"),] # this is wrong, for demo only
df[df$acct_cd %in% c("a","c"),] # this is correct

Writing a loop in R

I have written a loop in R. The code is expected to go through a list of variables defined in a list and then for each of the variables perform a function.
Problem 1 - I cannot loop through the list of variables
Problem 2 - I need to insert each output from the values into Mongo DB
Here is an example of the list:
121715771201463_626656620831011
121715771201463_1149346125105084
Based on this value - I am running a code and i want this output to be inserted into MongoDB. Right now only the first value and its corresponding output is inserted
test_list <-
C("121715771201463_626656620831011","121715771201463_1149346125105084","121715771201463_1149346125105999")
for (i in test_list)
{ //myfunction//
mongo.insert(mongo, DBNS, i)
}
I am able to only pick the values for the first value and not all from the list
Any help is appreciated.
Try this example, which prints the final characters
myfunction <- function(x){ print( substr(x, 27, nchar(x)) ) }
test_list <- c("121715771201463_626656620831011",
"121715771201463_1149346125105084",
"121715771201463_1149346125105999")
for (i in test_list){ myfunction(i) }
for (j in 1:length(test_list)){ myfunction(test_list[j]) }
The final two lines should each produce
[1] "31011"
[1] "105084"
[1] "105999"
It is not clear whether "variable" is the same as "value" here.
If what you mean by variable is actually an element in the list you construct, then I think Ilyas comment above may solve the issue.
If "variable" is instead an object in the workspace, and elements in the list are the names of the objects you want to process, then you need to make sure that you use get. Like this:
for(i in ls()){
cat(paste(mode(get(i)),"\n") )
}
ls() returns a list of names of objects. The loop above goes through them all, uses get on them to get the proper object. From there, you can do the processing you want to do (in the example above, I just printed the mode of the object).
Hope this helps somehow.

Need an explanation for a particular R code snippet

The following is the code for which i need an explanation for:
for (i in id) {
data <- read.csv(files[i] )
c <- complete.cases(data)
naRm <- data[c, ]
completeCases <- rbind(completeCases, c(i, nrow(naRm)))
as i understand, the variable c here stores multiple logical values. The line after, that seems foreign to me. How does data[c, ] work?
FYI, I am an R newbie.
complete.classes looks for all rows that are "complete", have no missing values. Here is the man page. Thus the completeCases object will tell you the number of "complete" rows in each file you have just read. You really don't need to store the value of i in the rbind call though as it is just the row number, so it is redundant. A vector would do just fine for this application.
Also looks like you are missing a close brackets or this isn't a complete chunk of code.

Modifying Data Set within a function but data set is not changed

My code is the following in R:
replaceNA<- function(myData,limit){
numNA<- rowsum(is.na(myData))
targetRows<- which(numNA<=limit)
targetCols<- length(names(myData))
for(row in targetRows){
for(col in 1:targetCols){
myData[row,col][is.na(myData[row,col])]<-1
}
}
}
I am trying to iterate through each element in myData and replace all NAs of a row with 1 IF the row does not have more than the number of NAs. I have tested my code with print statements and found that the iteration works perfectly (although not the most efficient code) and if I examine the modified myData by putting in a fix(myData) before the last bracket of the function, I see that my function worked perfectly(the NAs are replaced with 1s for the rows that meet the limit condition). However, when I examine myData after the function terminates, myData does not show the changes replaceNA made.
I know there is a problem in storing the modified myData but I am not sure how to store it properly.
The condition is not clear ( English problem). In any case you don't need a for loop here.
To compute the number of missing values for each row :
rowSums(is.na(myData))
Then you just test your condition and you replace all the row:
mm <- myData[rowSums(is.na(myData)) <= limit ,]
mm[is.na(mm)] <- 1
myData[rowSums(is.na(myData)) <= limit ,] <- mm
You should make your function explicitly return the modified data,
replaceNA<- function(myData,limit){
numNA<- rowsum(is.na(myData))
targetRows<- which(numNA<=limit)
targetCols<- length(names(myData))
for(row in targetRows){
for(col in 1:targetCols){
myData[row,col][is.na(myData[row,col])]<-1
}
}
return(myData)
}
then assign the modified data. You could overwrite your old data
myData <- replaceNA(myData, limit = 2)
or make a copy to compare
myData_no_na <- replaceNA(myData, limit = 2)
You can also avoid the loop entirely, which is much more R-like. #agstudy's answer seems to be covering that approach nicely.

Resources