I am stuck with a problem in R.
It is about removing NAs within vectors and dataframes.
I am given the library, data frame and the vector as follows:
library(dslabs)
data(na_example)
ind <- is.na(na_example)
So, I need to compute the mean, but with the entries that are not NA inside the vector "ind".
I have tried everything, including the answer (I think) that is: mean(!ind), because I HAVE to use the ! operator.
The result is 0.855. However, the evaluating system does not give me a positive score.
Please, could you give me a hand?
You're looking for na.omit, not is.na:
library(dslabs)
data(na_example)
ind <- na.omit(na_example)
mean(ind)
Which gives you: 2.301754
So, I finally got after many hours of struggle.
I was putting the ! in the wrong place
ind <- is.na(na_example)
mean(!ind)
[1] 0.855
It should be:
ind <- !is.na(na_example)
mean(ind)
[1] 0.855
Related
I am a second year M.Sc student and I am running into a bit of a snag running my statistics.
I am trying to run a contingency table and Fishers test and I keep getting an error.
Error in fisher.test(GAL4UAS) : if 'x' is not a matrix, 'y' must be given
If anyone can see what I have done wrong/may be missing I would really appreciate it?
This is the code:
setwd("/Users/Pria/Desktop/Data Analysis/")
GAL4UAS <-- data.frame(Yes=c(20,21,19),No=c(10,9,11))
GAL4UAS <- lapply(GAL4UAS, abs)
fisher.test(GAL4UAS)
fisher.test(GAL4UAS[c(1,2)])
fisher.test(GAL4UAS[c(1,3)])
fisher.test() is anticipating a matrix as an input and not a data frame. Try putting your data into a matrix. One option among several would be:
m <- matrix(c(20,21,19,10,9,11),nrow = 3,ncol=2,byrow=FALSE)
fisher.test(m)
When you apply the abs() using lapply the output is a list and not a data.frame. The apply function returns the output in a matrix format which is expected in the fisher.test(). So maybe you can try this:
GAL4UAS <- data.frame(Yes=c(20,21,19),No=c(10,9,11))
GAL4UAS <- apply(GAL4UAS, abs, MARGIN=c(1,2))
fisher.test(GAL4UAS)
So I know this has been asked before, but from what I've searched I can't really find an answer to my problem. I should also add I'm relatively new to R (and any type of coding at all) so when it comes to fixing problems in code I'm not too sure what I'm looking for.
My code is:
education_ge <- data.frame(matrix(ncol=2, nrow=1))
colnames(education_ge) <- c("Education","Genetic.Engineering")
for (i in 1:nrow(survey))
if (survey[i,12]=="Bachelors")
education_ge$Education <- survey[i,12]
To give more info, 'survey' is a data frame with 12 columns and 26 rows, and the 12th column, 'Education', is a factor which has levels such as 'Bachelors', 'Masters', 'Doctorate' etc.
This is the error as it appears in R:
for (i in 1:nrow(survey))
if (survey[i,12]=="Bachelors")
education_ge$Education <- survey[i,12]
Error in if (survey[i, 12] == "Bachelors") education_ge$Education <- survey[i, :
missing value where TRUE/FALSE needed
Any help would be greatly appreciated!
If you just want to ignore any records with missing values and get on with your analysis, try inserting this at the beginning:
survey <- survey[ complete.cases(survey), ]
It basically finds the indexes of all the rows where there are no NAs anywhere, and then subsets survey to have only those rows.
For more information on subsetting, try reading this chapter: http://adv-r.had.co.nz/Subsetting.html
The command:
sapply(survey,function (x) sum(is.na(x)))
will show you how many NAs you have in each column. That might help your data cleaning.
You can try this:
sub<-subset(survey,survey$Education=="Bachelors")
education_ge$Education<-sub$Education
Let me know if this helps.
I have the following block of code. I am a complete beginner in R (a few days old) so I am not sure how much of the code will I need to share to counter my problem. So here is all of it I have written.
mdata <- read.csv("outcome-of-care-measures.csv",colClasses = "character")
allstate <- unique(mdata$State)
allstate <- allstate[order(allstate)]
spldata <- split(mdata,mdata$State)
if (num=="best") num <- 1
ranklist <- data.frame("hospital" = character(),"state" = character())
for (i in seq_len(length(allstate))) {
if (outcome=="heart attack"){
pdata <- spldata[[i]]
pdata[,11] <- as.numeric(pdata[,11])
bestof <- pdata[!is.na(as.numeric(pdata[,11])),][]
inorder <- order(bestof[,11],bestof[,2])
if (num=="worst") num <- nrow(bestof)
hospital <- bestof[inorder[num],2]
state <- allstate[i]
ranklist <- rbind(ranklist,c(hospital,state))
}
}
allstate is a character vector of states.
outcome can have values similar to "heart attack"
num will be numeric or "best" or "worst"
I want to create a data frame ranklist which will have hospital names and the state names which follow a certain criterion.
However I keep getting the error
invalid factor level, NA generated
I know it has something to do with rbind but I cannot figure out what is it. I have tried googling about this, and also tried troubleshooting using other similar queries on this site too. I have checked any of my vectors I am trying to bind are not factors. I also tried forcing the coercion by setting the hospital and state as.character() during assignment, but didn't work.
I would be grateful for any help.
Thanks in advance!
Since this is apparently from a Coursera assignment I am not going to give you a solution but I am going to hint at it: Have a look at the help pages for read.csv and data.frame. Both have the argument stringsAsFactors. What is the default, true or false? Do you want to keep the default setting? Is colClasses = "character" in line 1 necessary? Use the str function to check what the classes of the columns in mdata and ranklist are. read.csv additionally has an na.strings argument. If you use it correctly, also the NAs introduced by coercion warning will disappear and line 16 won't be necessary.
Finally, don't grow a matrix or data frame inside a loop if you know the final size beforehand. Initialize it with the correct dimensions (here 52 x 2) and assign e.g. the i-th hospital to the i-th row and first column of the data frame. That way rbind is not necessary.
By the way you did not get an error but a warning. R didn't interrupt the loop it just let you know that some values have been coerced to NA. You can also simplify the seq_len statement by using seq_along instead.
I am trying to make a $n\times 4$ matrix by retrieving the n-th four elements in a given vector. Since I am new to R, don't know how to use loop functions properly.
My code is like
x<-runif(150,-2,2)
x1<-c(0,0,0,0,x)
for (i in 0:150)
{ai<-x1[1+i,4+i]
}
However, I got: Error in x1[1 + i, 4 + i] : incorrect number of dimensions.
I also want to combine these ai into a matrix, and each ai will be the i+1-th row of the matrix. Guess I should use the cbind function?
Any help will be appreciated. Thanks in advance.
You can do this directly with the matrix command:
x <- 1:36
xmat<-matrix(x,nr=9,byrow=TRUE)
May be this helps:
n <- length(x1)-1
res <- sapply((4:n)-3, function(i) x1[(i+3):i])
dim(res)
#[1] 4 150
I am having big trouble trying to convert a set of 53 factor variables to numeric. Here are a couple of the functions I tried but none of them are working :
sapply(dataset, function(x) transform(as.character(x)))
and then
sapply(dataset, function(x) transform(as.numeric(x)))
I also tried it with lapply, but same thing...
as.numeric(levels(factor))
doesnt work either and finally I tried to do it one by one:
transform(dataset, s1 = as.numeric(s1), s2= as.numeric(s2)...etc)
Could somebody please help me ? I also have a couple of missing values NA and M within the variables so I dont know how I can adjust for that.
Thanks !
Although you didn't provide a reproducible example, this might work:
df[,c(2:54)] <- as.numeric(as.character(unlist(df[,c(2:54)])))
where c(2:54) stands for the columns you want to change to numeric