creating new variable with ifelse with NA's - r

I have a data frame that consists of the answers to the question:
"What language do you speak at home?
1=English
2=Spanish
and so on...
first.language second.language third.language fourth.language
1 1 NA NA NA
2 1 2 NA NA
3 1 2 NA NA
4 2 NA NA NA
5 1 2 NA NA
6 1 5 NA NA
What I want to do is create the variable: "english.home"
"english.home" will be:
1=if English is spoken at home, never mind if it is the first, second... language
2(else)=if English is not spoken at home.
I tried using:
student1$english.home = ifelse(student1$first.language==1 |
student1$second.language==1 | student1$third.language==1 |
student1$fourth.language==1,1,2)
But i got:
> english.home
1 1
2 1
3 1
4 NA
5 1
6 1
Is there any way of accomplishing this without getting an NA on row number four. Because it really doesn’t matter that it is a NA what matters is that it is not English!
I know that the ifelse-Na topic was much debated. I have searched a lot for a solution before posting but could not find it.
Hope someone will help me out of this mess

Something like this should do what you want.
# Read your data
tab <- read.table(text ="
first.language second.language third.language fourth.language
1 1 NA NA NA
2 1 2 NA NA
3 1 2 NA NA
4 2 NA NA NA
5 1 2 NA NA
6 1 5 NA NA")
tab$english.home <-
apply(tab, 1, function (x) 2 - any(x == 1, na.rm = TRUE))
print(tab)
# first.language second.language third.language fourth.language english.home
#1 1 NA NA NA 1
#2 1 2 NA NA 1
#3 1 2 NA NA 1
#4 2 NA NA NA 2
#5 1 2 NA NA 1
#6 1 5 NA NA 1
We use the fact that logical vectors get promoted to numeric 0and 1 when added (or subtracted) with a numeric.

May be this helps
(!rowSums(student1==1 & !is.na(student1))) +1
#1 2 3 4 5 6
#1 1 1 2 1 1

Related

Assigning NAs into a dataframe in R

I have been trying to assign NAs by using for loop, but is not working and I know there are possible easiest ways to do this.
I want to create an extra column (just like the column in the example named Desire_Output) in which I will assign NA to any row that in the Value column has a number higher than 1. Also, I want to assign NAs to the next following two rows. If there are NAs in the Value column, just also put NAs in the desire output column.
Here is the example:
Event<- c(1,2,2,2,2,2,2,3,3,4,4,4,4,4,5,6,6,6,7)
Value<- c(5,3,0,0,0,2,0,1,10,0,0,NA,NA,NA,1,0,8,0,0)
Desire_output<- c(NA,NA,NA,NA,0,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,0,NA,NA,NA)
A<- data.frame(Event,Value,Desire_output)
Event Value Desire_output
1 1 5 NA
2 2 3 NA
3 2 0 NA
4 2 0 NA
5 2 0 0
6 2 2 NA
7 2 0 NA
8 3 1 NA
9 3 10 NA
10 4 0 NA
11 4 0 NA
12 4 NA NA
13 4 NA NA
14 4 NA NA
15 5 1 1
16 6 0 0
17 6 8 NA
18 6 0 NA
19 7 0 NA
This is what I tried to do, but when I getvto the NAs in the Value column I started to have some troubles.
for (f in 1:(nrow(A)-1)){
if(A$Value2[f] > 1){
A$Value2[f]<- NA
A$Value2[f+1]<- NA
A$Value[f+2]<- NA
}else{
}
}
Please let me know if you have an easiest way to do it with any other method.
We can first copy Value column to Desired_output column and find out the indices (inds) where Value is greater than 1 and add NA to that row and next two rows as well.
A$Desired_output <- A$Value
inds <- which(A$Value > 1)
A$Desired_output[unique(c(inds, inds + 1, inds + 2))] <- NA
A
# Event Value Desired_output
#1 1 5 NA
#2 2 3 NA
#3 2 0 NA
#4 2 0 NA
#5 2 0 0
#6 2 2 NA
#7 2 0 NA
#8 3 1 NA
#9 3 10 NA
#10 4 0 NA
#11 4 0 NA
#12 4 NA NA
#13 4 NA NA
#14 4 NA NA
#15 5 1 1
#16 6 0 0
#17 6 8 NA
#18 6 0 NA
#19 7 0 NA
You might use the ifelse. In the below code I have used the OR statement inside the ifelse code.
A$Desire_output<- ifelse(A$Value>1 | is.na(A$Value), NA, A$Value)
I hope this will help.
I think this gives what you're after, but other solutions may be less laborious.
Event<- c(1,2,2,2,2,2,2,3,3,4,4,4,4,4,5,6,6,6,7)
Value<- c(5,3,0,0,0,2,0,1,10,0,0,NA,NA,NA,1,0,8,0,0)
A<- data.frame(Event,Value,Desired_output)
A["Desired_Output"] <- 0
for(i in seq(length(A$Value))){
if(!is.na(A$Desired_Output[i])){
if(A$Value[i] > 1 & !is.na(A$Value[i])){
A$Desired_Output[c(i, i+1, i+2)] <- NA
}else if(is.na(A$Value[i])){
A$Desired_Output[i] <- NA
}else{
A$Desired_Output[i] <- A$Value[i]
}
}
}

Rearranging columns with NAs [duplicate]

This question already has answers here:
How to move cells with a value row-wise to the left in a dataframe [duplicate]
(5 answers)
Closed 4 years ago.
Sorry guys,
this is probably a silly question but I do not manage to find a quick solution to solve this issue.
I have a dataframe of this form indicating the number of components of households and gender of each member
Familyid Gender_1 Gender_2 Gender_3 Gender_4 Ncomponent
1 1 NA NA NA 1
2 NA 1 NA NA 1
3 1 2 NA NA 2
4 1 NA 2 NA 2
5 NA 1 2 NA 2
6 2 NA NA 1 2
I would like to collect this info just in two columns in the following way.
Familyid Gender_member1 Gender_member2 Ncomponent
1 1 NA 1
2 1 NA 1
3 1 2 2
4 1 2 2
5 1 2 2
6 2 1 2
In other words I want to create a column indicating gender of member 1, regardless in which column he/she is located in my original dataframe, and a different one indicating gender of the second family member, whenever this latter exists.
Can anyone helping me out with this?
Marco
I just removed NAs for Gender_x columns.
xy <- read.table(text = "Familyid Gender_1 Gender_2 Gender_3 Gender_4 Ncomponent
1 1 NA NA NA 1
2 NA 1 NA NA 1
3 1 2 NA NA 2
4 1 NA 2 NA 2
5 NA 1 2 NA 2
6 2 NA NA 1 2",
header = TRUE)
xy
fetch.gender <- grepl("^Gender_\\d{1}$", names(xy))
out <- apply(xy[, fetch.gender], MARGIN = 1, FUN = na.omit)
out <- do.call(rbind, out)
names(out) <- c("Gender_member1", "Gender_member2")
data.frame(Familyid = xy$Familyid, out, Ncomponent = xy$Ncomponent)
Familyid Gender_1 Gender_2 Ncomponent
1 1 1 1 1
2 2 1 1 1
3 3 1 2 2
4 4 1 2 2
5 5 1 2 2
6 6 2 1 2

R - Reorder NULL columns

I have the next question.
If I have the following data frame db and I want to rearrange the columns so that they the NULL columns stay at the ends (as in db2).
How can I do it dynamically?
Thank you
db<-data.frame(N=c(2,4,6,8),
a=c(1,1,1,1),
b=c(1,1,1,1),
c=c(NA,1,1,1),
d=c(NA,1,1,1),
e=c(NA,NA,1,1),
f=c(NA,NA,1,1),
g=c(NA,NA,NA,1),
h=c(NA,NA,NA,1))
db2<-data.frame(N=c(2,4,6,8),
a=c(NA,NA,NA,1),
b=c(NA,NA,1,1),
c=c(NA,1,1,1),
d=c(1,1,1,1),
e=c(1,1,1,1),
f=c(NA,1,1,1),
g=c(NA,NA,1,1),
h=c(NA,NA,NA,1))
N a b c d e f g h
1 2 NA NA NA 1 1 NA NA NA
2 4 NA NA 1 1 1 1 NA NA
3 6 NA 1 1 1 1 1 1 NA
4 8 1 1 1 1 1 1 1 1
If the number of NAs per row are always even, then loop through the rows, rearrange the NA by appending half the NAs at the start and end
db[-1] <- t(apply(db[-1], 1, function(x) {
i1 <- is.na(x)
if(sum(i1) > 0) setNames(c(rep(NA,sum(i1)/2), x[!i1],
rep(NA, sum(i1)/2)), names(x)) else x}))
db
# N a b c d e f g h
#1 2 NA NA NA 1 1 NA NA NA
#2 4 NA NA 1 1 1 1 NA NA
#3 6 NA 1 1 1 1 1 1 NA
#4 8 1 1 1 1 1 1 1 1

Retrieving subset of a data frame by finding entries with NA in specific columns

Suppose we had a data frame with NA values like so,
>data
A B C D
1 3 NA 4
2 1 3 4
NA 3 3 5
4 2 NA NA
2 NA 4 3
1 1 1 2
I wish to know a general method for retrieving the subset of data with NA values in C or A. So the output should be,
A B C D
1 3 NA 4
NA 3 3 5
4 2 NA NA
I tried using the subset command like so, subset(data, A==NA | C==NA), but it didn't work. Any ideas?
A very handy function for these sort of things is complete.cases. It checks row-wise for NA and if any returns FALSE. If there are no NAs, returns TRUE.
So, you need to subset just the two columns of your data and then use complete.cases(.) and negate it and subset those rows back from your original data, as follows:
# assuming your data is in 'df'
df[!complete.cases(df[, c("A", "C")]), ]
# A B C D
# 1 1 3 NA 4
# 3 NA 3 3 5
# 4 4 2 NA NA
Here is one possibility:
# Read your data
data <- read.table(text="
A B C D
1 3 NA 4
2 1 3 4
NA 3 3 5
4 2 NA NA
2 NA 4 3
1 1 1 2",header=T,sep="")
# Now subset your data
subset(data, is.na(C) | is.na(A))
A B C D
1 1 3 NA 4
3 NA 3 3 5
4 4 2 NA NA

Selecting values in a dataframe based on a priority list

I am new to R so am still getting my head around the way it works. My problem is as follows, I have a data frame and a prioritised list of columns (pl), I need:
To find the maximum value from the columns in pl for each row and create a new column with this value (df$max)
Using the priority list, subtract this maximum value from the priority value, ignoring NAs and returning the absolute difference
Probably better with an example:
My priority list is
pl <- c("E","D","A","B")
and the data frame is:
A B C D E F G
1 15 5 20 9 NA 6 1
2 3 2 NA 5 1 3 2
3 NA NA 3 NA NA NA NA
4 0 1 0 7 8 NA 6
5 1 2 3 NA NA 1 6
So for the first line the maximum is from column A (15) and the priority value is from column D (9) since E is a NA. The answer I want should look like this.
A B C D E F G MAX MAX-PR
1 15 5 20 9 NA 6 1 15 6
2 3 2 NA 5 1 3 2 5 4
3 NA NA 3 NA NA NA NA NA NA
4 0 1 0 7 8 NA 6 8 0
5 1 2 3 NA NA 1 6 2 1
How about this?
df$MAX <- apply(df[,pl], 1, max, na.rm = T)
df$MAX_PR <- df$MAX - apply(df[,pl], 1, function(x) x[!is.na(x)][1])
df$MAX[is.infinite(df$MAX)] <- NA
> df
# A B C D E F G MAX MAX_PR
# 1 15 5 20 9 NA 6 1 15 6
# 2 3 2 NA 5 1 3 2 5 4
# 3 NA NA 3 NA NA NA NA NA NA
# 4 0 1 0 7 8 NA 6 8 0
# 5 1 2 3 NA NA 1 6 2 1
Example:
df <- data.frame(A=c(1,NA,2,5,3,1),B=c(3,5,NA,6,NA,10),C=c(NA,3,4,5,1,4))
pl <- c("B","A","C")
#now we find the maximum per row, ignoring NAs
max.per.row <- apply(df,1,max,na.rm=T)
#and the first element according to the priority list, ignoring NAs
#(there may be a more efficient way to do this)
first.per.row <- apply(df[,pl],1, function(x) as.vector(na.omit(x))[1])
#and finally compute the difference
max.less.first.per.row <- max.per.row - first.per.row
Note that this code will break for any row that is all NA. There is no check against that.
Here a simple version. First , I take only pl columns , for each line I remove na then I compute the max.
df <- dat[,pl]
cbind(dat, t(apply(df, 1, function(x) {
x <- na.omit(x)
c(max(x),max(x)-x[1])
}
)
)
)
A B C D E F G 1 2
1 15 5 20 9 NA 6 1 15 6
2 3 2 NA 5 1 3 2 5 4
3 NA NA 3 NA NA NA NA -Inf NA
4 0 1 0 7 8 NA 6 8 0
5 1 2 3 NA NA 1 6 2 1

Resources