Assigning NAs into a dataframe in R - r

I have been trying to assign NAs by using for loop, but is not working and I know there are possible easiest ways to do this.
I want to create an extra column (just like the column in the example named Desire_Output) in which I will assign NA to any row that in the Value column has a number higher than 1. Also, I want to assign NAs to the next following two rows. If there are NAs in the Value column, just also put NAs in the desire output column.
Here is the example:
Event<- c(1,2,2,2,2,2,2,3,3,4,4,4,4,4,5,6,6,6,7)
Value<- c(5,3,0,0,0,2,0,1,10,0,0,NA,NA,NA,1,0,8,0,0)
Desire_output<- c(NA,NA,NA,NA,0,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,0,NA,NA,NA)
A<- data.frame(Event,Value,Desire_output)
Event Value Desire_output
1 1 5 NA
2 2 3 NA
3 2 0 NA
4 2 0 NA
5 2 0 0
6 2 2 NA
7 2 0 NA
8 3 1 NA
9 3 10 NA
10 4 0 NA
11 4 0 NA
12 4 NA NA
13 4 NA NA
14 4 NA NA
15 5 1 1
16 6 0 0
17 6 8 NA
18 6 0 NA
19 7 0 NA
This is what I tried to do, but when I getvto the NAs in the Value column I started to have some troubles.
for (f in 1:(nrow(A)-1)){
if(A$Value2[f] > 1){
A$Value2[f]<- NA
A$Value2[f+1]<- NA
A$Value[f+2]<- NA
}else{
}
}
Please let me know if you have an easiest way to do it with any other method.

We can first copy Value column to Desired_output column and find out the indices (inds) where Value is greater than 1 and add NA to that row and next two rows as well.
A$Desired_output <- A$Value
inds <- which(A$Value > 1)
A$Desired_output[unique(c(inds, inds + 1, inds + 2))] <- NA
A
# Event Value Desired_output
#1 1 5 NA
#2 2 3 NA
#3 2 0 NA
#4 2 0 NA
#5 2 0 0
#6 2 2 NA
#7 2 0 NA
#8 3 1 NA
#9 3 10 NA
#10 4 0 NA
#11 4 0 NA
#12 4 NA NA
#13 4 NA NA
#14 4 NA NA
#15 5 1 1
#16 6 0 0
#17 6 8 NA
#18 6 0 NA
#19 7 0 NA

You might use the ifelse. In the below code I have used the OR statement inside the ifelse code.
A$Desire_output<- ifelse(A$Value>1 | is.na(A$Value), NA, A$Value)
I hope this will help.

I think this gives what you're after, but other solutions may be less laborious.
Event<- c(1,2,2,2,2,2,2,3,3,4,4,4,4,4,5,6,6,6,7)
Value<- c(5,3,0,0,0,2,0,1,10,0,0,NA,NA,NA,1,0,8,0,0)
A<- data.frame(Event,Value,Desired_output)
A["Desired_Output"] <- 0
for(i in seq(length(A$Value))){
if(!is.na(A$Desired_Output[i])){
if(A$Value[i] > 1 & !is.na(A$Value[i])){
A$Desired_Output[c(i, i+1, i+2)] <- NA
}else if(is.na(A$Value[i])){
A$Desired_Output[i] <- NA
}else{
A$Desired_Output[i] <- A$Value[i]
}
}
}

Related

R First Non-NA Value From Cols

df <- data.frame(ID=c(1,2,3,4,5,6),
CO=c(-6,4,2,3,0,2),
CATFOX=c(1,NA,NA,3,0,NA),
DOGFOX=c(NA,NA,5,1,2,NA),
RABFOX=c(NA,3,NA,5,3,NA),
D=c(0,4,5,6,1,2),
WANT=c(1,3,5,3,0,NA))
I have a dataframe and i wish to make column WANT take the first value of 'CATFOX' 'DOGFOX' 'RABFOX' that is not NA. Is there a data.table solution? I tried this but it did not produce the desired outcome:
df$WANT=do.call(coalesce, data[grepl('FOX',names(data))])
You have coalesce in your example which is dplyr's construct. Try fcoalesce:
library(data.table)
setDT(df)[, WANT2 := fcoalesce(CATFOX, DOGFOX, RABFOX)]
Output:
ID CO CATFOX DOGFOX RABFOX D WANT WANT2
1: 1 -6 1 NA NA 0 1 1
2: 2 4 NA NA 3 4 3 3
3: 3 2 NA 5 NA 5 5 5
4: 4 3 3 1 5 6 3 3
5: 5 0 0 2 3 1 0 0
6: 6 2 NA NA NA 2 NA NA
We can use a vectorized option in base R
i1 <- endsWith(names(df), 'FOX')
df$WANT2 <- df[i1][cbind(seq_len(nrow(df)), max.col(!is.na(df[i1]), 'first'))]
df$WAN2
#[1] 1 3 5 3 0 NA
You could try this base R solution:
#Data
data=data.frame(ID=c(1,2,3,4,5),
CO=c(-6,4,2,3,0),
CATFOX=c(1,NA,NA,3,0),
DOGFOX=c(NA,NA,5,1,2),
RABFOX=c(NA,3,NA,5,3),
D=c(0,4,5,6,1),
WANT=c(1,3,5,3,0))
#Process
index <- which(names(data) %in% c('CATFOX','DOGFOX','RABFOX'))
data$WANT2 <- apply(data[,index],1,function(x) x[min(which(!is.na(x)))])
Output:
ID CO CATFOX DOGFOX RABFOX D WANT WANT2
1 1 -6 1 NA NA 0 1 1
2 2 4 NA NA 3 4 3 3
3 3 2 NA 5 NA 5 5 5
4 4 3 3 1 5 6 3 3
5 5 0 0 2 3 1 0 0

Store first non-missing value in a new column

Ciao, I have several columns that represents scores. For each STUDENT I want to take the first non-NA score and store it in a new column called TEST.
Here is my replicating example. This is the data I have now:
df <- data.frame(STUDENT=c(1,2,3,4,5),
CLASS=c(90,91,92,93,95),
SCORE1=c(10,NA,NA,NA,NA),
SCORE2=c(2,NA,8,NA,NA),
SCORE3=c(9,6,6,NA,NA),
SCORE4=c(NA,7,5,1,9),
ROOM=c(01,02, 03, 04, 05))
This is the column I am aiming to add:
df$FIRST <- c(10,6,8,1,9)
This is my attempt:
df$FIRSTGUESS <- max.col(!is.na(df[3:6]), "first")
This is exactly what coalesce from package dplyr does. As described in its documentation:
Given a set of vectors, coalesce() finds the first non-missing value
at each position.
Therefore, you can simplify do:
library(dplyr)
df$FIRST <- do.call(coalesce, df[grepl('SCORE', names(df))])
This is the result:
> df
STUDENT CLASS SCORE1 SCORE2 SCORE3 SCORE4 ROOM FIRST
1 1 90 10 2 9 NA 1 10
2 2 91 NA NA 6 7 2 6
3 3 92 NA 8 6 5 3 8
4 4 93 NA NA NA 1 4 1
5 5 95 NA NA NA 9 5 9
You can do this with apply and which.min(is.na(...))
df$FIRSTGUESS <- apply(df[, grep("^SCORE", names(df))], 1, function(x)
x[which.min(is.na(x))])
df
# STUDENT CLASS SCORE1 SCORE2 SCORE3 SCORE4 ROOM FIRSTGUESS
#1 1 90 10 2 9 NA 1 10
#2 2 91 NA NA 6 7 2 6
#3 3 92 NA 8 6 5 3 8
#4 4 93 NA NA NA 1 4 1
#5 5 95 NA NA NA 9 5 9
Note that we need is.na instead of !is.na because FALSE corresponds to 0 and we want to return the first (which.min) FALSE value.
Unfortunately, max.col gives indices of max values and not the values itself. However, we can subset the values from the original dataframe using the mapply call.
#Select only columns which has "SCORE" in it
sub_df <- df[grepl("SCORE", names(df))]
#Get the first non-NA value by row
inds <- max.col(!is.na(sub_df), ties.method = "first")
#Get the inds value by row
df$FIRSTGUESS <- mapply(function(x, y) sub_df[x,y], 1:nrow(sub_df), inds)
df
# STUDENT CLASS SCORE1 SCORE2 SCORE3 SCORE4 ROOM FIRST FIRSTGUESS
#1 1 90 10 2 9 NA 1 10 10
#2 2 91 NA NA 6 7 2 6 6
#3 3 92 NA 8 6 5 3 8 8
#4 4 93 NA NA NA 1 4 1 1
#5 5 95 NA NA NA 9 5 9 9
Using zoo,na.locf, borrowing the setting up of sub_df from Ronak
df['New']=zoo::na.locf(t(sub_df),fromLast=T)[1,]
df
STUDENT CLASS SCORE1 SCORE2 SCORE3 SCORE4 ROOM New
1 1 90 10 2 9 NA 1 10
2 2 91 NA NA 6 7 2 6
3 3 92 NA 8 6 5 3 8
4 4 93 NA NA NA 1 4 1
5 5 95 NA NA NA 9 5 9

R: Change values in a data frame according to values in another column of the same row

Let's say I have this kind of data frame:
df <- data.frame(
t=rep(seq(0,2),6),
no=rep(c(1,2,3,4,5,6),each=3),
value=rnorm(18),g=rep(c("nc","c1", NA),each=3)
)
t no value g
1 0 1 0.5022163 nc
2 1 1 0.5687227 nc
3 2 1 -0.2922622 nc
4 0 2 -0.3587089 c1
5 1 2 -0.9028012 c1
6 2 2 0.1926774 c1
7 0 3 0.6771236 NA
8 1 3 0.3752632 NA
9 2 3 0.2795892 NA
10 0 4 -0.4565521 nc
11 1 4 -0.1241807 nc
12 2 4 -1.2603695 nc
13 0 5 -0.6323118 c1
14 1 5 -0.6283850 c1
15 2 5 -0.2052317 c1
16 0 6 1.5996913 NA
17 1 6 -0.4802057 NA
18 2 6 -0.4255056 NA
I want to set the values in df$value to NA whenever there is NA in df$g (only in the same rows).
And similarly, set the values in df$value to NA, if df$no is, e.g., 1 or 5.
I was fooling around with for loops, but I could not get it right.
Any help will be much appreciated.
Thanks
With a for loop
for (i in 1:nrow(df)) {
if (df$no[i] == 1 | df$no[i] == 5 | is.na(df$g[i])) {
df$value[i] <- NA
}
}

creating new variable with ifelse with NA's

I have a data frame that consists of the answers to the question:
"What language do you speak at home?
1=English
2=Spanish
and so on...
first.language second.language third.language fourth.language
1 1 NA NA NA
2 1 2 NA NA
3 1 2 NA NA
4 2 NA NA NA
5 1 2 NA NA
6 1 5 NA NA
What I want to do is create the variable: "english.home"
"english.home" will be:
1=if English is spoken at home, never mind if it is the first, second... language
2(else)=if English is not spoken at home.
I tried using:
student1$english.home = ifelse(student1$first.language==1 |
student1$second.language==1 | student1$third.language==1 |
student1$fourth.language==1,1,2)
But i got:
> english.home
1 1
2 1
3 1
4 NA
5 1
6 1
Is there any way of accomplishing this without getting an NA on row number four. Because it really doesn’t matter that it is a NA what matters is that it is not English!
I know that the ifelse-Na topic was much debated. I have searched a lot for a solution before posting but could not find it.
Hope someone will help me out of this mess
Something like this should do what you want.
# Read your data
tab <- read.table(text ="
first.language second.language third.language fourth.language
1 1 NA NA NA
2 1 2 NA NA
3 1 2 NA NA
4 2 NA NA NA
5 1 2 NA NA
6 1 5 NA NA")
tab$english.home <-
apply(tab, 1, function (x) 2 - any(x == 1, na.rm = TRUE))
print(tab)
# first.language second.language third.language fourth.language english.home
#1 1 NA NA NA 1
#2 1 2 NA NA 1
#3 1 2 NA NA 1
#4 2 NA NA NA 2
#5 1 2 NA NA 1
#6 1 5 NA NA 1
We use the fact that logical vectors get promoted to numeric 0and 1 when added (or subtracted) with a numeric.
May be this helps
(!rowSums(student1==1 & !is.na(student1))) +1
#1 2 3 4 5 6
#1 1 1 2 1 1

Selecting values in a dataframe based on a priority list

I am new to R so am still getting my head around the way it works. My problem is as follows, I have a data frame and a prioritised list of columns (pl), I need:
To find the maximum value from the columns in pl for each row and create a new column with this value (df$max)
Using the priority list, subtract this maximum value from the priority value, ignoring NAs and returning the absolute difference
Probably better with an example:
My priority list is
pl <- c("E","D","A","B")
and the data frame is:
A B C D E F G
1 15 5 20 9 NA 6 1
2 3 2 NA 5 1 3 2
3 NA NA 3 NA NA NA NA
4 0 1 0 7 8 NA 6
5 1 2 3 NA NA 1 6
So for the first line the maximum is from column A (15) and the priority value is from column D (9) since E is a NA. The answer I want should look like this.
A B C D E F G MAX MAX-PR
1 15 5 20 9 NA 6 1 15 6
2 3 2 NA 5 1 3 2 5 4
3 NA NA 3 NA NA NA NA NA NA
4 0 1 0 7 8 NA 6 8 0
5 1 2 3 NA NA 1 6 2 1
How about this?
df$MAX <- apply(df[,pl], 1, max, na.rm = T)
df$MAX_PR <- df$MAX - apply(df[,pl], 1, function(x) x[!is.na(x)][1])
df$MAX[is.infinite(df$MAX)] <- NA
> df
# A B C D E F G MAX MAX_PR
# 1 15 5 20 9 NA 6 1 15 6
# 2 3 2 NA 5 1 3 2 5 4
# 3 NA NA 3 NA NA NA NA NA NA
# 4 0 1 0 7 8 NA 6 8 0
# 5 1 2 3 NA NA 1 6 2 1
Example:
df <- data.frame(A=c(1,NA,2,5,3,1),B=c(3,5,NA,6,NA,10),C=c(NA,3,4,5,1,4))
pl <- c("B","A","C")
#now we find the maximum per row, ignoring NAs
max.per.row <- apply(df,1,max,na.rm=T)
#and the first element according to the priority list, ignoring NAs
#(there may be a more efficient way to do this)
first.per.row <- apply(df[,pl],1, function(x) as.vector(na.omit(x))[1])
#and finally compute the difference
max.less.first.per.row <- max.per.row - first.per.row
Note that this code will break for any row that is all NA. There is no check against that.
Here a simple version. First , I take only pl columns , for each line I remove na then I compute the max.
df <- dat[,pl]
cbind(dat, t(apply(df, 1, function(x) {
x <- na.omit(x)
c(max(x),max(x)-x[1])
}
)
)
)
A B C D E F G 1 2
1 15 5 20 9 NA 6 1 15 6
2 3 2 NA 5 1 3 2 5 4
3 NA NA 3 NA NA NA NA -Inf NA
4 0 1 0 7 8 NA 6 8 0
5 1 2 3 NA NA 1 6 2 1

Resources