Is there any way to replace a missing value based on another columns' value to match the column name - r

I have a dataset:
a day day.1.time day.2.time day.3.time day.4.time day.5.time
1 NA 2 4 5 7 10 4
2 NA 5 4 1 1 6 NA
3 NA 3 7 9 6 7 4
4 NA 3 6 8 8 4 5
5 NA 3 5 2 4 5 6
6 NA 3 87 3 2 1 78
7 NA 1 NA 7 5 9 54
8 NA 5 6 6 3 2 3
9 NA 2 5 10 9 8 3
10 NA 3 9 4 10 3 3
I am trying to use the day column value to match with the day.x.time column to replace the missing value in column a. For instance, in the first row, the first value in the day column is 2, then we should use day.2.time value 5 to replace the first value in column a.
If the day.x.time value is missing, we should use -1 day or +1 day to replace the missing in column a. For instance, in the second row, the day column shows 5, so we should use the value in day.5.time column, but it's also a missing value. In this case, we should use the value in day.4.time column to replace the missing value in column a.
You can use dat = data.frame(a = rep(NA,10), day = c(2,5,3,3,3,3,1,5,2,3), day.1.time = c(4,4,7,6,5,87,NA,6,5,9), day.2.time = sample(10), day.3.time = sample(10), day.4.time = sample(10), day.5.time = c(4,NA,4,5,6,78,54,3,3,3)) to generate the sample data.
I have tried grep(paste0("^day."dat$day,".time$", names(dat)) to match with the column but my code isn't matching in every row, so any help would be appreciated!

Here is one way to do this.
The first part is easy to match day column with the corresponding day.x.time column. We can do this using matrix subsetting.
cols <- grep('day\\.\\d+\\.time', names(dat))
dat$a <- dat[cols][cbind(1:nrow(dat), dat$day)]
dat
# a day day.1.time day.2.time day.3.time day.4.time day.5.time
#1 3 2 4 3 3 3 4
#2 NA 5 4 4 10 2 NA
#3 1 3 7 8 1 8 4
#4 4 3 6 6 4 5 5
#5 6 3 5 10 6 7 6
#6 8 3 87 5 8 9 78
#7 NA 1 NA 1 7 10 54
#8 3 5 6 7 9 1 3
#9 2 2 5 2 5 6 3
#10 2 3 9 9 2 4 3
To fill values where day.x.time column is NA we can select the closest non-NA value in that row.
inds <- which(is.na(dat$a))
dat$a[inds] <- mapply(function(x, y)
na.omit(unlist(dat[x, cols[order(abs(y- seq_along(cols)))]])[1:4])[1],
inds, dat$day[inds])
dat
# a day day.1.time day.2.time day.3.time day.4.time day.5.time
#1 3 2 4 3 3 3 4
#2 2 5 4 4 10 2 NA
#3 1 3 7 8 1 8 4
#4 4 3 6 6 4 5 5
#5 6 3 5 10 6 7 6
#6 8 3 87 5 8 9 78
#7 1 1 NA 1 7 10 54
#8 3 5 6 7 9 1 3
#9 2 2 5 2 5 6 3
#10 2 3 9 9 2 4 3

Using sapply to loop over the rows and subset by day[i] + 2 column.
res <- transform(dat, a=sapply(1:nrow(dat), function(i) dat[i, dat$day[i] + 2]))
res
# a day day.1.time day.2.time day.3.time day.4.time day.5.time
# 1 5 2 4 5 7 10 4
# 2 NA 5 4 1 1 6 NA
# 3 6 3 7 9 6 7 4
# 4 8 3 6 8 8 4 5
# 5 4 3 5 2 4 5 6
# 6 2 3 87 3 2 1 78
# 7 NA 1 NA 7 5 9 54
# 8 3 5 6 6 3 2 3
# 9 10 2 5 10 9 8 3
# 10 10 3 9 4 10 3 3
Edit
The +/-2 days would require a decision rule, what to chose, if day is NA, but none of day - 1 and day + 1 is NA and both have the same values.
Here a solution that goes from day backwards and takes the first non-NA. If it is day one, as it's the case in row 7, we get NA.
res <- transform(dat, a=sapply(1:nrow(dat), function(i) {
days <- dat[i, -(1:2)]
day.value <- days[dat$day[i]]
if (is.na(day.value)) {
day.value <- tail(na.omit(unlist(days[1:dat$day[i]])), 1)
if (length(day.value) == 0) day.value <- NA
}
return(day.value)
}))
res
# a day day.1.time day.2.time day.3.time day.4.time day.5.time
# 1 10 2 4 10 1 2 4
# 2 10 5 4 1 3 10 NA
# 3 2 3 7 7 2 7 4
# 4 6 3 6 2 6 6 5
# 5 10 3 5 9 10 5 6
# 6 8 3 87 6 8 4 78
# 7 NA 1 NA 3 7 1 54
# 8 3 5 6 4 4 9 3
# 9 8 2 5 8 5 8 3
# 10 9 3 9 5 9 3 3

Related

Delete rows when all numbers within a cycle of another variable equal to NA

My data are as follow:
Row x y
1 1 2
2 2 3
3 3 4
4 4 3
5 5 NA
6 1 NA
7 2 NA
8 3 NA
9 4 NA
10 5 7
11 1 NA
12 2 NA
13 3 NA
14 4 NA
15 5 NA
I wish to delete Row 11 to 15 since y are NA for ALL cycles of x (y euqal to NA whatever value x takes for Row 11 to 15). I am not going to delete other rows since there is at lease one number of y not NA when x moves from 1 to 5 (Like from Row 6 to 10, y is 7 when x is 5, thus I keep Row 6 to 10). I wish to know how should I write a R code to accompolish this.
using base R, Taking into assumption that x is arranged and that all start from 1.
subset(df,!ave(is.na(y),cumsum(c(1,diff(x)<0)),FUN=all))
Row x y
1 1 1 2
2 2 2 3
3 3 3 4
4 4 4 3
5 5 5 NA
6 6 1 NA
7 7 2 NA
8 8 3 NA
9 9 4 NA
10 10 5 7
using tidyverse:
df%>%
group_by(m = cumsum(c(1,diff(x)<0)))%>%
filter(!all(is.na(y)))
# A tibble: 10 x 4
# Groups: m [2]
Row x y m
<int> <int> <int> <dbl>
1 1 1 2 1
2 2 2 3 1
3 3 3 4 1
4 4 4 3 1
5 5 5 NA 1
6 6 1 NA 2
7 7 2 NA 2
8 8 3 NA 2
9 9 4 NA 2
10 10 5 7 2
of course you can unselect then remove m

Removing rows from each dataframe in list with condition in R

I have such a list:
df1 <- data.frame(a=c(NA, NA, 1:10), b=c(NA, 1:11))
df2 <- data.frame(a=1:10, b=c(NA,1:9))
mylist <- list(df1, df2)
> mylist
[[1]]
a b
1 NA NA
2 NA 1
3 1 2
4 2 3
5 3 4
6 4 5
7 5 6
8 6 7
9 7 8
10 8 9
11 9 10
12 10 11
[[2]]
a b
1 1 NA
2 2 1
3 3 2
4 4 3
5 5 4
6 6 5
7 7 6
8 8 7
9 9 8
10 10 9
I'd like to remove all rows with more than 1 NA in a row in each data frame. How can I do that?
I found out how to delete rows
lapply(mylist, `[`, -1,)
and how to calculate the sum of NAs
NAsums <- function(x) {rowSums(is.na(x))}
lapply(mylist, NAsums)
But I can't figure out how to combine the two steps..
We loop through the list (lapply), use rowSums to get the number of NA elements in each row, convert to a logical vector (<2), and use that to subset the rows.
lapply(mylist, function(x) x[rowSums(is.na(x))<2,])
#[[1]]
# a b
#2 NA 1
#3 1 2
#4 2 3
#5 3 4
#6 4 5
#7 5 6
#8 6 7
#9 7 8
#10 8 9
#11 9 10
#12 10 11
#[[2]]
# a b
#1 1 NA
#2 2 1
#3 3 2
#4 4 3
#5 5 4
#6 6 5
#7 7 6
#8 8 7
#9 9 8
#10 10 9

How to replace the NA values after merge two data.frame? [duplicate]

This question already has answers here:
Replacing NAs with latest non-NA value
(21 answers)
Closed 7 years ago.
I have two data.frame as the following:
> a <- data.frame(x=c(1,2,3,4,5,6,7,8), y=c(1,3,5,7,9,11,13,15))
> a
x y
1 1 1
2 2 3
3 3 5
4 4 7
5 5 9
6 6 11
7 7 13
8 8 15
> b <- data.frame(x=c(1,5,7), z=c(2, 4, 6))
> b
x z
1 1 2
2 5 4
3 7 6
Then I use "join" for two data.frames:
> c <- join(a, b, by="x", type="left")
> c
x y z
1 1 1 2
2 2 3 NA
3 3 5 NA
4 4 7 NA
5 5 9 4
6 6 11 NA
7 7 13 6
8 8 15 NA
My requirement is to replace the NAs in the Z column by the last None-Na value before the current place. I want the result like this:
> c
x y z
1 1 1 2
2 2 3 2
3 3 5 2
4 4 7 2
5 5 9 4
6 6 11 4
7 7 13 6
8 8 15 6
This time (if your data is not too large) a loop is an elegant option:
for(i in which(is.na(c$z))){
c$z[i] = c$z[i-1]
}
gives:
> c
x y z
1 1 1 2
2 2 3 2
3 3 5 2
4 4 7 2
5 5 9 4
6 6 11 4
7 7 13 6
8 8 15 6
data:
library(plyr)
a <- data.frame(x=c(1,2,3,4,5,6,7,8), y=c(1,3,5,7,9,11,13,15))
b <- data.frame(x=c(1,5,7), z=c(2, 4, 6))
c <- join(a, b, by="x", type="left")
You might also want to check na.locf in the zoo package.

Eliminate in an increasing order rows in a data frame

Eliminate in an increasing order rows in a data frame
x<-c(4,5,6,23,5,6,7,8,0,3)
y<-c(2,4,5,6,23,5,6,7,8,0)
z<-c(1,2,4,5,6,23,5,6,7,8)
df<-data.frame(x,y,z)
df
x y z
1 4 2 1
2 5 4 2
3 6 5 4
4 23 6 5
5 5 23 6
6 6 5 23
7 7 6 5
8 8 7 6
9 0 8 7
10 3 0 8
I would like to eliminate number 23 in the df from all columns by instructing to sequentially increasingly remove a row per column (not by matching the value 23, but by its initial x location).
df
x y z
1 4 2 1
2 5 4 2
3 6 5 4
4 5 6 5
5 6 5 6
6 7 6 5
7 8 7 6
8 0 8 7
9 3 0 8
Thank you
You can iterate through the columns and remove the element from each, then reassemble as a data frame:
result <- as.data.frame(lapply(1:ncol(df), function(x) df[-(x+3),x]))
names(result) <- names(df)
result
## x y z
## 1 4 2 1
## 2 5 4 2
## 3 6 5 4
## 4 5 6 5
## 5 6 5 6
## 6 7 6 5
## 7 8 7 6
## 8 0 8 7
## 9 3 0 8
df[-(x+3),x] is the column with the value removed, by location. To start with row N in column x you would use df[-(x+N-1),x].
You could also try:
n <- 4
df1 <- df[-n,]
df1[] <- unlist(df,use.names=FALSE)[-seq(n, prod(dim(df)), by=nrow(df)+1)]
df1
# x y z
#1 4 2 1
#2 5 4 2
#3 6 5 4
#5 5 6 5
#6 6 5 6
#7 7 6 5
#8 8 7 6
#9 0 8 7
#10 3 0 8

R, Using reshape to pull pre post data

I have a simple data frame as follows
x = data.frame(id = seq(1,10),val = seq(1,10))
x
id val
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
I want to add 4 more columns. The first 2 are the previous two rows and the next two are the next two rows. For the first two rows and last two rows it needs to write out as NA.
How do I accomplish this using cast in the reshape package?
The final output would look like
1 1 NA NA 2 3
2 2 NA 1 3 4
3 3 1 2 4 5
4 4 2 3 5 6
... and so on...
Thanks much in advance
After your give the example , I change the solution
mat <- cbind(dat,
c(c(NA,NA),head(dat$id,-2)),
c(c(NA),head(dat$val,-1)),
c(tail(dat$id,-1),c(NA)),
c(tail(dat$val,-2),c(NA,NA)))
colnames(mat) <- c('id','val','idp','valp','idn','valn')
id val idp valp idn valn
1 1 1 NA NA 2 3
2 2 2 NA 1 3 4
3 3 3 1 2 4 5
4 4 4 2 3 5 6
5 5 5 3 4 6 7
6 6 6 4 5 7 8
7 7 7 5 6 8 9
8 8 8 6 7 9 10
9 9 9 7 8 10 NA
10 10 10 8 9 NA NA
Here is a soluting with sapply. First, choose the relative change for the new columns:
lags <- c(-2, -1, 1, 2)
Create the new columns:
newcols <- sapply(lags,
function(l) {
tmp <- seq.int(nrow(x)) + l;
x[replace(tmp, tmp < 1 | tmp > nrow(x), NA), "val"]})
Bind together:
cbind(x, newcols)
The result:
id val 1 2 3 4
1 1 1 NA NA 2 3
2 2 2 NA 1 3 4
3 3 3 1 2 4 5
4 4 4 2 3 5 6
5 5 5 3 4 6 7
6 6 6 4 5 7 8
7 7 7 5 6 8 9
8 8 8 6 7 9 10
9 9 9 7 8 10 NA
10 10 10 8 9 NA NA

Resources