How to replace the NA values after merge two data.frame? [duplicate] - r

This question already has answers here:
Replacing NAs with latest non-NA value
(21 answers)
Closed 7 years ago.
I have two data.frame as the following:
> a <- data.frame(x=c(1,2,3,4,5,6,7,8), y=c(1,3,5,7,9,11,13,15))
> a
x y
1 1 1
2 2 3
3 3 5
4 4 7
5 5 9
6 6 11
7 7 13
8 8 15
> b <- data.frame(x=c(1,5,7), z=c(2, 4, 6))
> b
x z
1 1 2
2 5 4
3 7 6
Then I use "join" for two data.frames:
> c <- join(a, b, by="x", type="left")
> c
x y z
1 1 1 2
2 2 3 NA
3 3 5 NA
4 4 7 NA
5 5 9 4
6 6 11 NA
7 7 13 6
8 8 15 NA
My requirement is to replace the NAs in the Z column by the last None-Na value before the current place. I want the result like this:
> c
x y z
1 1 1 2
2 2 3 2
3 3 5 2
4 4 7 2
5 5 9 4
6 6 11 4
7 7 13 6
8 8 15 6

This time (if your data is not too large) a loop is an elegant option:
for(i in which(is.na(c$z))){
c$z[i] = c$z[i-1]
}
gives:
> c
x y z
1 1 1 2
2 2 3 2
3 3 5 2
4 4 7 2
5 5 9 4
6 6 11 4
7 7 13 6
8 8 15 6
data:
library(plyr)
a <- data.frame(x=c(1,2,3,4,5,6,7,8), y=c(1,3,5,7,9,11,13,15))
b <- data.frame(x=c(1,5,7), z=c(2, 4, 6))
c <- join(a, b, by="x", type="left")
You might also want to check na.locf in the zoo package.

Related

Shift the value of the variable in R "A" instead of NA [duplicate]

This question already has answers here:
Replace a value NA with the value from another column in R
(5 answers)
Closed 3 months ago.
I need to put the value of variable "A" in place of the NA of variable "B".
Example of my dataframe:
> df <- data.frame(A = seq(1, 10), B = c(1, NA, 3, 4, NA, NA, 7, 8, NA, NA))
> df
A B
1 1 1
2 2 NA
3 3 3
4 4 4
5 5 NA
6 6 NA
7 7 7
8 8 8
9 9 NA
10 10 NA
I want the above dataframe converted into this:
> df
A B
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
Using R base indexing
> df$B[is.na(df$B)] <- df$A[is.na(df$B)]
> df
A B
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
Use coalesce
library(dplyr)
df <- df %>%
mutate(B = coalesce(B, A))
-output
df
A B
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
I prefer coalesce. Here is one with an ifelse:
library(dplyr)
df %>%
mutate(B = ifelse(is.na(B), A, B))
A B
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10

How to find closest match from list in R

I have a list of numbers and would like to find which is the next highest compared to each number in a data.frame. I have:
list <- c(3,6,9,12)
X <- c(1:10)
df <- data.frame(X)
And I would like to add a variable to df being the next highest number in the list. i.e:
X Y
1 3
2 3
3 3
4 6
5 6
6 6
7 9
8 9
9 9
10 12
I've tried:
df$Y <- which.min(abs(list-df$X))
but that gives an error message and would just get the closest value from the list, not the next above.
Another approach is to use findInterval:
df$Y <- list[findInterval(X, list, left.open=TRUE) + 1]
> df
X Y
1 1 3
2 2 3
3 3 3
4 4 6
5 5 6
6 6 6
7 7 9
8 8 9
9 9 9
10 10 12
You could do this...
df$Y <- sapply(df$X, function(x) min(list[list>=x]))
df
X Y
1 1 3
2 2 3
3 3 3
4 4 6
5 5 6
6 6 6
7 7 9
8 8 9
9 9 9
10 10 12

subset function in R with more than one conditions [duplicate]

I have this data.frame:
a <- c(rep("1", 3), rep("2", 3), rep("3",3), rep("4",3), rep("5",3))
b <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
df <-data.frame(a,b)
a b
1 1 1
2 1 2
3 1 3
4 2 4
5 2 5
6 2 6
7 3 7
8 3 8
9 3 9
10 4 10
11 4 11
12 4 12
13 5 13
14 5 14
15 5 15
I want to have something like this:
a <- c(rep("2", 3), rep("3", 3))
b <- c(4,5,6,7,8,9)
dffinal<-data.frame(a,b)
a b
1 2 4
2 2 5
3 2 6
4 3 7
5 3 8
6 3 9
I could use the "subset" function, but its not working
sub <- subset(df,c(2,3) == a )
a b
5 2 5
8 3 8
This command only takes one row of "2" and "3" in column "a".
Any Help?
You're confusing == with %in%:
subset(df, a %in% c(2,3))
# a b
# 4 2 4
# 5 2 5
# 6 2 6
# 7 3 7
# 8 3 8
# 9 3 9
what about this?
library(dplyr)
df %>% filter(a == 2 | a==3)
a b
1 2 4
2 2 5
3 2 6
4 3 7
5 3 8
6 3 9
We can use data.table. We convert the 'data.frame' to 'data.table' (setDT(df)), and set the 'key' as column 'a', then we subset the rows.
library(data.table)
setDT(df, key= 'a')[c('2','3')]
# a b
#1: 2 4
#2: 2 5
#3: 2 6
#4: 3 7
#5: 3 8
#6: 3 9

Eliminate in an increasing order rows in a data frame

Eliminate in an increasing order rows in a data frame
x<-c(4,5,6,23,5,6,7,8,0,3)
y<-c(2,4,5,6,23,5,6,7,8,0)
z<-c(1,2,4,5,6,23,5,6,7,8)
df<-data.frame(x,y,z)
df
x y z
1 4 2 1
2 5 4 2
3 6 5 4
4 23 6 5
5 5 23 6
6 6 5 23
7 7 6 5
8 8 7 6
9 0 8 7
10 3 0 8
I would like to eliminate number 23 in the df from all columns by instructing to sequentially increasingly remove a row per column (not by matching the value 23, but by its initial x location).
df
x y z
1 4 2 1
2 5 4 2
3 6 5 4
4 5 6 5
5 6 5 6
6 7 6 5
7 8 7 6
8 0 8 7
9 3 0 8
Thank you
You can iterate through the columns and remove the element from each, then reassemble as a data frame:
result <- as.data.frame(lapply(1:ncol(df), function(x) df[-(x+3),x]))
names(result) <- names(df)
result
## x y z
## 1 4 2 1
## 2 5 4 2
## 3 6 5 4
## 4 5 6 5
## 5 6 5 6
## 6 7 6 5
## 7 8 7 6
## 8 0 8 7
## 9 3 0 8
df[-(x+3),x] is the column with the value removed, by location. To start with row N in column x you would use df[-(x+N-1),x].
You could also try:
n <- 4
df1 <- df[-n,]
df1[] <- unlist(df,use.names=FALSE)[-seq(n, prod(dim(df)), by=nrow(df)+1)]
df1
# x y z
#1 4 2 1
#2 5 4 2
#3 6 5 4
#5 5 6 5
#6 6 5 6
#7 7 6 5
#8 8 7 6
#9 0 8 7
#10 3 0 8

R, Using reshape to pull pre post data

I have a simple data frame as follows
x = data.frame(id = seq(1,10),val = seq(1,10))
x
id val
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
I want to add 4 more columns. The first 2 are the previous two rows and the next two are the next two rows. For the first two rows and last two rows it needs to write out as NA.
How do I accomplish this using cast in the reshape package?
The final output would look like
1 1 NA NA 2 3
2 2 NA 1 3 4
3 3 1 2 4 5
4 4 2 3 5 6
... and so on...
Thanks much in advance
After your give the example , I change the solution
mat <- cbind(dat,
c(c(NA,NA),head(dat$id,-2)),
c(c(NA),head(dat$val,-1)),
c(tail(dat$id,-1),c(NA)),
c(tail(dat$val,-2),c(NA,NA)))
colnames(mat) <- c('id','val','idp','valp','idn','valn')
id val idp valp idn valn
1 1 1 NA NA 2 3
2 2 2 NA 1 3 4
3 3 3 1 2 4 5
4 4 4 2 3 5 6
5 5 5 3 4 6 7
6 6 6 4 5 7 8
7 7 7 5 6 8 9
8 8 8 6 7 9 10
9 9 9 7 8 10 NA
10 10 10 8 9 NA NA
Here is a soluting with sapply. First, choose the relative change for the new columns:
lags <- c(-2, -1, 1, 2)
Create the new columns:
newcols <- sapply(lags,
function(l) {
tmp <- seq.int(nrow(x)) + l;
x[replace(tmp, tmp < 1 | tmp > nrow(x), NA), "val"]})
Bind together:
cbind(x, newcols)
The result:
id val 1 2 3 4
1 1 1 NA NA 2 3
2 2 2 NA 1 3 4
3 3 3 1 2 4 5
4 4 4 2 3 5 6
5 5 5 3 4 6 7
6 6 6 4 5 7 8
7 7 7 5 6 8 9
8 8 8 6 7 9 10
9 9 9 7 8 10 NA
10 10 10 8 9 NA NA

Resources