Replace NA in a variable by their value with other observation - r

I have a dataframe like this one :
df <- data.frame(A = c(1, 2, 3, 4, 2, 2, 1, 5, 3),
B = c("a", "b", "c", "d", NA, "b", NA, NA, NA ))
I want ro remplace this dataframe by the vlue recuperated in the other observation.
For example, in the variable A, for 1 correspond "a" in the variable B; so NA should be remplaced by a.
But for 5, we can't conclude so I keep NA.
How could I do this, I'm stuck.
Thank you.

You could try
df$B <- with(df, ave(as.character(B), A, FUN= function(x)
ifelse(is.na(x), na.omit(x), x)))
Or using data.table
library(data.table)
setDT(df)[ ,B:=ifelse(is.na(B), na.omit(B), B) , A]
Or a variant would be
setDT(df)[,B:=if(any(is.na(B))) unique(na.omit(B)), A][]

Related

How to apply a function to a data.table subset by multiple columns in R?

I have a data table with counts for changes for multiple groups. For example:
input <- data.table(from = c("A", "A", "A", "B", "B", "B", "A", "A", "A", "B", "B", "B"),
to = c(letters[1:6], letters[1:6]),
from_N = c(100, 100, 100, 50, 50, 50, 60, 60 ,60, 80, 80, 80),
to_N = c(10, 20, 40, 5, 5, 15, 10, 5, 10, 20, 5, 10),
group = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2))
How can I calculate the total for each change across groups? I can do this using a for loop, for example:
out <- list()
for (i in 1:length(unique(input$from))){
sub <- input[from == unique(input$from)[i]]
out2 <- list()
for (j in 1:length(unique(sub$to))){
sub2 <- sub[to == unique(sub$to)[j]]
out2[[j]] <- data.table(from = sub2$from[1],
to = sub2$to[1],
from_N = sum(sub2$from_N),
to_N = sum(sub2$to_N))
print(unique(sub$to)[j])
}
out[[i]] <- do.call("rbind", out2)
print(unique(input$from)[i])
}
output <- do.call("rbind", out)
However, the data table I need to apply this to is very large, and I therefore need to maximise performance. Is there a data.table method? Any help will be greatly appreciated!
Perhaps I've overlooked something, but it seems you're just after:
library(data.table)
setDT(input)[, .(from_N = sum(from_N), to_N = sum(to_N)), by = .(from, to)]
Output:
from to from_N to_N
1: A a 160 20
2: A b 160 25
3: A c 160 50
4: B d 130 25
5: B e 130 10
6: B f 130 25
An option with dplyr
library(dplyr)
input %>%
group_by(from, to) %>%
summarise_at(vars(ends_with('_N')), sum)
Or in data.table
library(data.table)
setDT(input)[, lapply(.SD, sum), by = .(from, to), .SDcols = patterns('_N$')]

Using matplot in R whenever certain column changes

Sorry in advance because I am new at asking questions here and don't know how to input this table properly.
Say I have a data frame in R constructed like:
team = c("A", "A", "A", "B", "B", "B", "C", "C", "C")
value = c(1, 2, 3, 4, 5, 6, 7, 8, 9)
m = cbind(team, value)
I want to create a plot that will give me 3 lines graphing the values for teams A, B, and C. I believe I can do this inputting the matrix m into matplot somehow, but I'm not sure how.
EDIT: I've gotten a lot closer to solving my problem. However I've realized that for some reason, with the code I have, "Value" is a list of 745 which matches the number of rows in my dataframe m. However when I unlist(Value) it turns into a numeric of length 894. Any ideas on why this would happen?
You can try something like this:
team = c("A", "A", "A", "B", "B", "B", "C", "C", "C")
value = c(1, 2, 3, 4, 5, 6, 7, 8, 9)
m = cbind.data.frame(team, value)
library(ggplot2)
ggplot(m, aes(x=as.factor(1:nrow(m)), y=value, group=team, col=team)) +
geom_line(lwd=2) + xlab('index')
if you have same number of ordered values for each team, you could use matplot to visualize them. but the data should be converted to matrix first;
m = cbind.data.frame(team, value, index = rep(1:3, 3))
m <- reshape(m, v.names = 'value', idvar = 'team', direction = 'wide', timevar = 'index')
matplot(t(m[, 2:4]), type = 'l', lty = 1)
legend('top', legend = m[, 1], lty = 1, col = 1:3)

count non-NA values and group by variable

I am trying to show how many complete observations there are per variabie ID without using the complete.cases package or any other package.
If I use na.omit to filter out the NA values, I will lose all of the IDs which might have ZERO complete cases.
In the end, I'd like a frequency table with two columns: ID and Number of Complete Observations
> length(unique(data$ID))
[1] 332
> head(data)
ID value
1 1 NA
2 1 NA
3 1 NA
4 1 NA
5 1 NA
6 1 NA
> dim(data)
[1] 772087 2
When I try to create my own function z - which counts non-NA values and apply that in the aggregate() function, the IDs with zero complete observations are left out. I should be left with 332 rows, not 323. How does one resolve this using base functions?
z <- function(x){
sum(!is.na(x))
}
aggregate(value ~ ID, data = data , FUN = "z")
> nrow(aggregate(isna ~ ID, data = data , FUN = "z"))
[1] 323
One of the ways to do this is using table:
df2 <- table(df$Id, !is.na(df$value))[,2]
data.frame(ID = names(df2), value = df2)
Data
structure(list(Id = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 4, 4), value = c(NA,
1, 1, 2, 2, NA, 3, NA, 3, 3, 4, 4)), .Names = c("Id", "value"
), row.names = c(NA, -12L), class = "data.frame")
Base R you can use your utility function like this:
stack(by(data$value, data$ID, FUN=function(x) sum(!is.na(x))))
you can directly use table for this purpose. Below is the sample code:
df1 <- structure(list(Id = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 4, 4), value = c(2,
1, 1, NA, NA, NA, 3, NA, 3, 3, 4, 4)), .Names = c("Id", "value"
), row.names = c(NA, -12L), class = "data.frame")
df2 <- as.data.frame.matrix(with(df1, table(Id, value)))
resultDf <- data.frame(Id=row.names(df2), count=apply(df2, 1, sum))
resultDf
The code makes a table of id and value. Then it just sums the non-na values from the table. Hope this is easy to understand and helps.

Conditional displaying values in R

I'd like to see which values have a particular entry issue, but I'm not getting things done right.
For instance, I need to print on screen values from column "c" but conditional of a given value from "b" say where [b==0].
Finally, I need to add a new string for those whose condition is true.
df<- structure(list(a = c(11.77, 10.9, 10.32, 10.96, 9.906, 10.7,
11.43, 11.41, 10.48512, 11.19), b = c(2, 3, 2, 0, 0, 0, 1, 2,
4, 0), c = c("q", "c", "v", "f", "", "e", "e", "v", "a", "c")), .Names = c("a",
"b", "c"), row.names = c(NA, -10L), class = "data.frame")
I tried this without success:
if(df[b]==0){
print(df$c)
}
if((df[b]==0)&(df[c]=="v")){
df[c] <-paste("2")
}
Thanks for helping.
The correct syntax is like df[rows, columns], so you could try:
df[df$b==0, "c"]
You can accomplish changing values using ifelse:
df$c <- ifelse(df$b==0 & df$c=="v", paste(df$c, 2, sep=""), df$c)
Does this help?
rows <- which(df$b==0)
if (length(rows)>0) {
print(df$c[rows])
df$c[rows] <- paste(df$c[rows],'2')
## maybe you wanted to have:
# df$c[rows] <- '2'
}
There are several ways to subset data in R, like e.g.:
df$c[df$b == 0]
df[df$b == 0, "c"]
subset(df, b == 0, c)
with(df, c[b == 0])
# ...
To conditionally add another column (here: TRUE/FALSE):
df$e <- FALSE; df$e[df$b == 0] <- TRUE
df <- transform(df, c = ifelse(b == 0, TRUE, FALSE))
df <- within(df, e <- ifelse(b == 0, TRUE, FALSE))
# ...

Merging columns and removing NA

I have a data frame:
A<- c(NA, 1, 2, NA, 3, NA)
R<- c(2, 1, 2, 1, NA, 1)
C<- c(rep ("B",3), rep ("D", 3))
data1<-data.frame (A,R,C)
data1
And I wan to merge column A and R, to have a data frame like data2
AR<- c(2, 1, 2, 1, 3, 1)
C<- c(rep ("B",3), rep ("D", 3))
data2<-data.frame (AR,C)
data2
Do you know how I can do that?
You might want to consider what happens if "A" and "R" have different values, but this should work:
data2 <- with(data1, data.frame(AR=ifelse(is.na(A), R, A), C=C))

Resources