I've searched longer than I'd like to admit for shifting leading NA's to the end.
Got close with a few stack questions "Cut out outer NAs in R","Rotate a Matrix in R","na.locf remove leading NAs, keep others [closed]" as well as looking over na.trim function in zoo package. Essentially I want to turn this:
D <- matrix(c(1:9), 3)
D[2,1]<- NA
D[3,1]<- NA
D[3,2]<- NA
D <- as.data.frame(D)
into this:
D1 <- data.frame(V1 = c(1,5,9),
V2 = c(4,8,NA),
V3 = c(7,NA,NA))
Any help is as always, much appreciated!
Thanks,
You can use sort(..., na.last = T) within row-wise apply:
as.data.frame(t(apply(D, 1, sort, na.last = T)))
# V1 V2 V3
#1 1 4 7
#2 5 8 NA
#3 9 NA NA
Update
To avoid ordering non-NA entries, you can do:
# Revised sample data
D <- matrix(c(1:9), 3)
D[2,1]<- NA
D[3,1]<- NA
D[3,2]<- NA
D <- as.data.frame(D)
D[2,2:3] <- c(8, 5);
D;
# V1 V2 V3
#1 1 4 7
#2 NA 8 5
#3 NA NA 9
as.data.frame(t(apply(D, 1, function(x) c(x[!is.na(x)], x[is.na(x)]))))
#V1 V2 V3
#1 1 4 7
#2 8 5 NA
#3 9 NA NA
Related
I have a dataframe
df= data.frame(a=c(56,23,15,10),
b=c(43,NA,90.7,30.5),
c=c(12,7,10,2),
d=c(1,2,3,4),
e=c(NA,45,2,NA))
I want to select two random non-NA row values from each row and convert the rest to NA
Required Output- will differ because of randomness
df= data.frame(
a=c(56,NA,15,NA),
b=c(43,NA,NA,NA),
c=c(NA,7,NA,2),
d=c(NA,NA,3,4),
e=c(NA,45,NA,NA))
Code Used
I know to select random non-NA value from specific rows
set.seed(2)
sample(which(!is.na(df[1,])),2)
But no idea how to apply it all dataframe and get the required output
You may write a function to keep n random values in a row.
keep_n_value <- function(x, n) {
x1 <- which(!is.na(x))
x[-sample(x1, n)] <- NA
x
}
Apply the function by row using base R -
set.seed(123)
df[] <- t(apply(df, 1, keep_n_value, 2))
df
# a b c d e
#1 NA NA 12 1 NA
#2 NA NA 7 2 NA
#3 NA 90.7 10 NA NA
#4 NA 30.5 NA 4 NA
Or if you prefer tidyverse -
purrr::pmap_df(df, ~keep_n_value(c(...), 2))
Base R:
You could try column wise apply (sapply) and randomly replace two non-NA values to be NA, like:
as.data.frame(sapply(df, function(x) replace(x, sample(which(!is.na(x)), 2), NA)))
Example Output:
a b c d e
1 56 NA 12 NA NA
2 23 NA NA 2 NA
3 NA NA 10 3 NA
4 NA 30.5 NA NA NA
One option using dplyr and purrr could be:
df %>%
mutate(pmap_dfr(across(everything()), ~ `[<-`(c(...), !seq_along(c(...)) %in% sample(which(!is.na(c(...))), 2), NA)))
a b c d e
1 56 43.0 NA NA NA
2 23 NA 7 NA NA
3 15 NA NA NA 2
4 NA 30.5 2 NA NA
I need to combine some named numeric vectors in R into a data frame. I tried cbind.na as suggestet in another question, but it would not take names into account. Example:
v1 <- c(1,5,6,7)
names(v1) <- c("milk", "flour", "eggs", "sugar")
v2 <- c(2,3)
names(v2) <- c("fish", "chips")
v3 <- c(5,7,4)
names(v3) <- c("chips", "milk", "sugar")
The data frame should look like this
v1 v2 v3
milk 1 NA 7
flour 5 NA NA
eggs 6 NA NA
sugar 7 NA 4
fish NA 2 NA
chips NA 3 5
I can't figure out how to solve this in R.
This is a join, best done with data.table or other add-ins, but (especially for smallish arrays) can readily be performed in base R by creating an array of all the names and using it to index into the input arrays:
s <- unique(names(c(v1,v2,v3)))
x <- cbind(v1=v1[s], v2=v2[s], v3=v3[s])
rownames(x) <- s
print(x)
v1 v2 v3
milk 1 NA 7
flour 5 NA NA
eggs 6 NA NA
sugar 7 NA 4
fish NA 2 NA
chips NA 3 5
# get vectors into one list
v <- mget(paste0('v', 1:3))
# convert vectors to data frames
l <- lapply(v, stack)
# merge them all sequentially
out <- Reduce(function(x, y) merge(x, y, by = 'ind', all = T), l)
# name the columns according to the original vector names
setNames(out, c('ind', names(v)))
# ind v1 v2 v3
# 1 milk 1 NA 7
# 2 flour 5 NA NA
# 3 eggs 6 NA NA
# 4 sugar 7 NA 4
# 5 fish NA 2 NA
# 6 chips NA 3 5
Edit:
I think this is worse than whuber's solution because it requires creating a bunch of intermediate tables, both in the lapply step and in the Reduce step. Haven't done any benchmarks though.
I would like to convert the following list into a dataframe.
test <- list(list("a",c("b","c","d"),character(0)),list("j",c("r","s"),character(0)),list("h",character(0),"i"))
I tried the following:
df.test <- do.call(rbind,Map(data.frame, V1=sapply(test, "[[", 1),V2=sapply(test, "[[", 2),V3=sapply(test, "[[", 3)))
But this doesn't work with nested lists containing character(0). A satisfactory output looks something like this:
V1 V2 V3
1 a b NA
2 a c NA
3 a d NA
4 j r NA
5 j s NA
6 h NA i
Many thanks in advance.
library(tidyverse)
test %>%
map_df(~.x %>%
map(~if(length(.)) . else NA) %>%
do.call(what = cbind) %>%
as_tibble)
Gives
V1 V2 V3
<chr> <chr> <chr>
1 a b NA
2 a c NA
3 a d NA
4 j r NA
5 j s NA
6 h NA i
I have the following question:
I have a list (L1) with two parts and each 4 identical variables.
The variable 4 is also the name of the part of the list. e.g. $a = a
a <- data.frame(V1=c("a","b","c"), V2=c(4,7,9), V3=1:3, V4=c("a","a","a"))
b <- data.frame(V1=c("d","e","f"), V2=c(10,14,16), V3=1:3, V4=c("b","b","b"))
L1 <- list(a=a, b=b)
L1
$a
V1 V2 V3 V4
a 4 1 a
b 7 2 a
c 9 3 a
$b
V1 V2 V3 V4
d 10 1 b
e 14 2 b
f 16 3 b
I would like to extract the rows of each part of the list with V3==2. If there is no row in the list with this value V1 to V3 should be extracted with NA and V4 should contain the name of the part of the list.
In the example the outcome should look like this:
V1 V2 V3 V4
b 7 2 a
e 14 2 b
If I select a value e.g. V3==4 then my result should look like this:
V1 V2 V3 V4
<NA> <NA> <NA> a
<NA> <NA> <NA> b
I can extract a column with
unlist(lapply(L1, "[",3)) but I can't figure out how to extract rows which have a certain value in a variable.
I also tried to combine lapply with the subset function, but this didn't work for me.
Thank's for your help!
This should work. The first command returns a list, the second one converts it to a data frame. If the value is not in the data, it returns NA (for the list) or a row of NAs (for the df).
l <- lapply(L1, function(x) {i <- which(x$V3 == 2)
if (length(i) > 0) x[i, ]
else NA })
df <- rbind(l[[1]], l[[2]])
We could create a function using data.table. We rbind the list elements with rbindlist, grouped by 'V4', if the 'V3' is not equal to the given value, we return the NA elements (.SD[.N+1]) or else return the Subset of Data.table (.SD[tmp]).
library(data.table)
f1 <- function(lst, val){
rbindlist(lst)[, {tmp <- V3==val
if(!any(tmp)) .SD[.N+1]
else .SD[tmp]},
by = V4][, names(lst[[1]]), with=FALSE]
}
f1(L1, 4)
# V1 V2 V3 V4
#1: NA NA NA a
#2: NA NA NA b
f1(L1, 3)
# V1 V2 V3 V4
#1: c 9 3 a
#2: f 16 3 b
f1(L1, 2)
# V1 V2 V3 V4
#1: b 7 2 a
#2: e 14 2 b
You can also bind_rows with dplyr
list(a = a, b = b) %>%
bind_rows(.id = "source") %>%
filter(V2 == 2)
I would like to merge multiple data.frame in R using row.names, doing a full outer join. For this I was hoping to do the following:
x = as.data.frame(t(data.frame(a=10, b=13, c=14)))
y = as.data.frame(t(data.frame(a=1, b=2)))
z = as.data.frame(t(data.frame(a=3, b=4, c=3, d=11)))
res = Reduce(function(a,b) merge(a,b,by="row.names",all=T), list(x,y,z))
Warning message:
In merge.data.frame(a, b, by = "row.names", all = T) :
column name ‘Row.names’ is duplicated in the result
> res
Row.names Row.names V1.x V1.y V1
1 1 a 10 1 NA
2 2 b 13 2 NA
3 3 c 14 NA NA
4 a <NA> NA NA 3
5 b <NA> NA NA 4
6 c <NA> NA NA 3
7 d <NA> NA NA 11
What I was hoping to get would be:
V1 V2 V3
a 10 1 3
b 13 2 4
c 14 NA 3
d NA NA 11
The following works (up to some final column renaming):
res <- Reduce(function(a,b){
ans <- merge(a,b,by="row.names",all=T)
row.names(ans) <- ans[,"Row.names"]
ans[,!names(ans) %in% "Row.names"]
}, list(x,y,z))
Indeed:
> res
V1.x V1.y V1
a 10 1 3
b 13 2 4
c 14 NA 3
d NA NA 11
What happens with a row join is that a column with the original rownames is added in the answer, which in turn does not contain row names:
> merge(x,y,by="row.names",all=T)
Row.names V1.x V1.y
1 a 10 1
2 b 13 2
3 c 14 NA
This behavior is documented in ?merge (under Value)
If the matching involved row names, an extra character column called
Row.names is added at the left, and in all cases the result has
‘automatic’ row names.
When Reduce tries to merge again, it doesn't find any match unless the names are cleaned up manually.
For continuity, this is not a clean solution but a workaround, I transform the list argument of 'Reduce' using sapply.
Reduce(function(a,b) merge(a,b,by=0,all=T),
sapply(list(x,y,z),rbind))[,-c(1,2)]
x y.x y.y
1 10 1 3
2 13 2 4
3 14 NA 3
4 NA NA 11
Warning message:
In merge.data.frame(a, b, by = 0, all = T) :
column name ‘Row.names’ is duplicated in the result
For some reason I did not have much success with Reduce. given a list of data.frames (df.lst) and a list of suffixes (suff.lst) to change the names of identical columns, this is my solution (it's loop, I know it's ugly for R standards, but it works):
df.merg <- as.data.frame(df.lst[1])
colnames(df.merg)[-1] <- paste(colnames(df.merg)[-1],suff.lst[[1]],sep="")
for (i in 2:length(df.lst)) {
df.i <- as.data.frame(df.lst[i])
colnames(df.i)[-1] <- paste(colnames(df.i)[-1],suff.lst[[i]],sep="")
df.merg <- merge(df.merg, df.i, by.x="",by.y="", all=T)
}