Create NA's based on another dataframe without long data

Create NA's based on another dataframe without long data - r

I have a tibble with the explicit "id" and colnames I need to convert to NA's. Is there anyway I can create the NA's without making my df a long dataset? I considered using the new rows_update function, but I'm not sure if this is correct because I only want certain columns to be NA.
library(dplyr)
to_na <- tribble(~x, ~col,
1, "z",
3, "y"
)
df <- tibble(x = c(1,2,3),
y = c(1,1,1),
z = c(2,2,2))
# desired output:
#> # A tibble: 3 x 3
#> x y z
#> <dbl> <dbl> <dbl>
#> 1 1 1 NA
#> 2 2 1 2
#> 3 3 NA 2
Created on 2020-07-03 by the reprex package (v0.3.0)

This definitely isn't the most elegant solution, but it gets the output you want.
library(dplyr)
library(purrr)
to_na <- tribble(~x, ~col,
1, "z",
3, "y"
)
df <- tibble(x = c(1,2,3),
y = c(1,1,1),
z = c(2,2,2))
map2(to_na$x, to_na$col, #Pass through these two objects in parallel
function(xval_to_missing, col) df %>% #Two objects above matched by position here.
mutate_at(col, #mutate_at the specified cols
~if_else(x == xval_to_missing, NA_real_, .) #if x == xval_to_missing, make NA, else keep as is.
) %>%
select(x, col) #keep x and the modified column.
) %>% #end of map2
reduce(left_join, by = "x") %>% #merge within the above list, by x.
relocate(x, y, z) #Keep your ordering
Output:
# A tibble: 3 x 3
x y z
<dbl> <dbl> <dbl>
1 1 1 NA
2 2 1 2
3 3 NA 2

We can use row/column indexing to assign the values to NA in base R
df <- as.data.frame(df)
df[cbind(to_na$x, match(to_na$col, names(df)))] <- NA
df
# x y z
#1 1 1 NA
#2 2 1 2
#3 3 NA 2
If we want to use rows_update
library(dplyr)
library(tidyr)
library(purrr)
lst1 <- to_na %>%
mutate(new = NA_real_) %>%
split(seq_len(nrow(.))) %>%
map(~ .x %>%
pivot_wider(names_from = col, values_from = new))
for(i in seq_along(lst1)) df <- rows_update(df, lst1[[i]])
df
# A tibble: 3 x 3
# x y z
# <dbl> <dbl> <dbl>
#1 1 1 NA
#2 2 1 2
#3 3 NA 2

Related

Simultaneous Count and Sort in R

I am trying to obtain counts of a certain categorical variable in 2 separate columns, with each column reflecting the presence or an absence of an indicator variable. This is for a very large data frame. Here is an example data frame to further illustrate what I'm trying to do.
X <- (1:10)
Y <- c('a','b','a','c','b','b','a','a','c','c')
Z <- c(0,1,1,1,0,1,0,1,1,1)
test_df <- data.frame(X,Y,Z)
I would like to make a new DF grouped by 'a','b', and 'c' with 2 columns to the right, one with counts of the letter for Z==1 and the a count of that letter for Z==0.

The dplyr way:
library(dplyr)
library(tidyr)
#Code
res <- test_df %>% group_by(Y,Z) %>% summarise(N=n()) %>%
pivot_wider(names_from = Z,values_from=N,
values_fill = 0)
Output:
# A tibble: 3 x 3
# Groups: Y [3]
Y `0` `1`
<chr> <int> <int>
1 a 2 2
2 b 1 2
3 c 0 3

We can use values_fn in pivot_wider to do this in a single step
library(dplyr)
library(tidyr)
test_df %>%
pivot_wider(names_from = Z, values_from = X,
values_fn = length, values_fill = 0)
# A tibble: 3 x 3
# Y `0` `1`
# <chr> <int> <int>
#1 a 2 2
#2 b 1 2
#3 c 0 3

A base R option using aggregate + reshape
replace(
u <- reshape(
aggregate(X ~ ., test_df, length),
idvar = "Y",
timevar = "Z",
direction = "wide"
),
is.na(u),
0
)
giving
Y X.0 X.1
1 a 2 2
2 b 1 2
5 c 0 3

One way with data.table:
library(data.table)
setDT(test_df)
test_df[ , z1 := sum(Z==1), by=Y]
test_df[ , z0 := sum(Z==0), by=Y]

In base R you can use table :
table(test_df$Y, test_df$Z)
# 0 1
# a 2 2
# b 1 2
# c 0 3

R: How to insert a row in Dataframe starting at a certain column?

I have the following data frame:
df <- tibble(x = 1:3, y = 3:1, z = 4:6, a = 6:4, b = 7:9)
I now need to extract the values from the second row, third to fifth column with this command:
newrow <- df[2,3:5]
I now want to insert a new row after the second row. The problem is that I need the new row to start at column 2. If I use the following code, the row will be added at the same column positions as I extracted it from:
df%>% add_row(newrow, .before = 3)
Hope anybody can help with this, any help is much appreciated.

Your newrow dataframe has the colnames from coluns 3:5 (z,a,b). Therefore add_row()matches the newrow to these columns.
You need to rename the columns of newrow with the first three column names.
df%>% add_row(setNames(newrow, names(df)[1:ncol(newrow)]),
.before = 3)

I'm not sure exactly what you're desired outcome is but does this achieve what you want?
library(tibble)
library(dplyr)
df <- tibble::tibble(x = 1:3, y = 3:1, z = 4:6, a = 6:4, b = 7:9)
whatrow <- 2
whatcolumns <- 3:5
beforerow <- 3
newdf <-
slice(df, whatrow) %>%
select(all_of(whatcolumns)) %>%
setNames(., names(df)[whatcolumns - 1]) %>%
add_row(df, ., .before = beforerow)
newdf
#> # A tibble: 4 x 5
#> x y z a b
#> <int> <int> <int> <int> <int>
#> 1 1 3 4 6 7
#> 2 2 2 5 5 8
#> 3 NA 5 5 8 NA
#> 4 3 1 6 4 9

Replacing value depending on paired column

I have a dataframe with two columns per sample (n > 1000 samples):
df <- data.frame(
"sample1.a" = 1:5, "sample1.b" = 2,
"sample2.a" = 2:6, "sample2.b" = c(1, 3, 3, 3, 3),
"sample3.a" = 3:7, "sample3.b" = 2)
If there is a zero in column .b, the correspsonding value in column .a should be set to NA.
I thought to write a function over colnames (without suffix) to filter each pair of columns and conditional exchaning values. Is there a simpler approach based on tidyverse?

We can split the data.frame into a list of data.frames and do the replacement in base R
df1 <- do.call(cbind, lapply(split.default(df,
sub("\\..*", "", names(df))), function(x) {
x[,1][x[2] == 0] <- NA
x}))
Or another option is Map
acols <- endsWith(names(df), "a")
bcols <- endsWith(names(df), "b")
df[acols] <- Map(function(x, y) replace(x, y == 0, NA), df[acols], df[bcols])
Or if the columns are alternate with 'a', 'b' columns, use a logical index for recycling, create the logical matrix with 'b' columns and assign the corresponding values in 'a' columns to NA
df[c(TRUE, FALSE)][df[c(FALSE, TRUE)] == 0] <- NA
or an option with tidyverse by reshaping into 'long' format (pivot_longer), changing the 'a' column to NA if there is a correspoinding 0 in 'a', and reshape back to 'wide' format with pivot_wider
library(dplyr)
library(tidyr)
df %>%
mutate(rn = row_number()) %>%
pivot_longer(cols = -rn, names_sep="\\.",
names_to = c('group', '.value')) %>%
mutate(a = na_if(b, a == 0)) %>%
pivot_wider(names_from = group, values_from = c(a, b)) %>%
select(-rn)
# A tibble: 5 x 6
# a_sample1 a_sample2 a_sample3 b_sample1 b_sample2 b_sample3
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 2 1 2 2 1 2
#2 2 3 2 2 3 2
#3 2 3 2 2 3 2
#4 2 3 2 2 3 2
#5 2 3 2 2 3 2

Group data by factor level, then transform to data frame with colname being levels?

There is my problem that I can't solve it:
Data:
df <- data.frame(f1=c("a", "a", "b", "b", "c", "c", "c"),
v1=c(10, 11, 4, 5, 0, 1, 2))
data.frame:f1 is factor
f1 v1
a 10
a 11
b 4
b 5
c 0
c 1
c 2
# What I want is:(for example, fetch data with the number of element of some level == 2, then to data.frame)
a b
10 4
11 5
Thanks in advance!

I might be missing something simple here , but the below approach using dplyr works.
library(dplyr)
nlevels = 2
df1 <- df %>%
add_count(f1) %>%
filter(n == nlevels) %>%
select(-n) %>%
mutate(rn = row_number()) %>%
spread(f1, v1) %>%
select(-rn)
This gives
# a b
# <int> <int>
#1 10 NA
#2 11 NA
#3 NA 4
#4 NA 5
Now, if you want to remove NA's we can do
do.call("cbind.data.frame", lapply(df1, function(x) x[!is.na(x)]))
# a b
#1 10 4
#2 11 5
As we have filtered the dataframe which has only nlevels observations, we would have same number of rows for each column in the final dataframe.

split might be useful here to split df$v1 into parts corresponding to df$f1. Since you are always extracting equal length chunks, it can then simply be combined back to a data.frame:
spl <- split(df$v1, df$f1)
data.frame(spl[lengths(spl)==2])
# a b
#1 10 4
#2 11 5
Or do it all in one call by combining this with Filter:
data.frame(Filter(function(x) length(x)==2, split(df$v1, df$f1)))
# a b
#1 10 4
#2 11 5

Here is a solution using unstack :
unstack(
droplevels(df[ave(df$v1, df$f1, FUN = function(x) length(x) == 2)==1,]),
v1 ~ f1)
# a b
# 1 10 4
# 2 11 5
A variant, similar to #thelatemail's solution :
data.frame(Filter(function(x) length(x) == 2, unstack(df,v1 ~ f1)))
My tidyverse solution would be:
library(tidyverse)
df %>%
group_by(f1) %>%
filter(n() == 2) %>%
mutate(i = row_number()) %>%
spread(f1, v1) %>%
select(-i)
# # A tibble: 2 x 2
# a b
# * <dbl> <dbl>
# 1 10 4
# 2 11 5
or mixing approaches :
as_tibble(keep(unstack(df,v1 ~ f1), ~length(.x) == 2))

Using all base functions (but you should use tidyverse)
# Add count of instances
x$len <- ave(x$v1, x$f1, FUN = length)
# Filter, drop the count
x <- x[x$len==2, c('f1','v1')]
# Hacky pivot
result <- data.frame(
lapply(unique(x$f1), FUN = function(y) x$v1[x$f1==y])
)
colnames(result) <- unique(x$f1)
> result
a b
1 10 4
2 11 5

I'd like code this, may it helps for you
library(reshape2)
library(dplyr)
aa = data.frame(v1=c('a','a','b','b','c','c','c'),f1=c(10,11,4,5,0,1,2))
cc = aa %>% group_by(v1) %>% summarise(id = length((v1)))
dd= merge(aa,cc) #get the level
ee = dd[dd$aa==2,] #select number of level equal to 2
ee$id = rep(c(1,2),nrow(ee)/2) # reset index like (1,2,1,2)
dcast(ee, id~v1,value.var = 'f1')
all done!

R: Extract columns from list of data.frames in a tibble

I am wondering how to manipulate a list containing data.frames stored in a tibble.
Specifically, I would like to extract two columns from a data.frame that are stored in a tibble list column.
I would like to go from this tibble c
random_data<-list(a=letters[1:10],b=LETTERS[1:10])
x<-as.data.frame(random_data, stringsAsFactors=FALSE)
y<-list()
y[[1]]<-x[1,,drop=FALSE]
y[[3]]<-x[2,,drop=FALSE]
c<-tibble(z=c(1,2,3),my_data=y)
to this tibble d
d<-tibble(z=c(1,2,3),a=c('a',NA,'b'),b=c('A',NA,'B'))
thanks
Iain

c2 is the final output.
library(tidyverse)
c2 <- c %>%
filter(!map_lgl(my_data, is.null)) %>%
unnest() %>%
right_join(c, by = "z") %>%
select(-my_data)

You could create a function f to change out the NULL values, then apply it to the my_data column and finish with unnest.
library(dplyr); library(tidyr)
unnest(mutate(c, my_data = lapply(my_data, f)))
# # A tibble: 3 x 3
# z a b
# <dbl> <chr> <chr>
# 1 1 a A
# 2 2 <NA> <NA>
# 3 3 b B
Where f is a helper function to change out the NULL values, and is defined as
f <- function(x) {
if(is.null(x)) data.frame(a = NA, b = NA) else x
}

I think this does the trick with d the requested tibble:
library(dplyr)
new.y <- lapply(y, function(x) if(is.null(x)) data.frame(a = NA, b = NA) else x)
d <- cbind(z = c(1, 2, 3), bind_rows(new.y)) %>% tbl_df()
# # A tibble: 3 x 3
# z a b
# <dbl> <fctr> <fctr>
# 1 1 a A
# 2 2 NA NA
# 3 3 b B

Do you know your column names ahead of time?
extract_column <- function( d, column_name ) {
if( is.null(d) ) {
NA_character_
} else {
as.character(d[[column_name]])
}
}
cc %>%
dplyr::mutate(
a = purrr::map_chr(.$my_data, extract_column, column_name="a"),
b = purrr::map_chr(.$my_data, extract_column, column_name="b")
) %>%
dplyr::select(-my_data)
(I renamed your c tibble to cc so it can't collide with c().)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Create NA's based on another dataframe without long data - r

Related

Simultaneous Count and Sort in R

R: How to insert a row in Dataframe starting at a certain column?

Replacing value depending on paired column

Group data by factor level, then transform to data frame with colname being levels?

R: Extract columns from list of data.frames in a tibble

Categories

Resources