I have a dataframe with a column consisting of lists. I wish to mutate to create a new value for each row containing the length of each of these lists, but I'm having trouble doing that
I've tried
df <- data.frame(a=1:3,b=I(list(1,1:2,1:3)))
df %>% mutate(len = length(b))
but this just sets len to the the number of rows in the dataframe (value of len is 3 for every row). Any help would be greatly appreciated!
Figured it out, use
df %>% mutate(len = lengths(b))
We can do unnest and create the new column
library(tidyverse)
df %>%
unnest %>%
group_by(a) %>%
mutate(c = n())
Related
Simple question. Considering the data frame below, I want to count distinct IDs: one for all records and one after filtering on status. However, the %>% doesn't seem to work here. I just want to have a single value as ouput (so for total this should be 10, for closed it should be 5), not a dataframe . Both # lines don't work
dat <- data.frame (ID = as.factor(c(1:10)),
status = as.factor(rep(c("open","closed"))))
total <- n_distinct(dat$ID)
#closed <- dat %>% filter(status == "closed") %>% n_distinct(dat$ID)
#closed <- dat %>% filter(status == "closed") %>% n_distinct(ID)
n_distinct expects a vector as input, you are passing a dataframe. You can do :
library(dplyr)
dat %>%
filter(status == "closed") %>%
summarise(n = n_distinct(ID))
# n
#1 5
Or without using filter :
dat %>% summarise(n = n_distinct(ID[status == "closed"]))
You can add %>% pull(n) to above if you want a vector back and not a dataframe.
An option with data.table
library(data.table)
setDT(dat)[status == "closed"][, .(n = uniqueN(ID))]
I have a data.frame() in R which contains 3 columns:
id<-c(12312, 12312, 12312, 48373, 345632, 223452)
id2<-c(1928277, 17665363, 8282922, 82827722, 1231233,12312333)
description<-c(Positive, Negative, Indetermined, Positive, Negative, Positive)
I want to delete the duplicated rows by id which in description have the value of Indetermined.
This seems like a probem for filter() so:
library(dplyr)
df %>%
mutate(count = 1) %>% # count all ids
group_by(id) %>%
mutate(count = sum(count),Duplicate = count>1) %>% # count how often each id occurs and mark duplicates
ungroup() %>%
filter(!Duplicate & description == "Indetermined") # filter out duplicates that are "indetermined"
Not the best approach, but this should do the trick.
(d <- tibble(id,id2,description))
d[!d$id %in% (d$id[d$description == "Indetermined"]),]
My tibble looks like this:
df = tibble(x = 1:3, col1 = matrix(rnorm(6), ncol = 2),
col2 = matrix(rnorm(6), ncol = 2))
it has three columns of which two contain a matrix with 2 columns each (in my case there are many more columns, this example is just to illustrate the problem). I transform this data to long format by using gather
gather(df, key, val, -x)
but this gives me not the desired result. It stacks only the first column of column 1 and column 2 and dismisses the rest. What I want is that val contains the row vectors of column 1 and column 2, i.e. val is a matrix valued column (containing 1x2 matrices). The tidyverse seems, however, not be able to deal with matrix-valued columns appropriately. Is there a way to achieve my desired result? (Ideally using the routines from tidyverse)
Some of the columns are matrix. It needs to be converted to proper data.frame columns and then would work
library(dplyr)
library(tidyr)
do.call(data.frame, df) %>%
pivot_longer(cols = -x)
Or use gather
do.call(data.frame, df) %>%
gather(key, val, -x)
Or another option is to convert the matrix to vector with c and then use unnest
df %>%
mutate_at(-1, ~ list(c(.))) %>%
unnest(c(col1, col2))
if the 'col1', 'col2', values would be in a single column
df %>%
mutate_at(-1, ~ list(c(.))) %>%
pivot_longer(cols = -x) %>%
unnest(c(value))
I have a list of record that I need to dedup, these look like a combination of the same set of, but using the regular functions to deduplicate records does not work because the two columns are not duplicates. Below is a reproducible example.
df <- data.frame( A = c("2","2","2","43","43","43","331","391","481","490","501","501","501","502","502","502"),
B = c("43","501","502","2","501","502","491","496","490","481","2","43","502","2","43","501"))
Below is the desired output that I'm looking for.
df_Final <- data.frame( A = c("2","2","2","331","391","481"),
B = c("43","501","502","491","496","490"))
I guess the idea is that you want to find when the elements in column A first appear in column B
idx = match(df$A, df$B)
and keep the row if the element in A isn't in B (is.na(idx)) or the element in A occurs before it's first occurrence in B (seq_along(idx) < idx)
df[is.na(idx) | seq_along(idx) < idx,]
Maybe a more-or-less literal tidyverse approach to this would be to create and then drop a temporary column
library(tidyverse)
df %>% mutate(idx = match(A, B)) %>%
filter(is.na(idx) | seq_along(idx) < idx) %>%
select(-idx)
You can remove all rows which would be duplicates under some reordering with
require(dplyr)
df %>%
apply(1, sort) %>% t %>%
data.frame %>%
group_by_all %>%
slice(1)
I have a data.frame where I assign each column.name a vector of variables:
dat1 <- data.frame(a=1:5,b=1:5,c=1:5)
I want to create a new data.frame but instead of assigning each column individually, I want to assign them all at once. For example, if I wanted to rename them all:
dat.new <- data.frame(paste(names(dat1),'1',sep='') = dat1)
This obviously doens't work. Is there a way to make it work?
I understand I can just rename using names(), but the scenario where this actually seems useful is if combining multiple data sets that share the same col.names (and in which I don't want to simply rbind):
dat1 <- data.frame(a=1:5,b=1:5,c=1:5)
dat2 <- data.frame(a=6:10,b=6:10,c=6:10)
dat.new <- data.frame(paste(names(dat1),'1',sep='') = dat1, paste(names(dat1),'2',sep='') = dat2)
library(dplyr)
library(tidyr)
library(magrittr)
Ok, here's the first part:
dat2 =
dat1 %>%
setNames(names(.) %>%
paste0("1") )
Here's the second part. The reshaping is a bit complex but more flexible, especially if you have row id's already with different amounts of rows:
list(dat1, dat2) %>%
bind_rows(.id = "number") %>%
group_by(number) %>%
mutate(id = 1:n()) %>%
gather(variable, value, -number, -id) %>%
unite(new_variable, variable, number) %>%
spread(new_variable, value)