I would like to create a new column based condition below:
if the `str` column only contains `A` then insert `A`
if the `str` column only contains `B` then insert `B`
if the `str` column only contains `A` and `B` then insert `AB`
df<-read.table(text="
ID str
1 A
1 A
1 AA
1 ABB
2 BA
2 BB", header=T)
ID str simplify_str
1 A A
1 A A
1 AA A
1 ABB AB
2 BA AB
2 BB B
As far as tidyverse options are concerned, you could use dplyr::case_when with stringr::str_detect
library(dplyr)
library(stringr)
df %>%
mutate(simplify_str = case_when(
str_detect(str, "^A+$") ~ "A",
str_detect(str, "^B+$") ~ "B",
TRUE ~ "AB"))
# ID str simplify_str
#1 1 A A
#2 1 A A
#3 1 AA A
#4 1 ABB AB
#5 2 BA AB
#6 2 BB B
Using your data.frame:
As <- grep("A",df$str)
Bs <- grep("B",df$str)
df$simplify_str <- ""
df$simplify_str[As] <- paste0(df$simplify_str[As],"A")
df$simplify_str[Bs] <- paste0(df$simplify_str[Bs],"B")
df
ID str simplify_str
1 1 A A
2 1 A A
3 1 AA A
4 1 ABB AB
5 2 BA AB
6 2 BB B
A general solution in base R where it splits the string and pastes together the unique characters in a sorted way.
df$simplify_str <- sapply(strsplit(as.character(df$str), ""),
function(x) paste(unique(sort(x)), collapse = ""))
df
# ID str simplify_str
#1 1 A A
#2 1 A A
#3 1 AA A
#4 1 ABB AB
#5 2 BA AB
#6 2 BB B
Related
i have problem with add some records table with particular condition.
for example, i have this kind of table
id word count
1 1 aa 2
2 2 bb 3
then, i want to change and add some number in id column with similar data for other column like this
id word count
1 100 aa 2
2 101 aa 2
3 102 aa 2
4 103 aa 2
5 200 bb 3
6 201 bb 3
7 202 bb 3
8 203 bb 3
the id column need to add with 2 digits in behind and then add recursive number after without changing other column data. Supposed that i have thousand records, i wonder how to make this happen.
It is not entirely clear from the description. Based on the expected output, an option is to create a list column by looping over the 'id', get the sequence after multiplying by '4' and then unnest the list column
library(dplyr)
library(purrr)
library(tidyr)
df1 %>%
mutate(id = map(id*100, seq, length.out = 4)) %>%
unnest(c(id))
# A tibble: 8 x 3
# id word count
# <dbl> <chr> <int>
#1 100 aa 2
#2 101 aa 2
#3 102 aa 2
#4 103 aa 2
#5 200 bb 3
#6 201 bb 3
#7 202 bb 3
#8 203 bb 3
Or another option is to replicate the rows (uncount), grouped by 'word', modify the 'id'
df1 %>%
uncount(4) %>%
group_by(word) %>%
mutate(id = seq(100 * first(id), length.out = n()))
data
df1 <- structure(list(id = 1:2, word = c("aa", "bb"), count = 2:3),
class = "data.frame", row.names = c("1",
"2"))
Try the following base R function.
It loops (lapply) over column 'id' creating a vector like the one in the question and then putting the other columns in order in a data.frame, then combines (rbind) all these df's into the return value.
fun <- function(x, n = 3){
cols <- grep('id', names(x), invert = TRUE)
out <- lapply(x[['id']], function(i){
y <- sprintf(paste0(i, "%02d"), c(0L, seq.int(n)))
y <- data.frame(id = y)
for(j in cols) y[[j]] <- x[i, j]
y
})
out <- do.call(rbind, out)
row.names(out) <- NULL
out
}
fun(df1)
# id V2 V3
#1 100 aa 2
#2 101 aa 2
#3 102 aa 2
#4 103 aa 2
#5 200 bb 3
#6 201 bb 3
#7 202 bb 3
#8 203 bb 3
Data
df1 <- read.table(text = "
id word count
1 1 aa 2
2 2 bb 3
", header = TRUE)
I want to count the number of columns for each row by condition on character and missing.
For example, I have this dataset, test.
I want to create num columns, counting the number of columns 'not' in missing or empty value.
a<-c("aa","bb","cc","dd","",NA)
b<-c("",NA,"aa","","","dd")
c<-c("aa","",NA,NA,"cc","dd")
d<-c("aa","bb","",NA,"cc","dd")
test<-data.frame(cbind(a,b,c,d))
a b c d
1 aa aa aa
2 bb <NA> bb
3 cc aa <NA>
4 dd <NA> <NA>
5 cc cc
6 <NA> dd dd dd
I want to count the number of columns containing NA and empty value like
a b c d num
1 aa aa aa 3
2 bb <NA> bb 2
3 cc aa <NA> 2
4 dd <NA> <NA> 1
5 cc cc 2
6 <NA> dd dd dd 3
I tried some approach in other posts, like rowSums
Count number of columns by a condition (>) for each row
> test$num<-rowSums(test!=c("",NA),na.rm=T)
> test
a b c d num
1 aa aa aa 3
2 bb <NA> bb 0
3 cc aa <NA> 2
4 dd <NA> <NA> 0
5 cc cc 2
6 <NA> dd dd dd 0
However, it returns wrong numbers, and I couldn't find the reasons.
Would you let me know how to solve this problem?
You can use nchar + rowSums
test$num <- rowSums(nchar(as.matrix(test))>1,na.rm = TRUE)
or %in% + rowSums
test$num <- rowSums(`dim<-`(!as.matrix(test) %in% c("",NA),dim(test)))
such that
> test
a b c d num
1 aa aa aa 3
2 bb <NA> bb 2
3 cc aa <NA> 2
4 dd <NA> <NA> 1
5 cc cc 2
6 <NA> dd dd dd 3
You could use rowSums to count number of NAs or empty values in each row and then subtract it from number of columns in the dataframe.
test$num <- ncol(test) - rowSums(is.na(test) | test == "")
test
# a b c d num
#1 aa aa aa 3
#2 bb <NA> bb 2
#3 cc aa <NA> 2
#4 dd <NA> <NA> 1
#5 cc cc 2
#6 <NA> dd dd dd 3
Another idea using rowSums is to replace empty with NA, i.e.
rowSums(!is.na(replace(test, test == '', NA)))
#[1] 3 2 2 1 2 3
How about this approach from the tidyverse which also tells you how many columns contain NAs or empty strings?
a<-c("aa","bb","cc","dd","",NA)
b<-c("",NA,"aa","","","dd")
c<-c("aa","",NA,NA,"cc","dd")
d<-c("aa","bb","",NA,"cc","dd")
test<-data.frame(cbind(a,b,c,d))
library(magrittr) #import the pipe operator
num_cols <- test %>%
tibble::rowid_to_column("row_id") %>% #1st add a rowid column
dplyr::group_by(row_id) %>% #split the data into single row groups (i.e.
#row vectors)
tidyr::nest() %>% #turn it into a list column called data
dplyr::mutate(num_NAs = purrr::map_dbl(data, #loop over the data column of row
#vectors using map_dbl
~sum(is.na(.))), #count the number of NAs
num_empty = purrr::map_dbl(data,
#count the empty strings
~sum(. == "", na.rm = T)),
num_values = purrr::map_dbl(data,
#count columns without NAs or
#missing values (what you asked for)
~length(.)-sum(num_NAs, num_empty))
) %>%
dplyr::ungroup() %>% #remove the grouping structure
dplyr::select(num_NAs, num_empty, num_values) #extract only the variables you need
test_v2 <- cbind(test, num_cols)
test_v2
a b c d num_NAs num_empty num_values
1 aa aa aa 0 1 3
2 bb <NA> bb 1 1 2
3 cc aa <NA> 1 1 2
4 dd <NA> <NA> 2 1 1
5 cc cc 0 2 2
6 <NA> dd dd dd 1 0 3
I am transforming a list of character vectors into a dataframe using R. How can I get the list indices also into the dataframe?
list1 = list(c('kip','kroket'),'ei','koe')
print(list1)
##[[1]]
##[1] "kip" "kroket"
##[[2]]
##[1] "ei"
##[[3]]
##[1] "koe"
df = data.frame(col1 = unlist(x))
print(df)
## col1
##1 kip
##2 kroket
##3 ei
##4 koe
The preferred output would look like this:
## col1 col2
##1 kip 1
##2 kroket 1
##3 ei 2
##4 koe 3
An idea via base R,
data.frame(v1 = unlist(list1), v2 = rep(seq(length(list1)), lengths(list1)))
# v1 v2
#1 kip 1
#2 kroket 1
#3 ei 2
#4 koe 3
tidyverse method
list1 %>% map(~as_tibble(.)) %>% bind_rows(.id="index")
# A tibble: 4 x 2
index value
<chr> <chr>
1 1 kip
2 1 kroket
3 2 ei
4 3 koe
We can name the list along with it's length and then use stack
names(list1) <- seq_along(list1)
stack(list1)
# values ind
#1 kip 1
#2 kroket 1
#3 ei 2
#4 koe 3
Or another option could be using enframe and unnest
list1 %>% tibble::enframe() %>% tidyr::unnest()
I have df as below and I would like to add new column to adds up col1 based on ID. I write code with pdlyr, but I don't know how to fix it.
df %>%
group_by(ID) %>%
mutate(col2= paste0(col1,?????) <- what to write here?
df<-read.table(text="
ID col1
1 A
1 B
1 A
2 C
2 C
2 D", header=T)
result
ID col1 col2
1 A ABA
1 B ABA
1 A ABA
2 C CCD
2 C CCD
2 D CCD
Use the collapse argument.
df %>%
group_by(ID) %>%
mutate(col2= paste(col1, collapse = "")) %>%
ungroup
giving:
# A tibble: 6 x 3
ID col1 col2
<int> <fct> <chr>
1 1 A ABA
2 1 B ABA
3 1 A ABA
4 2 C CCD
5 2 C CCD
6 2 D CCD
Alternately using only base R we could use this one-liner:
transform(df, col2 = ave(as.character(col1), ID, FUN = function(x) paste(x, collapse = "")))
giving:
ID col1 col2
1 1 A ABA
2 1 B ABA
3 1 A ABA
4 2 C CCD
5 2 C CCD
6 2 D CCD
So i would like to compute a distinct value of a column. This is the data frame :
asa
----
aa
bb
aa
aa
bb
cc
dd
Want to get :
asa | n
--------
aa | 3
bb | 2
cc | 1
dd | 1
I ve tried using ddply from Counting unique / distinct values by group in a data frame and do this code : (reproducible)
asa<-c("aa","bb","aa","aa","bb","cc","dd")
asad<-data.frame(asa)
ddply(asad,~asa,summarise,n=length(unique(asa)))
But I got :
asa n
1 aa 1
2 bb 1
3 cc 1
4 dd 1
It didnt do the computation. Notes that the value in column can be added anytime. so it is not always "aa","bb","cc",and "dd". Also it can be separated by space or comma ("aa bb" , "aa,bb" or "aa, bb") There must be a way for this. thank you in advance
We can use table
setNames(as.data.frame(table(df1$asa)), c("asa", "n"))
# asa n
#1 aa 3
#2 bb 2
#3 cc 1
#4 dd 1
Or with tally from dplyr
library(dplyr)
df1 %>%
group_by(asa) %>%
tally()
# asa n
# (chr) (int)
#1 aa 3
#2 bb 2
#3 cc 1
#4 dd 1
Even more simple, just use the as.data.frame and table functions with no other parameters.
as.data.frame(table(df$asa))