I have a vector like:
c("A", "B", "C", "D", "E", "F")
and I'd like to create a dataframe like
"from" "to"
A B
B C
C D
D E
E F
how can I accomplish that?
Another way:
data.frame(from = vec[-length(vec)], to = vec[-1])
na.omit(data.frame(from = vec, to = dplyr::lead(vec)))
from to
1 A B
2 B C
3 C D
4 D E
5 E F
Another way is to use zoo package,
library(zoo)
rollapply(vec, 2, by = 1, paste)
Here is one method using embed and rearranging columns:
# data
temp <- c("A", "B", "C", "D", "E", "F")
embed(temp, 2)[, c(2,1)]
[,1] [,2]
[1,] "A" "B"
[2,] "B" "C"
[3,] "C" "D"
[4,] "D" "E"
[5,] "E" "F"
to put this into a data.frame, wrap it in data.frame:
setNames(data.frame(embed(temp, 2)[, c(2,1)]), c("from", "to"))
from to
1 A B
2 B C
3 C D
4 D E
5 E F
We could also do:
vec <- c("A", "B", "C", "D", "E", "F")
x <- rep(seq(length(vec)), each=2)[-length(vec)*2][-1]
# [1] 1 2 2 3 3 4 4 5 5 6
data.frame(matrix(vec[x], ncol = 2, byrow = T))
Or alternatively:
data.frame(t(sapply(seq(length(vec)-1), function(i) c(vec[i], vec[i+1]))))
# X1 X2
# 1 A B
# 2 B C
# 3 C D
# 4 D E
# 5 E F
Related
This is my character "NGNG" , here N represents either of c("A", "T", "C", "G"), so in my output I need a total of 16 combinations such as AGAG, TGAG, CGAG, GGAG, TGTG, TGCG, TGGG and so on.
If it is only a single change at the start for example "NGG" I can easily do it with expand_grid from tidyr
library(tidyverse)
expand_grid(one = c("A", "T", "C", "G"), two = "NG") %>%
mutate(three = paste0(one, two)) %>%
pull(three)
[1] "ANG" "TNG" "CNG" "GNG"
But I'm struggling to find a way to do this when N comes in the middle or multiples of it.
How about expand.grid followed by do.call?
cart_prod <- expand.grid(c("A", "T", "C", "G"),
"G",
c("A", "T", "C", "G"),
"G")
do.call(paste0, cart_prod)
[1] "AGAG" "TGAG" "CGAG" "GGAG" "AGTG" "TGTG" "CGTG" "GGTG"
[9] "AGCG" "TGCG" "CGCG" "GGCG" "AGGG" "TGGG" "CGGG" "GGGG"
Explanation
Since the OP requested that index 2 and 4 remain as "G", we simply let the first 1st and 3rd argument vary over the possible choices: c("A", "T", "C", "G"). Now, calling expand.grid with the first 4 arguments as:
c("A", "T", "C", "G")
"G"
c("A", "T", "C", "G")
"G"
will produce a data.frame that is isomorphic to our desired result, since expand.grid returns the Cartesian product.
expand.grid(c("A", "T", "C", "G"),
"G",
c("A", "T", "C", "G"),
"G")
Var1 Var2 Var3 Var4
1 A G A G
2 T G A G
3 C G A G
4 G G A G
5 A G T G
6 T G T G
7 C G T G
8 G G T G
9 A G C G
10 T G C G
11 C G C G
12 G G C G
13 A G G G
14 T G G G
15 C G G G
16 G G G G
Now, all that is left is smashing the columns together. We make use of do.call and paste0 to achieve this.
Why does do.call(paste0, some_data.frame) Work?
I found this great explanation on do.call here: The {do.call} function. Here is the first line:
"R has an interesting function called do.call. This function allows you to call any R function, but instead of writing out the arguments one by one, you can use a list to hold the arguments of the function."
Since a data.frame is essentially a list under the hood, we can utilize do.call in the usual way.
Since each column of cart_prod is simply a vector, paste0 combines each column element-wise. For example, the first and second column are:
cart_prod$Var1
[1] A T C G A T C G A T C G A T C G
Levels: A T C G
cart_prod$Var2
[1] G G G G G G G G G G G G G G G G
Levels: G
Applying paste0 to these two, gives:
paste0(cart_prod$Var1, cart_prod$Var2)
[1] "AG" "TG" "CG" "GG" "AG" "TG" "CG" "GG"
[9] "AG" "TG" "CG" "GG" "AG" "TG" "CG" "GG"
As you can see, we are starting to see our desired result come together. If we were to combine this result with the third column, we would obtain:
paste0(paste0(cart_prod$Var1, cart_prod$Var2), cart_prod$Var3)
[1] "AGA" "TGA" "CGA" "GGA" "AGT" "TGT" "CGT" "GGT"
[9] "AGC" "TGC" "CGC" "GGC" "AGG" "TGG" "CGG" "GGG"
And now, we combine this result with the last column:
paste0(paste0(paste0(cart_prod$Var1, cart_prod$Var2), cart_prod$Var3), cart_prod$Var4)
[1] "AGAG" "TGAG" "CGAG" "GGAG" "AGTG" "TGTG" "CGTG" "GGTG"
[9] "AGCG" "TGCG" "CGCG" "GGCG" "AGGG" "TGGG" "CGGG" "GGGG"
Voila! We have our desired result.
Here is a weird approach on how to achieve your desired output:
Here are a few notes on this solution:
I wrapped map2 function around curly braces so I can choose .x and .y myself as %>% put the LHS (here a data frame) in the first argument
exec function applies a function on a list of arguments which acts more like do.call in base R and using !!! will splice elements of the resulting list so that each one become a single argument for then to be bound by rows
library(purrr)
N <- c("A", "T", "C", "G")
expand.grid(N, N) %>%
{map2(.$Var1, .$Var2, ~ paste0(.x, "G", .y, "G"))} %>%
exec(rbind, !!!.)
[,1]
[1,] "AGAG"
[2,] "TGAG"
[3,] "CGAG"
[4,] "GGAG"
[5,] "AGTG"
[6,] "TGTG"
[7,] "CGTG"
[8,] "GGTG"
[9,] "AGCG"
[10,] "TGCG"
[11,] "CGCG"
[12,] "GGCG"
[13,] "AGGG"
[14,] "TGGG"
[15,] "CGGG"
[16,] "GGGG"
I have a datatable as follows:
library(data.table)
dt <- fread(
"A B D E iso year
1 A 1 NA ECU 2009
2 B 2 0 ECU 2009
3 D 3 0 BRA 2011
4 E 4 0 BRA 2011
5 D 7 NA ECU 2008
6 E 1 0 ECU 2008
7 A 3 2 BRA 2012
8 A 4 NA BRA 2012",
header = TRUE
)
dt <- dt[, D := as.factor(D)]
I would like to assign attributes to column D. I tried the following:
alist <- list("A", "B", "C", "D", "E", "F", "G", "H")
attributes(dt$D) <- alist
But I get the error:
Error in attributes(dt$D) <- alist : attributes must be named
How should I do this?
Try this.
alist <- list(c("A", "B", "C", "D", "E", "F", "G", "H"))
attributes(dt$D) <- setNames(alist, c("D"))
gives output as
> attributes(dt$D)
$D
[1] "A" "B" "C" "D" "E" "F" "G" "H"
I have a matrix with last column contains characters:
A
B
B
A
...
I would like to replace A with 1 and B with 2 in R. The expected result should be:
1
2
2
1
...
If you are 100% confident only "A" and "B" appear
sample_data = c("A", "B", "B", "A")
sample_data
# [1] "A" "B" "B" "A"
as.numeric(gsub("A", 1, gsub("B", 2, sample_data)))
# [1] 1 2 2 1
Using factor or a simple lookup table would be much more flexible:
sample_data = c("A", "B", "B", "A")
Recommended:
as.numeric(factor(sample_data))
# [1] 1 2 2 1
Possible alternative:
as.numeric(c("A" = "1", "B" = "2")[sample_data])
# [1] 1 2 2 1
I have a data frame including different levels of choices:
df = read.table(text="Index V1 V2 V3 V4 V5
1 A A A B A
2 B B B B B
3 B C C B B
4 B B C D E
5 B B C C D
6 A B B B B
7 C C B D D
8 A B C D E", header=T, stringsAsFactors=F)
I would like to create another column to hold the most accepted choice for each row. if there are more than one choices, take the maximum numbers of occurrences. if the maximum number is more than 1, take the first choice with the maximum number of occurrences. So my result is expected:
Index V1 V2 V3 V4 V5 final
1 A A A B A A
2 B B B B B B
3 B C C B B B
4 B B C D E B
5 B B C C D B
6 A B B B B B
7 C C B D D C
8 A B C D E A
Thanks for helps.
apply(df[,-1], 1, function(x)
x[which.max(ave(rep(1, length(x)), x, FUN = sum))] )
#[1] "A" "B" "B" "B" "B" "B" "C" "A"
df[7,2:6] = c("D", "C", "B", "C", "D")
apply(df[,-1], 1, function(x)
x[which.max(ave(rep(1, length(x)), x, FUN = sum))] )
#[1] "A" "B" "B" "B" "B" "B" "D" "A"
We can do this with finding the frequency of values in each row using table. Loop through the rows of dataset except the first column (apply with MARGIN = 1), get the frequency with table, find the index of the maximum frequency (which.max) and get the names that corresponds to the max frequency
df$final <- apply(df[-1], 1, FUN = function(x) {
tbl <- table(factor(x, levels = unique(x)))
names(tbl)[which.max(tbl)]})
df$final
#[1] "A" "B" "B" "B" "B" "B" "C" "A"
I have a list that looks something like this, where the variables are of various lengths, and in a random order
> my.list <- lapply(list(c(2,1,3,4),c(2,3),c(4,2,3),c(1,3,4),c(1,4),c(2,4,1)),
function(x)letters[x])
> my.list
>my.list
[[1]]
[1] "b" "a" "c" "d"
[[2]]
[1] "b" "c"
[[3]]
[1] "d" "b" "c"
[[4]]
[1] "a" "c" "d"
[[5]]
[1] "a" "d"
[[6]]
[1] "b" "d" "a"
What I want to do is put this into a data frame, with NA where there are blanks. However, each row is in a random order, and I want each row in the data frame to be ordered such that it goes in alphabetical or numeric. Ideally the end result would look like the example below
>df
V1 V2 V3 V4
1 a b c d
2 NA b c NA
3 NA b c d
4 a NA c d
5 a NA NA d
6 a b NA d
You could use a lookup vector and match against it.
m <- sort(Reduce(union, my.list))
as.data.frame(do.call(rbind, lapply(my.list, function(a) a[match(m, a)])))
# V1 V2 V3 V4
# 1 a b c d
# 2 <NA> b c <NA>
# 3 <NA> b c d
# 4 a <NA> c d
# 5 a <NA> <NA> d
# 6 a b <NA> d
One option is
library(qdapTools)
d1 <- mtabulate(my.list)
d1
# a b c d
#1 1 1 1 1
#2 0 1 1 0
#3 0 1 1 1
#4 1 0 1 1
#5 1 0 0 1
#6 1 1 0 1
d2 <- d1
d2[] <- colnames(d1)[col(d1)]
is.na(d2) <- d1==0
colnames(d2) <- paste0("V", 1:4)
d2
# V1 V2 V3 V4
#1 a b c d
#2 <NA> b c <NA>
#3 <NA> b c d
#4 a <NA> c d
#5 a <NA> <NA> d
#6 a b <NA> d
Or
d2[] <- names(d1)[(NA^!d1) * col(d1)]
colnames(d2) <- paste0('V', 1:4)
data
my.list <- list(c("b", "a", "c", "d"), c("b", "c"), c("d", "b", "c"),
c("a", "c", "d"), c("a", "d"), c("b", "d", "a"))