Adding values if characters match in list - r

I'm trying to sum the occurrence of every possible letter in a character string in a list but if I do:
table(simplify2array(as.vector(x)))
Error in base::table(...) : attempt to make a table with >= 2^31 elements
So I did the following and made a table for each character string.
x <- lapply(x, table)
head(lapply(x, table))
[[1]]
E F G H L N P Q R S Y
1 2 1 2 3 1 1 3 3 2 1
[[2]]
A C D G I K L N P R V
1 1 2 1 1 3 2 4 3 1 1
How can I now add up all of these values if the letters exist in each list? Each list can have different letters.

Maybee you could use:
x_v <- unlist(x)
table(x_v)
if this dosn't work. The aggregate() command could help you.

Related

count characters based on the order they appear

How does one count the characters based on the order they appear in a single length string. Below is an minimal example:
x <- "abbccdddaab"
First thought was this but it only counts them irrespective of order:
table(unlist(strsplit(x, "\\b")))
a b c d
3 3 2 3
But the desired output is:
a b c d a b
1 2 2 3 2 1
I would imagine the solution would require a for loop?
We can use rle instead of table as rle returns the output as a list of values and lengths based on checking whether the adjacent elements are same or not
out <- rle(strsplit(x, "\\b")[[1]])
setNames(out$lengths, out$values)
# a b c d a b
# 1 2 2 3 2 1
Using data.table::rleid :
x <- "abbccdddaab"
tmp <- strsplit(x, "\\b")[[1]]
table(data.table::rleid(tmp))
#1 2 3 4 5 6
#1 2 2 3 2 1

R How to permute all rows of a data frame such that all possible combinations of rows are returned in a list?

I'm trying to produce all possible row permutations of a data frame (or matrix if that's easier) and have an object returned as a list or array of the data frames/matrices. I've constructed a mock dataframe that as the same dimensions as the one I'm working with.
test.df <- as.data.frame(matrix(1:80,nrow=16,ncol=5)
Edit: changed combinations to permutations
v.df <- data.frame(symbol = c("a", "b", "c"), number = c(1,2,3))
v.df
## symbol number
## 1 a 1
## 2 b 2
## 3 c 3
permutate.rows <- function(df) {
k <- dim(df)[1] # number of rows
index.df <- as.data.frame(t(permutations(n = k, r = k, v = 1:k)))
res <- lapply(index.df, function(idx) df[idx, , drop = FALSE])
}
permutate.rows(v.df)
gives the list of all permutated dfs:
$V1
symbol number
1 a 1
2 b 2
3 c 3
$V2
symbol number
1 a 1
3 c 3
2 b 2
$V3
symbol number
2 b 2
1 a 1
3 c 3
$V4
symbol number
2 b 2
3 c 3
1 a 1
$V5
symbol number
3 c 3
1 a 1
2 b 2
$V6
symbol number
3 c 3
2 b 2
1 a 1
Use 16 instead of 3 and your data frame to apply it on your example.
I shortened the df because 16!=20922789888000
library(purrr)
library(combinat)
test.df <- as.data.frame(matrix(1:25,nrow=5,ncol=5))
map(permn(1:nrow(test.df)), function(x) test.df[x,])

how to convert a list with different length of lists to a dataframe in r

I have a list containing three different length of vectors with unique elements for each vector.
data <- list(ARG=letters[1:8],BRZ=c("a","b","c","f","h","g","l","m","n"),US=c("u","b","c","e","h","f","q","a","n","t"))
I would like to convert this list to a data frame by mergering them together, the result is expected as below or similar output, Thank you for helping this.
ID ARG BRZ US
a 1 1 1
b 1 1 1
c 1 1 1
d 1
e 1 1
f 1 1 1
g 1 1
h 1 1 1
l 1
m 1
n 1 1
q 1
t 1
u 1
We use mtabulate and transpose the output
library(qdapTools)
t(mtabulate(data))
Or if we are using base R, then stack into a data.frame with 2 columns and apply the table
table(stack(data))
Assuming that there are no duplicates for each entry. If there are duplicates, then we may need a logical vector coerced to binary
+(table(stack(data)) >0)

Sort a data.frame using row names of another data.frame

I need to sort the following data.frame (table 1):
X Y
A 1
B 5
C 0
D 3
based on the results of another data.frame (table 2):
X Y
C 10
B 9
A 8
D 7
So, data.frame # 1 ends like this:
X Y
C 0
B 5
A 1
D 3
How do I do this? I've tried to use:
table1[order(row names(table1),]
But I get the following error:
Subscript out of bound.
This should give the desired result:
table1[order(table2$X),]

Generating random number by length of blocks of data in R data frame

I am trying to simulate n times the measuring order and see how measuring order effects my study subject. To do this I am trying to generate integer random numbers to a new column in a dataframe. I have a big dataframe and i would like to add a column into the dataframe that consists a random number according to the number of observations in a block.
Example of data(each row is an observation):
df <- data.frame(A=c(1,1,1,2,2,3,3,3,3),
B=c("x","b","c","g","h","g","g","u","l"),
C=c(1,2,4,1,5,7,1,2,5))
A B C
1 1 x 1
2 1 b 2
3 1 c 4
4 2 g 1
5 2 h 5
6 3 g 7
7 3 g 1
8 3 u 2
9 3 l 5
What I'd like to do is add a D column and generate random integer numbers according to the length of each block. Blocks are defined in column A.
Result should look something like this:
df <- data.frame(A=c(1,1,1,2,2,3,3,3,3),
B=c("x","b","c","g","h","g","g","u","l"),
C=c(1,2,4,1,5,7,1,2,5),
D=c(2,1,3,2,1,4,3,1,2))
> df
A B C D
1 1 x 1 2
2 1 b 2 1
3 1 c 4 3
4 2 g 1 2
5 2 h 5 1
6 3 g 7 4
7 3 g 1 3
8 3 u 2 1
9 3 l 5 2
I have tried to use R:s sample() function to generate random numbers but my problem is splitting the data according to block length and adding the new column. Any help is greatly appreciated.
It can be done easily with ave
df$D <- ave( df$A, df$A, FUN = function(x) sample(length(x)) )
(you could replace length() with max(), or whatever, but length will work even if A is not numbers matching the length of their blocks)
This is really easy with ddply from plyr.
ddply(df, .(A), transform, D = sample(length(A)))
The longer manual version is:
Use split to split the data frame by the first column.
split_df <- split(df, df$A)
Then call sample on each member of the list.
split_df <- lapply(split_df, function(df)
{
df$D <- sample(nrow(df))
df
})
Then recombine with
df <- do.call(rbind, split_df)
One simple way:
df$D = 0
counts = table(df$A)
for (i in 1:length(counts)){
df$D[df$A == names(counts)[i]] = sample(counts[i])
}

Resources