Create a time series using diagonal matrices - r

Let´s say that my data has the following structure:
structure(list(Year = c(2000, 2000, 2000, 2000, 2000, 2000, 2000,
2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000,
2000, 2000, 2001, 2001, 2001, 2001), Month = c(1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), Day = c(1,
1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 1, 1,
1, 1), FivMin = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3,
4, 1, 2, 3, 4, 1, 2, 3, 4), A = c(1, 2, 3, 0, 1, 5, 3, 4, 1,
0, 3, 1, 0, 2, 3, 0, 1, 2, 0, 9, 1, 2, 3, 0), B = c(2, 3, 4,
1, 2, 3, 0, 1, 2, 1, 4, -2, 2, 1, 0, 2, 2, 3, -1, 1, 2, 3, 4,
1), C = c(3, 0, 1, 2, 3, 4, 1, 9, 3, 7, 1, 2, 3, 4, 1, 2, 3,
4, 1, 2, 3, 0, 1, 2), D = c(4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2,
3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3)), row.names = c(NA, -24L
), class = c("tbl_df", "tbl", "data.frame"))
My idea is use the crossproduct comand every day. In order to do that I wrote the following code:
res <- lapply(split(data, data[c("Year","Month","Day")]),
function(x) tcrossprod(t(x[c("A","B","C","D")])))
Final<-do.call(rbind, lapply(res, diag))
The output of Final is:
A B C D
2000.1.1 14 30 14 30
2001.1.1 14 30 14 30
2000.1.2 51 14 107 30
2001.1.2 0 0 0 0
2000.1.3 11 25 63 30
2001.1.3 0 0 0 0
2000.1.4 13 9 30 30
2001.1.4 0 0 0 0
2000.1.5 86 15 30 30
2001.1.5 0 0 0 0
What I need is a time serie (matrix or df object) formed by the diagonals calculated with crossproduct, It means my desired time serie would be
A B C D
2000.1.1 14 30 14 30
2000.1.2 51 14 107 30
2000.1.3 11 25 63 30
2000.1.4 13 9 30 30
2000.1.5 86 15 30 30
2001.1.1 14 30 14 30
What would be the changes in my original code. I think that i could replace the split command by grouped_by but it did not work.

As the split makes data frame into list, it creates 0 rows as well. Just remove those zero rows and try.
ls<- split(data, data[c("Year","Month","Day")])
ls<- ls[sapply(ls, nrow)>0]
res <- lapply(ls, function(x) tcrossprod(t(x[c("A","B","C","D")])))
Final<-do.call(rbind, lapply(res, diag))
Final <- Final[ order(row.names(Final)), ]
Final
Output:
A B C D
2000.1.1 14 30 14 30
2000.1.2 51 14 107 30
2000.1.3 11 25 63 30
2000.1.4 13 9 30 30
2000.1.5 86 15 30 30
2001.1.1 14 30 14 30

Related

as.factor not working with INT values on R

Hey guys if you could please help me. I got this dataset:
q1 q2 q3 m1 m2 b1 b2
A 78 150 2887 4 4 0 1
B 74 142 2904 4 4 1 1
C 79 137 1564 4 4 1 0
D 80 164 4522 2 2 0 0
E 74 173 5025 2 3 0 1
F 73 140 1971 3 3 0 1
I want to transform m1:b2 into factors. If I do
data[,4:7] <- as.factor(data[,4:7])
it doesn't work, the values change to char vectors. It gets messed up like this:
q1 q2 q3 m1 m2 b1
A 78 150 2887 c(4, 4, 4, 2, 2, 3) c(0, 1, 1, 0, 0, 0) c(4, 4, 4, 2, 2, 3)
B 74 142 2904 c(4, 4, 4, 2, 3, 3) c(1, 1, 0, 0, 1, 1) c(4, 4, 4, 2, 3, 3)
C 79 137 1564 c(0, 1, 1, 0, 0, 0) c(4, 4, 4, 2, 2, 3) c(0, 1, 1, 0, 0, 0)
D 80 164 4522 c(1, 1, 0, 0, 1, 1) c(4, 4, 4, 2, 3, 3) c(1, 1, 0, 0, 1, 1)
E 74 173 5025 c(4, 4, 4, 2, 2, 3) c(0, 1, 1, 0, 0, 0) c(4, 4, 4, 2, 2, 3)
F 73 140 1971 c(4, 4, 4, 2, 3, 3) c(1, 1, 0, 0, 1, 1) c(4, 4, 4, 2, 3, 3)
b2
A c(0, 1, 1, 0, 0, 0)
B c(1, 1, 0, 0, 1, 1)
C c(4, 4, 4, 2, 2, 3)
D c(4, 4, 4, 2, 3, 3)
E c(0, 1, 1, 0, 0, 0)
F c(1, 1, 0, 0, 1, 1)
But if I use lapply it works fine. Can you explain me why? Because I've been using as.factor(d[]) in other occasions and it worked just fine with other data.frame objects. Thank you.
Checking the documentation for as.factor (by typing ?as.factor), you'll see it says that the first argument x is "a vector of data, usually taking a small number of distinct values". If you supply multiple columns of a data frame, they are treated as one vector. In your example, as.factor creates a unique factor level for each unique value in the entire vectorized, concatenation of columns 4 through 7 of your data frame above.
You should use:
data[4:7] <- lapply(data[4:7], as.factor)
or (requiring tidyverse packages)
data <- data %>% mutate_at(4:7, as.factor)
Both of these solutions will correctly treat each column supplied, here columns 4, 5, 6, and 7, as their own vectors, individually. Each one is converted to a factor separately, and re-assigned appropriately.

Check row by row and highlight mismatches in row/column when it occurred

I have a data frame with 3 months of data with individual information. Individual information must be fixed during the whole period, however, in my real data set it is not the case. I would like to check row by row and highlight the dates that something went wrong during data entry.
Here is sample of my dataset ( real dataset has more variables):
input <- data.frame(stringsAsFactors=FALSE,
date = c(20190218, 20190219, 20190220, 20190221, 20190222,
20190223, 20190101, 20190103, 20190105, 20190110,
20190112, 20190218, 20190219, 20190220, 20190221, 20190222,
20190223),
id = c("18105265-ab", "18105265-ab", "18105265-ab",
"18105265-ab", "18105265-ab", "18105265-ab",
"18161665-aa", "18161665-aa", "18161665-aa", "18161665-aa",
"18161665-aa", "18502020-aa", "18502020-aa", "18502020-aa",
"18502020-aa", "18502020-aa", "18502020-aa"),
size = c(3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 1),
type = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 4, 4, 4, 4, 2, 2),
county = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5, 5, 5, 5, 5, 5),
member_p10 = c(3, 3, 3, 3, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 1),
youngest_age = c(5, 5, 5, 5, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7),
sex = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1),
position = c(5, 5, 5, 5, 5, 5, 4, 4, 4, 0, 0, 3, 3, 3, 3, 0, 0))
Is there any way for this type of operation? I would like to have this output at the end:
date id size type county member_p10 youngest_age sex position
1 20190221 18105265-ab 3 4 1 3 5 1 5
2 20190222 18105265-ab 2 4 1 2 7 1 5
3 20190105 18161665-aa 2 4 1 2 7 2 4
4 20190110 18161665-aa 1 2 1 1 7 2 0
5 20190221 18502020-aa 2 4 5 2 7 1 3
6 20190222 18502020-aa 1 2 5 1 7 1 0

When mutate_all and lapply disagree ... How to replace lapply with mutate_all

I'm here again to ask for your help!
I'm trying to figure out what's happening with mutate_all (or with me...).
Let's say I have this dataset:
ds <- structure(list(Q1 = structure(c(5, 4, 5, 5, 5, 5, 5, 5, 5, 5,
5, 4, 3, 5, 5, 5, 5, 5, 1, 4, 5, 5, 3, 4, 5, 5, 5, 5, 5, 2, 5,
5, 4, 5, 5, 3, 5, 5, 4, 3, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4,
5, 4), label = "1 Para mim é igual se os meus amigos são heterossexuais ou homossexuais.", format.spss = "F1.0", display_width = 3L, class = "labelled", labels = c(`discordo totalmente` = 1,
discordo = 2, indiferente = 3, concordo = 4, `concordo totalmente` = 5
)), Q2 = structure(c(1, 1, 1, 1, 1, 1, 3, 1, 2, 3, 1, 4, 4, 4,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 1, 3, 2,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 2), label = "A homossexualidade é uma perturbação psicológica/biológica.", format.spss = "F1.0", display_width = 5L, class = "labelled", labels = c(`discordo totalmente` = 1,
discordo = 2, indiferente = 3, concordo = 4, `concordo totalmente` = 5
)), Q3 = structure(c(5, 2, 5, 4, 5, 4, 5, 5, 5, 4, 5, 5, 2, 3,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 5, 4, 5, 4), label = "Acredito que os pais e as mães homossexuais são tão competentes como os pais e mães heterossexuais.", format.spss = "F1.0", display_width = 5L, class = "labelled", labels = c(`discordo totalmente` = 1,
discordo = 2, indiferente = 3, concordo = 4, `concordo totalmente` = 5
)), Q4 = structure(c(1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 2,
1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 5, 1, 1, 2, 1, 3), label = "4 Todas as Lésbicas, Gays, Bissexuais, Transexuais, Transgêneros e Intersexuais (LGBTI) me deixam irritado.", format.spss = "F1.0", display_width = 4L, class = "labelled", labels = c(`discordo totalmente` = 1,
discordo = 2, indiferente = 3, concordo = 4, `concordo totalmente` = 5
)), Q5 = structure(c(1, 4, 1, 1, 1, 1, 3, 1, 2, 1, 1, 1, 3, 3,
1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 3, 2,
1, 1, 1, 2, 2, 5, 1, 4, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 3), label = "A legalização do casamento entre pessoas do mesmo sexo é muito errada.", format.spss = "F1.0", display_width = 5L, class = "labelled", labels = c(`discordo totalmente` = 1,
discordo = 2, indiferente = 3, concordo = 4, `concordo totalmente` = 5
))), row.names = c(NA, -54L), class = c("tbl_df", "tbl", "data.frame"
))
Then I need to transform all variables into factors to plot them. I really like the dplyr approach:
ds_mutate <- ds %>% mutate_all(., factor, levels=1:5)
likert(ds_mutate)
But this error is coming up:
Error in likert(ds_mutate) :
All items (columns) must have the same number of levels
When I use lapply (Nobody will convince me 'apply'functions are intuitive...), it works pretty well:
> ds_apply <- lapply(ds, factor, levels=1:5) %>% as.data.frame()
> likert(ds_apply)
Item 1 2 3 4 5
1 Q1 1.851852 1.851852 9.259259 14.814815 72.222222
2 Q2 77.777778 9.259259 5.555556 7.407407 0.000000
3 Q3 0.000000 3.703704 1.851852 14.814815 79.629630
4 Q4 79.629630 14.814815 3.703704 0.000000 1.851852
5 Q5 72.222222 7.407407 14.814815 3.703704 1.851852
But as you can see, the str is (for me) the same...
i'm looking forward to hearing from you!!
Thank you!
There is one difference:
class(ds_mutate)
# [1] "tbl_df" "tbl" "data.frame"
class(ds_apply)
# [1] "data.frame"
The issue then arises from the fact that, in the call of likert, we have
nlevels = length(levels(items[, 1]))
where, in the former case,
length(levels(ds_mutate[, 1]))
# [1] 0
since
ds_mutate[, 1]
# A tibble: 54 x 1
# Q1
# <fct>
# 1 5
# 2 4
# 3 5
# 4 5
# 5 5
# 6 5
# 7 5
# 8 5
# 9 5
# 10 5
# … with 44 more rows
i.e., the result is a tibble. Also,
methods("levels")
# [1] levels.default
so that there is no levels method for tibbles. Notice also that
class(ds_mutate) <- c("data.frame", "tbl_df", "tbl")
ds_mutate[, 1]
# [1] 5 4 5 5 5 5 5 5 5 5 5 4 3 5 5 5 5 5 1 4 5 5 3 4 5 5 5 5 5 2 5 5 4 5 5 3 5 5 4 3 3 5 5 5
# [45] 5 5 5 5 5 5 5 4 5 4
# Levels: 1 2 3 4 5
in which case
likert(ds_mutate)
starts to work too. Without modifying classes you may also use
likert(data.frame(ds_mutate))
Extra: lapply in
lapply(ds, factor, levels = 1:5)
actually is really intuitive once we understand one thing: a data frame is a special case of a list where each list element is of the same length. Know the way sapply or lapply works is that it goes over each element of the first argument: once we see ds as a data frame whose elements (since it's a list) are columns, it becomes clear how it operates. For the same reason, since the results of factor in this case are of the same length, the list resulting from the call to lapply nicely can be converted to a data frame.
I never used likert package but it looks like it doesn't take an object of the class tibble. This works for me:
likert(as.data.frame(ds_mutate))

Advanced if/then/loop function to create new columns

I am learning R (focused on the tidyverse packages) and am hoping that someone could help with the following problem that has me stumped.
I have a data-set that looks similar to the following:
library("tibble")
myData <- frame_data(
~id, ~r1, ~r2, ~r3, ~r4, ~r5, ~r6, ~r7, ~r8, ~r9, ~r10, ~r11, ~r12, ~r13, ~r14, ~r15, ~r16,
"A", 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
"B", 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
"C", 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2,
"D", 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2,
"E", 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
)
Basically, I have multiple rows of respondent data, and each respondent gave 16 responses of either "1" or "2".
For each respondent (i.e., each row) I would like to create an additional three columns:
The first new column - called "switchCount" - identifies the number of times the respondent switched from a "2" response to a "1" response.
The second new column - called "switch1" - identifies the index of the first time the respondent switched from a "2" response to a "1" response.
The third new column - called "switch2" - identifies the index of the final time the respondent switched from a "2" response to a "1" response.
If there is no switch and all values are "2", then return the index of 0.
If there is no switch and all values are "1", then return the index of 16.
The final datatable should therefore look like this:
myData <- frame_data(
~id, ~r1, ~r2, ~r3, ~r4, ~r5, ~r6, ~r7, ~r8, ~r9, ~r10, ~r11, ~r12, ~r13, ~r14, ~r15, ~r16, ~switchCount, ~switch1, ~switch2,
"A", 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 1, 1,
"B", 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 4,
"C", 2, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2, 3, 9,
"D", 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 1, 1, 1, 2, 2, 1, 3, 6, 15,
"E", 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 16, 16
)
One approach could be to concatenate all response columns row wise and then find the occurrences of 2,1 using gregexpr
library(dplyr)
myData %>%
rowwise() %>%
mutate(concat_cols = paste(r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11,r12,r13,r14,r15,r16,sep=";"),
switchCount = ifelse(gregexpr("2;1", concat_cols)[[1]][1] == -1,
0,
length(gregexpr("2;1", concat_cols)[[1]])),
switch1 = ifelse(switchCount == 0,
ifelse(grepl("2",concat_cols), 1, 16),
min(floor(gregexpr("2;1", concat_cols)[[1]]/2)+1)),
switch2 = ifelse(switchCount == 0,
ifelse(grepl("2",concat_cols), 1, 16),
max(floor(gregexpr("2;1", concat_cols)[[1]]/2)+1))) %>%
select(-concat_cols)
Output is:
id r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16 switchCount switch1 switch2
1 A 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 1 1
2 B 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4
3 C 2 2 2 1 1 1 2 2 2 1 1 1 1 2 2 2 2 3 9
4 D 1 1 2 2 2 2 1 1 2 2 1 1 1 2 2 1 3 6 15
5 E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 16 16
Sample data:
myData <- structure(list(id = c("A", "B", "C", "D", "E"), r1 = c(2, 2,
2, 1, 1), r2 = c(2, 2, 2, 1, 1), r3 = c(2, 2, 2, 2, 1), r4 = c(2,
2, 1, 2, 1), r5 = c(2, 1, 1, 2, 1), r6 = c(2, 1, 1, 2, 1), r7 = c(2,
1, 2, 1, 1), r8 = c(2, 1, 2, 1, 1), r9 = c(2, 1, 2, 2, 1), r10 = c(2,
1, 1, 2, 1), r11 = c(2, 1, 1, 1, 1), r12 = c(2, 1, 1, 1, 1),
r13 = c(2, 1, 1, 1, 1), r14 = c(2, 1, 2, 2, 1), r15 = c(2,
1, 2, 2, 1), r16 = c(2, 1, 2, 1, 1), switchCount = c(0, 1,
2, 3, 0), switch1 = c(1, 4, 3, 6, 16), switch2 = c(1, 4,
9, 15, 16)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))

How to sum remaing values after using gsub?

This problem is unsolved by my brain, so I'm asking all of you for a little help.
This is part of my data:
rfam[1:20,]
id name
1 RF00001 LL_skoljka_r41782307_x1
2 RF00001 LL_skoljka_r9950955_x1
3 RF00001 LL_skoljka_r49323482_x1
4 RF00001 LL_skoljka_r14141437_x1
5 RF00001 LL_skoljka_r16457227_x3
6 RF00002 LL_skoljka_r40347558_x1
7 RF00002 LL_skoljka_r44415149_x1
8 RF00002 LL_skoljka_r13145032_x1
9 RF00002 LL_skoljka_r29248915_x42
10 RF00003 LL_skoljka_r15936986_x1
11 RF00003 LL_skoljka_r28953530_x1
12 RF00003 LL_skoljka_r32665758_x1
13 RF00003 LL_skoljka_r32835489_x1
14 RF00003 LL_skoljka_r32835498_x1
15 RF04051 LL_skoljka_r33254611_x1
16 RF04051 LL_skoljka_r29761867_x12
17 RF04051 LL_skoljka_r45123665_x2
18 RF04051 LL_skoljka_r34837827_x15
19 RF08595 LL_skoljka_r38900754_x1
20 RF08595 LL_skoljka_r22016530_x1
In first step I want to remove all the nonsense before x in variable name so I use:
rfam$name<- as.data.frame(sapply(rfam$name, gsub, pattern='^.*?x', replacement=""))
Result:
rfam[1:20,]
id name
1 RF00001 1
2 RF00001 1
3 RF00001 1
4 RF00001 1
5 RF00001 3
6 RF00002 1
7 RF00002 1
8 RF00002 1
9 RF00002 42
10 RF00003 1
11 RF00003 1
12 RF00003 1
13 RF00003 1
14 RF00003 1
15 RF04051 1
16 RF04051 12
17 RF04051 2
18 RF04051 15
19 RF08595 1
20 RF08595 1
In second step I would like to sum up values that stay in variable name for each id.
Results should look like this:
view(rfam)
id name
1 RF00001 7
2 RF00002 45
3 RF00003 5
4 RF04051 30
5 RF08595 2
If I want to sum up values, variable should be numeric. Both of my variables are factors. So I transformed id to character using rfam[,1]=as.character(rfam[,1]) and tried to convert name to numeric by rfam[,2]=as.numeric(levels(rfam[,2])[rfam[,2]]). Transformation of id was successful, while name returns "NA's".
I've also tried rfam[,2]=as.numeric(as.character(rfam[,2])), but the result was the same.
I've tried to export data to txt file and then in excel do the rest of analysis, but when I export data, it looks like this:
"id" "name"
"1" "RF00001" c(1, 1, 1, 1, 9, 1, 1, 1, 11, 1, 1, 1, 1, 1, 1, 3, 7, 5, 1, 1, 1, 9, 1, 14, 10, 7, 1, 5, 1, 1, 1, 1, 1, 7, 1, 2, 1, 1, 1, 9, 1, 7, 1, 1, 1, 1, 1, 1, 10, 7, 1, 10, 7, 1, 1, 1, 1, 1, 7, 1, 10, 1, 1, 1, 1, 1, 1, 1, 7, 1,...)
"2" "RF00001" c(1, 1, 1, 1, 9, 1, 1, 1, 11, 1, 1, 1, 1, 1, 1, 3, 7, 5, 1, 1, 1, 9, 1, 14, 10, 7, 1, 5, 1, 1, 1, 1, 1, 7, 1, 2, 1, 1, 1, 9, 1, 7, 1, 1, 1, 1, 1, 1, 10, 7, 1, 10, 7, 1, 1, 1, 1, 1, 7, 1, 10, 1, 1, 1, 1, 1, 1, 1, 7, 1,...)
"3" "RF00001" c(1, 1, 1, 1, 9, 1, 1, 1, 11, 1, 1, 1, 1, 1, 1, 3, 7, 5, 1, 1, 1, 9, 1, 14, 10, 7, 1, 5, 1, 1, 1, 1, 1, 7, 1, 2, 1, 1, 1, 9, 1, 7, 1, 1, 1, 1, 1, 1, 10, 7, 1, 10, 7, 1, 1, 1, 1, 1, 7, 1, 10, 1, 1, 1, 1, 1, 1, 1, 7, 1,...)
Now here is my dead end. I don't understand what is happening and I would appreciate if you could help me out.
Update
Having realized your question is not about the grouping part, the problem is that your sapply() function is creating a data.frame inside rfam instead of a vector.
You can use the following data.table solution to correctly convert the rfam$name column to the desired format to be able to group.
setDT(rfam)[,name:= as.numeric(gsub('^.*?x', replacement="",name))]
Now we can use dplyr to attain the desired output:
library(dplyr)
as.data.frame(rfam) %>% group_by(id) %>% summarise(name=sum(name))

Resources