Need to count items from a tables - r

I have this DF (partially shown) with 15 categories in the first column and each cell has number between 1 and 15. Actually this is just a small example, The 15 categories are repeated with their different numbers in the other columns
What I need is to have a 16x15 matrix with the count of appearances of the values as follows.
I can program this in an old fashion with IFs etc but I am kind of lost using R
I hope this is clear.
Any advise is welcome
EDITED AS REQUESTED (I apology not to be clear)
RESULTADOS DF
PREOCUPACIÓN 13 15 4 4 1 8 3 1
TRISTEZA 15 13 2 5 4 14 6 6
PERDIDA 4 11 3 2 14 12 7 10
ANGUSTIA 14 10 11 3 2 13 1 2
IMPOTENCIA 1 8 9 6 5 5 5 4
MUERTE 2 1 14 14 15 6 13 15
ENOJO 12 7 10 8 6 7 12 5
INJUSTICIA 3 9 12 7 12 2 14 13
AUSENCIA 11 14 6 1 8 11 11 11
DOLOR 5 12 5 9 7 15 8 8
CORRUPCIÓN 8 6 15 13 11 3 15 12
MIEDO 9 3 13 10 3 10 9 3
SECUESTRO 10 2 1 11 9 4 4 14
INSEGURIDAD 7 4 7 15 10 1 10 9
DESESPERACIÓN 6 5 8 12 13 9 2 7
PREOCUPACIÓN 14 2 5 4 3 8 8 7
TRISTEZA 5 7 1 8 7 9 13 9
PERDIDA 2 6 6 12 2 10 6 10
ANGUSTIA 13 3 15 9 8 11 7 4
IMPOTENCIA 12 11 7 5 10 12 12 1
MUERTE 3 10 14 2 13 13 9 2
ENOJO 11 5 10 10 11 7 11 5
INJUSTICIA 7 13 2 6 15 14 10 6
AUSENCIA 8 1 9 11 1 6 4 12
DOLOR 6 8 8 13 9 3 3 3
CORRUPCIÓN 10 15 3 14 14 15 5 11
MIEDO 9 4 13 15 4 4 14 8
SECUESTRO 4 9 11 1 12 5 15 13
INSEGURIDAD 1 12 4 7 6 1 1 14
DESESPERACIÓN 15 14 12 3 5 2 2 15
PREOCUPACIÓN 13 10 4 1 7 4 11 2
TRISTEZA 15 11 11 2 9 3 12 8
PERDIDA 2 15 7 4 15 7 3 13
ANGUSTIA 8 13 5 3 6 1 7 1
IMPOTENCIA 10 4 8 5 12 10 13 3
MUERTE 7 8 15 15 3 6 6 9
ENOJO 14 12 12 10 10 8 15 10
INJUSTICIA 4 1 13 6 1 9 2 6
AUSENCIA 12 9 1 7 8 11 1 14
DOLOR 9 14 2 12 5 2 14 12
CORRUPCIÓN 3 6 14 14 14 14 5 15
MIEDO 6 2 3 9 2 5 10 7
SECUESTRO 1 3 6 8 13 15 4 5
INSEGURIDAD 5 5 9 11 4 13 8 4
DESESPERACIÓN 11 7 10 13 11 12 9 11
...
The result I need is like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
PREOCUPACION 3 2 2 5 1 0 2 3 0 1 1 0 2 0 1
TRISTEZA 1 2 1 1 2 2 2 2 3 0 2 1 1 1 2

Using apply on every row, convert to factor and get table:
res <-
cbind.data.frame(name = df1[, 1],
t(apply(df1[, -1], 1, function(i){
table(factor(i, levels = 1:15))
})))
res
# name 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
# 1 PREOCUPACIÓN 2 1 2 2 0 2 0 1 0 0 0 0 1 0 1
# 2 TRISTEZA 0 2 0 1 2 3 0 0 1 0 0 0 1 1 1
# 3 PERDIDA 0 1 1 1 0 0 1 0 0 1 2 2 1 2 0
# 4 ANGUSTIA 2 2 1 1 0 0 0 0 1 1 1 0 1 1 1
# ...
Edit: If you have names repeated on multiple rows, then try below. Split dataframe on 1st column, then loop through each split dataframe and get counts per factor level.
res <- t(data.frame(
lapply(split(df1, df1$V1), function(i){
as.numeric(table(factor(unlist(i[-1, ]), levels = 1:15)))
})))
res
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
# ANGUSTIA 4 0 2 1 1 1 2 2 1 0 1 0 2 0 1
# AUSENCIA 4 2 0 1 0 1 1 2 2 0 2 2 0 1 0
# CORRUPCIÓN 0 0 4 0 2 1 0 0 0 1 1 0 0 6 3
# DESESPERACIÓN 0 2 1 2 1 0 1 0 1 1 3 2 1 1 2
# ...

Related

How to order numeric values in a designed order in R?

My question is: Given the target table(on the right), how can I order rows of the original table(on the left) to get exactly the target table with R? Thank you in advance.
Original table:
A B
1 1
1 2
5 12
2 6
5 14
3 6
3 7
5 13
6 2
3 10
5 11
2 5
6 14
2 7
5 15
6 1
3 8
6 3
2 4
1 3
2 10
4 11
2 8
1 4
1 5
2 9
4 12
4 13
3 9
6 15
Target table:
A B
1 1
1 2
1 3
1 4
1 5
3 6
3 7
3 8
3 9
3 10
5 11
5 12
5 13
5 14
5 15
6 1
6 2
6 3
2 4
2 5
2 6
2 7
2 8
2 9
2 10
4 11
4 12
4 13
6 14
6 15
This can be accomplished by ordering by an odd/even flag, and dat$B:
dat[order(-(dat$A %% 2), dat$B),]
## A B
##1 1 1
##2 1 2
##20 1 3
##24 1 4
##25 1 5
##6 3 6
##7 3 7
##17 3 8
##29 3 9
##10 3 10
##11 5 11
##3 5 12
##8 5 13
##5 5 14
##15 5 15
##16 6 1
##9 6 2
##18 6 3
##19 2 4
##12 2 5
##4 2 6
##14 2 7
##23 2 8
##26 2 9
##21 2 10
##22 4 11
##27 4 12
##28 4 13
##13 6 14
##30 6 15
If it's not an odd/even split then you can manually set the 1/3/5, and 2/4/6 groups:
dat[order(`levels<-`(factor(dat$A), list('1'=c(1,3,5), '2'=c(6,2,4))), dat$B),]
This collapsed version of the code with levels<- called directly as a function is a bit hard to read, but it is equivalent to:
grpord <- factor(dat$A)
levels(grpord) <- list('1'=c(1,3,5), '2'=c(6,2,4))
dat[order(grpord, dat$B),]
...where "1" is assigned to the groups 1, 3 and 5, and "2" to the groups 6, 2 and 4.

Creating new column names using dplyr across and .names

I have the following data frame:
df <- data.frame(A_TR1=sample(10:20, 8, replace = TRUE),A_TR2=seq(2, 16, by=2), A_TR3=seq(1, 16, by=2),
B_TR1=seq(1, 16, by=2),B_TR2=seq(2, 16, by=2), B_TR3=seq(1, 16, by=2))
> df
A_TR1 A_TR2 A_TR3 B_TR1 B_TR2 B_TR3
1 11 2 1 1 2 1
2 12 4 3 3 4 3
3 18 6 5 5 6 5
4 11 8 7 7 8 7
5 17 10 9 9 10 9
6 17 12 11 11 12 11
7 14 14 13 13 14 13
8 11 16 15 15 16 15
What I would like to do, is subtract B_TR1 from A_TR1, B_TR2 from A_TR2, and so on and create new columns from these, similar to below:
df$x_TR1 <- (df$A_TR1 - df$B_TR1)
df$x_TR2 <- (df$A_TR2 - df$B_TR2)
df$x_TR3 <- (df$A_TR3 - df$B_TR3)
> df
A_TR1 A_TR2 A_TR3 B_TR1 B_TR2 B_TR3 x_TR1 x_TR2 x_TR3
1 12 2 1 1 2 1 11 0 0
2 11 4 3 3 4 3 8 0 0
3 19 6 5 5 6 5 14 0 0
4 13 8 7 7 8 7 6 0 0
5 12 10 9 9 10 9 3 0 0
6 16 12 11 11 12 11 5 0 0
7 16 14 13 13 14 13 3 0 0
8 18 16 15 15 16 15 3 0 0
I would like to name these columns "x TR1", "x TR2", etc. I tried to do the following:
xdf <- df%>%mutate(across(starts_with("A_TR"), -across(starts_with("B_TR")), .names="x TR{.col}"))
However, I get an error in mutate():
attempt to select less than one element in integerOneIndex
I also don't know how to create the proper column names, in terms of getting the numbers right -- I am not even sure the glue() syntax allows for it. Any help appreciated here.
We could use .names in the first across to replace the substring 'a' with 'x' from the column names (.col) while subtracting from the second set of columns
library(dplyr)
library(stringr)
df <- df %>%
mutate(across(starts_with("A_TR"),
.names = "{str_replace(.col, 'A', 'x')}") -
across(starts_with("B_TR")))
-output
df
A_TR1 A_TR2 A_TR3 B_TR1 B_TR2 B_TR3 x_TR1 x_TR2 x_TR3
1 10 2 1 1 2 1 9 0 0
2 10 4 3 3 4 3 7 0 0
3 16 6 5 5 6 5 11 0 0
4 12 8 7 7 8 7 5 0 0
5 20 10 9 9 10 9 11 0 0
6 19 12 11 11 12 11 8 0 0
7 17 14 13 13 14 13 4 0 0
8 14 16 15 15 16 15 -1 0 0

Creating a vector with specified order

I have to create a vector in R with a specified order. I need 30 0s, then 30 1's, then 30 2's, and so on until 9. I was thinking I could create an empty vector and then create a loop that would add in my numbers but I'm not sure how to go about it. The code below is something along the lines of what I was thinking could possibly work.
labels <- c()
i = 0
for(i in 0:30){
append(labels,0)
i += 1
}
rep(0:9, each = 30)
#> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1
#> [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#> [75] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
#> [112] 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
#> [149] 4 4 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 6 6 6 6 6
#> [186] 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 7 7 7 7 7 7 7 7 7 7 7
#> [223] 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
#> [260] 8 8 8 8 8 8 8 8 8 8 8 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
#> [297] 9 9 9 9

Data Frame Filter Values

Suppose I have the next data frame.
table<-data.frame(group=c(0,5,10,15,20,25,30,35,40,0,5,10,15,20,25,30,35,40,0,5,10,15,20,25,30,35,40),plan=c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3),price=c(1,4,5,6,8,9,12,12,12,3,5,6,7,10,12,20,20,20,5,6,8,12,15,20,22,28,28))
group plan price
1 0 1 1
2 5 1 4
3 10 1 5
4 15 1 6
5 20 1 8
6 25 1 9
7 30 1 12
8 35 1 12
9 40 1 12
10 0 2 3
11 5 2 5
12 10 2 6
13 15 2 7
14 20 2 10
15 25 2 12
16 30 2 20
17 35 2 20
18 40 2 20
How can I get the values from the table up to the maximum price, without duplicates.
So the result would be:
group plan price
1 0 1 1
2 5 1 4
3 10 1 5
4 15 1 6
5 20 1 8
6 25 1 9
7 30 1 12
10 0 2 3
11 5 2 5
12 10 2 6
13 15 2 7
14 20 2 10
15 25 2 12
16 30 2 20
You can use slice in dplyr:
library(dplyr)
table %>%
group_by(plan) %>%
slice(1:which.max(price == max(price)))
which.max gives the index of the first occurrence of price == max(price). Using that, I can slice the data.frame to only keep rows for each plan up to the maximum price.
Result:
# A tibble: 22 x 3
# Groups: plan [3]
group plan price
<dbl> <dbl> <dbl>
1 0 1 1
2 5 1 4
3 10 1 5
4 15 1 6
5 20 1 8
6 25 1 9
7 30 1 12
8 0 2 3
9 5 2 5
10 10 2 6
# ... with 12 more rows

Extracting numbers from very long string into vector

I have the fairly long string shown below (~50k characters)
https://gist.github.com/anonymous/9de31de2e6fc9888f3debeda4698b739
I want to extract numbers (always 1 or 2 digit), that are always between "'>" and "<" and add them to a vector (must be in the correct order).
for example:
><td class='td-val ball-8'>13</td><td class='td-val ball-8'>9</td>
Would output a vector, [13,9]
I couldn't even get it to let me enter my string into r, when I tried to do it in the form.
mystring <- "text here"
When I would try to press enter then, it would just have a + next to the command line. So I think some of the symbols in the text were messing it up.
Since it's HTML that you're trying to parse, it's best to use an HTML parsing package like rvest:
library(rvest)
url <- 'https://gist.githubusercontent.com/anonymous/9de31de2e6fc9888f3debeda4698b739/raw/c07c2d6c6f00060806b15ec57ed06d4a4e0d9d74/gistfile1.txt'
url %>% read_html() %>% html_nodes('td.td-val') %>% html_text() %>% as.integer()
which returns
[1] 13 9 8 8 1 2 0 8 11 2 13 5 13 4 4 5 4 7 3 8 10 13 1 7 14 13 10 2 0 8
[31] 13 0 10 5 11 9 3 1 4 3 5 12 4 14 1 9 13 5 9 7 12 10 2 10 14 4 11 11 13 8
[61] 8 10 10 12 12 6 8 13 7 2 2 9 10 9 13 3 14 14 0 14 4 11 14 6 10 2 0 0 10 14
[91] 2 8 3 6 14 6 1 9 11 12 1 12 4 0 7 9 2 10 1 12 0 8 0 9 3 11 11 0 8 5
[121] 0 6 1 9 8 10 7 4 7 0 3 12 10 11 11 8 4 11 1 5 12 2 14 9 12 8 1 9 14 13
[151] 8 2 1 5 7 9 14 14 12 3 6 3 9 0 6 9 3 3 10 3 8 6 9 2 4 12 2 2 14 7
[181] 12 8 0 8 12 2 12 9 6 8 9 9 3 7 9 0 6 13 0 12 3 14 12 4 8 9 14 4 5 9
[211] 6 3 2 5 1 2 0 5 0 5 9 0 12 14 11 11 7 4 12 1 14 2 13 3 13 2 0 12 13 6
[241] 5 3 13 9 12 2 11 6 8 12 9 6 13 9 0 0 4 2 1 0 0 3 0 3 7 9 11 1 8 10
[271] 11 13 12 9 10 8 10 3 7 12 4 9 0 4 14 1 7 0 7 1 2 6 0 6 6 1 0 9 4 8
[301] 0 7 13 8 11 4 1 12 1 14 11 13 9 12 8 2 8 7 12 13 12 5 8 5 10 2 7 5 9 12
[331] 12 13 8 7 6 4 12 13 4 9 12 2 0 11 8 9 1 10 5 10 9 11 10 1 8 1 12 10 9 5
[361] 7 10 5 2 7 12 4 10 6 9 0 6 0 4 13 7 0 8 3 3 11 8 4 12 10 5 7 1 11 3
[391] 1 11 7 14 13 13 14 4 2 11 2 12 3 6 14 10 6 13 9 12 4 13 10 3 9 11 8 4 8 10
[421] 9 6 3 6 7 5 11 0 2 7 6 11 11 13 13 12 7 9 6 9 5 12 14 3 13 10 1 2 7 1
[451] 14 1 0 7 8 13 6 3 9 12 2 2 2 7 11 1 2 14 6 13 11 3 6 11 5 9 0 9 13 10
[481] 11 13 3 12 12 3 7 6 5 14 3 9 10 6 13 5 7 4 5 12 8 14 5 6 8 7 0 0 2 1
[511] 1 9 13 13 5 6 10 8 0 2 3 4 4 5 14 13 5 2 2 4 6 5 9 6 14 8 4 12 4 6
[541] 9 1 4 2 4 9 1 7 1 10 0 1 1 8 6 5 8 4 9 11 14 2 3 8 2 11 3 7 11 2
[571] 4 9 5 3 4 1 4 8 13 4 8 8 1 7 2 7 3 11 13 1 13 7 9 3 7 7 4 12 9 14
[601] 11 9 2 12 12 14 10 4 12 11 12 10 14 3 11 6 12 3 6 3 11 8 10 2 6 3 1 11 2 6
[631] 0 8 12 5 5 3 6 2 14 11 7 14 14 8 11 2 7 0 10 2 0 4 8 9 8 3 2 13 4 10
[661] 2 5 13 2 2 12 12 0 10 4 1 5 13 3 10 3 11 2 5 3 9 6 11 0 8 12 0 11 2 11
[691] 7 8 1 3 4 14 4 4 9 5 12 7 6 9 12 13 2 11 1 11 12 0 4 6 10 8 5 14 7 6
[721] 4 7 2 5 2 14 3 8 10 6 14 7 14 3 2 6 5 0 3 0 12 0 12 3 5 5 8 5 14 6
[751] 10 14 5 2 3 11 3 4 3 11 4 2 0 11 11 13 4 0 6 14 2 6 9 10 4 9 5 7 1 13
[781] 8 3 13 3 10 4 8 1 3 11 2 8 5 10 7 6 10 14 14 2 2 12 8 4 13 7 11 13 4 5
[811] 7 2 3 8 14 3 9 12 6 2 6 0 3 5 8 8 0 14 13 13 7 10 9 6 1 0 4 8 6 8
[841] 14 1 9 0 9 2 7 10 8 5 10 7 1 8 2 13 3 1 8 12 12 2 5 6 3 9 4 5 4 13
[871] 6 3 10 7 9 2 1 12 1 11 0 10 0 11 8 8 0 7 0 11 10 3 14 6 9 11 11 0 12 1
[901] 10 13 1 7 7 2 0 3 13 9 2 4 12 3 0 11 1 8 8 13 12 6 8 13 8 1 13 11 2 9
[931] 11 8 10 8 3 14 6 14 7 6 7 10 3 11 3 13 11 3 9 13 8 10 8 7 12 4 11 12 12 9
[961] 6 10 2 8 13 7 11 5 7 12 10 14 1 6 7 6 7 2 3 5 13 6 10 9 5 2 0 1 11 8
[991] 9 5 1 3 3 1 12 1 13 2 14 5 7 1 10 9 0 9 11 10 6 2 7 12 10 6 2 10 13 4
[1021] 9 9 14 4 4 5 7 13 13 13 6 7 12 1 6 11 12 14 4 11 6 4 10 0 9 12 10 10 13 8
[1051] 3 3 0 8 5 14 10 3 7 5 0 14 5 6 10 14 7 4 8 9 1 6 14 1 14 5 5 14 4 11
[1081] 12 14 9 13 14 13 2 13 11 9 14 2 1 9 8 11 13 11 14 13 3 4 9 6 9 6 10 13 1 12
[1111] 10 14 11 5 8 9 3 5 6 14 1 11 10 12 7 7 2 13 13 12 12 4 3 14 6 4 2 5 9 4
[1141] 14 11 6 4 11 6 4 4 8 2 2 5 14 1 7 11 8 9 11 11 10 6 14 3 0 3 8 8 14 13
[1171] 10 6 10 4 9 12 0 9 2 9 13 12 1 12 3 5 5 3 12 2 1 5 1 0 10 7 3 10 14 13
[1201] 11 8 0 10 12 9 4 5 4 8 5 6 2 11 7 5 5 8 4 9 9 10 14 3 7 9 1 9 9 8
[1231] 1 8 11 5 2 4 9 14 14 6 10 7 4 14 6 5 1 4 3 8 13 10 5 1 8 8 6 8 7 1
[1261] 14 4 4 7 2 12 10 8 10 5 6 7 2 3 5 13 1 2 9 8 5 14 1 11 9 5 8 12 13 0
[1291] 4 2 0 8 8 2 5 3 13 11 5 11 14 14 9 12 4 5 9 3 13 14 1 5 10 4 9 6 5 8
[1321] 7 5 7 3 14 8 4 8 4 6 5 8 11 0 14 13 2 13 12 13 3 4 7 8 11 4 14 12 3 6
[1351] 11 8 8 9 6 7 4 3 10 9 2 9 12 12 0 1 10 9 8 0 12 9 3 14 13 7 8 12 10 9
[1381] 10 10 2 11
You can use readLines to import string from the url which you can get by clicking the Raw button.
mystring <- readLines("https://gist.githubusercontent.com/anonymous/9de31de2e6fc9888f3debeda4698b739/raw/c07c2d6c6f00060806b15ec57ed06d4a4e0d9d74/gistfile1.txt")
Use some regular expression as follows should give you all the numbers you want:
library(stringr)
num <- gsub(">|<", "", str_extract_all(mystring, ">\\d+<", simplify = T))
head(as.vector(num))
[1] "13" "9" "8" "8" "1" "2"

Resources