Count values in a table (in R) - r

I have a table. I would like to count how many of the values start with 11_ and at the same time equal to 1.
11_AAACCCAAGAGCTGCA 11_AAACCCACAAAGACGC 11_AAACCCAGTCACTTAG 11_AAACGAACAAAGGCTG
6 3 1 1
11_AAACGAATCCACACAA 13_AAACGCTCACATGAAA 13_AAACGCTCAGCGGTCT 11_AAACGCTCATGGAAGC
7 1 3 1

Do you have a named vector?
You can combine the two conditions to filter name and value.
x <- c('11_AAACCCAAGAGCTGCA' = 6,
'11_AAACCCACAAAGACGC' = 3,
'11_AAACCCAGTCACTTAG' = 1,
'11_AAACGAACAAAGGCT' = 1,
'11_AAACGAATCCACACAA' = 7,
'13_AAACGCTCACATGAAA' = 1,
'13_AAACGCTCAGCGGTCT' = 3,
'11_AAACGCTCATGGAAGC' = 1)
x[startsWith(names(x), '11_') & x == 1]
#11_AAACCCAGTCACTTAG 11_AAACGAACAAAGGCT 11_AAACGCTCATGGAAGC
# 1 1 1
#To count
sum(startsWith(names(x), '11_') & x == 1)
#[1] 3

We can use grepl
sum(grepl("^11_", names(x)) & x == 1)

Related

Create new column from conditions on multiple columns in R

What I'm trying to write would be written with the apply function in Python:
def categorise(row):
if row['colC'] > 0 and row['colC'] <= 99:
return 'A'
elif row['colC'] > 100 and row['colC'] <= 199:
return 'B'
elif row['colC'] > 200 and row['colC'] <= 299:
return 'C'
return 'D'
df['colF'] = df.apply(lambda row: categorise(row), axis=1)
This is the R code I have at the moment
myf <- function(x) {
count <- 0
if(x[,"BMICat"]==4){
count = count +1}
if(x[,"SleepTimeCat"]==1 | x[,"SleepTimeCat"]==4){
count= count+1}
if(x[,"MentalHealthCat"]==3){
count= count+1}
if(x[,"Smoking"]==TRUE){
count= count+1}
if(x[,"PhysicalActivity"]==FALSE){
count= count+1}
return(count)
}
dfAugment %>%
mutate(BadHabits= myf(.))
I often get stuck on trying to apply this pattern in R, is my approach not common in R?
If I understand your question correctly, a possible solution is creating dummy variables and then adding them together.
library(dplyr)
dfAugment <- data.frame(BMICat = c(1, 2, 4, 4),
SleepTimeCat = c(1, 2, 3, 4))
dfAugment |>
mutate(risk_sum = if_else(BMICat == 4, 1, 0) +
if_else(SleepTimeCat == 1 | SleepTimeCat == 4, 1, 0))
Output
#> BMICat SleepTimeCat risk_sum
#> 1 1 1 1
#> 2 2 2 0
#> 3 4 3 1
#> 4 4 4 2
Created on 2022-06-22 by the reprex package (v2.0.1)

Creating a Basic R Dice Rolling Function to Sum Dice Values

I'm trying to write a function that combines up to 4 (fair 6 sided) dice rolls to create a specific value (named 'target.mountain') as many times as possible given the numbers shown on the dice.
Then return these values along with any that aren't used in said combination. If the other numbers that aren't used to form the 'target.mountain' can sum to be within the range (5-10) then do so.
So as an example say I roll 4,3,2,5 and my target.mountain value is 9
I would do
4 + 5 -> 9 and as 2 + 3 = 5 my function would return 9, 5
Another example could be
Roll = (2,3,6,4) --> (6 + 3), (4 + 2) --> 9, 6
Once these values have been found then list so it appears like
[1] 9, 5 (example 1)
[1] 9, 6 (example 2)
How do I go about doing this?
If you have ever played the board game 'Mountain Goats' then that may shed some light on how I need the dice to work as I just cannot figure it out!
Let's make the problem a bit harder, say 5 dice.
library(tidyverse)
rolls <- sample(1:6,replace = TRUE, size = 5)
target.mountain <- 7
#Make all possible combinations of the dice:
map_dfr(seq_along(rolls),~ combn(seq_along(rolls),.x,simplify = FALSE) %>%
map(~tibble(dice = list(.), sum = sum(rolls[.]), rolls = list(rolls[.]),length = length(.)))) %>%
#filter to only those combinations which equal the target
filter(sum == target.mountain) %>%
#Now make all possible combinations of the sets that equal the target
{map2(.x = list(.), .y = nrow(.) %>% map(.x = seq(.), .f = combn,x=.,simplify = FALSE) %>% unlist(recursive = FALSE),
~.x[unlist(.y),])} %>%
#Subset to non-overlapping sets
subset(map_lgl(.,~length(reduce(.x$dice,union))==length(unlist(.x$dice)))) -> part1
map(part1, as.data.frame)
#[[1]]
# dice sum rolls length
#1 1, 3 7 3, 4 2
#
#[[2]]
# dice sum rolls length
#1 4, 5 7 6, 1 2
#
#[[3]]
# dice sum rolls length
#1 2, 3, 5 7 2, 4, 1 3
#
#[[4]]
# dice sum rolls length
#1 1, 3 7 3, 4 2
#2 4, 5 7 6, 1 2
From here you can apply whatever rules you want:
part1 %>%
#subset to the largest number of sets
subset(map_dbl(.,nrow) == max(map_dbl(.,nrow))) %>%
#subset to the fewest number of total dice
subset(map_dbl(.,~sum(.x$length)) == min(map_dbl(.,~sum(.x$length)))) %>%
#if there are still ties, pick the first
`[[`(1) -> part2
as.data.frame(part2)
# dice sum rolls length
#1 1, 3 7 3, 4 2
#2 4, 5 7 6, 1 2
possible solution to the problem
target.mountain = 9
dice <- c(4,3,2,5)
library(tidyverse)
fn <- function(target.mountain, dice){
fltr <- map(seq_along(dice), ~combn(dice, .x, sum) == target.mountain)
out <- map(seq_along(dice), ~combn(dice, .x))
sum_target <- map2(out, fltr, ~.x[, .y]) %>%
purrr::discard(.x = ., function(x) length(x) == 0) %>%
keep(.x = ., .p = function(x) length(x) == min(lengths(.))) %>%
flatten_dbl()
no_sum_target <- dice[!(dice %in% sum_target)]
result <- toString(c(sum(sum_target), no_sum_target))
return(result)
}
fn(target.mountain = target.mountain, dice = dice)
#> [1] "9, 3, 2"
Created on 2021-03-29 by the reprex package (v1.0.0)

Recoding a nominal variable with multiple categories into a dummy variable

I am trying to condense a party ID seven point scale variable (pid_x) from the ANES 2012 data to a dummy variable (democrat = 1 and republican = 0). This entails removing all missing values and excluding independents (4). I can remove NAs, but how would I filter out independents and properly mutate the new variable? Yes, I am very new to R. Much appreciated!
The code below provides the following error:
"Error: Problem with mutate() input party_id_recode.
x Can't recycle ..1 (size 2054) to match ..2 (size 3).
i Input party_id_recode is `ifelse(pid_x == 1:3, 1, ifelse(pid_x == 5:7, 0))"
library(tidyverse)
anesnew <- anes %>%
na.omit(anes$pid_x) %>%
mutate(party_id_recode = ifelse(pid_x == 1:3, 1,
ifelse(pid_x == 5:7, 0)))
Reproducible data and expected output would be very useful, but it looks like your ifelse() statement hasn't been constructed properly, and could be simplified:
anesnew <- anes %>%
filter(!is.na(pid_x), pid_x != 4) %>%
mutate(party_id_recode = case_when(pid_x < 4 ~ 1,
pid_x > 4 ~ 0))
With the following sample data:
anes <- tibble(pid_x = c(1, 2, 3, 4, 5, 6, 7, NA))
The results are:
# A tibble: 6 x 2
pid_x party_id_recode
<dbl> <dbl>
1 1 1
2 2 1
3 3 1
4 5 0
5 6 0
6 7 0

how to fill in values in a vector?

I have vectors in R containing a lot of 0's, and a few non-zero numbers.Each vector starts with a non-zero number.
For example <1,0,0,0,0,0,2,0,0,0,0,0,4,0,0,0>
I would like to set all of the zeros equal to the most recent non-zero number.
I.e. this vector would become <1,1,1,1,1,1,2,2,2,2,2,2,4,4,4,4>
I need to do this for a about 100 vectors containing around 6 million entries each. Currently I am using a for loop:
for(k in 1:length(vector){
if(vector[k] == 0){
vector[k] <- vector[k-1]
}
}
Is there a more efficient way to do this?
Thanks!
One option, would be to replace those 0 with NA, then use zoo::na.locf:
x <- c(1,0,0,0,0,0,2,0,0,0,0,0,4,0,0,0)
x[x == 0] <- NA
zoo::na.locf(x) ## you possibly need: `install.packages("zoo")`
# [1] 1 1 1 1 1 1 2 2 2 2 2 2 4 4 4 4
Thanks to Richard for showing me how to use replace,
zoo::na.locf(replace(x, x == 0, NA))
You could try this:
k <- c(1,0,0,0,0,0,2,0,0,0,0,0,4,0,0,0)
k[which(k != 0)[cumsum(k != 0)]]
or another case that cummax would not be appropriate
k <- c(1,0,0,0,0,0,2,0,0,0,0,0,1,0,0,0)
k[which(k != 0)[cumsum(k != 0)]]
Logic:
I am keeping "track" of the indices of the vector elements that are non zero which(k != 0), lets denote this new vector as x, x=c(1, 7, 13)
Next I am going to "sample" this new vector. How? From k I am creating a new vector that increments every time there is a non zero element cumsum(k != 0), lets denote this new vector as y y=c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3)
I am "sampling" from vector x: x[y] i.e. taking the first element of x 6 times, then the second element 6 times and the third element 3 times. Let denote this new vector as z, z=c(1, 1, 1, 1, 1, 1, 7, 7, 7, 7, 7, 7, 13, 13, 13)
I am "sampling" from vector k, k[z], i.e. i am taking the first element 6 times, then the 7th element 6 times then the 13th element 3 times.
Add to #李哲源's answer:
If it is required to replace the leading NAs with the nearest non-NA value, and to replace the other NAs with the last non-NA value, the codes can be:
x <- c(0,0,1,0,0,0,0,0,2,0,0,0,0,0,4,0,0,0)
zoo::na.locf(zoo::na.locf(replace(x, x == 0, NA),na.rm=FALSE),fromLast=TRUE)
# you possibly need: `install.packages("zoo")`
# [1] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 4 4 4 4

Function to extract requencies of numbers in rows in R

I need a function, which checks for the frequency of values per row in a df, then checks whether one of the values appears 6 or more times, and if so, displays this value in a new column. If not, writes "nope" in the same new column instead.
In the example below: The values in the rows are either 1, 2, or 3. So if one of the values 1,2,or3 appears 6 or more times per row, whichever value that is (1,2,or3) has to appear in a new column. If none of the values appear 6 or more times per row, the value in that same new column should be "nope".
example
Try applying the table function for each row using
make_count_col <- function(x) {
cnt <- apply(x, 1, table)
x$newcolumn <- apply(cnt, 2, function(y) {
if (max(y, na.rm = T) < 6)
out <- 'nope'
else
out <- names(y)[which.max(y)]
out
})
x
}
Your example replicated
x <- as.data.frame(matrix(c(1, 2, 1, 2, 2, 2, 2, 2, 3,
2, 3, 1, 1, 3, 2, 1, 1, 3), nrow = 2, byrow = T))
colnames(x) <- paste0('svo', 1:9)
make_count_col(x)
svo1 svo2 svo3 svo4 svo5 svo6 svo7 svo8 svo9 newcolumn
1 2 1 2 2 2 2 2 3 2
2 3 1 1 3 2 1 1 3 nope

Resources