count_if (EXPSS) with multiple conditions in R

count_if (EXPSS) with multiple conditions in R - r

I am using expss::count_if.
While something like this works fine (i.e., counting values only where value is equal to "1"):
(number_unemployed = count_if("1",unemployed_field,na.rm = TRUE)),
This does not (i.e., counting values only where value is equal to "1" or "2" or "3"):
(number_unemployed = count_if("1", "2", "3", unemployed_field,na.rm = TRUE)),
What is the correct syntax for using multiple conditions for count_if? I cannot find anything in the expss package documentation.

You need to put them into a vector. This works:
(number_unemployed = count_if(c("1", "2", "3"), unemployed_field), na.rm=T),
Example: Sample data is provided below;
library(expss)
count_if(c("1","2","3"),dt$Encounter)
#> 9
Data:
dt <- structure(list(Location = c("A", "B", "A", "A", "C", "B", "A", "B", "A", "A", "A"),
Encounter = c("1", "2", "3", "1", "2", "3", "4", "1", "2", "3", "4")),
row.names = c(NA, -11L), class = "data.frame")
# Location Encounter
# 1 A 1
# 2 B 2
# 3 A 3
# 4 A 1
# 5 C 2
# 6 B 3
# 7 A 4
# 8 B 1
# 9 A 2
# 10 A 3
# 11 A 4

Related

How to count occurence of variable eacht time it occurs and remove outliers in R

I have a vector. On the hand I want to remove factors, which seem to be classified not correct. For instance the "D" at position 7. As the surroundings are "A" this should be "A" too. I know there must be a rule, for example, if the 3 values before and after an outlier are different it is converged- in this case "D" to "A" , otherwise it is removed like the "C" on position 22.
Var = c("A", "A", "A", "A","A", "A", "D", "A", "A","A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "C", "B", "B", "C","C","C","C","C","C","C","C","C","C","D", "D","D","D","D","D","D","D", "A", "A", "A", "A","A", "A", "A", "A", "A", "A","A", "A", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "C","C","C","C","C","C","C","C", "C","C","C","C","C","C","C","C", "D","D","D","D","D")
Var= as.factor(Var)
Var2=c("1", "2", "1", "2","3", "2", "1", "2", "2","2", "2", "1", "1",
"1", "2", "1", "2","3", "2", "1", "2", "2","2", "2",
"1", "2", "1", "2","3", "2", "1", "2", "2","2", "2", "1", "1", "2","2", "2", "1", "1","2","2", "2", "1", "1", "2","2", "2", "1", "1", "2","2", "2", "1", "1","2","2", "2", "1", "1", "2","2", "2", "1", "2","2", "2", "1", "1", "2","2", "2", "1", "1", "2","2", "2", "1", "1","1",
"1","1","1","1","1")
df<- data.frame (Var, Var2)
Additionally, I want to count the occurences for each variable, if it occurs. So I do nit want to count the occurences in the whole vector, but a list like this. Ideally with the corrected values.
# Var Occurence
#1 A 6
#2 D 1
#3 A 4
#4 B 10
#5 C 1
#6 B 2 ...
I only get to count the values for the whole vector to get with
table (Var)
By the following code I get a column, which start counting each time the "Var" changes.
df$Var <- with(df, ave(Var, FUN = function(x) sequence(rle(as.character(x))$lengths)))

This may be easier with data.table. Do a grouping by the rleid (run-length-id) of the 'Var', and get the count (.N), then remove the outlier observations by creating a logical expression in i (from the boxplot outliers)
library(data.table)
setDT(df)[, .N, .(Var, grp = rleid(Var))][, grp := NULL][
!N %in% boxplot(N, plot = FALSE)$out]
-output
Var N
1: A 6
2: D 1
3: A 4
4: B 10
5: C 1
6: B 2
7: C 10
8: D 8
9: A 12
10: B 12
11: C 16
12: D 5
rleid can take multiple input columns as the first argument is variadic (...) - from ?rleid
rleid(..., prefix=NULL)
... A sequence of numeric, integer64, character or logical vectors, all of same length. For interactive use.
Therefore, if we have multiple columns, either specify the columns or may use rleidv and the subset of data.frame/data.table as input
setDT(df)[, .N, .(Var, Var2, grp = rleid(Var, Var2))][,
grp := NULL][ !N %in% boxplot(N, plot = FALSE)$out]

How to format data from excel containing two rows of column headers to be able to use in R?

I am importing the following table 1 into R but am struggling with the formatting, as each column has two headers. My desired output is the second table 2. I plan to use tidyr to gather the data.
Another obstacle I have is the merged cells. I have been using fillMergedCells=TRUE to duplicate this.
read.xlsx(xlsxFile ="C:/Users/X/X/Desktop/X.xlsx",fillMergedCells = TRUE)

One option would be to
read your excel file with option colNames = FALSE
Paste the first two rows together and use the result as the column names. Here I use an underscore as the separator which makes it easy to split the names later on.
Get rid of the first two rows
Use tidyr::pivot_longer to convert to long format.
# df <- openxlsx::read.xlsx(xlsxFile ="data/test2.xlsx", fillMergedCells = TRUE, colNames = FALSE)
# Use first two rows as names
names(df) <- paste(df[1, ], df[2, ], sep = "_")
names(df)[1] <- "category"
# Get rid of first two rows and columns containing year average
df <- df[-c(1:2), ]
df <- df[, !grepl("^Year", names(df))]
library(tidyr)
library(dplyr)
df %>%
pivot_longer(-category, names_to = c("Time", ".value"), names_pattern = "^(.*?)_(.*)$") %>%
arrange(Time)
#> # A tibble: 16 × 4
#> category Time Y Z
#> <chr> <chr> <chr> <chr>
#> 1 Total Feb-21 1 1
#> 2 A Feb-21 2 2
#> 3 B Feb-21 3 3
#> 4 C Feb-21 4 4
#> 5 D Feb-21 5 5
#> 6 E Feb-21 6 6
#> 7 F Feb-21 7 7
#> 8 G Feb-21 8 8
#> 9 Total Jan-21 1 1
#> 10 A Jan-21 2 2
#> 11 B Jan-21 3 3
#> 12 C Jan-21 4 4
#> 13 D Jan-21 5 5
#> 14 E Jan-21 6 6
#> 15 F Jan-21 7 7
#> 16 G Jan-21 8 8
DATA
df <- structure(list(X1 = c(
NA, NA, "Total", "A", "B", "C", "D", "E",
"F", "G"
), X2 = c(
"Year Rolling Avg.", "Share", NA, "1", "1",
"1", "1", "1", "1", "1"
), X3 = c(
"Year Rolling Avg.", "Y", "1",
"2", "3", "4", "5", "6", "7", "8"
), X4 = c(
"Year Rolling Avg.",
"Z", "1", "2", "3", "4", "5", "6", "7", "8"
), X5 = c(
"Jan-21",
"Y", "1", "2", "3", "4", "5", "6", "7", "8"
), X6 = c(
"Jan-21",
"Z", "1", "2", "3", "4", "5", "6", "7", "8"
), X7 = c(
"Feb-21",
"Y", "1", "2", "3", "4", "5", "6", "7", "8"
), X8 = c(
"Feb-21",
"Z", "1", "2", "3", "4", "5", "6", "7", "8"
)), row.names = c(
NA,
10L
), class = "data.frame")

Why do I get the error message "Error: unexpected '=' in:" when using the fct_collpase function?

Data frame = qog_std3
factor = btid
I am trying to collapse this ordinal level factor using following code:
I get the following error message:
Error: unexpected '=' in:
"btid4 <- fct_collapse(qog_std3$btid,
1="
Can anyone explain to me why the use of "=" provides this error and what I can do about it?
Any alternative solution would also be deeply appreciated.

If the column is factor or character, we need to quote the name especically when it is numeric. It is not an issue when it is non-numeric
fct_collapse(df1$btid, "1" = c("1", "2"))
#[1] 1 1 3 3 4 5 1 1
#Levels: 1 3 4 5
It can be also backquotes
fct_collapse(df1$btid, `1` = c("1", "2"))
#[1] 1 1 3 3 4 5 1 1
#Levels: 1 3 4 5
whereas if we specify the unquoted numeric value
fct_collapse(df1$btid, 1 = c("1", "2"))
Error: unexpected '=' in " fct_collapse(df1$btid, 1 ="
However, this is not an issue when it is character
fct_collapse(df1$id2, AB = c("A", "B"))
#[1] AB AB C D AB AB C AB
#Levels: AB C D
data
df1 <- structure(list(btid = c("1", "1", "3", "3", "4", "5", "1", "2"
), id2 = c("A", "B", "C", "D", "A", "B", "C", "A")), row.names = c(NA,
-8L), class = "data.frame")

Expanding data frame with "mirror" observations

I have a data frame arranged as follows:
df <- structure(list(name1= c("A","A","B"),
name2 = c("B", "C","C"),
size = c(10,20,30)),.Names=c("name1","name2","size"),
row.names = c("1", "2", "3"), class =("data.frame"))
I would like to add "mirror" observations as follows:
df <- structure(list(name1 = c("A","B","A", "C", "B", "C"),
name2 = c("B", "A","C", "A", "C", "B"),
size = c(10,10,20,20,30,30)),.Names=c("name1","name2","size"),
row.names = c("1", "2", "3", "4", "5", "6"), class =("data.frame"))
Inputs would be much appreciated.

We can do this in two steps,
df1 <- df[rep(rownames(df), each = 2),]
df1[c(FALSE, TRUE), 1:2] <- df1[c(FALSE, TRUE), 2:1]
df1
# name1 name2 size
#1 A B 10
#1.1 B A 10
#2 A C 20
#2.1 C A 20
#3 B C 30
#3.1 C B 30

We can do
library(data.table)
rbindlist(list(df, df[c(2:1, 3)]))

Error when subsetting based on adjusted values of different data frame in R

I am asking a side-question about the method I learned here from #redmode :
Subsetting based on values of a different data frame in R
When I try to dynamically adjust the level I want to subset by:
N <- nrow(A)
cond <- sapply(3:N, function(i) sum(A[i,] > 0.95*B[i,])==2)
rbind(A[1:2,], subset(A[3:N,], cond))
I get an error
Error in FUN(left, right) : non-numeric argument to binary operator.
Can you think of a way I can get rows pertaining to values in A that are greater than 95% of the value in B? Thank you.
Here is code for A and B to play with.
A <- structure(list(name1 = c("trt", "0", "1", "10", "1", "1", "10"
), name2 = c("ctrl", "3", "1", "1", "1", "1", "10")), .Names = c("name1",
"name2"), row.names = c("cond", "hour", "A", "B", "C", "D", "E"
), class = "data.frame")
B <- structure(list(name1 = c("trt", "0", "1", "1", "1", "1", "9.4"),
name2 = c("ctrl", "3", "1", "10", "1", "1", "9.4")), .Names = c("name1",
"name2"), row.names = c("cond", "hour", "A", "B", "C", "D", "E"
), class = "data.frame")

You have some serious formatting issues with your data.
First, columns should be of the same data type, rows should be observations. (not always true, but a very good way to start) Here you have a row called cond, then a row called hour, then a series of classifications I'm guessing. The way you're data is presented to begin with doesn't make much sense and doesn't lend itself to easy manipulation of your data. But all is not lost. This is what I would do:
Reorganize my data:
C <- data.frame(matrix(as.numeric(unlist(A)), ncol=2)[-(1:2), ])
colnames(C) <- c('A.trt', 'A.cntr')
rownames(C) <- LETTERS[1:nrow(C)]
D <- data.frame(matrix(as.numeric(unlist(B)), ncol=2)[-(1:2), ])
colnames(D) <- c('B.trt', 'B.cntr')
(df <- cbind(C, D))
Which gives:
# A.trt A.cntr B.trt B.cntr
# A 1 1 1.0 1.0
# B 10 1 1.0 10.0
# C 1 1 1.0 1.0
# D 1 1 1.0 1.0
# E 10 10 9.4 9.4
Then you're problem is easily solved by:
df[which(df[, 1] > 0.95*df[, 3] & df[, 2] > 0.95*df[, 4]), ]
# A.trt A.cntr B.trt B.cntr
# A 1 1 1.0 1.0
# C 1 1 1.0 1.0
# D 1 1 1.0 1.0
# E 10 10 9.4 9.4

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

count_if (EXPSS) with multiple conditions in R - r

Related

How to count occurence of variable eacht time it occurs and remove outliers in R

How to format data from excel containing two rows of column headers to be able to use in R?

Why do I get the error message "Error: unexpected '=' in:" when using the fct_collpase function?

Expanding data frame with "mirror" observations

Error when subsetting based on adjusted values of different data frame in R

Categories

Resources