R Programming: How to drop variable labels as first column name in the table output from the fre function of the expss package? - r

I'm doing exploratory analysis of survey data and the dataframe is a haven labelled dataset, that is, each variable already has value labels and variable labels.
I want to store frequencies tables in a list, and then name each list element as the variable label. I'm using the expss package. The problem is that the output tables contain in the first column name this description: values2labels(Df$var. How can this description be dropped from the table?
Reproducible example:
# Dataframe
df <- data.frame(sex = c(1, 1, 2, 2, 1, 2, 2, 2, 1, 2),
agegroup= c(1, 3, 1, 2, 3, 3, 2, 2, 2, 1),
weight = c(100, 20, 400, 300, 50, 50, 80, 250, 100, 100))
library(expss)
# Variable labels
var_lab(df$sex) <-"Sex"
var_lab(df$agegroup) <-"Age group"
# Value labels
val_lab(df$sex) <- make_labels("1 Male
2 Female")
val_lab(df$agegroup) <- make_labels("1 1-29
2 30-49
3 50 and more")
# Save variable labels
var_labels1 <- var_lab(df$sex)
var_labels2 <- var_lab(df$agegroup)
# Drop variable labels
var_lab(df$sex) <- NULL
var_lab(df$agegroup) <- NULL
# Save frequencies
f1 <- fre(values2labels(df$sex))
f2 <- fre(values2labels(df$agegroup))
# Note: I use the function 'values2labels' from 'expss' package in order to display the value <br />
labels instead of the values of the variable.In this example, since I manually created the value <br />
labels, I don't need that function, but when I import haven labelled data, I need it to
display value labels by default.
# Add frequencies on list
my_list <- list(f1, f2)
# Name lists elements as variable labels
names(my_list) <- c(var_labels1,
var_labels2)
In the following output, how can I get rid of the first column name on both tables: values2labels(df$sex) and values2labels(df$agegroup) ?
$Sex
| values2labels(df$sex) | Count | Valid percent | Percent | Responses, % | Cumulative responses, % |
| --------------------- | ----- | ------------- | ------- | ------------ | ----------------------- |
| Female | 6 | 60 | 60 | 60 | 60 |
| Male | 4 | 40 | 40 | 40 | 100 |
| #Total | 10 | 100 | 100 | 100 | |
| <NA> | 0 | | 0 | | |
$`Age group`
| values2labels(df$agegroup) | Count | Valid percent | Percent | Responses, % | Cumulative responses, % |
| -------------------------- | ----- | ------------- | ------- | ------------ | ----------------------- |
| 1-29 | 3 | 30 | 30 | 30 | 30 |
| 30-49 | 4 | 40 | 40 | 40 | 70 |
| 50 and more | 3 | 30 | 30 | 30 | 100 |
| #Total | 10 | 100 | 100 | 100 | |
| <NA> | 0 | | 0 | | |

You need to set var_lab to empty string instead of NULL:
library(expss)
a = 1:2
val_lab(a) = c("One" = 1, "Two" = 2)
var_lab(a) = ""
fre(values2labels(a))
# | | Count | Valid percent | Percent | Responses, % | Cumulative responses, % |
# | ------ | ----- | ------------- | ------- | ------------ | ----------------------- |
# | One | 1 | 50 | 50 | 50 | 50 |
# | Two | 1 | 50 | 50 | 50 | 100 |
# | #Total | 2 | 100 | 100 | 100 | |
# | <NA> | 0 | | 0 | | |

Related

frequency table for banner list

I am trying to create a function to generate frequency table (to show count , valid percentage , percentage) for list of banner.
I want to export tables in xlsx files.
like for variable "gear" , i want to calculate the table for banner below ()
library(expss)
df <- mtcars
df$all<- 1
df$small<-ifelse(df$vs==1,1,NA)
df$large<-ifelse(df$am ==1,1,NA)
val_lab(df$all)<-c("Total"=1)
val_lab(df$small)<-c("Small"=1)
val_lab(df$large)<-c("Large"=1)
banner <- list(dat$all,dat$small,dat$large)
data <- df
var <- "gear"
var1 <- rlang::parse_expr(var)
expss::var_lab(data[[var]])
#tab1 <- expss::fre(data[[var1]])
table1 <- expss::fre(data[[var1]],
stat_lab = getOption("expss.fre_stat_lab", c("Count N", "Valid percent", "Percent",
"Responses, %", "Cumulative responses, %")))
table1
the output table should be like below
You need to make custom function around fre:
library(expss)
df <- mtcars
df$all<- 1
df$small<-ifelse(df$vs==1,1,NA)
df$large<-ifelse(df$am ==1,1,NA)
val_lab(df$all)<-c("Total"=1)
val_lab(df$small)<-c("Small"=1)
val_lab(df$large)<-c("Large"=1)
my_fre <- function(curr_var) setNames(expss::fre(curr_var)[, 1:3],
c("row_labels", "Count N", "Valid percent"))
cross_fun_df(df, gear, list(all, small, large), fun = my_fre)
# | | Total | | Small | | Large | |
# | | Count N | Valid percent | Count N | Valid percent | Count N | Valid percent |
# | ------ | ------- | ------------- | ------- | ------------- | ------- | ------------- |
# | 3 | 15 | 46.88 | 3 | 21.43 | | |
# | 4 | 12 | 37.50 | 10 | 71.43 | 8 | 61.54 |
# | 5 | 5 | 15.62 | 1 | 7.14 | 5 | 38.46 |
# | #Total | 32 | 100.00 | 14 | 100.00 | 13 | 100.00 |
# | <NA> | 0 | | 0 | | 0 | |

How to drop unused value labels in crosstabulations table outputs using cro function from expss package?

I'm using heaven labelled dataframes (variables already have value labels when importing datasets). I need to run many crosstabulations of two variables. I’m using the cro function from expss package because by default displays value labels, and computes weighted crosstabs.
However, the output tables I get display unused value labels. How can I drop unused labels without manually dropping unused value labels for each variable? (by the way: the fre function from expss package has this argument by default: drop_unused_labels = TRUE, but cro function doesn’t)
Here is a reproducible example:
# Dataframe
df <- data.frame(sex = c(1, 2, 99, 2, 1, 2, 2, 2, 1, 2),
agegroup= c(1, 2, 99, 2, 3, 3, 2, 2, 2, 1),
weight = c(100, 20, 400, 300, 50, 50, 80, 250, 100, 100))
library(expss)
# Variable labels
var_lab(df$sex) <-"Sex"
var_lab(df$agegroup) <-"Age group"
# Value labels
val_lab(df$sex) <- make_labels("1 Male
2 Female
97 Didn't know
98 Didn't respond
99 Abandoned survey")
val_lab(df$agegroup) <- make_labels("1 1-29
2 30-49
3 50 and more
97 Didn't know
98 Didn't respond
99 Abandoned survey")
cro(df$sex, df$agegroup, weight = df$weight)
| | | Age group | | | | | |
| | | 1-29 | 30-49 | 50 and more | Didn't know | Didn't respond | Abandoned survey |
| --- | ---------------- | --------- | ----- | ----------- | ----------- | -------------- | ---------------- |
| Sex | Male | 100 | 100 | 50 | | | |
| | Female | 100 | 650 | 50 | | | |
| | Didn't know | | | | | | |
| | Didn't respond | | | | | | |
| | Abandoned survey | | | | | | 400 |
| | #Total cases | 2 | 5 | 2 | | | 1 |
I want to get rid of the columns and rows called ‘Didn't know’ and ‘Didn't respond’.
You can use drop_unused_labels function to remove the labels which are not used.
library(expss)
df1 <- drop_unused_labels(df)
cro(df1$sex, df1$agegroup, weight = df1$weight)
| | | Age group | | | |
| | | 1-29 | 30-49 | 50 and more | Abandoned survey |
| --- | ---------------- | --------- | ----- | ----------- | ---------------- |
| Sex | Male | 100 | 100 | 50 | |
| | Female | 100 | 650 | 50 | |
| | Abandoned survey | | | | 400 |
| | #Total cases | 2 | 5 | 2 | 1 |

How do I analyze Market Basket Output?

I have a sale data as below:
+------------+------+-------+
| Receipt ID | Item | Value |
+------------+------+-------+
| 1 | a | 2 |
| 1 | b | 3 |
| 1 | c | 2 |
| 1 | k | 4 |
| 2 | a | 2 |
| 2 | b | 5 |
| 2 | d | 6 |
| 2 | k | 7 |
| 3 | a | 8 |
| 3 | k | 1 |
| 3 | c | 2 |
| 3 | q | 3 |
| 4 | k | 4 |
| 4 | a | 5 |
| 5 | b | 6 |
| 5 | a | 7 |
| 6 | a | 8 |
| 6 | b | 3 |
| 6 | c | 4 |
+------------+------+-------+
Using APriori algorithm, I modified the Rules into different columns:
For eg, I got output as below, I trimmed support, confidence, Lift value.. I am only considering rules which mapped into different columns into Target Item, Item1, Items ({Item1,Item2} -> {Target Item})
Output is as below:
+-------------+-------+-------+
| Target Item | Item1 | Item2 |
+-------------+-------+-------+
| a | b | |
| a | b | c |
| a | k | |
+-------------+-------+-------+
I am looking to calculate the all the receipts having the rules combination and identify the Target item Sale value only in those receipts and also Combined sale value of Item 1 and Item 2 in the combination receipts:
Output should be something like below (I dont need receipt ID's from below)
+-------------+-------+-------+--------------+----------------------+------------------------------+
| Target Item | Item1 | Item2 | Receipt ID's | Value of Target Item | Remaining value(Item1+item2) |
+-------------+-------+-------+--------------+----------------------+------------------------------+
| a | b | | 1,2,5,6 | 2+2+7+8 | 3+5+6+3 |
| a | b | c | 1,6 | 2 | (3+3) + (2+4) |
| a | k | | 1,2,3,4 | 2+2+8+5 | 4+7+1+4 |
+-------------+-------+-------+--------------+----------------------+------------------------------+
To replicate the Apriori:
library(arules)
Data <- data.frame(
Receipt_ID = c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,5,5,6,6,6),
item = c('a','b','c','k','a','b','d','k','a','k','c','q','k', 'a','b','a','a', 'b', 'c'
)
,
value = c(2,3,2,4,2,5,6,7,8,1,2,3,4,5,6,7,8,3,4
)
)
write.table(Data,"item.csv",sep=',',row.names = F)
data_frame = read.transactions(
file = "item.csv",
format = "single",
sep = ",",
cols = c("Receipt_ID","item"),
rm.duplicates = T
)
rules_apriori <- apriori(data_frame)
rules_apriori
rules_tab <- as(rules_apriori, "data.frame")
rules_tab
out <- strsplit(as.character(rules_tab$rules),'=>')
rules_tab$rhs <- do.call(rbind, out)[,2]
rules_tab$lhs <- do.call(rbind, out)[,1]
rules_tab$rhs <- gsub("\\{", "", rules_tab$rhs)
rules_tab$rhs <- gsub("}", "", rules_tab$rhs)
rules_tab$lhs = gsub("}", "", rules_tab$lhs)
rules_tab$lhs = gsub("\\{", "", rules_tab$lhs)
rules_final <- data.frame (target_item = character(),item_combination = character() )
rules_final <- cbind(target_item = rules_tab$rhs,item_Combination = rules_tab$lhs)
rules_final

How do I present data from a text file to CrossTable in R?

I am having trouble importing data to R so that the CrossTable package will do a simple chi squared test. Thank you for any tips on how to import the data in the correct way: the test is fine when I enter data manually but not when I import into a table - see below. /OT
> library(gmodels)
> library(MASS)
> #When I enter the data manually there's no problem running a simple chi-squared:
> CA<-c(42,100,10,5)
> noCA<-c(20,0,140,40)
> regionalca<-cbind(CA,noCA)
> regionalca
CA noCA
[1,] 42 20
[2,] 100 0
[3,] 10 140
[4,] 5 40
> CrossTable(regionalca, fisher=FALSE, chisq=TRUE, expected=TRUE, , sresid=TRUE, format="SPSS")
Cell Contents
|-------------------------|
| Count |
| Expected Values |
| Chi-square contribution |
| Row Percent |
| Column Percent |
| Total Percent |
| Std Residual |
|-------------------------|
Total Observations in Table: 357
|
| CA | noCA | Row Total |
-------------|-----------|-----------|-----------|
[1,] | 42 | 20 | 62 |
| 27.266 | 34.734 | |
| 7.962 | 6.250 | |
| 67.742% | 32.258% | 17.367% |
| 26.752% | 10.000% | |
| 11.765% | 5.602% | |
| 2.822 | -2.500 | |
-------------|-----------|-----------|-----------|
[2,] | 100 | 0 | 100 |
| 43.978 | 56.022 | |
| 71.366 | 56.022 | |
| 100.000% | 0.000% | 28.011% |
| 63.694% | 0.000% | |
| 28.011% | 0.000% | |
| 8.448 | -7.485 | |
-------------|-----------|-----------|-----------|
[3,] | 10 | 140 | 150 |
| 65.966 | 84.034 | |
| 47.482 | 37.274 | |
| 6.667% | 93.333% | 42.017% |
| 6.369% | 70.000% | |
| 2.801% | 39.216% | |
| -6.891 | 6.105 | |
-------------|-----------|-----------|-----------|
[4,] | 5 | 40 | 45 |
| 19.790 | 25.210 | |
| 11.053 | 8.677 | |
| 11.111% | 88.889% | 12.605% |
| 3.185% | 20.000% | |
| 1.401% | 11.204% | |
| -3.325 | 2.946 | |
-------------|-----------|-----------|-----------|
Column Total | 157 | 200 | 357 |
| 43.978% | 56.022% | |
-------------|-----------|-----------|-----------|
Statistics for All Table Factors
Pearson's Chi-squared test
------------------------------------------------------------
Chi^2 = 246.0862 d.f. = 3 p = 4.595069e-53
Minimum expected frequency: 19.78992
> #But when I try to import the data from a .txt file, it becomes unacceptable:
> regionalca<-read.table(file="låtsas ca.txt", header=TRUE)
> regionalca
CA noCA
1 43 20
2 100 1
3 10 140
4 5 40
> CrossTable(regionalca, fisher=FALSE, chisq=TRUE, expected=TRUE, , sresid=TRUE, format="SPSS")
Error in margin.table(x, margin) : 'x' is not an array
> #I would really like to run the test on this table:
> regionalca<-read.table(file="låtsas ca.txt", header=TRUE)
> regionalca
region CA noCA
1 south 43 20
2 southwest 100 0
3 mid 10 140
4 north 5 40
> #Which ob
> CrossTable(regionalca, fisher=FALSE, chisq=TRUE, expected=TRUE, , sresid=TRUE, format="SPSS")
Error in if (any(x < 0) || any(is.na(x))) stop("all entries of x must be nonnegative and finite") :
missing value where TRUE/FALSE needed
In addition: Warning message:
In Ops.factor(left, right) : ‘<’ not meaningful for factors
>
The error is very explicit :
if (any(x < 0) || any(is.na(x)))
stop("all entries of x must be nonnegative and finite")
You have not eligible inputs for CrossTable ( gmodels package). I can reproduce it using your data and introduction a non negative value:
CA <- c(-1,100,10,5) ## -1 the first value
So you need to remove all this values before or setting them by another value. For example :
regionalca <- regionalca[rowSums(!regionalca < 0) == ncol(regionalca) &
rowSums(!is.na(regionalca))==ncol(regionalca),]
The probelm is that the read.table create a data.frame, yet what you need is a matrix. Note the sing c.bind() defaulting in a matrix class output. It is also specified in the error you have printed:
Error in margin.table(x, margin) : 'x' is not an array, while array equals matrix, in that case.
That is in order to fix it you need to change your code as follows:
regionalca<-as.matrix(read.table(file="låtsas ca.txt", header=TRUE))

Hmisc Table Creation

Just starting out with R and trying to figure out what works for my needs when it comes to creating "summary tables." I am used to Custom Tables in SPSS, and the CrossTable function in the package gmodels gets me almost where I need to be; not to mention it is easy to navigate for someone just starting out in R.
That said, it seems like the Hmisc table is very good at creating various summaries and exporting to LaTex (ultimately what I need to do).
My questions are:1)can you create the table below easily in the Hmsic page? 2) if so, can I interact variables (2 in the the column)? and finally 3) can I access p-values of significance tests (chi square).
Thanks in advance,
Brock
Cell Contents
|-------------------------|
| Count |
| Row Percent |
| Column Percent |
|-------------------------|
Total Observations in Table: 524
| asq[, 23]
asq[, 4] | 1 | 2 | 3 | 4 | 5 | Row Total |
-------------|-----------|-----------|-----------|-----------|-----------|-----------|
0 | 76 | 54 | 93 | 46 | 54 | 323 |
| 23.529% | 16.718% | 28.793% | 14.241% | 16.718% | 61.641% |
| 54.286% | 56.250% | 63.265% | 63.889% | 78.261% | |
-------------|-----------|-----------|-----------|-----------|-----------|-----------|
1 | 64 | 42 | 54 | 26 | 15 | 201 |
| 31.841% | 20.896% | 26.866% | 12.935% | 7.463% | 38.359% |
| 45.714% | 43.750% | 36.735% | 36.111% | 21.739% | |
-------------|-----------|-----------|-----------|-----------|-----------|-----------|
Column Total | 140 | 96 | 147 | 72 | 69 | 524 |
| 26.718% | 18.321% | 28.053% | 13.740% | 13.168% | |
-------------|-----------|-----------|-----------|-----------|-----------|-----------|
The gmodels package has a function called CrossTable, which is very nice for those used to SPSS and SAS output. Try this example:
library(gmodels) # run install.packages("gmodels") if you haven't installed the package yet
x <- sample(c("up", "down"), 100, replace = TRUE)
y <- sample(c("left", "right"), 100, replace = TRUE)
CrossTable(x, y, format = "SPSS")
This should provide you with an output just like the one you displayed on your question, very SPSS-y. :)
If you are coming from SPSS, you may be interested in the package Deducer ( http://www.deducer.org ). It has a contingency table function:
> library(Deducer)
> data(tips)
> tables<-contingency.tables(
+ row.vars=d(smoker),
+ col.vars=d(day),data=tips)
> tables<-add.chi.squared(tables)
> print(tables,prop.r=T,prop.c=T,prop.t=F)
================================================================================================================
==================================================================================
========== Table: smoker by day ==========
| day
smoker | Fri | Sat | Sun | Thur | Row Total |
-----------------------|-----------|-----------|-----------|-----------|-----------|
No Count | 4 | 45 | 57 | 45 | 151 |
Row % | 2.649% | 29.801% | 37.748% | 29.801% | 61.885% |
Column % | 21.053% | 51.724% | 75.000% | 72.581% | |
-----------------------|-----------|-----------|-----------|-----------|-----------|
Yes Count | 15 | 42 | 19 | 17 | 93 |
Row % | 16.129% | 45.161% | 20.430% | 18.280% | 38.115% |
Column % | 78.947% | 48.276% | 25.000% | 27.419% | |
-----------------------|-----------|-----------|-----------|-----------|-----------|
Column Total | 19 | 87 | 76 | 62 | 244 |
Column % | 7.787% | 35.656% | 31.148% | 25.410% | |
Large Sample
Test Statistic DF p-value | Effect Size est. Lower (%) Upper (%)
Chi Squared 25.787 3 <0.001 | Cramer's V 0.325 0.183 (2.5) 0.44 (97.5)
-----------
================================================================================================================
You can get the counts and test to latex or html using the xtable package:
> library(xtable)
> xtable(drop(extract.counts(tables)[[1]]))
> test <- contin.tests.to.table((tables[[1]]$tests))
> xtable(test)

Resources