Troubleshooting R table function to create a more complete-looking table - r

I'm attempting to create a table of relative percentage values for answers to a questionnaire (answers are graded 1-5) for a total of 3 questions.
I'm using the formattable library to convert the values in the tables to percents, but thus far I am unable to combine the results for Questions 1, 2, and 3 into 1 table.
The code I have written is:
tableq1<-percent(table(Q1val)/length(na.omit(Q1val)))
tableq1
The current output is:
What do I need to do in order to achieve this?
Ultimately, I want to have this table as a pdf or png, with gridlines that make it look clean and professional.
Per request:
dput(Q1val)
c(4, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 4, 5, 5, 5, 4, 4, 3, 4, 5,
4, 5, 1, 5, 5, 5, 4, 5, 4, 3, 5, 5, 5, 5, 5, 5, 5, 4, 5, 5, 5,
5, 5, 3, 5, 5, 4, 4, 4, 4, 5, 2, 5, 4, NA, 5, 5, 5, 3, 5, 5,
4, 5, 5, 4, 5, 5, 4, 4, 5, 4, 5, 5, 4, 5, 5, 5, 5, 5, 4, 5, 4,
4, 3, 5, 4, 4, 5, 5, 5, 4, 5, 4, 5, 4, 4, 3, 5, 5, 5, 5, 4, 3,
4, 5, 4, 4, 5, 5, 4, 4, 5, 5, 4, 5, 4, 4, 4, 5, 4, 4, 5, 5, 5,
5, 5, 4, 4, 5, 4, 5, 5, 5, 2, 4, 2, 5, 5, 3, 4, 3, 4, 5, 5, 4,
4, 3, 5, 5, 4, 5, 4, 3, 5, 3, 4, 4, 5, 4, 5, 5, 5, 5, 5, 5, 5,
5, 5, 4, 5, 5, 3, 3, 4, 5, 4, 5, 5 # , ...
)

In an improvement over my comments, I've chosen to render the result with the kableExtra package.
Solution
First load the appropriate packages:
library(dplyr)
library(formattable)
library(kableExtra)
Once you've generated the Q*val vectors in your workspace...
# ...
# Code to generate 'Q1val', 'Q2val', ...
# ...
...then run this workflow to consolidate them into a single contingency table:
# Identify all the 'Q*val' vectors.
q_names <- ls(pattern = "Q\\d+val")
# Assemble a single table.
q_all <- q_names %>%
# Capture those vectors in a (named) list.
mget(inherits = TRUE) %>%
# Turn each vector into a contingency table of percentages.
sapply(
FUN = function(x) {
percent(table(x) / length(na.omit(x)))
},
simplify = FALSE
) %>%
# Stack those contingency tables into a single table.
do.call(what = bind_rows) %>%
# Convert to 'data.frame' to preserve row names.
as.data.frame() %>%
# Add row names of "Q1", "Q2", ...; as extracted from the names "Q1val", "Q2val", ...
`rownames<-`(value = gsub(
x = q_names,
pattern = "^(Q\\d+)(val)$",
replacement = "\\1"
))
You can then prettify the table and export it as an image.
# Prettify the table.
q_pretty <- q_all %>%
# Convert into 'kable' object.
kbl() %>%
# Border the rows and stripe the cells.
kable_styling(
bootstrap_options = c("bordered", "striped")
# ...
# Further styling as desired.
# ...
) # %>%
# ...
# Further 'kable' adjustments as desired.
# ...
# Save pretty table as PNG image.
save_kable(
x = q_pretty,
file = "pretty.png"
)
Note
You can easily save as a PDF by replacing "pretty.png" with "pretty.pdf".
Result
I had to improvise my own Q2val and Q3val, but the resulting pretty.png should look like this:

Related

R: item name missing in the plot legend

With this code I get the plot I want
d <- density(mydata$item1)
plot(d)
This code is the same, but omits N/As. And there is a flaw in the plot's legend. As you can see, it doesn't tell what item is plotted, (x = .)
Can you tell where is the matter and how to fix it? Thank you for your help.
My data
structure(list(item1 = c(5, 5, 5, 5, 4, 4, 2, 1, 3, 4, 4, 3,
2, 5, 2, 4, 4, 3, 6, 5, 3, 2, 5, 3, 3, 1, 3, 5, 1, 3, 2, 6, 3,
5, 4, 4, 3, 5, 6, 3, 2, 6, 6, 5, 2, 2, 2, 3, 3, 3), item2 = c(5,
4, 5, 1, 2, 2, 3, 2, 2, 2, 2, 3, 2, 5, 1, 4, 4, 3, 3, 5, 3, 2,
4, 4, 3, 4, 4, 3, 7, NA, 2, 4, 2, 4, 2, 3, 5, 3, 5, 3, 2, 6,
6, 7, 2, 3, 2, 3, 1, 4), item3 = c(5, 5, 6, 7, 3, 4, 5, 2, 2,
6, 4, 2, 5, 7, 1, 2, 4, 5, 6, 6, 5, 2, 6, 5, 6, 4, 6, 4, 6, 4,
6, 5, 5, 6, 6, 6, 5, 6, 7, 5, 5, 7, 7, 6, 2, 6, 6, 6, 5, 3)), row.names = c(NA,
-50L), class = c("tbl_df", "tbl", "data.frame"))
Use the main = argument inside plot to make the title say whatever you want it to.
Data$item2 %>%
na.omit() %>%
density() %>%
plot(main = 'Density of Data$item2')
you had a little typo in your code as the density() call was piped into a plot call refering to the variable it was been written to ... this might have resulted in the strange plot.
In general the density() function won't work with NA values acording to the documentation so you have to set the argument na.rm = TRUE as the default is FALSE for the plot to work correctly... also as #AllanCameron pointed out in an earlier answer you can set the plot title manually.
d <- density(mydata$item2, na.rm = TRUE)
plot(d)
Possibly you can substitute, interpolate or impute the NA values so that you do not have to remove them for the denstiy() call. Though this obviously depends on your data, context and goals.

R: trying to omit missing values before plotting

Any idea why omitting N/A does not work with this code?
d <- density(Data$item2) %>%
na.omit()
I get the error Error in density.default(Data$item2) : 'x' contains missing values
This didn't work either
d <- Data %>% na.omit() %>%
density(Data$item2)
My data
structure(list(item1 = c(5, 5, 5, 5, 4, 4, 2, 1, 3,
4, 4, 3, 2, 5, 2, 4, 4, 3, 6, 5, 3, 2, 5, 3, 3, 1, 3, 5, 1, 3,
2, 6, 3, 5, 4, 4, 3, 5, 6, 3, 2, 6, 6, 5, 2, 2, 2, 3, 3, 3),
item2 = c(5, 4, 5, 1, 2, 2, 3, 2, 2, 2, 2, 3, 2,
5, 1, 4, 4, 3, 3, 5, 3, 2, 4, 4, 3, 4, 4, 3, 7, NA, 2, 4,
2, 4, 2, 3, 5, 3, 5, 3, 2, 6, 6, 7, 2, 3, 2, 3, 1, 4)), row.names = c(NA,
-50L), class = c("tbl_df", "tbl", "data.frame"))
I also tried to omit all the N/A in the beginning with this code, but it did not solve the problem
Data <- read_excel("C:/location/Data.xlsx") %>%
na.omit()
So, how to do this? Thanks for your help!
You need to remove the NA values from your data, not from the density object.
Data$item2 %>%
na.omit() %>%
density() %>%
plot()
Alternatively, use the na.rm = TRUE argument in density:
Data$item2 %>%
density(na.rm = TRUE) %>%
plot()
You can use:
`d <- Data %>% na.omit
density(d$item2)`.

Creating a histogram with appropriate counts and labels in R

I have a dataset (dat), which I am hard-coding in here:
dat = c(5, 9, 5, 6, 5, 6, 8, 4, 6, 4, 6, 6, 4, 6, 4, 6, 5, 5, 6, 5, 6, 7, 4, 5, 4, 4, 6, 4, 4, 5, 7, 6, 3, 5, 5, 5, 5, 4, 6, 3, 6, 5, 4, 6, 5, 8, 4, 8, 5, 5, 4, 4, 6, 6, 4, 6, 4, 7, 4, 1, 4, 6, 3, 6, 3, 4, 6, 6, 3, 6, 6, 2, 5, 5, 4, 7, 6)
table(dat)
By doing the table function above on the data, I see that there should be a count of 1 for values of 1, and count of 1 for values of 2. However, when I plot the data using hist, I get a count of 2.
hist(dat, col="lightgreen", labels = TRUE, xlim=c(0,10), ylim=c(0,27))
This is the first problem. The other problem is that I am trying to plot the x label value for the corresponding bin (where there should be 11 bins, labeled 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10). Even though I have no 0 values or 10 values, I would like to illustrate that they had a count of 0, and have their bins - like the rest- labeled. How can I accomplish that?
Thanks.
am = hist(dat, col="lightgreen", labels = TRUE,
breaks=seq(min(dat)-2,max(dat)),
axes=F)
axis(2)
axis(1,at=am$mids,seq(min(dat)-1,max(dat)))
Did you mean like this:
hist(dat, col="lightgreen", labels = TRUE,
xlim=c(0,10), ylim=c(0,27), breaks = 0:10, at=0:10)

Apply ifelse test to previous values?

What I need to do is test the prior n number of rows (say 5 for a nice round number) of a data.frame. This is really easy for just one, where it's
for (i in 1:nrow(data)){
ifelse(((data$V1[i]>(mean(data$V1)+2*sd(data$V1))) &
(data$V1[i-1]>(mean(data$V1)+2*sd(data$V1)))),Control[i,1]<-1,Control[i,1]<-0)
}
This works and Control is filled with 1s if the test is true and 0 if it is false.
However, I want to extend it to several more in the past, which I attempt to do with a nested for like so:
for (i in 1:nrow(data$V1)){ifelse((data$V1[i]>(mean(data$V1)+sd(data$V1))) &
(for (j in 1:4){(data$V1[(i-j)]>(mean(data$V1)+sd(data$V1)))},
Control[i,1]<-1,Control[i,1]<-0)}
This gives the following error (for simplicity I test with a single vector of values, called test ):
Error: unexpected ',' in "for (i in 1:length(test)){ifelse((test[i]>(mean(test)+sd(test))) & (for (j in 1:4){(test[(i-j)]>(mean(test)+sd(test)))},"
Trying to pad it with some parens gives the following slightly different error:
Error: unexpected ')' in "for (i in 1:length(test)){ifelse((test[i]>(mean(test)+sd(test))) & (for (j in 1:4){(test[(i-j)]>(mean(test)+sd(test))))"
test is defined like so:
test <- c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 8, 8, 7, 7)
Any help with my method, or a more efficient R method (I'm still new to the language!) is extremely welcome.
I think you need rollapply from zoo package. But it is not clear how you define your loop since you use data? Control and yo give an error in a test object.... Here I check if the current value and 4 previous values are all less than a certain value (outlier). No need to use ifelse here since the condition is not a vector.
test <- c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 8, 8, 7, 7)
library(zoo)
V <- mean(test)+2*sd(test)
rollapply(test,5,function(x)if(all(x>V)) 1 else 0)

Remove empty list from a list of lists

I have a list of lists where some of them are NA e.g. empty lists. I want to extract all the lists which are filled with data and remove all the lists which are empty(NA).
The code i'm trying is:
lapply(outputfile,function(x){
if(outputfile != NA){
test<-lapply(outputfile,unlist)
}})
But this does not work.
The list of lists is like this: (small example of random data)
list(NA, NA, NA, NA, NA, NA, list(c(5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5)))
I only want to extract the list with the 5s in it. The first 6 lists should be ignored e.g. removed.
Any help is appreciated
So, to remove NA at the first level, you could use is.na directly:
l[!is.na(l)]
Alternatively, you can also use Filter which tries to coerce the results of the evaluated function to logical and returns those elements that evaluated to TRUE. You could do, for example:
Filter(function(x) !is.na(x), l)
(or) equivalently (as #flodel writes under comment)
Filter(Negate(is.na), l)

Resources