How can I plot a crosstable of summary data - r

I´m summarizing a set of helpdesk tickets using R tapply with summary. How could I plot a crosstable of this data to show a five-number-summary of each category?
tsSummary = tapply(tickets$timeSpent, tickets$category, summary)
$ERROR
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 1.16 16.26 81.51 61.68 578.40
$SUPPORT
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 3.28 24.19 93.02 93.38 2328.00
$DEFECT
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 3.71 28.16 134.20 148.90 2572.00
$SYSTEM
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 5.33 22.45 95.31 64.61 1178.00
$OTHERS
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 1.99 22.17 102.60 115.60 3461.00
I would like to plot (as image ) something like this:
Min. 1st Qu. Median Mean 3rd Qu. Max.
$ERROR 0.00 1.16 16.26 81.51 61.68 578.40
$SUPPORT 0.00 3.28 24.19 93.02 93.38 2328.00
$DEFECT 0.00 3.71 28.16 134.20 148.90 2572.00
$SYSTEM 0.00 5.33 22.45 95.31 64.61 1178.00
$OTHERS 0.00 1.99 22.17 102.60 115.60 3461.00
Any help?

If all you want to do is collapse your list of tables into one, you can use do.call. Here's an example similar to your using the iris dataset:
tsSummary <- tapply(iris$Sepal.Length, iris$Species, summary)
do.call(rbind, tsSummary)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## setosa 4.3 4.800 5.0 5.006 5.2 5.8
## versicolor 4.9 5.600 5.9 5.936 6.3 7.0
## virginica 4.9 6.225 6.5 6.588 6.9 7.9
If you want to visualize this. You probably want a boxplot:
boxplot(Sepal.Length ~ Species, data = iris)

Related

write.csv() using format() adds white spaces to NA

Incidentally, I have found this problem with write.csv() and NA values if using format():
d <- data.frame(id=1:10, f=0.1*(1:10),f2=0.01*(1:10))
d$f2[3] <- NA
summary(d)
id f f2
Min. : 1.00 Min. :0.100 Min. :0.01000
1st Qu.: 3.25 1st Qu.:0.325 1st Qu.:0.04000
Median : 5.50 Median :0.550 Median :0.06000
Mean : 5.50 Mean :0.550 Mean :0.05778
3rd Qu.: 7.75 3rd Qu.:0.775 3rd Qu.:0.08000
Max. :10.00 Max. :1.000 Max. :0.10000
NA's :1
format(d, nsmall=3)
id f f2
1 1 0.100 0.010
2 2 0.200 0.020
3 3 0.300 NA
4 4 0.400 0.040
5 5 0.500 0.050
6 6 0.600 0.060
7 7 0.700 0.070
8 8 0.800 0.080
9 9 0.900 0.090
10 10 1.000 0.100
format(d$f2, nsmall = 3)
[1] "0.010" "0.020" " NA" "0.040" "0.050" "0.060" "0.070" "0.080" "0.090" "0.100"
format(d$f2[3])
[1] "NA"
write.csv(format(d,nsmall=3),file="test.csv",row.names = FALSE)
d2 <- read.csv("test.csv")
summary(d2)
id f f2
Min. : 1.00 Min. :0.100 Length:10
1st Qu.: 3.25 1st Qu.:0.325 Class :character
Median : 5.50 Median :0.550 Mode :character
Mean : 5.50 Mean :0.550
3rd Qu.: 7.75 3rd Qu.:0.775
Max. :10.00 Max. :1.000
I check test.csv and find that the cell corresponding to d$f[3] is not "NA" but " NA"
d2 <- read.csv("test.csv", na.strings=" NA")
summary(d2)
id f f2
Min. : 1.00 Min. :0.100 Min. :0.01000
1st Qu.: 3.25 1st Qu.:0.325 1st Qu.:0.04000
Median : 5.50 Median :0.550 Median :0.06000
Mean : 5.50 Mean :0.550 Mean :0.05778
3rd Qu.: 7.75 3rd Qu.:0.775 3rd Qu.:0.08000
Max. :10.00 Max. :1.000 Max. :0.10000
NA's :1
Should this behavior of format(), adding white spaces to NAs, not be considered a bug?
Not a critical issue as using format() within write.csv() is not really necessary (I found this problem in a very particular case), but, in principle, NAs should not be affected by any format. One thing is having a nicer print to the console and another actually saving those white spaces to a file that could be read back into R.

Summarize the same variables from multiple dataframes in one table

I have voter and party-data from several datasets that I further separated into different dataframes and lists to make it comparable. I could just use the summary command on each of them individually then compare manually, but I was wondering whether there was a way to get them all together and into one table?
Here's a sample of what I have:
> summary(eco$rilenew)
Min. 1st Qu. Median Mean 3rd Qu. Max.
3 4 4 4 4 5
> summary(ecovoters)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.000 3.000 4.000 3.744 5.000 10.000 26
> summary(lef$rilenew)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.000 3.000 3.000 3.692 4.000 7.000
> summary(lefvoters)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.000 2.000 3.000 3.612 5.000 10.000 332
> summary(soc$rilenew)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.000 4.000 4.000 4.143 5.000 6.000
> summary(socvoters)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.000 3.000 4.000 3.674 5.000 10.000 346
Is there a way I can summarize these lists (ecovoters, lefvoters, socvoters etc) and the dataframe variables (eco$rilenew, lef$rilenew, soc$rilenew etc) together and have them in one table?
You could put everything into a list and summarize with a small custom function.
L <- list(eco$rilenew, ecovoters, lef$rilenew,
lefvoters, soc$rilenew, socvoters)
t(sapply(L, function(x) {
s <- summary(x)
length(s) <- 7
names(s)[7] <- "NA's"
s[7] <- ifelse(!any(is.na(x)), 0, s[7])
return(s)
}))
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
[1,] 0.9820673 3.3320662 3.958665 3.949512 4.625109 7.229069 0
[2,] -4.8259384 0.5028293 3.220546 3.301452 6.229384 9.585749 26
[3,] -0.3717391 2.3280366 3.009360 3.013908 3.702156 6.584659 0
[4,] -2.6569493 1.6674330 3.069440 3.015325 4.281100 8.808432 332
[5,] -2.3625651 2.4964361 3.886673 3.912009 5.327401 10.349040 0
[6,] -2.4719404 1.3635785 2.790523 2.854812 4.154936 8.491347 346
Data
set.seed(42)
eco <- data.frame(rilenew=rnorm(800, 4, 1))
ecovoters <- rnorm(75, 4, 4)
ecovoters[sample(length(ecovoters), 26)] <- NA
lef <- data.frame(rilenew=rnorm(900, 3, 1))
lefvoters <- rnorm(700, 3, 2)
lefvoters[sample(length(lefvoters), 332)] <- NA
soc <- data.frame(rilenew=rnorm(900, 4, 2))
socvoters <- rnorm(700, 3, 2)
socvoters[sample(length(socvoters), 346)] <- NA
Can use map from tidyverse to get the summary list, then if you want the result as dataframe, then plyr::ldply can help to convert list to dataframe:
ll = map(L, summary)
ll
plyr::ldply(ll, rbind)
> ll = map(L, summary)
> ll
[[1]]
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.9821 3.3321 3.9587 3.9495 4.6251 7.2291
[[2]]
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-4.331 1.347 3.726 3.793 6.653 16.845 26
[[3]]
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.3717 2.3360 3.0125 3.0174 3.7022 6.5847
[[4]]
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-2.657 1.795 3.039 3.013 4.395 9.942 332
[[5]]
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.363 2.503 3.909 3.920 5.327 10.349
[[6]]
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
-3.278 1.449 2.732 2.761 4.062 8.171 346
> plyr::ldply(ll, rbind)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1 0.9820673 3.332066 3.958665 3.949512 4.625109 7.229069 NA
2 -4.3312551 1.346532 3.725708 3.793431 6.652917 16.844796 26
3 -0.3717391 2.335959 3.012507 3.017438 3.702156 6.584659 NA
4 -2.6569493 1.795307 3.038905 3.012928 4.395338 9.941819 332
5 -2.3625651 2.503324 3.908727 3.920050 5.327401 10.349040 NA
6 -3.2779863 1.448814 2.732515 2.760569 4.061854 8.170793 346

Convert tapply summary result to data frame [duplicate]

This question already has answers here:
Apply multiple functions to column using tapply
(2 answers)
How can I write the code to generate a summarized table in R? [duplicate]
(2 answers)
Closed 4 years ago.
My code is:
Normality <- tapply(input$TotalAuthBdNet.USD., input$Country, summary)
The output displayed is:
$Albania
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000e+00 1.066e+04 2.730e+04 3.403e+07 5.015e+04 2.720e+09
$Angola
Min. 1st Qu. Median Mean 3rd Qu. Max.
5405 15323 52522 486451 170000 4513196
$`Antigua and Barbuda`
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
22622 22622 22622 22622 22622 22622 2
$Argentina
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0 15814 45000 212800 193626 4080293 15
Country names are in rows and each country will have such statistic. I want the output as:
Country Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
Albania 0.000e+00 1.066e+04 2.730e+04 3.403e+07 5.015e+04 2.720e+09
Angola 5405 15323 52522 486451 170000 4513196
Argentina 0 15814 45000 212800 193626 4080293 15
The country name is a list identified from the file.
A simple rbind would do.. E.g.
do.call(rbind, tapply(mpg$year, mpg$model, summary))
You can also directly call aggregate so you don't need the extra step:
aggregate(Sepal.Length ~ Species, iris, summary)
# Species Sepal.Length.Min. Sepal.Length.1st Qu. Sepal.Length.Median Sepal.Length.Mean Sepal.Length.3rd Qu. Sepal.Length.Max.
# 1 setosa 4.300 4.800 5.000 5.006 5.200 5.800
# 2 versicolor 4.900 5.600 5.900 5.936 6.300 7.000
# 3 virginica 4.900 6.225 6.500 6.588 6.900 7.900

Summary of each column of data.table or data.frame for xtable

I want to use summary of each column of a data.table or data.frame to be used for sweave with xtable package. Here is MWE.
summary(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 setosa :50
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 versicolor:50
Median :5.800 Median :3.000 Median :4.350 Median :1.300 virginica :50
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
library(xtable)
lapply(iris, summary)
$Sepal.Length
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.300 5.100 5.800 5.843 6.400 7.900
$Sepal.Width
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.000 2.800 3.000 3.057 3.300 4.400
$Petal.Length
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 1.600 4.350 3.758 5.100 6.900
$Petal.Width
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.100 0.300 1.300 1.199 1.800 2.500
$Species
setosa versicolor virginica
50 50 50
xtableList(lapply(iris, summary))
Error in xtable.table(x[[i]], caption = caption, label = label, align = align, :
xtable.table is not implemented for tables of > 2 dimensions
Wonder how to get summary of each column in separate table to be used for sweave or knitr. Thanks in advance.

Restructure output of R summary function

Is there an easy way to change the output format for R's summary function so that the results print in a column instead of row? R does this automatically when you pass summary a data frame. I'd like to print summary statistics in a column when I pass it a single vector. So instead of this:
>summary(vector)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 1.000 2.000 6.699 6.000 559.000
It would look something like this:
>summary(vector)
Min. 1.000
1st Qu. 1.000
Median 2.000
Mean 6.699
3rd Qu. 6.000
Max. 559.000
Sure. Treat it as a data.frame:
set.seed(1)
x <- sample(30, 100, TRUE)
summary(x)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 1.00 10.00 15.00 16.03 23.25 30.00
summary(data.frame(x))
# x
# Min. : 1.00
# 1st Qu.:10.00
# Median :15.00
# Mean :16.03
# 3rd Qu.:23.25
# Max. :30.00
For slightly more usable output, you can use data.frame(unclass(.)):
data.frame(val = unclass(summary(x)))
# val
# Min. 1.00
# 1st Qu. 10.00
# Median 15.00
# Mean 16.03
# 3rd Qu. 23.25
# Max. 30.00
Or you can use stack:
stack(summary(x))
# values ind
# 1 1.00 Min.
# 2 10.00 1st Qu.
# 3 15.00 Median
# 4 16.03 Mean
# 5 23.25 3rd Qu.
# 6 30.00 Max.

Resources