I am running some calculations in r
df1 <- data.frame( data=mydata6$Date.created, mydata6[,-1] + mydataADDtasks[,-1])
code is running, no mistake is given. when i write
View(df1)
i see a table length 16, 5 obs of 16 variables.
But when I check
summarise (df1)
data frame with 0 columns and 1 row
And obviously i can not do any calculations with dataset. What should i do? What is wrong???
It is a normal behavior of the dplyr::summarise function. You have no group combination, so it returns an empty dataframe. You could use group_by and return a non-empty dataframe.
mtcars |>
dplyr::group_by(mpg) |>
dplyr::summarise(sum_cyl = sum(cyl))
There's probably nothing wrong with your dataframe!
You could also get an output from no grouping variables, but you still would need to supply a function to get an output:
dplyr::summarise(mtcars, sum_cyl = sum(cyl))
Maybe what you are after is summary?
> summary(mtcars)
mpg cyl disp hp drat wt
Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0 Min. :2.760 Min. :1.513
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5 1st Qu.:3.080 1st Qu.:2.581
Median :19.20 Median :6.000 Median :196.3 Median :123.0 Median :3.695 Median :3.325
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7 Mean :3.597 Mean :3.217
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0 3rd Qu.:3.920 3rd Qu.:3.610
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0 Max. :4.930 Max. :5.424
qsec vs am gear carb
Min. :14.50 Min. :0.0000 Min. :0.0000 Min. :3.000 Min. :1.000
1st Qu.:16.89 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
Median :17.71 Median :0.0000 Median :0.0000 Median :4.000 Median :2.000
Mean :17.85 Mean :0.4375 Mean :0.4062 Mean :3.688 Mean :2.812
3rd Qu.:18.90 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :22.90 Max. :1.0000 Max. :1.0000 Max. :5.000 Max. :8.000
Related
I want to be able to see summary output for few columns of iris (inbuilt dataset) inside loop using below construct, I saw here
, mget might be a solution but guess its not. Can someone help here with the latest & effective way to run it
l <- c("Sepal.Length","Sepal.Width")
for(i in l){
print(summary( mget(paste0("iris$",l))))
}
I get an error on running above
Error: value for ‘iris$Sepal.Length’ not found
Q2 How would this work for different dataframe
l <- c("iris","mtcars")
for(i in l){
print(summary( mget(l)))
}
Since you are having column names in a vector, you don't need to get them. Just use it directly as index [[ to extract the column.
Base R:
sapply(l, function(x) summary(iris[[x]]))
Sepal.Length Sepal.Width
Min. 4.300000 2.000000
1st Qu. 5.100000 2.800000
Median 5.800000 3.000000
Mean 5.843333 3.057333
3rd Qu. 6.400000 3.300000
Max. 7.900000 4.400000
Q2:
In this case because you need to get the value of an object, you literally need the get() function.
sapply(c("iris","mtcars"), function(x) summary(get(x)))
$iris
Sepal.Length Sepal.Width Petal.Length Petal.Width
Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
Median :5.800 Median :3.000 Median :4.350 Median :1.300
Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
Species
setosa :50
versicolor:50
virginica :50
$mtcars
mpg cyl disp hp
Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
Median :19.20 Median :6.000 Median :196.3 Median :123.0
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
drat wt qsec vs
Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
Median :3.695 Median :3.325 Median :17.71 Median :0.0000
Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
am gear carb
Min. :0.0000 Min. :3.000 Min. :1.000
1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
Median :0.0000 Median :4.000 Median :2.000
Mean :0.4062 Mean :3.688 Mean :2.812
3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :1.0000 Max. :5.000 Max. :8.000
I am creating an RMarkdown presentation and would like to set a custom width for the results. For example, when I use the code
---
title: 'Example presentation'
output: html_document
---
# Summarize data
```{r}
data(mtcars)
summary(mtcars)
```
The results display ~67 characters characters per line:
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
Is there a way to set a custom line width, so the results fit better to the width of my presentation? For example, are there chunk options that would allow me to show this instead?
## mpg cyl disp hp drat wt
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0 Min. :2.760 Min. :1.513
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5 1st Qu.:3.080 1st Qu.:2.581
## Median :19.20 Median :6.000 Median :196.3 Median :123.0 Median :3.695 Median :3.325
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7 Mean :3.597 Mean :3.217
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0 3rd Qu.:3.920 3rd Qu.:3.610
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0 Max. :4.930 Max. :5.424
## qsec vs am gear carb
## Min. :14.50 Min. :0.0000 Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:16.89 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :17.71 Median :0.0000 Median :0.0000 Median :4.000 Median :2.000
## Mean :17.85 Mean :0.4375 Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:18.90 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :22.90 Max. :1.0000 Max. :1.0000 Max. :5.000 Max. :8.000
---
title: 'Example presentation'
output: html_document
---
# Summarize data
```{r}
options(width = 100)
data(mtcars)
summary(mtcars)
```
Does exactly what you want
I can use the tapply function to make basic operations (e.g. using mtcars data, calculate mean weight by number of cylinders).
library(data.table)
mtcars <- data.table(mtcars)
tapply(X = mtcars[,wt],
INDEX = mtcars[,cyl],
mean)
However, I do not know how to perform more complex operations. E.g. Correlation between weight and qsec variables by number of cylinders.
I tried something like the following but it does not work.
tapply(X = mtcars[,.(wt, qsec)],
INDEX = mtcars[,cyl],
cor.test(mtcars[,wt], mtcars[,qsec]))
Error in match.fun(FUN) : 'cor.test(mtcars[, wt], mtcars[, qsec])' is not a function, character or symbol
tapply(X = rownames(mtcars[,.(wt,qsec,cyl)]),
INDEX = mtcars[,cyl],
function(r) cor.test(mtcars[r, 1],
mtcars[r, 2])
Any idea how to do this efficiently with an t/apply function?
In my mind, a tapply data.table variant should have FUNs that operate on indexed subsets of the data.table. I have defined a dt_tapply is I imagine it should behave. Seems ok practical.
library(data.table)
data(mtcars)
mtcars = data.table(mtcars)
#iterate over table with index, like tapply just for table rows
dt_tapply = function(dx,INDEX,FUN=NULL,...) {
lapply(sort(unique(INDEX)),function(i){
do.call(FUN,c(list(dx[INDEX==i,]),list(...)))
})
}
dt_tapply(mtcars,mtcars$cyl,summary)
#some custom made function computing stuff from multiple columns giving some blob output
compute_cor_wtqsec = function(dx) {
cor(dx$wt,dx$qsec)
}
#dt_tapply that function
dt_tapply(mtcars,mtcars$cyl,compute_cor_wtqsec)
[[1]]
mpg cyl disp hp drat wt qsec
Min. :21.40 Min. :4 Min. : 71.10 Min. : 52.00 Min. :3.690 Min. :1.513 Min. :16.70
1st Qu.:22.80 1st Qu.:4 1st Qu.: 78.85 1st Qu.: 65.50 1st Qu.:3.810 1st Qu.:1.885 1st Qu.:18.56
Median :26.00 Median :4 Median :108.00 Median : 91.00 Median :4.080 Median :2.200 Median :18.90
Mean :26.66 Mean :4 Mean :105.14 Mean : 82.64 Mean :4.071 Mean :2.286 Mean :19.14
3rd Qu.:30.40 3rd Qu.:4 3rd Qu.:120.65 3rd Qu.: 96.00 3rd Qu.:4.165 3rd Qu.:2.623 3rd Qu.:19.95
Max. :33.90 Max. :4 Max. :146.70 Max. :113.00 Max. :4.930 Max. :3.190 Max. :22.90
vs am gear carb
Min. :0.0000 Min. :0.0000 Min. :3.000 Min. :1.000
1st Qu.:1.0000 1st Qu.:0.5000 1st Qu.:4.000 1st Qu.:1.000
Median :1.0000 Median :1.0000 Median :4.000 Median :2.000
Mean :0.9091 Mean :0.7273 Mean :4.091 Mean :1.545
3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:2.000
Max. :1.0000 Max. :1.0000 Max. :5.000 Max. :2.000
[[2]]
mpg cyl disp hp drat wt qsec
Min. :17.80 Min. :6 Min. :145.0 Min. :105.0 Min. :2.760 Min. :2.620 Min. :15.50
1st Qu.:18.65 1st Qu.:6 1st Qu.:160.0 1st Qu.:110.0 1st Qu.:3.350 1st Qu.:2.822 1st Qu.:16.74
Median :19.70 Median :6 Median :167.6 Median :110.0 Median :3.900 Median :3.215 Median :18.30
Mean :19.74 Mean :6 Mean :183.3 Mean :122.3 Mean :3.586 Mean :3.117 Mean :17.98
3rd Qu.:21.00 3rd Qu.:6 3rd Qu.:196.3 3rd Qu.:123.0 3rd Qu.:3.910 3rd Qu.:3.440 3rd Qu.:19.17
Max. :21.40 Max. :6 Max. :258.0 Max. :175.0 Max. :3.920 Max. :3.460 Max. :20.22
vs am gear carb
Min. :0.0000 Min. :0.0000 Min. :3.000 Min. :1.000
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:3.500 1st Qu.:2.500
Median :1.0000 Median :0.0000 Median :4.000 Median :4.000
Mean :0.5714 Mean :0.4286 Mean :3.857 Mean :3.429
3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
Max. :1.0000 Max. :1.0000 Max. :5.000 Max. :6.000
[[3]]
mpg cyl disp hp drat wt qsec
Min. :10.40 Min. :8 Min. :275.8 Min. :150.0 Min. :2.760 Min. :3.170 Min. :14.50
1st Qu.:14.40 1st Qu.:8 1st Qu.:301.8 1st Qu.:176.2 1st Qu.:3.070 1st Qu.:3.533 1st Qu.:16.10
Median :15.20 Median :8 Median :350.5 Median :192.5 Median :3.115 Median :3.755 Median :17.18
Mean :15.10 Mean :8 Mean :353.1 Mean :209.2 Mean :3.229 Mean :3.999 Mean :16.77
3rd Qu.:16.25 3rd Qu.:8 3rd Qu.:390.0 3rd Qu.:241.2 3rd Qu.:3.225 3rd Qu.:4.014 3rd Qu.:17.55
Max. :19.20 Max. :8 Max. :472.0 Max. :335.0 Max. :4.220 Max. :5.424 Max. :18.00
vs am gear carb
Min. :0 Min. :0.0000 Min. :3.000 Min. :2.00
1st Qu.:0 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.25
Median :0 Median :0.0000 Median :3.000 Median :3.50
Mean :0 Mean :0.1429 Mean :3.286 Mean :3.50
3rd Qu.:0 3rd Qu.:0.0000 3rd Qu.:3.000 3rd Qu.:4.00
Max. :0 Max. :1.0000 Max. :5.000 Max. :8.00
[[1]]
[1] 0.6380214
[[2]]
[1] 0.8659614
[[3]]
[1] 0.5365487
Instead of writing summary(...) for each list I tried the following code:
test <- c('list1', 'list2')
summary(test)
since I guess that R functions read objects and all objects are vectors I thought this would work but it does not. Anyone knows why this is not working and how I can get the summaries of all the lists in one command?
You can use lapply to loop over every element in the list
#Sample data
test <- list(mtcars, iris)
lapply(test, summary)
#[[1]]
# mpg cyl disp hp drat
#Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0 Min. :2.760
#1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5 1st Qu.:3.080
#Median :19.20 Median :6.000 Median :196.3 Median :123.0 Median :3.695
#Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7 Mean :3.597
#3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0 3rd Qu.:3.920
#Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0 Max. :4.930
# wt qsec vs am gear
# Min. :1.513 Min. :14.50 Min. :0.0000 Min. :0.0000 Min. :3.000
#1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:3.000
#Median :3.325 Median :17.71 Median :0.0000 Median :0.0000 Median :4.000
#Mean :3.217 Mean :17.85 Mean :0.4375 Mean :0.4062 Mean :3.688
#3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:4.000
#Max. :5.424 Max. :22.90 Max. :1.0000 Max. :1.0000 Max. :5.000
# carb
#Min. :1.000
#1st Qu.:2.000
#Median :2.000
#Mean :2.812
#3rd Qu.:4.000
#Max. :8.000
#[[2]]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
# Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 setosa :50
# 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 versicolor:50
# Median :5.800 Median :3.000 Median :4.350 Median :1.300 virginica :50
# Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
# 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
# Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
As per the comment by #docendo discimus,
If the OP has taken lists in the form of character as mentioned in the question.
test <- c('list1', 'list2')
in that case mget should be used
lapply(mget(test), summary)
Just self-learning R at the moment and have gotten a little stuck. I have a dataset and I want to summarize (find mean, max, etc) but only selecting those cases that have a particular value on a certain variable.
Alternatively, I guess the same outcome could be done by summarizing only certain rows in the dataset (ie summarize only rows 1 thru 20).
Could someone lend a helping hand? Thanks so much
mydata<-mtcars
a. Find summary for rows 1 to 20
summary(mydata[1:20,])
mpg cyl disp hp drat wt qsec vs am
Min. :10.40 Min. :4.0 Min. : 71.1 Min. : 52.0 Min. :2.760 Min. :1.615 Min. :15.84 Min. :0.0 Min. :0.0
1st Qu.:16.10 1st Qu.:4.0 1st Qu.:145.2 1st Qu.: 94.5 1st Qu.:3.070 1st Qu.:2.811 1st Qu.:17.41 1st Qu.:0.0 1st Qu.:0.0
Median :18.95 Median :6.0 Median :196.3 Median :116.5 Median :3.460 Median :3.440 Median :18.15 Median :0.5 Median :0.0
Mean :20.13 Mean :6.2 Mean :233.9 Mean :136.2 Mean :3.545 Mean :3.398 Mean :18.44 Mean :0.5 Mean :0.3
3rd Qu.:22.80 3rd Qu.:8.0 3rd Qu.:296.9 3rd Qu.:180.0 3rd Qu.:3.920 3rd Qu.:3.743 3rd Qu.:19.45 3rd Qu.:1.0 3rd Qu.:1.0
Max. :33.90 Max. :8.0 Max. :472.0 Max. :245.0 Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0 Max. :1.0
gear carb
Min. :3.0 Min. :1.00
1st Qu.:3.0 1st Qu.:1.75
Median :3.5 Median :3.00
Mean :3.5 Mean :2.70
3rd Qu.:4.0 3rd Qu.:4.00
Max. :4.0 Max. :4.00
b. Find summary when value of cyl=4
summary(mydata[mydata$cyl==4,])
mpg cyl disp hp drat wt qsec vs am
Min. :21.40 Min. :4 Min. : 71.10 Min. : 52.00 Min. :3.690 Min. :1.513 Min. :16.70 Min. :0.0000 Min. :0.0000
1st Qu.:22.80 1st Qu.:4 1st Qu.: 78.85 1st Qu.: 65.50 1st Qu.:3.810 1st Qu.:1.885 1st Qu.:18.56 1st Qu.:1.0000 1st Qu.:0.5000
Median :26.00 Median :4 Median :108.00 Median : 91.00 Median :4.080 Median :2.200 Median :18.90 Median :1.0000 Median :1.0000
Mean :26.66 Mean :4 Mean :105.14 Mean : 82.64 Mean :4.071 Mean :2.286 Mean :19.14 Mean :0.9091 Mean :0.7273
3rd Qu.:30.40 3rd Qu.:4 3rd Qu.:120.65 3rd Qu.: 96.00 3rd Qu.:4.165 3rd Qu.:2.623 3rd Qu.:19.95 3rd Qu.:1.0000 3rd Qu.:1.0000
Max. :33.90 Max. :4 Max. :146.70 Max. :113.00 Max. :4.930 Max. :3.190 Max. :22.90 Max. :1.0000 Max. :1.0000
gear carb
Min. :3.000 Min. :1.000
1st Qu.:4.000 1st Qu.:1.000
Median :4.000 Median :2.000
Mean :4.091 Mean :1.545
3rd Qu.:4.000 3rd Qu.:2.000
Max. :5.000 Max. :2.000