Why don't we have statistics for each group? [closed] - r

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
Input:
library("UsingR")
library("dplyr")
data("kid.weights")
attach(Kid.weights)
df <- data.frame(gender[1:6],
weight[1:6],
height[1:6],
Kg=weight[1:6] * 0.453592,
M=height[1:6] * 0.0254)
df
df %>%
group_by(df$gender) %>%
summarise(mean(df$weight))
Output:
> df %>%
+ group_by(df$gender) %>%
+ summarise(mean(df$weight))
# A tibble: 2 x 2
`df$gender` `mean(df$weight)`
<fct> <dbl>
1 F 58.3
2 M 58.3
I want to make data frame for mean(weight(kg)) or median(weight(kg)) to gender.
but it is not working. looks like.
how to it solve?

Once you use %>% you don't need to reference to df anymore:
df %>%
group_by(gender) %>%
summarise(mean(weight))
%>% is a pipeline which makes you accessible to the columns directly, on each group, df$gender and df$weight would give you the whole column.

Related

Why does R sometimes think ASCII characters are non-ASCII? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 19 days ago.
Improve this question
I am trying to identify elements in a dataframe that contain non-ASCII characters. For example, in the dataframe below I would want all rows in the unicode_only column and the last two rows in the mixed column.
example_dataset <- tribble(
~ascii_only, ~unicode_only, ~mixed,
"a", "表", "c",
"b", "外", "表",
"c", "字", "外",
)
When I try filtering elements using the regular expression "[^[:ascii]]", however, some ASCII-only elements are included.
example_dataset %>%
mutate(row_number = row_number()) %>%
pivot_longer(c(everything(), -row_number),
names_to = "variable") %>%
select(variable, row_number, value) %>%
arrange(variable, row_number) %>%
filter(str_detect(value, "[^[:ascii]]"))
variable
row_number
value
ascii_only
2
b
mixed
2
表
mixed
3
外
unicode_only
1
表
unicode_only
2
外
unicode_only
3
字
Why would "[^[:ascii]]" match b?
The pattern is [:ascii:] and not [:ascii].

A wierd problem that group_by() doesn't work? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I have a dataset with 3 factor columns and 4 numeric columns. I want to use group_by() to summarize it. But no matter how I try it doesn't work, there is no group.
freetick <- read.csv("FreeTickAll.csv", stringsAsFactors=FALSE)
library(dplyr)
group1 <- freetick %>% group_by(Habitat, Month) %>% summarize(
meanAd = mean(Adult),
meanNy = mean(Nymph),
meanLa = mean(Larva)
)
group1
The result:
> group1
meanAd meanNy meanLa
1 0.6129032 4.258065 20.1129
And my group1 data.frame also show:
mean Ad mean Ny mean La
1 0.6129032 4.258065 20.1129
If a function is common in multiple packages and those packages are loaded into the working env, then there is a possibility of masking the function from the last loaded package. In such cases, either restart the R session with only the package of interest loaded (dplyr in this case) or specify the function to be loaded explicitly from the package of interest (dplyr::summarise)
freetick %>%
dplyr::group_by(Habitat, Month) %>%
dplyr::summarise(meanAd = mean(Adult),
meanNy = mean(Nymph),
meanLa = mean(Larva))

Use str_detect to find the ID of the novel Pride and Prejudice [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
Use str_detect to find the ID of the novel Pride and Prejudice.
How many different ID numbers are returned?
library(tidyverse)
library(gutenbergr)
library(tidytext)
options(digits = 3)
An option is to filter the rows based on the substring "Pride and Prejudice" in the 'title' column and get the number of distinct 'gutenberg_id' s with n_distinct. If the ids are alreaydy unique, just do summarise(n = n())
library(gutenbergr)
library(stringr)
gutenberg_metadata %>%
filter(str_detect(title, "Pride and Prejudice")) %>%
summarise(n = n_distinct(gutenberg_id))
# A tibble: 1 x 1
# n
# <int>
#1 6

Get 30 occurencies in dataframe [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Good Morning Everyone,
I have a small issue regarding a dataframe :
I have 165 differents countries, sometimes with more than 30 occurencies. What I would like to do is take only 30 occurencies for each country, and then apply the mean function on the related variables.
Do you have any idea how I can achieve this?
Here is the dataframe :
Thanks for your answer,
Rémi
Assuming you want to take out 30 rows for each group, we can do the following. Unfortunately, dplyr's sample_n cannot handle when the input data frame has less rows than you want to sample (unless you want to sample with replacement).
Where df is your data.frame:
Solution 1:
library(dplyr)
df %>% group_by(Nationality) %>%
sample_n(30, replace=TRUE) %>%
distinct() %>% # to remove repeated rows where nationalities have less than 30 rows
summarise_at(vars(Age, Overall, Passing), funs(mean))
Solution 2:
df %>% split(.$Nationality) %>%
lapply(function(x) {
if (nrow(x) < 30)
return(x)
x %>% sample_n(30, replace=FALSE)
}) %>%
do.call(what=bind_rows) %>%
group_by(Nationality) %>%
summarise_at(vars(Age, Overall, Passing), funs(mean))
Naturally without guarantee as you did not supply a working example.

Get distinct values from columns in R [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
I have the following kind of data in my csv file
DriveNo Date and Time Longitude
156 2014-01-31 23:00:00 41.88367183
187 2014-01-31 23:00:01 41.92854
These data have a lot of noise. Sometimes, a driver(the DriveNo is unique) is present in two different locations at the same time , which is not possible and a noise. I tried to do it using distinct(select(five,DriveNo,Date and Time))
but i get the following error
Error: unexpected symbol in "distinct(select(five,DriveNo,Date and"
However, when i try
distinct(select(five,DriveNo,Longitude))
it works.But, i need it with DriveNo and Date and Time.
you can escape with backticks, like:
df %>%
select(DriveNo, `Date and Time`, Longitude) %>%
distinct()
or using group_by, like:
df %>%
group_by(DriveNo, `Date and Time`) %>%
select(Longitude) %>%
unique()

Resources