This question already has answers here:
Count number of occurences for each unique value
(14 answers)
Counting the number of elements with the values of x in a vector
(20 answers)
Closed 3 years ago.
I am currently trying to count the frequency of countries that appear in a dataframe object.
I tried using count commands as well as rle(sort(x)), which apparently is used to search for strings. But it does not seem to yield any results.
rle(sort(x))
I tried using this, but does not seem to work. I also tried to use
count(x, "COUNTRY")
but all it does is count how many entries are there.
How can I get a result such as:
Country Frequency
[1] United States 3
[2] Mexico 5
[3] Germany 12
Here is a small example using dplyr and the built-in dataset mtcars:
library(dplyr)
mtcars %>%
group_by(cyl) %>%
count(cyl)
or
mtcars %>%
group_by(cyl) %>%
add_count(cyl)
other solution is: table(yourdataframe$x)
count(x,Country,Frequency)
Have to include both to see a deeper breakdown then it'll count the countries and Frequency
or
X%>%group_by(Country)%>%summarise(sum = sum(Frequency), n = n())
Related
This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 4 months ago.
so I'm trying to find the total amount of each class in the Genus column, the following dataframe is shown below:
Genus/Species Dataframe
But it returns values of how many times each Genus is repeating, not it's total amount of the species observed.
I want a dataframe that looks like this:
Genus - Arachnida
Total Sum - (Total amount of Arachnida)
Thank you for whoever replies! (This is my first post, English is my second language so hopefully someone understands!)
I've tried using dplyr's function of count like:
BIO205 %>% count(Genus)
But it returns values of how many times each Genus is repeating, not it's total.
Like if I did BIO205 %>% count(Genus), it would return with
Returning Dataframe
This is indicating that the word Arachnida is repeating 21 times.
So it seems like you are trying to sum all the total values for each genus. If so group_by() is what you want. Here is a reprex:
library(dplyr)
data("iris")
iris %>%
group_by(Species) %>%
summarise(sum_col = sum(Sepal.Length))
The above code is grouping the data by species, followed by summing all the sepal lengths for each species. In your case, what I would try is the following code:
library(dplyr)
BIO205 %>%
group_by(Genus) %>%
summarise(sum_col = sum(Total))
Hope this helps.
This question already has an answer here:
How can I count the number of instances a value occurs within a subgroup in R?
(1 answer)
Closed 1 year ago.
I am new to R programming. I have to build titanic data in R. I want to find out how many child and adults are there in the dataset. Can someone give me hint to find the same?
I tried using length() function but it did not give the result.
Here's a solution in tidyverse syntax. It converts the Titanic dataset into a tibble (a type of dataframe), groups the data by the Age column, then uses n() to count the number of rows at each level of Age, giving the number of children and adults.
library(tidyverse)
Titanic %>%
as_tibble() %>%
group_by(Age) %>%
summarise(N = n())
This gives the output:
# A tibble: 2 x 2
Age N
<chr> <int>
1 Adult 16
2 Child 16
This question already has answers here:
Group by multiple columns and sum other multiple columns
(7 answers)
Closed 2 years ago.
I'm having a hard time working through something simple. I have a data frame where the first column is "Cat" and includes 3 different variables which I would like to group_by and summarize. Columns 2-5 are considered Months so 1 is the first month, 2 is the second month etc. What I'm trying to do is group_by Cat and sum up the individual columns. I've tried working with colSums and aggregate. Any help would greatly appreciated! Thanks
dff<-data.frame(Cat=c('A','B','C','A','A','A','B','C'),
'1'=c(10,20,30,80,10,15,20,15),
'2'=c(15,10,20,30,60,45,50,65),
'3'=c(10,20,30,80,20,25,27,85),
'4'=c(90,70,50,30,10,15,20,15),
'5'=c(1,120,3,8,7,10,25,30))
Using aggregate in base R
aggregate(. ~ Cat, dff, sum)
Or with dplyr
library(dplyr)
dff %>%
group_by(Cat) %>%
summarise(across(everything(), sum))
This question already has answers here:
Divide the values in one column by those in another column in R
(2 answers)
Closed 2 years ago.
I'm confused with an exercise that i'm working on in R. I'm just a beginner in R
the instructions is to Use dplyr to manipulate the data so that you have the proportion immunised (i.e. Immunised divided by Eligible) for each DHB, for each Age, and each Date. Save the result so you can use it for the remaining questions. You should end up with a data frame containing variables for DHB, Date, Age and Proportion with 4834 observations.
But I don't understand how to do this but this is what i've tried
```{r}
vacc %>% mutate(Proportion = Immunised/DHB(vacc))
vacc %>% select(DHB, Date, Age, Proportion)
```
but it gave me this error
Error: Problem with `mutate()` input `Proportion`. x could not find function "DHB" i Input `Proportion` is `Immunised/DHB(vacc)`.
can someone please help me
DHB is a column in the dataframe, however, you are using it as function.
You can group_by DHB, Age and Date and calculate ratio between Immunised and Eligible.
library(dplyr)
vacc %>% group_by(DHB, Age, Date) %>% mutate(Proportion = Immunised/Eligible)
Perhaps, I think this would work too :
vacc %>% mutate(Proportion = Immunised/Eligible)
This question already has answers here:
Select the row with the maximum value in each group
(19 answers)
Closed 3 years ago.
with data like below, have data for hours of each day for each area,loc pair. Need to find out the rows for each area,loc for which value of a is maximum.
day,hour,area,loc,a,b,c
20181231,ar01,loc01,00,99,11.3,18.2
20181231,ar01,loc01,22,96,12.3,15.2
20190101,ar01,loc01,00,98,10.9,22.5
20190101,ar01,loc01,23,97,10.9,22.1
20181231,ar02,loc01,00,93,11.3,18.2
20181231,ar02,loc01,22,96,12.3,15.2
20190101,ar02,loc01,00,97,10.9,22.5
20190101,ar02,loc01,23,97.2,10.9,22.1
expected output
day,hour,area,loc,a,b,c
20181231,ar01,loc01,00,99,11.3,18.2
20190101,ar01,loc01,00,98,10.9,22.5
20181231,ar02,loc01,22,96,12.3,15.2
20190101,ar02,loc01,23,97.2,10.9,22.1
I could do an aggregation using dplyr, like df %>% group_by(day, area, loc) - how do I get the result rows from here ?
You can try:
library(dplyr)
df %>%
group_by(day, area, loc) %>%
filter(., a == max(a))