This question already has answers here:
Counting the number of elements with the values of x in a vector
(20 answers)
Closed 2 years ago.
for eg:
a dataframe "housing" has a column "street" with different street names as levels.
I want to return a df with counts of the number of houses in each street (level), basically number of repetitions.
what functions do i use in r?
This should help:
library(dplyr)
housing %>% group_by(street) %>% summarise(Count=n())
This can be done in multiple ways, for instance with base R using table():
table(housing$street)
It can also be done through dplyr, as illustrated by Duck.
Another option (my preference) is using data.table.
library(data.table)
setDT(housing)
housing[, .N, by = street]
summary gives the first 100 frequencies of the factor levels. If there are more, try:
table(housing$street)
For example, let's generate one hundred one-letter street names and summarise them with table.
set.seed(1234)
housing <- data.frame(street = sample(letters, size = 100, replace = TRUE))
x <- table(housing$street)
x
# a b c d e f g h i j k l m n o p q r s t u v w x y z
# 1 3 5 6 4 6 2 6 5 3 1 3 1 2 5 5 4 1 5 5 3 7 4 5 3 5
As per OP's comment. To further use the result in analyses, it needs to be included in a variable. Here, the x. The class of the variable is table, and it works in base R with most functions as a named vector. For example, to find the most frequent street name, use which.max.
which.max(x)
# v
# 22
The result says that the 22nd position in x has the maximum value and it is called v.
Related
This question already has answers here:
How to convert a table to a data frame
(5 answers)
Closed last month.
I'm running frequency table of frequencies, I want to convert the table to two lists of numbers.
numbers <- c(1,2,3,4,1,2,3,1,2,3,1,2,3,4,2,3,5,1,2,3,4)
freq_of_freq <- table(table(numbers))
> freq_of_freq
1 3 5 6
1 1 1 2
From the table freq_of_freq, I'd like to get create two list, x and y, one containing the numbers 1,3,5,6 and the other with the frequency values 1,1,1,2
I tried this x <- freq_of_freq[ 1 , ] and y <- freq_of_freq[ 2 , ], but this doesn't work.
Any help greatly appreciated. Thanks
One approach is to use stack() to create a list.
numbers <- c(1,2,3,4,1,2,3,1,2,3,1,2,3,4,2,3,5,1,2,3,4)
freq_of_freq <- table(table(numbers))
stack(freq_of_freq)
#> values ind
#> 1 1 1
#> 2 1 3
#> 3 1 5
#> 4 2 6
To exactly match your expected output, you could do:
x = as.integer(names(freq_of_freq))
y = unname(freq_of_freq)
Note, the OP attempt of freq_of_freq[1, ] does not work because table returns a named integer vector for this example dataset. That is, we can't subset using matrix or data.frame notation because we only have one dimension.
I have a large dataset (14295,58). Each column is a different element from the periodic table (e.g. Fe, Ca, Zr) and the rows are arranged according to depth (in mm); the last column is the depth value. I am trying to make a code that can be customized to a given group of elements over a given depth interval but I don't want to have to go through and change a bunch of lines of code everytime I look at a different subset. So far I have created a dataframe called Section:
Section <- df[50:100,]
and a vector called Elements:
Elements <- c("Fe", "Ca", "Zr")
I can subsample the Section data frame by:
Section %>%
select(., Elements, depth)
but now I want to plot this with ggplot and I can't figure out how to call the Elements vector to the x-variable. I tried:
Section %>%
select(., Elements, depth) %>%
ggplot() +
geom_path (aes(Elements, depth))
but the arguments don't have the same length. How can I plot the selected elements from the Elements vector?
I think your problem is actually that your data is not formatted in the most useful way (wide vs. long), so you aren't actually giving ggplot what you think you are. If you give it a vector as an aesthetic (Elements here), it will try its best to plot it. In this case, it will do it if the length matches by just matching up values in depth to things in Elements. So this works:
# Toy Data
df <- data.frame(O = 1:3,
Fe = 2:4,
Ca = 3:5,
Zr = 4:6,
depth = 5:7)
Elements <- c('Fe', 'Ca', 'Zr')
ggplot(df) +
geom_point(aes(x=Elements, y=depth))
But it just matches the first depth to 'Fe', the second depth to 'Ca', etc. I don't think that's what you are hoping to have happen.
Long vs Wide Data
You have separate columns for every all these elements, but do they actually represent different things? You are probably better off re-formatting your data so that all these "element" columns get collapsed into key-value pairs using tidyr:
# Wide:
df
O Fe Ca Zr depth
1 1 2 3 4 5
2 2 3 4 5 6
3 3 4 5 6 7
# Long
library(tidyr)
longDf <- tidyr::gather(df, element, amount, -depth)
longDf
depth element amount
1 5 O 1
2 6 O 2
3 7 O 3
4 5 Fe 2
5 6 Fe 3
6 7 Fe 4
7 5 Ca 3
8 6 Ca 4
9 7 Ca 5
10 5 Zr 4
11 6 Zr 5
12 7 Zr 6
Now you can get the elements you want using dplyr's filter (which is also probably a better option for subsetting by depth) and use the new element column as the x coordinate for plotting:
longDf %>%
filter(element %in% Elements) %>%
ggplot() +
geom_path(aes(x=element, y=depth))
I'm not sure what you're expecting the graph to look like, but that should get you started.
This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 5 years ago.
I am an early user of Rstudio, and i have a quite simple problem, but unfortunately i am not able to solve it.
I just want to aggregate rows of my data.frame by words contained on the first column of the df.
The data.frame is made by five columns:
The first one is made by words;
the second, the third, the fourth, the fifth ones are made by numeric values.
for example if the data would be:
SecondWord X Y Z Q
NO 1 2 2 1
NO 0 0 1 0
YES 1 1 1 1
i expect to see a result like:
SecondWord X Y Z Q
NO 1 2 3 1
YES 1 1 1 1
How could i do?
i have tried to use the following method:
test <- read.csv2("test.csv")
df<-aggregate(.~Secondword,data=test, FUN = sum, na.rm=TRUE)
But the values were not the ones i expected to see.
Thank you for your future helps and sorry for the "simple" question.
You can also use tidyverse
library(tidyverse)
df <- test %>%
group_by(SecondWord) %>%
summarize_each(funs(sum))
df
# SecondWord X Y Z Q
# NO 1 2 3 1
# YES 1 1 1 1
ddply should work as well.
For example, something like:
library(plyr)
grouped <- ddply(test, "Secondword", numcolwise(sum))
This question already has answers here:
Sorting rows alphabetically
(4 answers)
How to sum a variable by group
(18 answers)
Closed 4 years ago.
I have a large dataset I want to simplify but I'm currently having some troubles with one thing.
The following table shows a origin destination combination. The count column, represents the amount of occurrences of A to B for example.
From To count
A B 2
A C 1
C A 3
B C 1
The problem I have is that for example A to C (1), is actually the same as C to A (3). As direction doesn't really matter to me only that there's a connection between A and C, I wonder how can I simply have A to C (4).
The problem is that I have a factor with 400 levels, so I can't do it manually. Is there something with dplyr or similar that can solve this for me?
df[1:2] <- t(apply(df[1:2], 1, sort))
aggregate(count ~ From + To, df, sum)
results in:
From To count
1 A B 2
2 A C 4
3 B C 1
Here is a base R method using aggregate, sort, paste, and mapply.
with(df, aggregate(count,
list(route=mapply(function(x, y) paste(sort(c(x, y)), collapse=" - "),
From, To)), sum))
route x
1 A - B 2
2 A - C 4
3 B - C 1
Here, mapply takes pairs of elements from the from and to variables, sorts them and pastes them into a single string with collapse=TRUE. The resulting string vector is used in aggregate to group the observations and sum the count values. with reduces typing.
This question already has answers here:
The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe
(11 answers)
Closed 7 years ago.
I have the following data frame:
df <- data.frame(a=rep(1:3),b=rep(1:3),c=rep(4:6),d=rep(4:6))
df
a b c d
1 1 1 4 4
2 2 2 5 5
3 3 3 6 6
i would like to have a vector N which determines my window size so for thsi example i will set
N <- 1
I would like to split this dataframe into equal portions of N rows and store the 3 resulting dataframes into a list.
I have the following code:
groupMaker <- function(x, y) 0:(x-1) %/% y
testlist2 <- split(df, groupMaker(nrow(df), N))
The problem is that this code renames my column names by adding an X0. in front
result <- as.data.frame(testlist2[1])
result
X0.a X0.b X0.c X0.d
1 1 1 4 4
>
I would like a code that does the exact same thing but keeps the column names as they are. please keep in mind that my original data has a lot more than 3 rows so i need something that is applicable to a much larger dataframe.
To extract a list element, we can use [[. Also, as each list elements are data.frames, we don't need to explicitly call as.data.frame again.
testlist2[[1]]
We can also use gl to create the grouping variable.
split(df, as.numeric(gl(nrow(df), N, nrow(df))))