This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 5 years ago.
I was wondering how I might simply split a numerical column by a second grouping variable in a dataset, then cbind the numerical column. This would most likely be a simple extension of the separate function for dplyr. For example, changing X below:
Y <- rbind(2,5,3,6,3,2)
Z <- rbind("A", "A", "A", "B", "B", "B")
X <- data.frame(Y,Z)
Into
A B
2 6
5 3
3 2
Then ideally extract the rowMeans into a new vector. (Issue also arises here when there is only one character in Z, given rowmeans requires 2).
This would need to be infinitely expandable based on the number of unique variables in Z. e.g., if Z had A, B, and C, then the final data.frame would require 3 columns. This would allow me to capture the row means from infinite number of groups in Z.
Thanks in advance,
Conal
Looks like a job for tidyr::spread.
library(dplyr)
library(tidyr)
X2 <- X %>%
group_by(Z) %>%
mutate(ID = 1:n()) %>%
spread(Z, Y) %>%
select(-ID)
X2
# A tibble: 3 x 2
A B
* <dbl> <dbl>
1 2 6
2 5 3
3 3 2
Related
This question already has answers here:
Counting the number of elements with the values of x in a vector
(20 answers)
Counting distinct values in column of a data frame in R
(2 answers)
Closed 4 months ago.
Having a specific column like this number_of_columns_with_text:
df <- data.frame(id = c(1,2,3,4,5,6), number_of_columns_with_text = c(3,2,1,3,1,1))
Is there any command which could give the sum of the numbers exists in this column (how many times a number exists).
Example output
data.frame(number = c(1,2,3), volume = c(3,1,2))
What you might be looking for is table(...)
> table(df$number_of_columns_with_text)
1 2 3
3 1 2
In dplyr, you can first group_by the variable you want to tabulate and then use n() to count the frequencies of the distinct values:
library(dplyr)
df %>%
group_by(number_of_columns_with_text)%>%
summarise(volume = n())
# A tibble: 3 × 2
number_of_columns_with_text volume
<dbl> <int>
1 1 3
2 2 1
3 3 2
Using dplyr
library(tidyverse)
df %>%
group_by(number_of_columns_with_text) %>%
count()
This question already has answers here:
data.frame Group By column [duplicate]
(4 answers)
Closed 6 years ago.
I have a panel data file (long format) and I need to convert it to cross-sectional data. That is I don't just need a transformation to the wide format but I need exactly one observation per individual that contains the mean for each variable.
Here's what I want to to: I have panel data (a number of observations for each individual) in a data frame and I'm looking for an easy way in R to generate a new data frame that contains cumulated data for each individual, i. e. either the sum of all observations in each variable or their mean. It might also be interesting to get a measure of volatility.
For example I have a given data frame panel_data that contains panel data:
> individual <- c(1,1,2,2,3,3)
> var1 <- c(2,3,3,3,4,3)
> panel_data <- data.frame(individual,var1)
> panel_data
individual var1
1 1 2
2 1 3
3 2 3
4 2 3
5 3 4
6 3 3
The result should look like this:
> cross_data
individual var1
1 1 5
2 2 6
3 3 7
Now this is only an example. I need this feature in a number of varieties, the most important one probably being the intra-individual mean for each variable.
There are ways to do this using base R or using the popular packages data.table or dplyr. Everyone has their own preference and mine is dplyr.
You can very easily perform a variety of operation to summarise your data per individual. With dplyr syntax, you first group_by individual to specify that operations should be performed on groups defined by the variable "individual". You can then summarise your groups using a function you specify.
Try the following:
library("dplyr")
panel_data %>%
group_by(individual) %>%
summarise(sum_var1 = sum(var1), mean_var1=mean(var1))
Do not be put off by the %>% notation, it is just a convenient shortcut to chain operations:
x %>% f is equivalent to f(x)
x %>% f(a) is equivalent to f(x, a)
x %>% f(a) %>% g(b) is equivalent to g(f(x, a), b)
This question already has answers here:
Add column with order counts
(2 answers)
Count number of rows within each group
(17 answers)
Closed 7 years ago.
What is the easiest way to count the occurrences of a an element on a vector or data.frame at every grouop?
I don't mean just counting the total (as other stackoverflow questions ask) but giving a different number to every succesive occurence.
for example for this simple dataframe: (but I will work with dataframes with more columns)
mydata <- data.frame(A=c("A","A","A","B","B","A", "A"))
I've found this solution:
cbind(mydata,myorder=ave(rep(1,nrow(mydata)),mydata$A, FUN=cumsum))
and here the result:
A myorder
A 1
A 2
A 3
B 1
B 2
A 4
A 5
Isn't there any single command to do it?. Or using an specialized package?
I want it to later use tidyr's spread() function.
My question is not the same than
Is there an aggregate FUN option to count occurrences?
because I don't want to know the total number of occurrencies at the end but the cumulative occurencies till every element.
OK, my problem is a little bit more complex
mydata <- data.frame(group=c("x","x","x","x","y","y", "y"), letter=c("A","A","A","B","B","A", "A"))
I only know to solve the first example I wrote above.
But what happens when I want it also by a second grouping variable?
something like occurrencies(letter) by group.
group letter "occurencies within group"
x A 1
x A 2
x A 3
x B 1
y B 1
y A 1
y A 2
I've found the way with
ave(rep(1,nrow(mydata)),list(mydata$group, mydata$letter), FUN=cumsum)
though it shoould be something easier.
Using data.table
library(data.table)
setDT(mydata)
mydata[, myorder := 1:.N, by = .(group, letter)]
The by argument makes the table be dealt with within the groups of the column called A. .N is the number of rows within that group (if the by argument was empty it would be the number of rows in the table), so for each sub-table, each row is indexed from 1 to the number of rows in that sub-table.
mydata
group letter myorder
1: x A 1
2: x A 2
3: x A 3
4: x B 1
5: y B 1
6: y A 1
7: y A 2
or a dplyr solution which is pretty much the same
mydata %>%
group_by(group, letter) %>%
mutate(myorder = 1:n())
This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 7 years ago.
I have a data.frame that looks somewhat like this.
k <- data.frame(id = c(1,2,2,1,2,1,2,2,1,2), act = c('a','b','d','c','d','c','a','b','a','b'), var1 = 25:34, var2= 74:83)
I have to group the data into separate levels by first 2 columns and write the mean of the the next 2 columns(var1 and var2). It should look like this
id act varmean1 varmean2
1 1 a
2 1 c
3 2 a
4 2 b
5 2 b
6 2 d
The values of respective means are filled in varmean1 and varmean2.
My actual dataframe has 88 columns where I have to group the data into separate levels by the first 2 columns and find the respective means of the remaining. Please help me figure this out as soon as possible. Please try to use 'dplyr' package for the solution if possible. Thanks.
You have several options:
base R:
aggregate(. ~ id + act, k, mean)
or
aggregate(cbind(var1, var2) ~ id + act, k, mean)
The first option aggregates all the column by id and act, the second option only the column you specify. In this case both give the same result, but it is good to know for when you have more columns and only want to aggregate some of them.
dplyr:
library(dplyr)
k %>%
group_by(id, act) %>%
summarise_each(funs(mean))
If you want to specify the columns for which to calculate the mean, you can use summarise instead of summarise_each:
k %>%
group_by(id, act) %>%
summarise(var1mean = mean(var1), var2mean = mean(var2))
data.table:
library(data.table)
setDT(k)[, lapply(.SD, mean), by = .(id, act)]
If you want to specify the columns for which to calculate the mean, you can add .SDcols like:
setDT(k)[, lapply(.SD, mean), by = .(id, act), .SDcols=c("var1", "var2")]
I apologize if this question is abhorrently simple, but I'm looking for a way to just add a column of consecutive integers to a data frame (if my data frame has 200 observations, for example, starting with 1 for the first observation, and ending with 200 on the last one).
How can I do this?
For a dataframe (df) you could use
df$observation <- 1:nrow(df)
but if you have a matrix you would rather want to use
ma <- cbind(ma, "observation"=1:nrow(ma))
as using the first option will transform your data into a list.
Source: http://r.789695.n4.nabble.com/adding-column-of-ordered-numbers-to-matrix-td2250454.html
Or use dplyr.
library(dplyr)
df %>% mutate(observation = 1:n())
You might want it to be the first column of df.
df %>% mutate(observation = 1:n()) %>% select(observation, everything())
Probably, function tibble::rowid_to_column is what you need if you are using tidyverse ecosystem.
library(tidyverse)
dat <- tibble(x=c(10, 20, 30),
y=c('alpha', 'beta', 'gamma'))
dat %>% rowid_to_column(var='observation')
# A tibble: 3 x 3
observation x y
<int> <dbl> <chr>
1 1 10 alpha
2 2 20 beta
3 3 30 gamma