how to change my dataframe based on value of a column [duplicate] - r

This question already has answers here:
Faster ways to calculate frequencies and cast from long to wide
(4 answers)
Closed 3 years ago.
there is a dataframe with two column as below,and i want to change it into a dataframe with 3 column
df <- data.frame(key=c('a','a','a','b','b'),value=c(1,2,2,1,3))
I have tried it in python,that's ok,but in r i have no idea
the expect output should be like
1 2 3
a 1 2 0
b 1 0 1

library(data.table)
dcast(key~value, data=df, fun.aggregate=length)
# key 1 2 3
# 1 a 1 2 0
# 2 b 1 0 1

Related

Receive the total sum score of every number [duplicate]

This question already has answers here:
Count number of occurences for each unique value
(14 answers)
Closed 2 years ago.
Using as input data frame:
df1 <- data.frame(num = c(1,1,1,2,2,2,3))
How is it possible to receive the sum of every number excited in the num column?
Example output:
num frequency
1 3
2 3
3 1
Using table and coerce it to a data frame.
as.data.frame(table(df1$num))
# Var1 Freq
# 1 1 3
# 2 2 3
# 3 3 1
or
with(df1, data.frame(num=unique(num), freq=tabulate(num)))
# num freq
# 1 1 3
# 2 2 3
# 3 3 1

Build rowSums in dplyr based on columns containing pattern in their names [duplicate]

This question already has answers here:
Sum across multiple columns with dplyr
(8 answers)
R, create a new column in a data frame that applies a function of all the columns with similar names
(3 answers)
Closed 4 years ago.
My data frame looks something like this
USER OBSERVATION COUNT.1 COUNT.2 COUNT.3
A 1 0 1 1
A 2 1 1 2
A 3 3 0 0
With dplyr I want to build a columns that sums the values of the count-variables for each row, selecting the count-variables based on their name.
USER OBSERVATION COUNT.1 COUNT.2 COUNT.3 SUM
A 1 0 1 1 2
A 2 1 1 2 4
A 3 3 0 0 3
How do I do that?
As you asked for a dplyr solution, you can do:
library(dplyr)
df %>%
mutate(SUM = rowSums(select(., starts_with("COUNT"))))
USER OBSERVATION COUNT.1 COUNT.2 COUNT.3 SUM
1 A 1 0 1 1 2
2 A 2 1 1 2 4
3 A 3 3 0 0 3

Transform multiple rows of a data frame into one row with multiple columns with R [duplicate]

This question already has answers here:
Reshape three column data frame to matrix ("long" to "wide" format) [duplicate]
(6 answers)
Closed 5 years ago.
I have a data frame with four columns :
df=data.frame( UserId=c(1,2,2,2,3,3), CatoId=c('C','A','B','C','D','E'), No=c(1,9,2,2,5,3))
UserId CatoId No
1 C 1
2 A 9
2 B 2
2 C 2
3 D 5
3 E 3
I would like to transform the structure into the following one :
UserId A B C D E
1 0 0 1 0 0
2 9 2 2 0 0
3 0 0 0 5 3
Where the columns represents all possible values in CatoId.
The first data frame has 2 million rows and CatoId has 21 different values. So I don't want to use any loops. Is there a way to do this with R. Otherwise what is the best way to proceed?
My goal would be to apply a clustering algorithm on the last dataframe.
You can do this using dcast:
df1 <- dcast(df, UserId ~ CatoId, value.var = "No", fill = 0)

Duplicating data frame rows by freq value in same data frame [duplicate]

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 7 years ago.
I have a data frame with names by type and their frequencies. I'd like to expand this data frame so that the names are repeated according to their name-type frequency.
For example, this:
> df = data.frame(name=c('a','b','c'),type=c(0,1,2),freq=c(2,3,2))
name type freq
1 a 0 2
2 b 1 3
3 c 2 2
would become this:
> df_exp
name type
1 a 0
2 a 0
3 b 1
4 b 1
5 b 1
6 c 2
7 c 2
Appreciate any suggestions on a easy way to do this.
You can just use rep to "expand" your data.frame rows:
df[rep(sequence(nrow(df)), df$freq), c("name", "type")]
# name type
# 1 a 0
# 1.1 a 0
# 2 b 1
# 2.1 b 1
# 2.2 b 1
# 3 c 2
# 3.1 c 2
And there's a function expandRows in the splitstackshape package that does exactly this. It also has the option to accept a vector specifying how many times to replicate each row, for example:
expandRows(df, "freq")

R is it possible to get the output of table() using dcast? [duplicate]

This question already has answers here:
Faster ways to calculate frequencies and cast from long to wide
(4 answers)
Closed 4 years ago.
I have the following data frame:
id<-c(1,2,3,4,1,1,2,3,4,4,2,2)
period<-c("first","calib","valid","valid","calib","first","valid","valid","calib","first","calib","valid")
df<-data.frame(id,period)
typing
table(df)
results in
period
id calib first valid
1 1 2 0
2 2 0 2
3 0 0 2
4 1 1 1
Is there any way to get the same result using 'dcast' and save it as a new data frame?
Yes, there is a way:
library(reshape2)
dcast(df, id ~ period, length)
Using period as value column: use value.var to override.
id calib first valid
1 1 1 2 0
2 2 2 0 2
3 3 0 0 2
4 4 1 1 1
You can also type just dcast(df, id ~ period) and length will be chosen by default too. As I can see, you tried to find this out in your another question. Extended solution without dcast would look like this:
df <- data.frame(unclass(table(df)))
df$ID <- rownames(df)
df
calib first valid ID
1 1 2 0 1
2 2 0 2 2
3 0 0 2 3
4 1 1 1 4

Resources