This question already has answers here:
Collapse / concatenate / aggregate a column to a single comma separated string within each group
(6 answers)
Grouping functions (tapply, by, aggregate) and the *apply family
(10 answers)
Closed 3 years ago.
I'd like to manipulate data in R as follows, can anybody help please?
Before:
Record Person Value
1 1 100
1 2 0
1 3 200
2 1 150
2 2 220
After:
Record Value
1 {100, 0, 200}
2 {150, 220}
Ideally I'd like the final dataset values to be in a list (as opposed to a string), so that I can apply formulas to each value.
Many thanks in advance.
You can try the following code
dfout <- aggregate(Value~Record, df, toString)
or
dfout<- aggregate(Value~Record, df,FUN = function(x) x)
such that
> dfout
Record Value
1 1 100, 0, 200
2 2 150, 220
DATA
df <- structure(list(Record = c(1L, 1L, 1L, 2L, 2L), Person = c(1L,
2L, 3L, 1L, 2L), Value = c(100L, 0L, 200L, 150L, 220L)), class = "data.frame", row.names = c(NA,
-5L))
Related
This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 1 year ago.
The dataframe above is the an example of the original one. I am trying to create following new dataframe based on this original one:
Thank you!
We can use xtabs from base R
xtabs(abundance ~ StationCode + SpeciesCode, df1)
-output
SpeciesCode
StationCode AME BCF BKB CAP
O-01 2 1 5 0
O-02 1 0 1 1
O-03 0 4 2 0
O-04 0 0 8 1
data
df1 <- structure(list(SpeciesCode = c("AME", "AME", "BCF", "BCF", "CAP",
"CAP", "BKB", "BKB", "BKB", "BKB"), StationCode = c("O-01", "O-02",
"O-03", "O-01", "O-04", "O-02", "O-04", "O-01", "O-02", "O-03"
), abundance = c(2L, 1L, 4L, 1L, 1L, 1L, 8L, 5L, 1L, 2L)),
class = "data.frame", row.names = c(NA,
-10L))
This question already has answers here:
R group by multiple columns and mean value per each group based on different column
(2 answers)
Closed 2 years ago.
My data set "data1" somewhat looks like this
Price class
243 1
32 2
45 3
245 1
67 2
343 3
567 1
.
.
and so on, in class column 1,2,3 repeats itself continuously till the end of data (298 observations).
I want to aggregate it, such that I get the mean of each class. The data should look like. The data should be on a new dataset "classdata"
class column_name
1 mean of all class 1 prices
2 mean of all class 2 prices
3 mean of all class 3 prices
I tried this code
classdata = aggregate(x=data1$Price, by=list(data1$class), FUN="mean")
But I am not getting the desired result. Please help.
You probably want proper column names. To get them also put x= into a list, and name the lists in both arguments.
aggregate(x=list(column_name=data1$Price), by=list(class=data1$class), FUN="mean")
# class column_name
# 1 1 351.6667
# 2 2 49.5000
# 3 3 194.0000
Data:
data1 <- structure(list(Price = c(243L, 32L, 45L, 245L, 67L, 343L, 567L
), class = c(1L, 2L, 3L, 1L, 2L, 3L, 1L)), class = "data.frame", row.names = c(NA,
-7L))
Welcome to Stack Overflow. Another option is to use the tidyverse data processing model:
# use the data jay.sf made
data1 <- structure(list(Price = c(243L, 32L, 45L, 245L, 67L, 343L, 567L),
class = c(1L, 2L, 3L, 1L, 2L, 3L, 1L)),
class = "data.frame", row.names = c(NA, -7L))
library(tidyverse)
data1 %>% # start with sample data and pipe it to the next line
group_by(class) %>% # group the data by class and pipe it to the next line
summarise(`The Mean Price` = mean(Price)) # Make a variable called "The
# Mean Price" holding the mean of
# the price variable.
This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 3 years ago.
I am trying to remove some rows of my data by adding them to a different row, in the form of another column. Is there a way I can group rows together by a certain variable?
I have tried using group_by statement in the dplyr package, but it does not seem to solve my issue.
library(dplyr)
late <- read.csv(file.choose())
late <- group_by(late, state, add = FALSE)
The data set I have (named "late") now is in this form:
ontime state count
0 AL 1
1 AL 44
null AL 3
0 AR 5
1 AR 50
...
But I would like it to be:
state count0 count1 countnull
AL 1 44 3
AR 5 50 null
...
Ultimately, I want to calculate count0/count1 for each state. So if there is a better way of going about this, I would be open to any suggestions.
You could do this with dcast() from the reshape2 package
library(reshape2)
df = data.frame(
ontime = c(0,1,NA,0,1),
state = c("AL","AL","AL","AR","AR"),
count = c(1,44,3,5,50)
)
dcast(df,state~ontime,value=count)
With spread:
library(dplyr)
library(tidyr)
df %>%
mutate(ontime = paste0('count', ontime)) %>%
spread(ontime, count)
Output:
state count0 count1 countnull
1 AL 1 44 3
2 AR 5 50 NA
Data:
df <- structure(list(ontime = structure(c(1L, 2L, 3L, 1L, 2L), .Label = c("0",
"1", "null"), class = "factor"), state = structure(c(1L, 1L,
1L, 2L, 2L), .Label = c("AL", "AR"), class = "factor"), count = c(1L,
44L, 3L, 5L, 50L)), class = "data.frame", row.names = c(NA, -5L
))
This question already has answers here:
Finding maximum value of one column (by group) and inserting value into another data frame in R
(3 answers)
Closed 7 years ago.
I have this data frame which consists of two vectors and it runs into million of rows. I used loop but it takes a day to compare the value.
Can some one suggest any apply functions??
Names Sales
A 1
A 2
A 3
B 1
B 5
B 6
.
.
what I want is unique list of names along with the maximum element in sales against that particular name. like A has 3 rows and highest sales is 3.
Output should be in data frame.
Names Sales
A 3
B 6
You can try with aggregate()
aggregate(V2 ~ ., df1 , max)
# V1 V2
#1 A 3
#2 B 6
data
df1 <- structure(list(V1 = structure(c(1L, 1L, 1L, 2L, 2L, 2L),
.Label = c("A", "B"), class = "factor"), V2 = c(1L, 2L, 3L, 1L, 5L, 6L)),
.Names = c("V1","V2"), class = "data.frame", row.names = c(NA, -6L))
This question already has answers here:
How to group data.table by multiple columns?
(2 answers)
Closed 7 years ago.
I have a table with multiple columns that I am loading from a CSV file in R:
data <- read.table(file="test.csv",header=TRUE,sep="\t",check.names=FALSE)
The data has the following format:
id timestamp quantity zone
1 123 1 A
2 123 1 A
3 124 1 A
4 124 1 B
5 125 1 B
5 125 1 B
I am trying to get the total quantity of each entity based on timestamp and zone. In other words: how many items were there at given time and given place, so the result should look like this:
timestamp zone quantity
123 A 2
124 A 1
124 B 1
125 B 2
There are plenty of similar questions here on SO but I am always getting this cannot coerce type 'closure' to vector of type 'list' error.
At the moment, I am trying to group only by one column, using the data.tables library but I just can't seem to be able to get it work.
Could you take a look on my script and tell me what am I doing wrong, please?
library(data.table)
frame <- read.table(file="test.csv",header=TRUE,sep="\t")
DT <- data.table(frame)
DT[,sum(quantity), by = timestamp]
Thanks for any tips!
You can use library dplyr as follows:
library(dplyr)
df %>% group_by(timestamp, zone) %>% summarise(quantity = sum(quantity))
We can use aggregate from base R
aggregate(quantity~timestamp+zone, df, sum)
# timestamp zone quantity
#1 123 A 2
#2 124 A 1
#3 124 B 1
#4 125 B 2
data
df <- structure(list(id = c(1L, 2L, 3L, 4L, 5L, 5L),
timestamp = c(123L,
123L, 124L, 124L, 125L, 125L), quantity = c(1L, 1L, 1L, 1L, 1L,
1L), zone = c("A", "A", "A", "B", "B", "B")), .Names = c("id",
"timestamp", "quantity", "zone"), class = "data.frame",
row.names = c(NA, -6L))