This question already has answers here:
How to merge and sum two data frames
(5 answers)
Closed 3 years ago.
I have a question regarding to addition of rows from different tables having same column names. I have time series of two tables with values 8760 rows (whole year).
Table1
Name Year Month Day Hour Value
Plant_1 2020 1 1 1 10
Plant_2 2020 1 1 1 20
Plant_3 2020 1 1 1 30
Plant_1 2020 1 1 2 40
Plant_2 2020 1 1 2 50
Plant_3 2020 1 1 2 60
Table2
Name Year Month Day Hour Value
Plant_x 2020 1 1 1 1
Plant_y 2020 1 1 1 2
Plant_z 2020 1 1 1 3
Plant_x 2020 1 1 2 4
Plant_y 2020 1 1 2 5
Plant_z 2020 1 1 2 6
What I want is, summation of value of all plants at same time period like
Year Month Day Hour Value
2020 1 1 1 66
2020 1 1 2 165
I don't care about name of plant but need to get sum of total value at each hour of the year. I was trying to do something like this but doesn't work for tables more than two and I have 9 to 10 such tables. Could anyone help me to improve this code or any other function which I can use?
SumOfValue <- Table1%>%
full_join(Table2) %>%
group_by (Year,Month,Day,Hour) %>%
summarise(Value=sum(Value))
Any help would be appreciated. Thank you.
It looks like your two dataframes have the same exact format, so you can just rbind them and then get the summary per Year, Month, Day and Hour.
df = rbind(a,b)%>%group_by(Year,Month,Day,Hour)%>%summarise(Value=sum(Value))
# Alternative as suggested by Sotos
bind_rows(a, b) %>%group_by(Year,Month,Day,Hour)%>%summarise(Value=sum(Value))
# A tibble: 2 x 5
# Groups: Year, Month, Day [?]
Year Month Day Hour Value
<int> <int> <int> <int> <int>
1 2020 1 1 1 66
2 2020 1 1 2 165
Data
a = structure(list(Name = structure(c(1L, 2L, 3L, 1L, 2L, 3L), .Label = c("Plant_1",
"Plant_2", "Plant_3"), class = "factor"), Year = c(2020L, 2020L,
2020L, 2020L, 2020L, 2020L), Month = c(1L, 1L, 1L, 1L, 1L, 1L
), Day = c(1L, 1L, 1L, 1L, 1L, 1L), Hour = c(1L, 1L, 1L, 2L,
2L, 2L), Value = c(10L, 20L, 30L, 40L, 50L, 60L)), class = "data.frame", row.names = c(NA,
-6L))
b = structure(list(Name = structure(c(1L, 2L, 3L, 1L, 2L, 3L), .Label = c("Plant_x",
"Plant_y", "Plant_z"), class = "factor"), Year = c(2020L, 2020L,
2020L, 2020L, 2020L, 2020L), Month = c(1L, 1L, 1L, 1L, 1L, 1L
), Day = c(1L, 1L, 1L, 1L, 1L, 1L), Hour = c(1L, 1L, 1L, 2L,
2L, 2L), Value = 1:6), class = "data.frame", row.names = c(NA,
-6L))
Related
I am a novice trying to analyze trap catch data in R and am looking for an efficient way to loop through by trap line. The first column is trap ID. The second column is the trap line that each trap is associated with. The remaining columns are values related to target catch and bycatch for each visit to the traps. I want to write code that will evaluate the data during each visit for each trap line. Here is an example of data I am working with:
Sample Data:
Data <- structure(list(Trap_ID = c(1L, 2L, 1L, 1L, 2L, 3L), Trapline = c("Cemetery",
"Cemetery", "Golf", "Church", "Church", "Church"), Target_Visit_1 = c(0L,
1L, 5L, 0L, 1L, 1L), Bycatch_Visit_1 = c(3L, 2L, 0L, 2L, 1L,
4L), Target_Visit_2 = c(1L, 1L, 2L, 0L, 1L, 0L), Bycatch_Visit_2 = c(4L,
2L, 1L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA,
-6L))
The number of traps per trapline varies. I have a code that I wrote out for each Trapline (there are 14 different traplines), but I was hoping there would be a way to consolidate it into one line of code that would calculate values while the trapline was constant, and then when it changed to the next trapline it would start a new calculation. Here is an example of how I was finding the sum of bycatch found at the Cemetery Trapline for visit 1.
CemetaryBycatch1 <- Data %>% select(Bycatch Visit 1 %>% filter(Data$Trapline == "Cemetery")
sum(CemetaryBycatch1)
As of right now I have code like this written out for each trapline for each visit, but with 14 traplines and 8 total visits, I would like to avoid having to write out so many lines of code and was hoping there was a way to loop through it with one block of code that would calculate value (sum, mean, etc.) for each trap line.
Thanks
Does something like this help you?
You can add a filter for Trapline in between group_by and summarise_all.
Code:
library(dplyr)
Data <- structure(list(Trap_ID = c(1L, 2L, 1L, 1L, 2L, 3L), Trapline = c("Cemetery",
"Cemetery", "Golf", "Church", "Church", "Church"), Target_Visit_1 = c(0L,
1L, 5L, 0L, 1L, 1L), Bycatch_Visit_1 = c(3L, 2L, 0L, 2L, 1L,
4L), Target_Visit_2 = c(1L, 1L, 2L, 0L, 1L, 0L), Bycatch_Visit_2 = c(4L,
2L, 1L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA,
-6L))
df
Data %>%
group_by(Trap_ID, Trapline) %>%
summarise_all(list(sum))
Output:
#> # A tibble: 6 x 6
#> # Groups: Trap_ID [3]
#> Trap_ID Trapline Target_Visit_1 Bycatch_Visit_1 Target_Visit_2 Bycatch_Visit_2
#> <int> <chr> <int> <int> <int> <int>
#> 1 1 Cemetery 0 3 1 4
#> 2 1 Church 0 2 0 0
#> 3 1 Golf 5 0 2 1
#> 4 2 Cemetery 1 2 1 2
#> 5 2 Church 1 1 1 1
#> 6 3 Church 1 4 0 0
Created on 2020-10-16 by the reprex package (v0.3.0)
Adding another row to Data:
Trap_ID Trapline Target_Visit_1 Bycatch_Visit_1 Target_Visit_2 Bycatch_Visit_2
1 Cemetery 100 200 1 4
Will give you:
#> # A tibble: 6 x 6
#> # Groups: Trap_ID [3]
#> Trap_ID Trapline Target_Visit_1 Bycatch_Visit_1 Target_Visit_2 Bycatch_Visit_2
#> <int> <chr> <int> <int> <int> <int>
#> 1 1 Cemetery 100 203 2 8
#> 2 1 Church 0 2 0 0
#> 3 1 Golf 5 0 2 1
#> 4 2 Cemetery 1 2 1 2
#> 5 2 Church 1 1 1 1
#> 6 3 Church 1 4 0 0
Created on 2020-10-16 by the reprex package (v0.3.0)
This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 3 years ago.
I have a data frame sex(male & female), age(child & adult), survive(yes & no) and frequency. How can I create a cross tab of sex and age?
sex age survive freq
male child yes 4
male adult yes 0
female child yes 6
female adult yes 3
male child no 1
male adult no 0
female child no 2
female adult no 1
I think you are looking for reshaping your data using pivot_wider from tidyr:
library(tidyr)
df %>% pivot_wider(., names_from = age, values_from = freq)
# A tibble: 4 x 4
sex survive child adult
<fct> <fct> <int> <int>
1 male yes 4 0
2 female yes 6 3
3 male no 1 0
4 female no 2 1
or
library(tidyr)
df %>% pivot_wider(., names_from = c(age, survive), values_from = freq)
# A tibble: 2 x 5
sex child_yes adult_yes child_no adult_no
<fct> <int> <int> <int> <int>
1 male 4 0 1 0
2 female 6 3 2 1
Is it what you are looking for ? If not, can you provide the expected outcome ?
Data
df = structure(list(sex = structure(c(2L, 2L, 1L, 1L, 2L, 2L, 1L,
1L), .Label = c("female", "male"), class = "factor"), age = structure(c(2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L), .Label = c("adult", "child"), class = "factor"),
survive = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("no",
"yes"), class = "factor"), freq = c(4L, 0L, 6L, 3L, 1L, 0L,
2L, 1L)), class = "data.frame", row.names = c(NA, -8L))
I have a data set ProductTable, I want to return the date of all the ProductsFamily has been ordered first time and the very last time. Examples:
ProductTable
OrderPostingYear OrderPostingMonth OrderPostingDate ProductsFamily Sales QTY
2008 1 20 R1 5234 1
2008 1 12 R2 223 2
2009 1 30 R3 34 1
2008 2 1 R1 1634 3
2010 4 23 R3 224 1
2009 3 20 R1 5234 1
2010 7 12 R2 223 2
Result as followings
OrderTime
ProductsFamily OrderStart OrderEnd SumSales
R1 2008/1/20 2009/3/20 12102
R2 2008/1/12 2010/7/12 446
R3 2009/1/30 2010/4/23 258
I have no idea how to do it. Any suggestions?
ProductTable <- structure(list(OrderPostingYear = c(2008L, 2008L, 2009L, 2008L,
2010L, 2009L, 2010L), OrderPostingMonth = c(1L, 1L, 1L, 2L, 4L,
3L, 7L), OrderPostingDate = c(20L, 12L, 30L, 1L, 23L, 20L, 12L
), ProductsFamily = structure(c(1L, 2L, 3L, 1L, 3L, 1L, 2L), .Label = c("R1",
"R2", "R3"), class = "factor"), Sales = c(5234L, 223L, 34L, 1634L,
224L, 5234L, 223L), QTY = c(1L, 2L, 1L, 3L, 1L, 1L, 2L)), .Names = c("OrderPostingYear",
"OrderPostingMonth", "OrderPostingDate", "ProductsFamily", "Sales",
"QTY"), class = "data.frame", row.names = c(NA, -7L))
We can also use dplyr/tidyr to do this. We arrange the columns, concatenate the 'Year:Date' columns with unite, group by 'ProductsFamily', get the first, last of 'Date' column and sum of 'Sales' within summarise.
library(dplyr)
library(tidyr)
ProductTable %>%
arrange(ProductsFamily, OrderPostingYear, OrderPostingMonth, OrderPostingDate) %>%
unite(Date,OrderPostingYear:OrderPostingDate, sep='/') %>%
group_by(ProductsFamily) %>%
summarise(OrderStart=first(Date), OrderEnd=last(Date), SumSales=sum(Sales))
# Source: local data frame [3 x 4]
# ProductsFamily OrderStart OrderEnd SumSales
# (fctr) (chr) (chr) (int)
# 1 R1 2008/1/20 2009/3/20 12102
# 2 R2 2008/1/12 2010/7/12 446
# 3 R3 2009/1/30 2010/4/23 258
You can first set up the date in a new column, and then aggregate your data using data.table package (you take the first and last date by ID, as well as the sum of sales):
library(data.table)
# First build up the date
ProductTable$date = with(ProductTable,
as.Date(paste(OrderPostingYear,
OrderPostingMonth,
OrderPostingDate, sep = "." ),
format = "%Y.%m.%d"))
# In a second step, aggregate your data
setDT(ProductTable)[,list(OrderStart = sort(date)[1],
OrderEnd = sort(date)[.N],
SumSales = sum(Sales))
,ProductsFamily]
# ProductsFamily OrderStart OrderEnd SumSales
#1: R1 2008-01-20 2009-03-20 12102
#2: R2 2008-01-12 2010-07-12 446
#3: R3 2009-01-30 2010-04-23 258
This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 6 years ago.
I have the following data frame:
Event Scenario Year Cost
1 1 1 10
2 1 1 5
3 1 2 6
4 1 2 6
5 2 1 15
6 2 1 12
7 2 2 10
8 2 2 5
9 3 1 4
10 3 1 5
11 3 2 6
12 3 2 5
I need to produce a pivot table/ frame that will sum the total cost per year for each scenario. So the result will be.
Scenario Year Cost
1 1 15
1 2 12
2 1 27
2 2 15
3 1 9
3 2 11
I need to produce a ggplot line graph that plot the cost of each scenario per year. I know how to do that, I just can't get the right data frame.
Try
library(dplyr)
df %>% group_by(Scenario, Year) %>% summarise(Cost=sum(Cost))
Or
library(data.table)
setDT(df)[, list(Cost=sum(Cost)), by=list(Scenario, Year)]
Or
aggregate(Cost~Scenario+Year, df,sum)
data
df <- structure(list(Event = 1:12, Scenario = c(1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L), Year = c(1L, 1L, 2L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 2L, 2L), Cost = c(10L, 5L, 6L, 6L, 15L, 12L,
10L, 5L, 4L, 5L, 6L, 5L)), .Names = c("Event", "Scenario", "Year",
"Cost"), class = "data.frame", row.names = c(NA, -12L))
The following does it:
library(plyr)
ddply(df, .(Scenario, Year), summarize, Cost = sum(Cost))
#Scenario Year Cost
#1 1 1 15
#2 1 2 12
#3 2 1 27
#4 2 2 15
#5 3 1 9
#6 3 2 11
I have a dataframe in long form for which I need to aggregate several observations taken on a particular day.
Example data:
long <- structure(list(Day = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L, 2L, 2L), .Label = c("1", "2"), class = "factor"),
Genotype = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,
2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), View = structure(c(1L,
2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("1",
"2", "3"), class = "factor"), variable = c(1496L, 1704L,
1738L, 1553L, 1834L, 1421L, 1208L, 1845L, 1325L, 1264L, 1920L,
1735L)), .Names = c("Day", "Genotype", "View", "variable"), row.names = c(NA, -12L),
class = "data.frame")
> long
Day Genotype View variable
1 1 A 1 1496
2 1 A 2 1704
3 1 A 3 1738
4 1 B 1 1553
5 1 B 2 1834
6 1 B 3 1421
7 2 A 1 1208
8 2 A 2 1845
9 2 A 3 1325
10 2 B 1 1264
11 2 B 2 1920
12 2 B 3 1735
I need to aggregate each genotype for each day by taking the cube root of the product of each view. So for genotype A on day 1, (1496 * 1704 * 1738)^(1/3). Final dataframe would look like:
Day Genotype summary
1 1 A 1642.418
2 1 B 1593.633
3 2 A 1434.695
4 2 B 1614.790
Have been going round and round with reshape2 for the last couple of days, but not getting anywhere. Help appreciated!
I'd probably use plyr and ddply for this task:
library(plyr)
ddply(long, .(Day, Genotype), summarize,
summary = prod(variable) ^ (1/3))
#-----
Day Genotype summary
1 1 A 1642.418
2 1 B 1593.633
3 2 A 1434.695
4 2 B 1614.790
Or this with dcast:
dcast(data = long, Day + Genotype ~ .,
value.var = "variable", function(x) prod(x) ^ (1/3))
#-----
Day Genotype NA
1 1 A 1642.418
2 1 B 1593.633
3 2 A 1434.695
4 2 B 1614.790
An other solution without additional packages.
aggregate(list(Summary=long$variable),by=list(Day=long$Day,Genotype=long$Genotype),function(x) prod(x)^(1/length(x)))
Day Genotype Summary
1 1 A 1642.418
2 2 A 1434.695
3 1 B 1593.633
4 2 B 1614.790