I need help accessing a value in a tibble by name - r

I need to use the value from count() by name and not by position due to the dynamic nature of the source data.
I am trying to estimate the labor cost to re-ip devices based on existing ip assignment state.
Example:
a device with a state of Active will be $40.00
a device with a state of ActiveReservation will be $100.00
For the first example below:
6,323 * $10
For the second example below:
9 * $10
I can get them by
temp_dhcp_count$quantity[1] * 10
however I cant guarantee that [1] is the position and always "Active", I need to be able to call it by name "Active"
My assumption was, if I could extract them to values I could:
> Active = 6323
> Active * 10
[1] 63230
vs
temp_dhcp_count$quantity[1] * 10
For example:
> temp_dhcp_count
# A tibble: 5 x 2
# Groups: AddressState [5]
AddressState quantity
<chr> <int>
1 Active 6323
2 ActiveReservation 1222
3 Declined 10
4 Expired 12
5 InactiveReservation 287
> temp_dhcp_count$quantity[1]
[1] 6323
and
> temp_dhcp_count
# A tibble: 3 x 2
# Groups: AddressState [3]
AddressState quantity
<chr> <int>
1 Active 9
2 ActiveReservation 46
3 InactiveReservation 642
> temp_dhcp_count$quantity[1]
[1] 9
I tried asking how to extract rows from a tibble as key value pairs and now I am trying to ask this way based on feedback.
How do you change the output of count from a tibble to Name Value pairs?
The source data is a tsv that I import and select based on subnet and count by state.
library(tidyverse)
library(ipaddress)
dhcp <- read_delim("dhcpmerge.tsv.txt",
delim = "\t", escape_double = FALSE,
trim_ws = TRUE)
dhcp <- distinct(dhcp)
network_in_review = "10.75.0.0/16"
temp_dhcp <- dhcp %>%
select(IPAddress, AddressState, HostName) %>%
filter(is_within(ip_address(IPAddress), ip_network(network_in_review)))
temp_dhcp %>%
group_by(AddressState) %>%
count(name = "quantity") -> temp_dhcp_count
temp_dhcp_count
After more digging,
deframe() %>% as.list()
Works as well.

You can create a named list. With the sample data
temp_dhcp_count <- read.table(text="
AddressState quantity
Active 6323
ActiveReservation 1222
Declined 10
Expired 12
InactiveReservation 287", header=TRUE)
You can create a named list of values to extract them by name
vals <- with(temp_dhcp_count, setNames(as.list(quantity), AddressState))
vals$Active
# [1] 6323
vals$Declined
# [1] 10
And if the vals$ part bothers you, you can use with() again
with(vals, {
Active * 10 - Declined * 2
})
# [1] 63210

If I'm understanding the goal, you can make a table of prices, then merge it in to temp_dhcp_count as needed:
library(tidyverse)
prices <- tribble(
~ AddressState, ~ price,
"Active", 40,
"ActiveReservation", 100,
"Declined", 50,
"Expired", 50,
"InactiveReservation", 120
)
temp_dhcp_count %>%
left_join(prices) %>%
mutate(total = quantity * price)
# # A tibble: 5 x 4
# AddressState quantity price total
# <chr> <dbl> <dbl> <dbl>
# 1 Active 6323 40 252920
# 2 ActiveReservation 1222 100 122200
# 3 Declined 10 50 500
# 4 Expired 12 50 600
# 5 InactiveReservation 287 120 34440
This will work regardless of the order of AddressState in temp_dhcp_count.

Related

Read table from PDF with partially filled column using Pdftools

I've written a function in R using pdftools to read a table from a pdf. The function gets the job done, but unfortunately the table contains a column for notes, which is only partially filled. As a result the data in the resulting table is shifted by one column in the row containing a note.
Here's the table.
And here's the code:
# load library
library(pdftools)
# link to report
url <- "https://www.rymanhealthcare.co.nz/hubfs/Investor%20Centre/Financial/Half%20year%20results%202022/Ryman%20Healthcare%20Limited%20-%20Announcement%20Numbers%20and%20financial%20statements%20-%2030%20September%202022.pdf"
# read data through pdftool
data <- pdf_text(url)
# create a function to read the pdfs
scrape_pdf <- function(list_of_tables,
table_number,
number_columns,
column_names,
first_row,
last_row) {
data <- list_of_tables[table_number]
data <- trimws(data)
data <- strsplit(data, "\n")
data <- data[[1]]
data <- data[min(grep(first_row, data)):
max(grep(last_row, data))]
data <- str_split_fixed(data, " {2,}", number_columns)
data <- data.frame(data)
names(data) <- column_names
return(data)
}
names <- c("","6m 30-9-2022","6m 30-9-2021","12m 30-3-2022")
output <- scrape_pdf(rym22Q3fs,3,5,names,"Care fees","Basic and diluted")
And the output.
6m 30-9-2022 6m 30-9-2021 12m 30-3-2022 NA
1 Care fees 210,187 194,603 398,206
2 Management fees 59,746 50,959 105,552
3 Interest received 364 42 41
4 Other income 3,942 2,260 4,998
5 Total revenue 274,239 247,864 508,797
6
7 Fair-value movement of
8 investment properties 3 261,346 285,143 745,885
9 Total income 535,585 533,007 1,254,682
10
11 Operating expenses (265,148) (225,380) (466,238)
12 Depreciation and
13 amortisation expenses (22,996) (17,854) (35,698)
14 Finance costs (19,355) (15,250) (30,664)
15 Impairment loss 2 (10,784) - -
16 Total expenses (318,283) (258,484) (532,600)
17
18 Profit before income tax 217,302 274,523 722,082
19 Income tax (expense) / credit (23,316) 6,944 (29,209)
20 Profit for the period 193,986 281,467 692,873
21
22 Earnings per share
23 Basic and diluted (cents per share) 38.8 56.3 138.6
How can I best circumvent this issue?
Many thanks in advance!
While readr::read_fwf() is for handling fixed width files, it performs pretty well on text from pdftools too once header / footer rows are removed. Even if it has to guess column widths, though those can be specified too.
library(pdftools)
library(dplyr, warn.conflicts = F)
url <- "https://www.rymanhealthcare.co.nz/hubfs/Investor%20Centre/Financial/Half%20year%20results%202022/Ryman%20Healthcare%20Limited%20-%20Announcement%20Numbers%20and%20financial%20statements%20-%2030%20September%202022.pdf"
data <- pdf_text(url)
scrape_pdf <- function(pdf_text_item, first_row_str, last_row_str){
lines <- unlist(strsplit(pdf_text_item, "\n"))
# remove 0-length lines
lines <- lines[nchar(lines) > 0]
lines <- lines[min(grep(first_row_str, lines)):
max(grep(last_row_str , lines))]
# paste lines back into single string for read_fwf()
paste(lines, collapse = "\n") %>%
readr::read_fwf() %>%
# re-connect strings in first colum if values were split between rows
mutate(X1 = if_else(!is.na(lag(X1)) & is.na(lag(X3)), paste(lag(X1), X1), X1)) %>%
filter(!is.na(X3))
}
output <- scrape_pdf(data[3], "Care fees","Basic and diluted" )
Result:
output %>%
mutate(X1 = stringr::str_trunc(X1, 35))
#> # A tibble: 16 × 5
#> X1 X2 X3 X4 X5
#> <chr> <dbl> <chr> <chr> <chr>
#> 1 Care fees NA 210,187 194,603 398,206
#> 2 Management fees NA 59,746 50,959 105,552
#> 3 Interest received NA 364 42 41
#> 4 Other income NA 3,942 2,260 4,998
#> 5 Total revenue NA 274,239 247,864 508,797
#> 6 Fair-value movement of investmen... 3 261,346 285,143 745,885
#> 7 Total income NA 535,585 533,007 1,254,682
#> 8 Operating expenses NA (265,148) (225,380) (466,238)
#> 9 Depreciation and amortisation ex... NA (22,996) (17,854) (35,698)
#> 10 Finance costs NA (19,355) (15,250) (30,664)
#> 11 Impairment loss 2 (10,784) - -
#> 12 Total expenses NA (318,283) (258,484) (532,600)
#> 13 Profit before income tax NA 217,302 274,523 722,082
#> 14 Income tax (expense) / credit NA (23,316) 6,944 (29,209)
#> 15 Profit for the period NA 193,986 281,467 692,873
#> 16 Earnings per share Basic and dil... NA 38.8 56.3 138.6
Created on 2022-11-19 with reprex v2.0.2

Efficient way to repeat operations with columns with similar name in R

I am a beginner with R and have found myself repeatedly running into a problem of this kind. Say I have a dataframe with columns:
company, shares_2010, shares_2011, ... , shares_2020, share_price_2010, ... , share_price_2020
TeslaInc 1000 1200 2000 8 40
.
.
.
I then want to go ahead and calculate the market value in each year. Ordinarily I would do it this way:
dataframe <- dataframe %>%
mutate(value_2010 = shares_2010*share_price_2010,
value_2011 = shares_2011*share_price_2011,
.
:
value_2020 = shares_2020*share_price_2020)
Clearly, all of this is rather cumbersome to type out each time and it cannot be made dynamic with respect to the number of time periods included. Is there any clever way to do these operations in one line instead? I am suspecting something may be possible to do with a combination of starts_with() and some lambda function, but I just haven't been able to figure out how to make the correct things multiply yet. Surely the tidyverse must have a better way to do this?
Any help is much appreciated!
You're right, this is a very common situation in data management.
Let's make a minimal, reproducible example:
dat <- data.frame(
company = c("TeslaInc", "Merta"),
shares_2010 = c(1000L, 1500L),
shares_2011 = c(1200L, 1100L),
shareprice_2010 = 8:7,
shareprice_2011 = c(40L, 12L)
)
dat
#> company shares_2010 shares_2011 shareprice_2010 shareprice_2011
#> 1 TeslaInc 1000 1200 8 40
#> 2 Merta 1500 1100 7 12
This dataset has two issues:
It's in a wide format. This is relatively easy to visualise for humans, but it's not ideal for data analysis. We can fix this with pivot_longer() from tidyr.
Each column actually contains two variables: measure (share or share price) and year. We can fix this with separate() from the same package.
library(tidyr)
dat_reshaped <- dat |>
pivot_longer(shares_2010:shareprice_2011) |>
separate(name, into = c("name", "year")) |>
pivot_wider(everything(), values_from = value, names_from = name)
dat_reshaped
#> # A tibble: 4 × 4
#> company year shares shareprice
#> <chr> <chr> <int> <int>
#> 1 TeslaInc 2010 1000 8
#> 2 TeslaInc 2011 1200 40
#> 3 Merta 2010 1500 7
#> 4 Merta 2011 1100 12
The last pivot_wider() is needed to have shares and shareprice as two separate columns, for ease of further calculations.
We can finally use mutate() to calculate in one go all the new values.
dat_reshaped |>
dplyr::mutate(value = shares * shareprice)
#> # A tibble: 4 × 5
#> company year shares shareprice value
#> <chr> <chr> <int> <int> <int>
#> 1 TeslaInc 2010 1000 8 8000
#> 2 TeslaInc 2011 1200 40 48000
#> 3 Merta 2010 1500 7 10500
#> 4 Merta 2011 1100 12 13200
I recommend you read this chapter of R4DS to better understand these concepts - it's worth the effort!
I think further analysis will be simpler if you reshape your data long.
Here, we can extract the shares, share_price, and year from the header names using pivot_longer. Here, I specify that I want to split the headers into two pieces separated by _, and I want to put the name (aka .value) from the beginning of the header (that is, share or share_price) next to the year that came from the end of the header.
Then the calculation is a simple one-liner.
library(tidyr); library(dplyr)
data.frame(company = "Tesla",
shares_2010 = 5, shares_2011 = 6,
share_price_2010 = 100, share_price_2011 = 110) %>%
pivot_longer(-company,
names_to = c(".value", "year"),
names_pattern = "(.*)_(.*)") %>%
mutate(value = shares * share_price)
# A tibble: 2 × 5
company year shares share_price value
<chr> <chr> <dbl> <dbl> <dbl>
1 Tesla 2010 5 100 500
2 Tesla 2011 6 110 660
I agree with the other posts about pivoting this data into a longer format. Just to add a different approach that works well with this type of example: you can create a list of expressions and then use the splice operator !!! to evaluate these expressions within your context:
library(purrr)
library(dplyr)
library(rlang)
library(glue)
lexprs <- set_names(2010:2011, paste0("value_", 2010:2011)) %>%
map_chr(~ glue("shares_{.x} * share_price_{.x}")) %>%
parse_exprs()
df %>%
mutate(!!! lexprs)
Output
company shares_2010 shares_2011 share_price_2010 share_price_2011 value_2010
1 TeslaInc 1000 1200 8 40 8000
2 Merta 1500 1100 7 12 10500
value_2011
1 48000
2 13200
Data
Thanks to Andrea M
structure(list(company = c("TeslaInc", "Merta"), shares_2010 = c(1000L,
1500L), shares_2011 = c(1200L, 1100L), share_price_2010 = 8:7,
share_price_2011 = c(40L, 12L)), class = "data.frame", row.names = c(NA,
-2L))
How it works
With this usage, the splice operator takes a named list of expressions. The names of the list become the variable names and the expressions are evaluated in the context of your mutate statement.
> lexprs
$value_2010
shares_2010 * share_price_2010
$value_2011
shares_2011 * share_price_2011
To see how this injection will resolve, we can use rlang::qq_show:
> rlang::qq_show(df %>% mutate(!!! lexprs))
df %>% mutate(value_2010 = shares_2010 * share_price_2010, value_2011 = shares_2011 *
share_price_2011)
It is indeed likely you may need to have your data in a long format. But in case you don't, you can do this:
# thanks Andrea M!
df <- data.frame(
company=c("TeslaInc", "Merta"),
shares_2010=c(1000L, 1500L),
shares_2011=c(1200L, 1100L),
share_price_2010=8:7,
share_price_2011=c(40L, 12L)
)
years <- sub('shares_', '', grep('^shares_', names(df), value=T))
for (year in years) {
df[[paste0('value_', year)]] <-
df[[paste0('shares_', year)]] * df[[paste0('share_price_', year)]]
}
If you wanted to avoid the loop (for (...) {...}) you can use this instead:
sp <- df[, paste0('shares_', years)] * df[, paste0('share_price_', years)]
names(sp) <- paste0('value_', years)
df <- cbind(df, sp)

Sum unique occurrences per night and create a new data frame in R

I have studied prey deliveries in a breeding owl and want to score the number of prey items delivered during the night to the nestlings. I define night as from 21 to 5. How could I make a new data frame with number of prey each night per location ID based upon these 24/7 observation dataset? In the new data frame, I wish to have the following columns: ID (A & B), No_prey_during_night (the sum of prey items), Time (date, e.g. 4/6 to 5/6), there will be a unique row per night per ID.
https://drive.google.com/file/d/1y5VCoNWZCmYbyWCktKfMSBqjOIaLeumQ/view?usp=sharing. I have done it in Excel so far, but very time demanding. I would be happy to get help with a simple script I could use in R.
To take into account the fact that a night begins and ends on different dates, you could first assign all the morning hours to the prior day. The final label (the Time column in your question) then includes the next day. If the year of the data collection has a Feb 29, make sure the year is correct (I used 2022).
library(dplyr)
library(lubridate)
read.csv("Tot_prey_example.csv") %>%
mutate(time = make_datetime(year = 2022, month = Month, day = Day, hour = Hour),
night_time = if_else(between(Hour, 0, 5), time - days(1), time),
night_date = floor_date(night_time, unit = "day"),
night = Hour <= 5 | Hour >= 21) %>%
filter(night) %>%
group_by(ID, night_date) %>%
summarise(No_prey_during_night = sum(n), .groups = "drop") %>%
mutate(next_day = night_date + days(1),
Time = glue::glue("{day(night_date)}/{month(night_date)} to {day(next_day)}/{month(next_day)}")) %>%
select(ID, No_prey_during_night, Time)
#> # A tibble: 88 × 3
#> ID No_prey_during_night Time
#> <chr> <int> <glue>
#> 1 A 12 4/6 to 5/6
#> 2 A 22 5/6 to 6/6
#> 3 A 20 6/6 to 7/6
#> 4 A 14 7/6 to 8/6
#> 5 A 14 8/6 to 9/6
#> 6 A 27 9/6 to 10/6
#> 7 A 22 10/6 to 11/6
#> 8 A 18 11/6 to 12/6
#> 9 A 22 12/6 to 13/6
#> 10 A 25 13/6 to 14/6
#> # … with 78 more rows
Created on 2022-05-18 by the reprex package (v2.0.1)
You can do something like this:
library(dplyr)
library(lubridate)
read.csv("Tot_prey_example.csv") %>%
# create initial datetime variable, `night`
mutate(night = lubridate::make_datetime(2021, Month,Day,Hour)) %>%
# filter to nighttime hours
filter(Hour>=21 | Hour<=5) %>%
# flip datetime variable to the next day if hour is >=21
mutate(night = if_else(Hour>=21,night + 60*60*24, night)) %>%
# now group by the date part of `night`
group_by(ID,Night_No = as.Date(night)) %>%
# summarize the sum of prey
summarize(
No_prey_during_night = sum(n),
No_deliveries_during_night = sum(PreyDelivery)
) %>%
# replace the Night_No with a character variable showing both dates
mutate(Night_No = paste0(Night_No-1, "-", Night_No))
Output:
# A tibble: 88 × 4
# Groups: ID [2]
ID Night_No No_prey_during_night No_deliveries_during_night
<chr> <chr> <int> <int>
1 A 2021-06-04-2021-06-05 12 5
2 A 2021-06-05-2021-06-06 22 6
3 A 2021-06-06-2021-06-07 20 5
4 A 2021-06-07-2021-06-08 14 6
5 A 2021-06-08-2021-06-09 14 5
6 A 2021-06-09-2021-06-10 27 5
7 A 2021-06-10-2021-06-11 22 4
8 A 2021-06-11-2021-06-12 18 6
9 A 2021-06-12-2021-06-13 22 6
10 A 2021-06-13-2021-06-14 25 5
# … with 78 more rows

How do I find the clickthrough rate using dbplyr in R?

Here is the given code:
library(RSQLite)
library(DBI)
sqcon<-dbConnect(dbDriver("SQLite"), "data/sqlite.db")
events <- read_csv("events_log.csv")
sqevents <- copy_to(sqcon, events)
sqevents
The sqevents dataframe is like this:
## # Source: table<events> [?? x 9]
## # Database: sqlite 3.35.5 [C:\Users\James\Documents\Work\2021 Sem2\Stats
## # 369\lab4\Data\sqlite.db]
## uuid timestamp session_id group action checkin page_id n_results
## <chr> <dbl> <chr> <chr> <chr> <dbl> <chr> <dbl>
## 1 00000736167~ 2.02e13 78245c2c3f~ b searchR~ NA cbeb66d1~ 5
## 2 00000c69fe3~ 2.02e13 c559c3be98~ a searchR~ NA eb658e87~ 10
## 3 00003bfdab7~ 2.02e13 760bf89817~ a checkin 30 f99a9fc1~ NA
## 4 0000465cd7c~ 2.02e13 fb905603d3~ a checkin 60 e5626962~ NA
## 5 000050cbb4e~ 2.02e13 c2bf5e5172~ a checkin 30 787dd6a4~ NA
## 6 0000a6af2ba~ 2.02e13 f6840a9614~ a checkin 180 6fb7b9ea~ NA
## 7 0000cd61e11~ 2.02e13 51f4d3b6a8~ a checkin 240 8ad97e7c~ NA
## 8 000104fe220~ 2.02e13 485eabe537~ b searchR~ NA 4da9a642~ 15
## 9 00012e37b74~ 2.02e13 91174a537d~ a checkin 180 dfdff179~ NA
## 10 000145fbe69~ 2.02e13 a795756dba~ b checkin 150 ec0bad00~ NA
## # ... with more rows, and 1 more variable: result_position <dbl>
I want to find the clickthrough rate which is the proportion of session_id that have action=="visitPage"
My current code is this:
sqevents %>% group_by(session_id) %>%
summarise(clickthrough = sum(action=="visitPage")) %>% filter(clickthrough=="0") %>% collect()
However this doesn't return anything:
## # A tibble: 0 x 2
## # ... with 2 variables: session_id <chr>, clickthrough <lgl>
What did I do wrong? And how do I fix this?
Perhaps, we may need to unquote the "0" as the previous step with sum returns a numeric summarised output. Also, if there are NA elements, specify the na.rm = TRUE in sum or else any missing value in the column returns the sum as NA as na.rm = FALSE by default.
library(dplyr)
sqevents %>%
group_by(session_id) %>%
summarise(clickthrough = sum(action=="visitPage", na.rm = TRUE)) %>%
filter(clickthrough == 0) %>%
collect()
Also, other case would be that there is at least one 'visitPage' for each 'session_id', thus the filter steps returns 0 rows
From you description "[...] which is the proportion of session_id that have action=="visitPage" [...]" you might commit an error further down the pipe using sum(). A nice way to calculate the proportion you described can be this:
library(dplyr)
sqevents %>%
dplyr::group_by(session_id) %>%
# check if a session has at least one "visitPage" (true or false = 1 or 0)
dplyr::summarise(yn = any(action == "visitPage")) %>%
# build a mean from that to get the proportion
dplyr::summarise(prop = mean(yn))
# and collect if you like

Creating a new Data.Frame from variable values

I am currently working on a task that requires me to query a list of stocks from an sql db.
The problem is that it is a list where there are 1:n stocks traded per date. I want to calculate the the share of each stock int he portfolio on a given day (see example) and pass it to a new data frame. In other words date x occurs 2 times (once for stock A and once for stock B) and then pull it together that date x occurs only one time with the new values.
'data.frame': 1010 obs. of 5 variables:
$ ID : int 1 2 3 4 5 6 7 8 9 10 ...
$ Date : Date, format: "2019-11-22" "2019-11-21" "2019-11-20" "2019-11-19" ...
$ Close: num 52 51 50.1 50.2 50.2 ...
$ Volume : num 5415 6196 3800 4784 6189 ...
$ Stock_ID : Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
RawInput<-data.frame(Date=c("2017-22-11","2017-22-12","2017-22-13","2017-22-11","2017-22-12","2017-22-13","2017-22-11"), Close=c(50,55,56,10,11,12,200),Volume=c(100,110,150,60,70,80,30),Stock_ID=c(1,1,1,2,2,2,3))
RawInput$Stock_ID<-as.factor(RawInput$Stock_ID)
*cannot transfer the date to a date variable in this example
I would like to have a new dataframe that generates the Value traded per day, the weight of each stock, and the daily returns per day, while keeping the number of stocks variable.
I hope I translated the issue properly so that I can receive help.
Thank you!
I think the easiest way to do this would be to use the dplyr package. You may need to read some documentation but the mutate and group_by function may be able do what you want. This function will allow you to modify the current dataframe by either adding a new column or changing the existing data.
Lets start with a reproducible dataset
RawInput<-data.frame(Date=c("2017-22-11","2017-22-12","2017-22-13","2017-22-11","2017-22-12","2017-22-13","2017-22-11"),
Close=c(50,55,56,10,11,12,200),
Volume=c(100,110,150,60,70,80,30),
Stock_ID=c(1,1,1,2,2,2,3))
RawInput$Stock_ID<-as.factor(RawInput$Stock_ID)
library(magrittr)
library(dplyr)
dat2 <- RawInput %>%
group_by(Date, Stock_ID) %>% #this example only has one stock type but i imagine you want to group by stock
mutate(CloseMean=mean(Close),
CloseSum=sum(Close),
VolumeMean=mean(Volume),
VolumeSum=sum(Volume)) #what ever computation you need to do with
#multiple stock values for a given date goes here
dat2 %>% select(Stock_ID, Date, CloseMean, CloseSum, VolumeMean,VolumeSum) %>% distinct() #dat2 will still be the same size as dat, thus use the distinct() function to reduce it to unique values
# A tibble: 7 x 6
# Groups: Date, Stock_ID [7]
Stock_ID Date CloseMean CloseSum VolumeMean VolumeSum
<fct> <fct> <dbl> <dbl> <dbl> <dbl>
1 1 2017-22-11 50 50 100 100
2 1 2017-22-12 55 55 110 110
3 1 2017-22-13 56 56 150 150
4 2 2017-22-11 10 10 60 60
5 2 2017-22-12 11 11 70 70
6 2 2017-22-13 12 12 80 80
7 3 2017-22-11 200 200 30 30
This data set that you provided actually only has one unique Stock_ID and Date combinations so there was nothing actually done with the data. However if you remove Stock_ID where necessary you can see how this function would work
dat2 <- RawInput %>%
group_by(Date) %>%
mutate(CloseMean=mean(Close),
CloseSum=sum(Close),
VolumeMean=mean(Volume),
VolumeSum=sum(Volume))
dat2 %>% select(Date, CloseMean, CloseSum, VolumeMean,VolumeSum) %>% distinct()
# A tibble: 3 x 5
# Groups: Date [3]
Date CloseMean CloseSum VolumeMean VolumeSum
<fct> <dbl> <dbl> <dbl> <dbl>
1 2017-22-11 86.7 260 63.3 190
2 2017-22-12 33 66 90 180
3 2017-22-13 34 68 115 230
After reading your first reply, You will have to be specific on how you are trying to calculate the weight. Also define your end result.
Im going to assume weight is just percentage by total cost. And the end result is for each date show the weight per stock. In other words a matrix of dates and stock Ids
library(tidyr)
RawInput %>%
group_by(Date) %>%
mutate(weight=Close/sum(Close)) %>%
select(Date, weight, Stock_ID) %>%
spread(key = "Stock_ID", value = "weight", fill = 0)
# A tibble: 3 x 4
# Groups: Date [3]
Date `1` `2` `3`
<fct> <dbl> <dbl> <dbl>
1 2017-22-11 0.192 0.0385 0.769
2 2017-22-12 0.833 0.167 0
3 2017-22-13 0.824 0.176 0

Resources