This question already has answers here:
Filling missing dates by group
(3 answers)
Fastest way to add rows for missing time steps?
(4 answers)
Closed 5 years ago.
I have a data frame like the following and would like to pad the dates.
Notice that four days are missing for id 3.
df = data.frame(
id = rep(1,1,1,2,2,3,3,3),
date = lubridate::ymd("2017-01-01","2017-01-02","2017-01-03",
"2017-05-10","2017-05-11","2017-01-03",
"2017-01-08","2017-01-09"),
type = c("A","A","A","B","B","C","C","C"),
val1 = rnorm(8),
val2 = rnorm(8))
df
I tried the padr package as I wanted a quick solution, but this doesn't seem to work.
?pad
padr::pad(df)
library(dplyr)
df %>% padr::pad(group = c('id'))
df %>% padr::pad(group = c('id','date'))
Any ideas on tools or other packages to pad a dataset across multiple columns and based on groupings
EDIT:
So there are three missing dates in my df.
"2017-01-03","2017-01-08","2017-01-09"
Thus, I want the final dates to include three extra rows that contain
"2017-01-04","2017-01-05","2017-01-06","2017-01-07"
Related
This question already has answers here:
Reshape multiple value columns to wide format
(5 answers)
Closed 4 months ago.
I have a data frame, in which data for different countries are listed vertically with this pattern:
country | time | value
I want to transform it in a data frame, in which each row is a specific time period, and every column is the value relative to that country. Data are monthly.
time | countryA-value | countryB-value |countryC-value
Moreover, not all periods are present, when data is missing, the row is just absent, and not filled with NA or similar. I thought to two possible solutions, but they seem too complicated and inefficient. I do not write here the code,
If the value in a cell of the column "time" is more than one month after the cell above, while the cells to the left are the same (i.e. the data pertains to the same country), then we have a gap. I have to fill the gap and to this recursively until all missing dates are included.
At this point I have for each country the same number of observations, and I can simply copy a number of cells equal to the number of observations.
Drawbacks: it does not seem very efficient.
I could create a list of time periods using the command
allDates <- seq.Date(from = as.Date('2020-02-01'), to = as.Date('2021-01-01'), by = 'month')-1)
Then I look up the table about each period of allDates for each subset of the table of each country. If the value exist, copy the value, if there is not, fill with NA.
Drawbacks: I have no idea of which function I could use to this purpose.
Below the code to create a small table with two missing rows, namely data2
data <- data.frame(matrix(NA, 24, 3))
colnames(data) <- c("date", "country", "value")
data["date"] <- rep((seq.Date(from = as.Date('2020-02-01'), to = as.Date('2021-01-01'), by = 'month')-1), 2)
data["country"] <- rep(c("US", "CA"), each = 12)
data["value"] <- round(runif(24, 0, 1), 2)
data2 <- data[c(-4,-5),]
I solved the problem following the suggestion of r2evans, I checked the function dcast, and I obtained exactly what I wanted.
I used the code
reshape2::dcast(dataFrame, yearMonth ~ country, fill = NA)
Where dataFrame is the name of the data frame, yearMonth is the name of the column, in which the date is written, and country is the name of the column, in which the country is written.
The option fill=NA allowed to fill all gaps in the data with NA.
This question already has answers here:
Convert integer as "20160119" to different columns of "day" "year" "month"
(5 answers)
Closed 2 years ago.
I have a dataframe:
df <- data.frame(year = c(200501:200512))
and I want to split the column into two at the 4th number, so that my data frame look like this:
df_split <- data.frame(year = rep(c(2005)), month = c(01:12)).
This is not so much a question about data frames, but about vectors in R in general. If your actual problem contains more nuances, then update your question or post a new question.
If your vector year is numerical (as asked) you can do simple maths:
year0 <- 200501:200512
year <- as.integer(year0 / 100) # examine the result of `year0 / 100` to see why I use `as.integer` at all and not `round`.
month <- year0 - year
Edit: As nicola pointed out, we can calculate it in other ways, with exact same result:
year <- year0 %/% 100
month <- year0 %% 100
Alternatively, tidyr version may be more compact
library(tidyr)
df %>% separate(year, into = c("yr", "mth"), sep = 4, convert = TRUE)
This question already has answers here:
Adding a column of means by group to original data [duplicate]
(4 answers)
Closed 2 years ago.
I have a table with 2 columns.
Type: 1 or 2 or 3 or 4
Data: corresponding data (there are multiple data for each type)
Now I want to create a third column that contains means of data each type i.e., all the rows with type 1 have the same mean value. I think I should do it with mutate function but not sure how to proceed.
data %>% mutate(meanData = ifelse(...))
Can somebody help?
Thank you in advance.
We can do a group by operation
library(dplyr)
data <- data %>%
group_by(Type) %>%
mutate(meanData = mean(Data))
This question already has answers here:
Calculate the mean by group
(9 answers)
Closed 3 years ago.
I have data that includes a treatment group, which is indicated by a 1, and a control group, which is indicated by a 0. This is all contained under the variable treat_invite. How can I separate these and take the mean of pct_missing for the 1's and 0's? I've attached an image for clarification.
enter image description here
assuming your data frame is called df:
df <- df %>% group_by(treat_invite) %>% mutate(MeanPCTMissing = mean(PCT_missing))
Or, if you want to just have the summary table (rather than the original table with an additional column):
df <- df %>% group_by(treat_invite) %>% summarise(MeanPCTMissing =
mean(PCT_missing))
This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 4 years ago.
I have a CSV file where there are 4 columns. I would like to get answer by adding the 4th column values where the 3rd column values are same.
The data that i have looks like this:
Now i want to aggregate the above data like this:
Anyone can help me with your ideas!
Using aggregate would do the trick here. Below I'm summing the value variable using id as the group (notice ids 6 and 10 are repeating).
df <- data.frame(id = c(1,2,3,4,5,6,6,7,8,9,10,10),
value = c(9,5,6,8,4,3,2,5,3,5,1,2))
df_sum <- aggregate(value ~ id, data=df, FUN=sum)