This question already has an answer here:
Find index of value in a sorted vector in R
(1 answer)
Closed 3 years ago.
I am trying to find out which two date values of a vector a specific date is between. I am not really sure how else to explain it, the example should help more.
## Vector of dates
temp <- seq(as.Date("2000/01/01"),as.Date("2003/01/01"),"years")
temp
[1] "2000/01/01" "2001/01/01" "2002/01/01" "2003/01/01"
date<- sample(seq(as.Date("2000/01/01"),as.Date("2003/01/01"),"days"),1)
date
This should be a random date, but just for the example let say that it is 2002/09/14. How can I go about having date look through temp and find the values that it is between, so for this example, the answer would be c("2002/01/01","2003/01/01").
I am basically looking for something that is the flip of the between function in dplyr.
You can use findInterval
set.seed(123)
date<-sample(seq(as.Date("2000/01/01"),as.Date("2003/01/01"),"days"),1)
date
#[1] "2001-02-18"
ind <- findInterval(date, temp)
c(temp[ind], temp[ind + 1])
#[1] "2001-01-01" "2002-01-01"
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Please, could anyone help me implement the calculation outlined below.
I'm using R in RStudio.
df <- data.frame(x = c(1,2,3,4,5,6,7,8,9,0,11,12,13,14,15,16,17,18,19,20),
total_fatal_injuries = c(1,0,5,4,0,27,10,15,6,2,10,4,0,0,1,0,3,0,1,0),
total_serious_injuries = c(10,0,9,3,2,4,9,9,0,8,3,1,0,8,2,7,5,4,0,2),
total_minor_injuries = c(10,0,9,3,2,4,9,9,0,8,3,1,0,8,2,7,5,4,0,3),
total_uninjuried = c(1,0,1,0,0,10,2,5,0,4,0,0,31,0,2,3,0,1,0,0),
injured_index = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))
In the data set above, each line represents an observation of the occurrence of accidents with vehicles.
Column 'x' is just an ID.
The same occurrence may have individuals with various levels of injury: fatal injuries, serious injuries, minor injuries and uninjured. The sum of the values of each column is equal to the number of individuals involved in the occurrence.
The goal is to populate the 'injured_index' column with a value that represents the severity of the occurrence, according to the values recorded in the other columns.
A numerical index that represents the severity of the occurrence, by which the data set can be ordered.
What would be the best formula for calculating the 'injured_index' column?
I would like someone to make a suggestion on how to calculate a value for an index that represents the level of how bad the occurrence is. Based on the total number of victims at each level, per occurrence.
The importance is simple to understand.
1) Fatal is bad
2) Serious is a bit less bad
3) Minor is not good
4) Uninjured is ideal.
How to put everything together mathematically and get an index that represents which occurrence is more or less serious than the other?
I know how to create the column and assign a value.
I just want the hint of how to calculate the value that will be stored.
I know this has more to do with math, but mathematicians in the Mathematics Stack Exchange refuse to answer because they think it does not have mathematics but programming. :/
Thank you all for trying!
Here's an approach.
# This counts how many people in each row, for columns 2 through 5
df$count <- rowSums(df[,2:5])
# This assigns a weighting to each severity of injury and divides by how
# many people in that row. Adjust the weights based on your judgment.
df$injured_index = (1000 * df$total_fatal_injuries + 200 *
df$total_serious_injuries + 20 * df$total_minor_injuries) / df$count
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I want to replace each missing value in the first column of my dataframe with the previous one multiplied by a scalar (eg. 3)
nRowsDf <- nrow(df)
for(i in 1:nRowsDf){
df[i,1] =ifelse(is.na(df[i,1]), lag(df[i,1])+3*lag(df[i,1]), df[i,1])
}
The above code does not give me an error but does not do the job either.
In addition, is there a better way to do this instead of writing a loop?
Update and Data:
Here is an example of data. I want to replace each missing value in the first column of my dataframe with the previous one multiplied by a scalar (eg. 3). The NA values are in subsequent rows.
df <- mtcars
df[c(2,3,4,5),1] <-NA
IND <- is.na(df[,1])
df[IND,1] <- df[dplyr::lead(IND,1L, F),1] * 3
The last line of the above code does the job row by row (I should run it 4 times to fill the 4 missing rows). How can I do it once for all rows?
reproducible data which YOU should provide:
df <- mtcars
df[c(1,5,8),1] <-NA
code:
IND <- is.na(df[,1])
df[IND,1] <- df[dplyr::lag(IND,1L, F),1] * 3
since you use lag I use lag. You are saying "previous". So maybe you want to use lead.
What happens if the first value in lead case or last value in lag case is missing. (this remains a mystery)
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I'm trying to subset a data set to remove all values before the 7th month of the year 2011. I have Years and Months in different columns.
What I am doing I know is logically wrong(also getting a wrong output), but can't seem to figure out the right way to do this:
state_in2_check <- subset(state_in2, Month > 6 & Year > 2011)
#thelatemail has given you a workable solution in the comments. Your problem is that You're asking R to match two logical checks separately, but each of those checks is dependant on the other. You won't, for example, get any "January" dates (because you're only accepting months greater than 6), even though "Jan-2013" would be fine. #thelatemail's solution separates the checks, such that months lower than 6 will be accepted, as long as they're in years greater than 2011.
Another way would be to convert to date at the same time as subsetting, this way the process is a little more logical:
Month <- 7
Year <- 2011
as.Date( paste( Year, Month, 15, sep = "-" ) )
[1] "2011-07-15"
You can use that simple conversion to subset in a more (in my opinion) logical way:
state_in2_check <- subset(state_in2,
as.Date( paste( Year, Month, 15, sep = "-" ) ) >
as.Date( "2011-06-15" )
)
Note I've made the day of the month the same in both date conversions, which will mean they're compared only according to month/year.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
My sample data is as follows.
There are event, datetime and ten_minute.
The format of datetime is "POSIXlt" "POSIXt". Ten minute is just a substring of the first digit of minute in datetime variable.
I’d like to generate multiple rolling dataset using R. For example, Data_1 have rows with ten_minute value of 0, 1, 2. Data_2 have rows with ten_minute value of 1, 2, 3. (And finally, Data_n would have value of 3, 4, 5.) I also want to change the width of window. In this example the width of window is 3. I want to change the width to 5, 10 and etc.
I've tried R coding myself over a week. But I can't figure it out how to do this.
First, a function to generate the windows you need:
generate.windows <- function(vec.start, vec.numberofsets, vec.wdith) {
vec.sets <- vec.start:vec.numberofsets
lapply(vec.sets, function(n) {
seq(from = n, length.out = vec.wdith)})}
Next, extract the data.frames that correspond to each window:
# Assume your original data set is called df.data
list.windows <- generate.windows(1, 10, 3)
list.data.frames <- lapply(list.windows, function(n) {df.data[df.data[,"Ten_minute"] %in% n,]}