Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
Please, could anyone help me implement the calculation outlined below.
I'm using R in RStudio.
df <- data.frame(x = c(1,2,3,4,5,6,7,8,9,0,11,12,13,14,15,16,17,18,19,20),
total_fatal_injuries = c(1,0,5,4,0,27,10,15,6,2,10,4,0,0,1,0,3,0,1,0),
total_serious_injuries = c(10,0,9,3,2,4,9,9,0,8,3,1,0,8,2,7,5,4,0,2),
total_minor_injuries = c(10,0,9,3,2,4,9,9,0,8,3,1,0,8,2,7,5,4,0,3),
total_uninjuried = c(1,0,1,0,0,10,2,5,0,4,0,0,31,0,2,3,0,1,0,0),
injured_index = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))
In the data set above, each line represents an observation of the occurrence of accidents with vehicles.
Column 'x' is just an ID.
The same occurrence may have individuals with various levels of injury: fatal injuries, serious injuries, minor injuries and uninjured. The sum of the values of each column is equal to the number of individuals involved in the occurrence.
The goal is to populate the 'injured_index' column with a value that represents the severity of the occurrence, according to the values recorded in the other columns.
A numerical index that represents the severity of the occurrence, by which the data set can be ordered.
What would be the best formula for calculating the 'injured_index' column?
I would like someone to make a suggestion on how to calculate a value for an index that represents the level of how bad the occurrence is. Based on the total number of victims at each level, per occurrence.
The importance is simple to understand.
1) Fatal is bad
2) Serious is a bit less bad
3) Minor is not good
4) Uninjured is ideal.
How to put everything together mathematically and get an index that represents which occurrence is more or less serious than the other?
I know how to create the column and assign a value.
I just want the hint of how to calculate the value that will be stored.
I know this has more to do with math, but mathematicians in the Mathematics Stack Exchange refuse to answer because they think it does not have mathematics but programming. :/
Thank you all for trying!
Here's an approach.
# This counts how many people in each row, for columns 2 through 5
df$count <- rowSums(df[,2:5])
# This assigns a weighting to each severity of injury and divides by how
# many people in that row. Adjust the weights based on your judgment.
df$injured_index = (1000 * df$total_fatal_injuries + 200 *
df$total_serious_injuries + 20 * df$total_minor_injuries) / df$count
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
I want to replace each missing value in the first column of my dataframe with the previous one multiplied by a scalar (eg. 3)
nRowsDf <- nrow(df)
for(i in 1:nRowsDf){
df[i,1] =ifelse(is.na(df[i,1]), lag(df[i,1])+3*lag(df[i,1]), df[i,1])
}
The above code does not give me an error but does not do the job either.
In addition, is there a better way to do this instead of writing a loop?
Update and Data:
Here is an example of data. I want to replace each missing value in the first column of my dataframe with the previous one multiplied by a scalar (eg. 3). The NA values are in subsequent rows.
df <- mtcars
df[c(2,3,4,5),1] <-NA
IND <- is.na(df[,1])
df[IND,1] <- df[dplyr::lead(IND,1L, F),1] * 3
The last line of the above code does the job row by row (I should run it 4 times to fill the 4 missing rows). How can I do it once for all rows?
reproducible data which YOU should provide:
df <- mtcars
df[c(1,5,8),1] <-NA
code:
IND <- is.na(df[,1])
df[IND,1] <- df[dplyr::lag(IND,1L, F),1] * 3
since you use lag I use lag. You are saying "previous". So maybe you want to use lead.
What happens if the first value in lead case or last value in lag case is missing. (this remains a mystery)
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
so I want to count the number of times each value appears in a vector and to create a new vector, of EQUAL length, to bind it with the initial one. So my solution cannot be the table function as it returns only the unique values and the times they appear. I need in each row of the initial vector to append its number of appearences. I have found a solution but I got a big database with ~800k rows and it runs for like 10 min. Does anyone know how to perform this task more efficiently? I include an example.Thanks
df<-as.data.frame(sample(1:100, 800000, replace = T))
df[2]<-rep(1,nrow(df))
names(df)<-c("Numbers","Count")
df$Count<-pbapply(df,1,function(x) length(which(df$Numbers==df$Numbers[x])))
P.S. I have used to pbapply function to keep track of the progress.
This will do the trick:
df<-data.frame(Numbers=sample(1:100, 800000, replace = T))
Count <- ddply(df, .(Numbers), summarize, Count=length(Numbers)) #Unique values and how many times they appear
Indices<-match(df$Numbers, Count$Numbers) #Use match to add counts to data frame
df$Count <- Count$Count[Indices]
If you want a count of the number of each unique item in Number, this is straightforward in dplyr
library(dplyr)
set.seed(123)
df<-data.frame(Numbers = sample(1:100, 800000, replace = T))
df2 <- df %>%
group_by(Numbers) %>%
mutate(Count = n())
head(df2)
# Numbers Count
# 51 8146
# 49 7961
# 3 8090
# 63 8072
# 80 8017
# 80 8017
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I need to calculate the % change in values for Argentina for the entire column and store it in a data frame:
% change is 2nd value- 1st value/ 1st value *100
say if a column has
30
40
%change is 40-30/30 =33.33%
Pls read abot how to make a reproducible example
Supposing df is your data.frame. Using dplyr:
library(dplyr)
df %>%
mutate(change = (Argentina - lead(Argentina)) / Argentina * 100
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I've just started learning R. I wanted to know how can I find the lowest value in a column for a unique value in other column. For example, in this case I wanted to know the lowest avg price per year.
I have a data frame with about 7 columns, 2 of them being average price and year. The year is obviously recurrent ranges from 2000 to 2009. The data also has various NA's in different columns.
I have very less idea about running a loop or whatsoever in this regard.
Thank you :)
my data set looks something like this:
avgprice year
332 2002
NA 2009
5353 2004
1234 NA and so on.
To break down my problem to find first five lowest values from year 2000-2004.
s<-subset(tx.house.sales,na.rm=TRUE,select=c(avgprice,year)
s2<-subset(s,year==2000)
s3<-arrange(s2)
tail(s2,5)
I know the code fails miserably. I wanted to first subset my dataframe on the basis of year and avgprice. Then sort it for each year through 2000-2004. Arrange it and using tail() print the lowest five. However I also wanted to ignore the NAs
You could try
aggregate(averageprice~year, df1, FUN=min)
Update
If you need to get 5 lowest "averageprice" per "year"
library(dplyr)
df1 %>%
group_by(year) %>%
arrange(averageprice) %>%
slice(1:5)
Or you could use rank in place of arrange
df1 %>%
group_by(year) %>%
filter(rank(averageprice, ties.method='min') %in% 1:5)
This could be also done with aggregate, but the 2nd column will be a list
aggregate(averageprice~year, df1, FUN=function(x)
head(sort(x),5), na.action=na.pass)
data
set.seed(24)
df1 <- data.frame(year=sample(2002:2008, 50, replace=TRUE),
averageprice=sample(c(NA, 80:160), 50, replace=TRUE))