How to use tapply to match specific condition

How to use tapply to match specific condition - r

I have accident dataset that contains number of accidents being reported. I am trying to use tapply function, that will display me the total number of accidents being reported on "Thursday". However, instead of returning number of accidents being reported for particular day. It is displaying total number of rows I have in my dataset.I am using below tapply function.:
tapply(myfinal$VEHICLE_COUNT,myfinal$DAY_OF_WEEK=='THURSDAY',length)
My sample dataset is as follows:
> dput(tail(myfinal,5))
structure(list(CASE_NUMBER = c("1251045636", "1251045630", "1251045591",
"1251045574", "1250010434"), BARRACK = c("Frederick", "Frederick",
"Frederick", "Frederick", "Jessup"), ACC_DATE = c("2012-12-31T00:00:00",
"2012-12-31T00:00:00", "2012-12-31T00:00:00", "2012-12-31T00:00:00",
"2012-12-31T00:00:00"), ACC_TIME = c("18:12", "18:12", "12:12",
"9:12", "11:12"), ACC_TIME_CODE = c("5", "5", "4", "3", "3"),
DAY_OF_WEEK = c("MONDAY ", "MONDAY ", "MONDAY ", "MONDAY ",
"MONDAY "), ROAD = c("IS 00070 EISENHOWER MEMOR HWY", "MD 00077 ROCKY RIDGE RD",
"MD 00085 BUCKEYSTOWN PIKE", "MD 00017 MYERSVILLE RD", "IS 00070 No Name"
), INTERSECT_ROAD = c("CO 00248 MONUMENT RD", "MD 00076 MOTTERS STATION RD",
"CO 00308 MANOR WOODS RD", "CO 00941 DAWN CT", "US 00029 Columbia Pike"
), DIST_FROM_INTERSECT = c("300", "0", "400", "500", "0.25"
), DIST_DIRECTION = c("E", "U", "S", "S", "E"), CITY_NAME = c("Not Applicable",
"Not Applicable", "Not Applicable", "Not Applicable", NA),
COUNTY_CODE = c("10", "10", "10", "10", "13"), COUNTY_NAME = c("Frederick",
"Frederick", "Frederick", "Frederick", "Howard"), VEHICLE_COUNT = c(1,
2, 2, 1, 2), PROP_DEST = c("NO", "YES", "YES", "NO", "NO"
), INJURY = c("YES", "NO", "NO", "YES", "YES"), COLLISION_WITH_1 = c("FIXED OBJ",
"VEH", "VEH", "NON-COLLISION", "VEH"), COLLISION_WITH_2 = c("OTHER-COLLISION",
"OTHER-COLLISION", "OTHER-COLLISION", "OTHER-COLLISION",
"OTHER-COLLISION")), .Names = c("CASE_NUMBER", "BARRACK",
"ACC_DATE", "ACC_TIME", "ACC_TIME_CODE", "DAY_OF_WEEK", "ROAD",
"INTERSECT_ROAD", "DIST_FROM_INTERSECT", "DIST_DIRECTION", "CITY_NAME",
"COUNTY_CODE", "COUNTY_NAME", "VEHICLE_COUNT", "PROP_DEST", "INJURY",
"COLLISION_WITH_1", "COLLISION_WITH_2"), row.names = 18634:18638, class = "data.frame")
Any suggestions on how to fix it! Thanks in advance!

If can only use tapply for whatever reason, then, building off of Maurits' answer, you should be able to do this:
tapply(myfinal$VEHICLE_COUNT,trimws(myfinal$DAY_OF_WEEK)=='THURSDAY',length)
Or similar. It seems that the strings in your DAY_OF_WEEK variable have a lot of whitespaces at the end. You either need to remove them (via trimws) or modify your comparison string to include these spaces (e.g., myfinal$DAY_OF_WEEK=="THURSDAY "). With the comparison operator, R will match two string only if they match exactly character by character, so any additional whitespaces in either string will count against you.

Base R solution is to subset DAY_OF_WEEK by "THURSDAY" and then return number of rows:
nrow(df[df$DAY_OF_WEEK == "THURSDAY",])

There is really no point in using tapply here!
Method 1
Use dplyr:
require(tidyverse);
df %>% filter(trimws(DAY_OF_WEEK) == "MONDAY") %>% summarise(count = n());
# count
#1 5
Method 2
In base R, use subset and table
table(subset(df, trimws(DAY_OF_WEEK) == "MONDAY")$DAY_OF_WEEK);
#MONDAY
# 5
I've used "MONDAY" here because you've got no entries with DAY_OF_WEEK = "THURSDAY".
Sample data
df <- structure(list(CASE_NUMBER = c("1251045636", "1251045630", "1251045591",
"1251045574", "1250010434"), BARRACK = c("Frederick", "Frederick",
"Frederick", "Frederick", "Jessup"), ACC_DATE = c("2012-12-31T00:00:00",
"2012-12-31T00:00:00", "2012-12-31T00:00:00", "2012-12-31T00:00:00",
"2012-12-31T00:00:00"), ACC_TIME = c("18:12", "18:12", "12:12",
"9:12", "11:12"), ACC_TIME_CODE = c("5", "5", "4", "3", "3"),
DAY_OF_WEEK = c("MONDAY ", "MONDAY ", "MONDAY ", "MONDAY ",
"MONDAY "), ROAD = c("IS 00070 EISENHOWER MEMOR HWY", "MD 00077 ROCKY RIDGE RD",
"MD 00085 BUCKEYSTOWN PIKE", "MD 00017 MYERSVILLE RD", "IS 00070 No Name"
), INTERSECT_ROAD = c("CO 00248 MONUMENT RD", "MD 00076 MOTTERS STATION RD",
"CO 00308 MANOR WOODS RD", "CO 00941 DAWN CT", "US 00029 Columbia Pike"
), DIST_FROM_INTERSECT = c("300", "0", "400", "500", "0.25"
), DIST_DIRECTION = c("E", "U", "S", "S", "E"), CITY_NAME = c("Not Applicable",
"Not Applicable", "Not Applicable", "Not Applicable", NA),
COUNTY_CODE = c("10", "10", "10", "10", "13"), COUNTY_NAME = c("Frederick",
"Frederick", "Frederick", "Frederick", "Howard"), VEHICLE_COUNT = c(1,
2, 2, 1, 2), PROP_DEST = c("NO", "YES", "YES", "NO", "NO"
), INJURY = c("YES", "NO", "NO", "YES", "YES"), COLLISION_WITH_1 = c("FIXED OBJ",
"VEH", "VEH", "NON-COLLISION", "VEH"), COLLISION_WITH_2 = c("OTHER-COLLISION",
"OTHER-COLLISION", "OTHER-COLLISION", "OTHER-COLLISION",
"OTHER-COLLISION")), .Names = c("CASE_NUMBER", "BARRACK",
"ACC_DATE", "ACC_TIME", "ACC_TIME_CODE", "DAY_OF_WEEK", "ROAD",
"INTERSECT_ROAD", "DIST_FROM_INTERSECT", "DIST_DIRECTION", "CITY_NAME",
"COUNTY_CODE", "COUNTY_NAME", "VEHICLE_COUNT", "PROP_DEST", "INJURY",
"COLLISION_WITH_1", "COLLISION_WITH_2"), row.names = 18634:18638, class = "data.frame")

Related

Is there a way to fuzzy match or provide a score as an assumption of what ID or Group the row value should be associated with?

I have a dataset that looks like this
structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), Date = c("2020-01-
\n04",
"2020-04-03", "2020-12-10", "2020-09-12", "2020-11-19", "2020-04- \n03",
"2020-06-03", "2020-05-03", "2020-08-09", "2020-10-10"), Name = c("Jon",
"Mike", "", "Rodney", "Jon", "Mike", "", "Ryan", "Ryan", "Ryan"
), Phone = c("555-555-5555", "123-456-7890", "123-456-7890",
"333-333-3333", "", "123-456-7890", "098-765-4321", "", "", "444-444-
\n4444"
), Email = c("Jon#gmail.com", "Mike#gmail.com", "Mike#gmail.com",
"Rodney#gmail.com", "", "", "", "Ryan#gmail.com", "", "Ryan2#gmail.com"
), Address = c("123 Main Street", "456 Washingto Avenue", "",
"16 Henderson St", "", "456 Washingto Avenue", "123 Lincoln Avenue",
"123 Lincoln Avenue", "", "156 Jefferson Street"), Group = c("1",
"2", "2", "3", "1", "2", "4", "4", "4", "5")), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
I want to get a dataset that looks like this (Note that the numbers in the score column are not exactly the ones I want. I just added numbers as place holders. I will allow the method to determine the proper score count. However 1 should refer to a perfect score.
structure(list(ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), Date = c("2020-01-
04", "2020-04-03", "2020-12-10", "2020-09-12", "2020-11-19", "2020-04-
03", "2020-06-03", "2020-05-03", "2020-08-09", "2020-10-10"), Name =
c("Jon", "Mike", "", "Rodney", "Jon", "Mike", "", "Ryan", "Ryan", "Ryan"
), Phone = c("555-555-5555", "123-456-7890", "123-456-7890",
"333-333-3333", "", "123-456-7890", "098-765-4321", "", "", "444-444-
4444"), Email = c("Jon#gmail.com", "Mike#gmail.com", "Mike#gmail.com",
"Rodney#gmail.com", "", "", "", "Ryan#gmail.com", "Ryan#gmail.com",
"Ryan2#gmail.com"), Address = c("123 Main Street", "456 Washingto
Avenue","", "16 Henderson St", "", "456 Washingto Avenue", "123 Lincoln
Avenue",
"123 Lincoln Avenue", "", "156 Jefferson Street"), Group = c("1",
"2", "2", "3", "1", "2", "4", "4", "4", "5"), Score = c("1",
"1", ".88", "1", ".96", ".96", "1", "1", ".25", "1")), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
The numbers in the column "score" are arbitrary. I'm ok with getting other numbers according to the rules of the fuzzy match process. The idea in my head is that based on the long dataset the script picks up that there are four groups. Those groups correspond to 1,2,3, and 4 which refer to Jon,Mike,Rodney, and Ryan. Note that the score for Ryan is .25 because it only includes his name rather than other information like phone or email. The score is relative within the group rather than relative to the whole dataset.
A completed set of
Col<-("Name","Phone","Email","Address")
should draw a one as in a perfect approximation without controversy. A set with 3 out of 4 should be higher than 2 out of 4 and so on. How to go about this process? Is this even possible?

converting multiple columns from wide to long using pivot_longer

I get an error message when I want to convert multiple columns from wide to long with pivot_longer
I have code which converts from wide to long with gather but I have to do this column by column. I want to use pivot_longer to gather multiple columns rather than column by column.
This is some input data:
structure(list(id = c("81", "83", "85", "88", "1", "2"), look_work = c("yes",
"yes", "yes", "yes", "yes", "yes"), current_work = c("no", "yes",
"no", "no", "no", "no"), before_work = c("no", "NULL", "yes",
"yes", "yes", "yes"), keen_move = c("yes", "yes", "no", "no",
"no", "no"), city_size = c("village", "more than 500k inhabitants",
"more than 500k inhabitants", "village", "city up to 20k inhabitants",
"100k - 199k inhabitants"), gender = c("male", "female", "female",
"male", "female", "male"), age = c("18 - 24 years", "18 - 24 years",
"more than 50 years", "18 - 24 years", "31 - 40 years", "more than 50 years"
), education = c("secondary", "vocational", "secondary", "secondary",
"secondary", "secondary"), hf1 = c("", "", "", "1", "1", "1"),
hf2 = c("", "1", "1", "", "", ""), hf3 = c("", "", "", "",
"", ""), hf4 = c("", "", "", "", "", ""), hf5 = c("", "",
"", "", "", ""), hf6 = c("", "", "", "", "", ""), ac1 = c("",
"", "", "", "", "1"), ac2 = c("", "1", "1", "", "1", ""),
ac3 = c("", "", "", "", "1", ""), ac4 = c("", "", "", "",
"", ""), ac5 = c("", "", "", "", "", ""), ac6 = c("", "",
"", "", "", ""), cs1 = c("", "", "", "", "", ""), cs2 = c("",
"1", "1", "", "1", ""), cs3 = c("", "", "", "", "", "1"),
cs4 = c("", "", "", "1", "", ""), cs5 = c("", "", "", "",
"", ""), cs6 = c("", "", "", "", "", ""), cs7 = c("", "",
"", "", "", ""), cs8 = c("", "", "", "", "", ""), se1 = c("",
"", "1", "1", "", ""), se2 = c("", "", "", "", "1", ""),
se3 = c("", "1", "", "", "1", "1"), se4 = c("", "", "", "",
"", ""), se5 = c("", "", "", "", "", ""), se6 = c("", "",
"", "", "", ""), se7 = c("", "", "", "", "", ""), se8 = c("",
"", "", "1", "", "")), row.names = c(NA, 6L), class = "data.frame")
The code using gather is:
df1 <- df %>%
gather(key = "hf_com", value = "hf_com_freq", hf_<:hf6) %>%
gather(key = "ac_com", value = "ac_com_freq", ac1:ac6) %>%
filter(substring(hf_com, 3) == substring(ac_com, 3))
df1 <- df1 %>%
gather(key = "curr_sal", value = "curr_sal_freq", cs1:cs8) %>%
gather(key = "exp_sal", value = "exp_sal_freq", se1:se8) %>%
filter(substring(curr_sal, 3) == substring(exp_sal, 3))
The code using pivot_longer is:
df_longer <- df %>%
pivot_longer(
cols = starts_with("hf"),
names_to = "hf_com",
values_to = "hf_freq",
names_prefix = "hf",
na.rm = TRUE)
The expected results which I get with gather are:
structure(list(id = c("81", "83", "85", "88", "1", "2"), look_work = c("yes",
"yes", "yes", "yes", "yes", "yes"), current_work = c("no", "yes",
"no", "no", "no", "no"), before_work = c("no", "NULL", "yes",
"yes", "yes", "yes"), keen_move = c("yes", "yes", "no", "no",
"no", "no"), city_size = c("village", "more than 500k inhabitants",
"more than 500k inhabitants", "village", "city up to 20k inhabitants",
"100k - 199k inhabitants"), gender = c("male", "female", "female",
"male", "female", "male"), age = c("18 - 24 years", "18 - 24 years",
"more than 50 years", "18 - 24 years", "31 - 40 years", "more than 50 years"
), education = c("secondary", "vocational", "secondary", "secondary",
"secondary", "secondary"), hf_com = c("hf1", "hf1", "hf1", "hf1",
"hf1", "hf1"), hf_com_freq = c("", "", "", "1", "1", "1"), ac_com = c("ac1",
"ac1", "ac1", "ac1", "ac1", "ac1"), ac_com_freq = c("", "", "",
"", "", "1"), curr_sal = c("cs1", "cs1", "cs1", "cs1", "cs1",
"cs1"), curr_sal_freq = c("", "", "", "", "", ""), exp_sal = c("se1",
"se1", "se1", "se1", "se1", "se1"), exp_sal_freq = c("", "",
"1", "1", "", "")), row.names = c(NA, 6L), class = "data.frame")
With pivot_longer, I get the following error message:
Error in pivot_longer(., cols = starts_with("hf"), names_to = "hf_com", :
unused argument (na.rm = TRUE)
Also, if there is no solution with pivot_longer, then a solution with data.table would be appreciated.

I have solved the problem:
This needs to be changed from:
df_longer <- df %>%
pivot_longer(
cols = starts_with("hf"),
names_to = "hf_com",
values_to = "hf_freq",
names_prefix = "hf",
na.rm = TRUE)
to:
df_longer <- df %>%
pivot_longer(
cols = starts_with("hf"),
names_to = "hf_com",
values_to = "hf_freq",
names_prefix = "hf",
values_drop_na = TRUE)

How to search for specific character that has space in the end using sqldf in R

I have a dataset that contains 18 columns and columns are related to an accident being reported. I am trying to find number of accidents that were reported during specific day of the week. For example: total number of accidents reported on Tuesday. For this I am using two variable available in my dataset:Day_of_Week and number_of_vehicle. I am running below sqldf query. However, it is showing me total number of accidents reported during entire week. I would like to use sqldf to report for particular day ex: monday. I would also like to add that there is a fair amount of space in Day_oF_WEEK column.See below for example
Day_OF_Week Number_of_vehicle
MONDAY(Space here) 50
Some sample dataset
> dput(head(myf,5))
structure(list(CASE_NUMBER = c("1363000002", "1296000023", "1283000016",
"1282000006", "1267000007"), BARRACK = c("Rockville", "Berlin",
"Prince Frederick", "Leonardtown", "Essex"), ACC_DATE = c("2012-01-01T00:00:00",
"2012-01-01T00:00:00", "2012-01-01T00:00:00", "2012-01-01T00:00:00",
"2012-01-01T00:00:00"), ACC_TIME = c("2:01", "18:01", "7:01",
"0:01", "1:01"), ACC_TIME_CODE = c("1", "5", "2", "1", "1"),
DAY_OF_WEEK = c("SUNDAY ", "SUNDAY ", "SUNDAY ", "SUNDAY ",
"SUNDAY "), ROAD = c("IS 00495 CAPITAL BELTWAY", "MD 00090 OCEAN CITY EXPWY",
"MD 00765 MAIN ST", "MD 00944 MERVELL DEAN RD", "IS 00695 BALTO BELTWAY"
), INTERSECT_ROAD = c("IS 00270 EISENHOWER MEMORIAL", "CO 00220 ST MARTINS NECK RD",
"CO 00208 DUKE ST", "MD 00235 THREE NOTCH RD", "IS 00083 HARRISBURG EXPWY"
), DIST_FROM_INTERSECT = c("0", "0.25", "100", "10", "100"
), DIST_DIRECTION = c("U", "W", "S", "E", "S"), CITY_NAME = c("Not Applicable",
"Not Applicable", "Not Applicable", "Not Applicable", "Not Applicable"
), COUNTY_CODE = c("15", "23", "4", "18", "3"), COUNTY_NAME = c("Montgomery",
"Worcester", "Calvert", "St. Marys", "Baltimore"), VEHICLE_COUNT = c("2",
"1", "1", "1", "2"), PROP_DEST = c("YES", "YES", "YES", "YES",
"YES"), INJURY = c("NO", "NO", "NO", "NO", "NO"), COLLISION_WITH_1 = c("VEH",
"FIXED OBJ", "FIXED OBJ", "FIXED OBJ", "VEH"), COLLISION_WITH_2 = c("OTHER-COLLISION",
"OTHER-COLLISION", "FIXED OBJ", "OTHER-COLLISION", "OTHER-COLLISION"
)), .Names = c("CASE_NUMBER", "BARRACK", "ACC_DATE", "ACC_TIME",
"ACC_TIME_CODE", "DAY_OF_WEEK", "ROAD", "INTERSECT_ROAD", "DIST_FROM_INTERSECT",
"DIST_DIRECTION", "CITY_NAME", "COUNTY_CODE", "COUNTY_NAME",
"VEHICLE_COUNT", "PROP_DEST", "INJURY", "COLLISION_WITH_1", "COLLISION_WITH_2"
), row.names = c(NA, 5L), class = "data.frame")
sqldf Code:
sqldf("select sum(Number_of_vehicle),DAY_OF_WEEK from accident group by DAY_OF_WEEK")
Any help is appreciated!
Thanks in advance!

Error in seq.default(from = min(x, na.rm = TRUE), to = max(x, na.rm = TRUE), : 'from' cannot be NA, NaN or infinite

I'm using R Learner in Knime. I want to discretize a matrix, which is the following:
> my_matrix= as(knime.in,"matrix");
> dput(head(my_matrix, 5))
structure(c("KS", "OH", "NJ", "OH", "OK", "128", "107", "137",
" 84", " 75", "415", "415", "415", "408", "415", "No", "No",
"No", "Yes", "Yes", "Yes", "Yes", "No", "No", "No", "25", "26",
" 0", " 0", " 0", "265.1", "161.6", "243.4", "299.4", "166.7",
"110", "123", "114", " 71", "113", "45.07", "27.47", "41.38",
"50.90", "28.34", "197.4", "195.5", "121.2", " 61.9", "148.3",
" 99", "103", "110", " 88", "122", "16.78", "16.62", "10.30",
" 5.26", "12.61", "244.7", "254.4", "162.6", "196.9", "186.9",
" 91", "103", "104", " 89", "121", "11.01", "11.45", " 7.32",
" 8.86", " 8.41", "10.0", "13.7", "12.2", " 6.6", "10.1", " 3",
" 3", " 5", " 7", " 3", "2.70", "3.70", "3.29", "1.78", "2.73",
"1", "1", "0", "2", "3", "False", "False", "False", "False",
"False"), .Dim = c(5L, 20L), .Dimnames = list(c("Row0", "Row1",
"Row2", "Row3", "Row4"), c("State", "Account length", "Area code",
"International plan", "Voice mail plan", "Number vmail messages",
"Total day minutes", "Total day calls", "Total day charge", "Total eve minutes",
"Total eve calls", "Total eve charge", "Total night minutes",
"Total night calls", "Total night charge", "Total intl minutes",
"Total intl calls", "Total intl charge", "Customer service calls",
"Churn")))
I'm using the following code to discretize the matrix:
require(arules)
#require(arulesViz)
my_matrix= as(knime.in,"matrix");
my_rows= nrow(my_matrix);
my_cols= ncol(my_matrix);
#discretize(x, method="interval", categories = 3, labels = NULL,
# ordered=FALSE, onlycuts=FALSE, ...)
typeof(my_matrix)
vector = my_matrix[,2]
my_matrix[,2] = discretize(vector, method="interval", categories = 3, labels=c("length0","length1","length2"))
my_matrix[,3] = ...
etc...
In corrispondence of the line of code:
my_matrix[,2] = discretize(vector, method="interval", categories = 3, labels=c("length0","length1","length2"))
I get the following error:
Error in seq.default(from = min(x, na.rm = TRUE), to = max(x, na.rm = TRUE), : 'from' cannot be NA, NaN or infinite
If I put "sum(is.na(vector)) here:
vector = my_matrix[,2]
sum(is.na(vector))
my_matrix[,2] = discretize(vector, method="interval", categories = 3, labels=c("length0","length1","length2"))
I get:
> sum(is.na(vector))
[1] 0
so I have no NA element in the vector. Anyway, typeof(matrix) is "character". If I print the vector, I get the following:
> vector = my_matrix[,2]
> sum(is.na(vector))
[1] 0
> head(vector, 20)
Row0 Row1 Row2 Row3 Row4 Row5 Row6 Row7 Row8 Row9 Row10 Row11 Row12
"128" "107" "137" " 84" " 75" "118" "121" "147" "117" "141" " 65" " 74" "168"
Row13 Row14 Row15 Row16 Row17 Row18 Row19
" 95" " 62" "161" " 85" " 93" " 76" " 73"

The problem is that you vector consists of strings. Ideally you solve this problem in knime. Nodes for this kind of conversions do exist.
However you can also replace
vector = my_matrix[,2]
by
vector = as.numeric(my_matrix[,2])

Filling data down to subsequent row

I have data that look similar to:
Alabama Age>50 Value1 Value2 Value3
Age<50 Value1 Value2 Value3
Alaska Age>50 Value1 Value2 Value3
Age<50 Value1 Value2 Value3
I only need to keep the data for Age<50. How can I repeat the state name to the row below it? I have created a string of the state names, but am unsure how to insert it into every other row in the first column.
The head of my data.frame is:
d <- structure(c("ALABAMA", "", "ALASKA", "", "ARIZONA", "", "Under 18",
"Total all ages", "Under 18", "Total all ages", "Under 18", "Total all ages",
"0", "1", "10", "87", "46", "303", "0", "0", "0", "36", "6", "855", "84,843",
"", "469,145", "", "6,303,555", ""), .Dim = c(6L, 5L), .Dimnames = list(NULL,
c("State", "", "Rape3", "Prostitution and\ncommercialized\nvice",
"2014\nestimated \npopulation")))

How's this:
df <- as.data.frame(structure(c("ALABAMA", "", "ALASKA", "", "ARIZONA", "", "Under 18", "Total all ages", "Under 18", "Total all ages", "Under 18", "Total all ages", "0", "1", "10", "87", "46", "303", "0", "0", "0", "36", "6", "855", "84,843", "", "469,145", "", "6,303,555", ""), .Dim = c(6L, 5L), .Dimnames = list(NULL, c("State", "", "Rape3", "Prostitution and\ncommercialized\nvice", "2014\nestimated \npopulation"))), stringsAsFactors = FALSE)
names(df)[5] <- "est_pop"
df$est_pop[df$est_pop == ""] <- NA
df$State[df$State == ""] <- NA
library(zoo)
df$State <- na.locf(df$State,na.rm = TRUE)
df$est_pop <- na.locf(df$est_pop,na.rm = TRUE)
df <- df[df$V2 == "Total all ages" , ]

Supposing you have a column header Age
Supposing your data is called MyDataFrame
You could use for example:
# Load required package zoo
if(library("zoo", logical.return=TRUE, quietly=TRUE, warn.conflicts = FALSE)==FALSE){
install.packages("zoo")
} else{require("zoo") }
MyDataFrame$Age<-na.locf(MyDataFrame$Age, na.rm=FALSE)
Hope it helps.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to use tapply to match specific condition - r

Base R solution is to subset DAY_OF_WEEK by "THURSDAY" and then return number of rows: nrow(df[df$DAY_OF_WEEK == "THURSDAY",])

Related

Is there a way to fuzzy match or provide a score as an assumption of what ID or Group the row value should be associated with?

converting multiple columns from wide to long using pivot_longer

How to search for specific character that has space in the end using sqldf in R

Error in seq.default(from = min(x, na.rm = TRUE), to = max(x, na.rm = TRUE), : 'from' cannot be NA, NaN or infinite

Filling data down to subsequent row

Categories

Resources