Filling data down to subsequent row

Filling data down to subsequent row - r

I have data that look similar to:
Alabama Age>50 Value1 Value2 Value3
Age<50 Value1 Value2 Value3
Alaska Age>50 Value1 Value2 Value3
Age<50 Value1 Value2 Value3
I only need to keep the data for Age<50. How can I repeat the state name to the row below it? I have created a string of the state names, but am unsure how to insert it into every other row in the first column.
The head of my data.frame is:
d <- structure(c("ALABAMA", "", "ALASKA", "", "ARIZONA", "", "Under 18",
"Total all ages", "Under 18", "Total all ages", "Under 18", "Total all ages",
"0", "1", "10", "87", "46", "303", "0", "0", "0", "36", "6", "855", "84,843",
"", "469,145", "", "6,303,555", ""), .Dim = c(6L, 5L), .Dimnames = list(NULL,
c("State", "", "Rape3", "Prostitution and\ncommercialized\nvice",
"2014\nestimated \npopulation")))

How's this:
df <- as.data.frame(structure(c("ALABAMA", "", "ALASKA", "", "ARIZONA", "", "Under 18", "Total all ages", "Under 18", "Total all ages", "Under 18", "Total all ages", "0", "1", "10", "87", "46", "303", "0", "0", "0", "36", "6", "855", "84,843", "", "469,145", "", "6,303,555", ""), .Dim = c(6L, 5L), .Dimnames = list(NULL, c("State", "", "Rape3", "Prostitution and\ncommercialized\nvice", "2014\nestimated \npopulation"))), stringsAsFactors = FALSE)
names(df)[5] <- "est_pop"
df$est_pop[df$est_pop == ""] <- NA
df$State[df$State == ""] <- NA
library(zoo)
df$State <- na.locf(df$State,na.rm = TRUE)
df$est_pop <- na.locf(df$est_pop,na.rm = TRUE)
df <- df[df$V2 == "Total all ages" , ]

Supposing you have a column header Age
Supposing your data is called MyDataFrame
You could use for example:
# Load required package zoo
if(library("zoo", logical.return=TRUE, quietly=TRUE, warn.conflicts = FALSE)==FALSE){
install.packages("zoo")
} else{require("zoo") }
MyDataFrame$Age<-na.locf(MyDataFrame$Age, na.rm=FALSE)
Hope it helps.

Related

How to wrangle the dataset in R: reshaping and creating new columns with given information

I have a dataset that looks like below,
structure(list(nonyeasted_19 = c("Force (N)", "0", "-0.0077",
"0.0023", "-0.0707", "-0.2155", "-0.2026", "-0.0628", "-0.0481",
"-0.0601", "0.0302", "0.0475", "-0.0176", "0.008", "0.0569",
"0.0242", "0.0003", "0.0295", "0.028", "-0.0221", "-0.0333",
"0.0034", "0.004", "-0.0219", "-0.0216", "-0.0261"), nonyeasted_19.1 = c("Distance (m)",
"0", "0", "0", "0", "0", "0", "0.000002", "0.000004", "0.000006",
"0.000008", "0.00001", "0.000012", "0.000014", "0.000016", "0.000018",
"0.00002", "0.000022", "0.000024", "0.000026", "0.000028", "0.00003",
"0.000032", "0.000034", "0.000036", "0.000038"), nonyeasted_19.2 = c("Time (sec)",
"0", "0.002", "0.004", "0.006", "0.008", "0.01", "0.012", "0.014",
"0.016", "0.018", "0.02", "0.022", "0.024", "0.026", "0.028",
"0.03", "0.032", "0.034", "0.036", "0.038", "0.04", "0.042",
"0.044", "0.046", "0.048"), nonyeasted_19.3 = c("Status", "101",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"), yeasted_01 = c("Force (N)",
"0", "0.0024", "0.0307", "-0.0487", "-0.2063", "-0.1928", "-0.0421",
"-0.0278", "-0.0586", "0.0251", "0.0373", "-0.0084", "0.0597",
"0.091", "0.0246", "0.0318", "", "", "", "", "", "", "", "",
""), yeasted_01.1 = c("Distance (m)", "0", "0", "0", "0", "0",
"0", "0", "0.000001", "0.000003", "0.000005", "0.000007", "0.000009",
"0.000011", "0.000013", "0.000015", "0.000017", "", "", "", "",
"", "", "", "", ""), yeasted_01.2 = c("Time (sec)", "0", "0.002",
"0.004", "0.006", "0.008", "0.01", "0.012", "0.014", "0.016",
"0.018", "0.02", "0.022", "0.024", "0.026", "0.028", "0.03",
"", "", "", "", "", "", "", "", ""), yeasted_01.3 = c("Status",
"101", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "", "", "", "", "", "", "", "", "")), class = "data.frame", row.names = c(NA,
-26L))
Every four columns are in one group, and the group names are in the first row, while the column names are in the second row. I wonder whether there are any ways to concatenate the groups vertically and create two new columns with the group name row, where column 1 contains the text before the underscore and column 2 contains the text after the underscore.
I tried to use tidyverse, but after read.csv(), the variable names could not be preserved.

one approach:
sample data (example_data.csv):
group_A,group_A,group_B,group_B
var_1,var_2,var_3,var_4
143,897,234,382
Code:
library(readr) ## for the read_lines function
library(tidyr) ## wrangling (pivoting etc.)
## read csv but skip first line (containing group names):
df <- read.csv('path/to/example_data.csv',skip = 1)
## read first line of csv and convert it to vector of group names:
group_names <- read_lines('path/to/example_data.csv', n_max = 1) %>%
strsplit(',') %>% unlist
## change names of dataframe df to: variable_name;group_name
names(df) <- paste(group_names, names(df), sep = ';')
## wrangle data (for documentation see https://tidyr.tidyverse.org/ )
df %>%
pivot_longer(everything(), names_to = 'group_var', values_to = 'value') %>%
separate(group_var, into = c('group', 'var'), sep = ';') %>%
separate(group, into = c('yeasted_status', 'index'), sep='_') %>%
pivot_wider(names_from = var, values_from = value)
Result:
## A tibble: 2 x 6
# yeasted_status index var_1 var_2 var3 var4
# <chr> <chr> <int> <int> <int> <int>
# 1 group A 143 897 NA NA
# 2 group B NA NA 234 382
edit
or, if df is the dataframe derived from your dput output:
df[-1,] %>%
pivot_longer(everything(),names_to = 'group_var', values_to = 'value') %>% head %>%
mutate(ID = paste(row_number(),group_var)) %>%
separate(group_var, into = c('group', 'var'), sep = ';') %>%
separate(group, into = c('yeasted_status', 'index'), sep='_') %>%
mutate(value = as.double(value)) %>%
pivot_wider(id_cols = c(ID, yeasted_status,index), names_from = var, values_from = value) %>%
select(-ID)

converting multiple columns from wide to long using pivot_longer

I get an error message when I want to convert multiple columns from wide to long with pivot_longer
I have code which converts from wide to long with gather but I have to do this column by column. I want to use pivot_longer to gather multiple columns rather than column by column.
This is some input data:
structure(list(id = c("81", "83", "85", "88", "1", "2"), look_work = c("yes",
"yes", "yes", "yes", "yes", "yes"), current_work = c("no", "yes",
"no", "no", "no", "no"), before_work = c("no", "NULL", "yes",
"yes", "yes", "yes"), keen_move = c("yes", "yes", "no", "no",
"no", "no"), city_size = c("village", "more than 500k inhabitants",
"more than 500k inhabitants", "village", "city up to 20k inhabitants",
"100k - 199k inhabitants"), gender = c("male", "female", "female",
"male", "female", "male"), age = c("18 - 24 years", "18 - 24 years",
"more than 50 years", "18 - 24 years", "31 - 40 years", "more than 50 years"
), education = c("secondary", "vocational", "secondary", "secondary",
"secondary", "secondary"), hf1 = c("", "", "", "1", "1", "1"),
hf2 = c("", "1", "1", "", "", ""), hf3 = c("", "", "", "",
"", ""), hf4 = c("", "", "", "", "", ""), hf5 = c("", "",
"", "", "", ""), hf6 = c("", "", "", "", "", ""), ac1 = c("",
"", "", "", "", "1"), ac2 = c("", "1", "1", "", "1", ""),
ac3 = c("", "", "", "", "1", ""), ac4 = c("", "", "", "",
"", ""), ac5 = c("", "", "", "", "", ""), ac6 = c("", "",
"", "", "", ""), cs1 = c("", "", "", "", "", ""), cs2 = c("",
"1", "1", "", "1", ""), cs3 = c("", "", "", "", "", "1"),
cs4 = c("", "", "", "1", "", ""), cs5 = c("", "", "", "",
"", ""), cs6 = c("", "", "", "", "", ""), cs7 = c("", "",
"", "", "", ""), cs8 = c("", "", "", "", "", ""), se1 = c("",
"", "1", "1", "", ""), se2 = c("", "", "", "", "1", ""),
se3 = c("", "1", "", "", "1", "1"), se4 = c("", "", "", "",
"", ""), se5 = c("", "", "", "", "", ""), se6 = c("", "",
"", "", "", ""), se7 = c("", "", "", "", "", ""), se8 = c("",
"", "", "1", "", "")), row.names = c(NA, 6L), class = "data.frame")
The code using gather is:
df1 <- df %>%
gather(key = "hf_com", value = "hf_com_freq", hf_<:hf6) %>%
gather(key = "ac_com", value = "ac_com_freq", ac1:ac6) %>%
filter(substring(hf_com, 3) == substring(ac_com, 3))
df1 <- df1 %>%
gather(key = "curr_sal", value = "curr_sal_freq", cs1:cs8) %>%
gather(key = "exp_sal", value = "exp_sal_freq", se1:se8) %>%
filter(substring(curr_sal, 3) == substring(exp_sal, 3))
The code using pivot_longer is:
df_longer <- df %>%
pivot_longer(
cols = starts_with("hf"),
names_to = "hf_com",
values_to = "hf_freq",
names_prefix = "hf",
na.rm = TRUE)
The expected results which I get with gather are:
structure(list(id = c("81", "83", "85", "88", "1", "2"), look_work = c("yes",
"yes", "yes", "yes", "yes", "yes"), current_work = c("no", "yes",
"no", "no", "no", "no"), before_work = c("no", "NULL", "yes",
"yes", "yes", "yes"), keen_move = c("yes", "yes", "no", "no",
"no", "no"), city_size = c("village", "more than 500k inhabitants",
"more than 500k inhabitants", "village", "city up to 20k inhabitants",
"100k - 199k inhabitants"), gender = c("male", "female", "female",
"male", "female", "male"), age = c("18 - 24 years", "18 - 24 years",
"more than 50 years", "18 - 24 years", "31 - 40 years", "more than 50 years"
), education = c("secondary", "vocational", "secondary", "secondary",
"secondary", "secondary"), hf_com = c("hf1", "hf1", "hf1", "hf1",
"hf1", "hf1"), hf_com_freq = c("", "", "", "1", "1", "1"), ac_com = c("ac1",
"ac1", "ac1", "ac1", "ac1", "ac1"), ac_com_freq = c("", "", "",
"", "", "1"), curr_sal = c("cs1", "cs1", "cs1", "cs1", "cs1",
"cs1"), curr_sal_freq = c("", "", "", "", "", ""), exp_sal = c("se1",
"se1", "se1", "se1", "se1", "se1"), exp_sal_freq = c("", "",
"1", "1", "", "")), row.names = c(NA, 6L), class = "data.frame")
With pivot_longer, I get the following error message:
Error in pivot_longer(., cols = starts_with("hf"), names_to = "hf_com", :
unused argument (na.rm = TRUE)
Also, if there is no solution with pivot_longer, then a solution with data.table would be appreciated.

I have solved the problem:
This needs to be changed from:
df_longer <- df %>%
pivot_longer(
cols = starts_with("hf"),
names_to = "hf_com",
values_to = "hf_freq",
names_prefix = "hf",
na.rm = TRUE)
to:
df_longer <- df %>%
pivot_longer(
cols = starts_with("hf"),
names_to = "hf_com",
values_to = "hf_freq",
names_prefix = "hf",
values_drop_na = TRUE)

How to use tapply to match specific condition

I have accident dataset that contains number of accidents being reported. I am trying to use tapply function, that will display me the total number of accidents being reported on "Thursday". However, instead of returning number of accidents being reported for particular day. It is displaying total number of rows I have in my dataset.I am using below tapply function.:
tapply(myfinal$VEHICLE_COUNT,myfinal$DAY_OF_WEEK=='THURSDAY',length)
My sample dataset is as follows:
> dput(tail(myfinal,5))
structure(list(CASE_NUMBER = c("1251045636", "1251045630", "1251045591",
"1251045574", "1250010434"), BARRACK = c("Frederick", "Frederick",
"Frederick", "Frederick", "Jessup"), ACC_DATE = c("2012-12-31T00:00:00",
"2012-12-31T00:00:00", "2012-12-31T00:00:00", "2012-12-31T00:00:00",
"2012-12-31T00:00:00"), ACC_TIME = c("18:12", "18:12", "12:12",
"9:12", "11:12"), ACC_TIME_CODE = c("5", "5", "4", "3", "3"),
DAY_OF_WEEK = c("MONDAY ", "MONDAY ", "MONDAY ", "MONDAY ",
"MONDAY "), ROAD = c("IS 00070 EISENHOWER MEMOR HWY", "MD 00077 ROCKY RIDGE RD",
"MD 00085 BUCKEYSTOWN PIKE", "MD 00017 MYERSVILLE RD", "IS 00070 No Name"
), INTERSECT_ROAD = c("CO 00248 MONUMENT RD", "MD 00076 MOTTERS STATION RD",
"CO 00308 MANOR WOODS RD", "CO 00941 DAWN CT", "US 00029 Columbia Pike"
), DIST_FROM_INTERSECT = c("300", "0", "400", "500", "0.25"
), DIST_DIRECTION = c("E", "U", "S", "S", "E"), CITY_NAME = c("Not Applicable",
"Not Applicable", "Not Applicable", "Not Applicable", NA),
COUNTY_CODE = c("10", "10", "10", "10", "13"), COUNTY_NAME = c("Frederick",
"Frederick", "Frederick", "Frederick", "Howard"), VEHICLE_COUNT = c(1,
2, 2, 1, 2), PROP_DEST = c("NO", "YES", "YES", "NO", "NO"
), INJURY = c("YES", "NO", "NO", "YES", "YES"), COLLISION_WITH_1 = c("FIXED OBJ",
"VEH", "VEH", "NON-COLLISION", "VEH"), COLLISION_WITH_2 = c("OTHER-COLLISION",
"OTHER-COLLISION", "OTHER-COLLISION", "OTHER-COLLISION",
"OTHER-COLLISION")), .Names = c("CASE_NUMBER", "BARRACK",
"ACC_DATE", "ACC_TIME", "ACC_TIME_CODE", "DAY_OF_WEEK", "ROAD",
"INTERSECT_ROAD", "DIST_FROM_INTERSECT", "DIST_DIRECTION", "CITY_NAME",
"COUNTY_CODE", "COUNTY_NAME", "VEHICLE_COUNT", "PROP_DEST", "INJURY",
"COLLISION_WITH_1", "COLLISION_WITH_2"), row.names = 18634:18638, class = "data.frame")
Any suggestions on how to fix it! Thanks in advance!

If can only use tapply for whatever reason, then, building off of Maurits' answer, you should be able to do this:
tapply(myfinal$VEHICLE_COUNT,trimws(myfinal$DAY_OF_WEEK)=='THURSDAY',length)
Or similar. It seems that the strings in your DAY_OF_WEEK variable have a lot of whitespaces at the end. You either need to remove them (via trimws) or modify your comparison string to include these spaces (e.g., myfinal$DAY_OF_WEEK=="THURSDAY "). With the comparison operator, R will match two string only if they match exactly character by character, so any additional whitespaces in either string will count against you.

Base R solution is to subset DAY_OF_WEEK by "THURSDAY" and then return number of rows:
nrow(df[df$DAY_OF_WEEK == "THURSDAY",])

There is really no point in using tapply here!
Method 1
Use dplyr:
require(tidyverse);
df %>% filter(trimws(DAY_OF_WEEK) == "MONDAY") %>% summarise(count = n());
# count
#1 5
Method 2
In base R, use subset and table
table(subset(df, trimws(DAY_OF_WEEK) == "MONDAY")$DAY_OF_WEEK);
#MONDAY
# 5
I've used "MONDAY" here because you've got no entries with DAY_OF_WEEK = "THURSDAY".
Sample data
df <- structure(list(CASE_NUMBER = c("1251045636", "1251045630", "1251045591",
"1251045574", "1250010434"), BARRACK = c("Frederick", "Frederick",
"Frederick", "Frederick", "Jessup"), ACC_DATE = c("2012-12-31T00:00:00",
"2012-12-31T00:00:00", "2012-12-31T00:00:00", "2012-12-31T00:00:00",
"2012-12-31T00:00:00"), ACC_TIME = c("18:12", "18:12", "12:12",
"9:12", "11:12"), ACC_TIME_CODE = c("5", "5", "4", "3", "3"),
DAY_OF_WEEK = c("MONDAY ", "MONDAY ", "MONDAY ", "MONDAY ",
"MONDAY "), ROAD = c("IS 00070 EISENHOWER MEMOR HWY", "MD 00077 ROCKY RIDGE RD",
"MD 00085 BUCKEYSTOWN PIKE", "MD 00017 MYERSVILLE RD", "IS 00070 No Name"
), INTERSECT_ROAD = c("CO 00248 MONUMENT RD", "MD 00076 MOTTERS STATION RD",
"CO 00308 MANOR WOODS RD", "CO 00941 DAWN CT", "US 00029 Columbia Pike"
), DIST_FROM_INTERSECT = c("300", "0", "400", "500", "0.25"
), DIST_DIRECTION = c("E", "U", "S", "S", "E"), CITY_NAME = c("Not Applicable",
"Not Applicable", "Not Applicable", "Not Applicable", NA),
COUNTY_CODE = c("10", "10", "10", "10", "13"), COUNTY_NAME = c("Frederick",
"Frederick", "Frederick", "Frederick", "Howard"), VEHICLE_COUNT = c(1,
2, 2, 1, 2), PROP_DEST = c("NO", "YES", "YES", "NO", "NO"
), INJURY = c("YES", "NO", "NO", "YES", "YES"), COLLISION_WITH_1 = c("FIXED OBJ",
"VEH", "VEH", "NON-COLLISION", "VEH"), COLLISION_WITH_2 = c("OTHER-COLLISION",
"OTHER-COLLISION", "OTHER-COLLISION", "OTHER-COLLISION",
"OTHER-COLLISION")), .Names = c("CASE_NUMBER", "BARRACK",
"ACC_DATE", "ACC_TIME", "ACC_TIME_CODE", "DAY_OF_WEEK", "ROAD",
"INTERSECT_ROAD", "DIST_FROM_INTERSECT", "DIST_DIRECTION", "CITY_NAME",
"COUNTY_CODE", "COUNTY_NAME", "VEHICLE_COUNT", "PROP_DEST", "INJURY",
"COLLISION_WITH_1", "COLLISION_WITH_2"), row.names = 18634:18638, class = "data.frame")

How to search for specific character that has space in the end using sqldf in R

I have a dataset that contains 18 columns and columns are related to an accident being reported. I am trying to find number of accidents that were reported during specific day of the week. For example: total number of accidents reported on Tuesday. For this I am using two variable available in my dataset:Day_of_Week and number_of_vehicle. I am running below sqldf query. However, it is showing me total number of accidents reported during entire week. I would like to use sqldf to report for particular day ex: monday. I would also like to add that there is a fair amount of space in Day_oF_WEEK column.See below for example
Day_OF_Week Number_of_vehicle
MONDAY(Space here) 50
Some sample dataset
> dput(head(myf,5))
structure(list(CASE_NUMBER = c("1363000002", "1296000023", "1283000016",
"1282000006", "1267000007"), BARRACK = c("Rockville", "Berlin",
"Prince Frederick", "Leonardtown", "Essex"), ACC_DATE = c("2012-01-01T00:00:00",
"2012-01-01T00:00:00", "2012-01-01T00:00:00", "2012-01-01T00:00:00",
"2012-01-01T00:00:00"), ACC_TIME = c("2:01", "18:01", "7:01",
"0:01", "1:01"), ACC_TIME_CODE = c("1", "5", "2", "1", "1"),
DAY_OF_WEEK = c("SUNDAY ", "SUNDAY ", "SUNDAY ", "SUNDAY ",
"SUNDAY "), ROAD = c("IS 00495 CAPITAL BELTWAY", "MD 00090 OCEAN CITY EXPWY",
"MD 00765 MAIN ST", "MD 00944 MERVELL DEAN RD", "IS 00695 BALTO BELTWAY"
), INTERSECT_ROAD = c("IS 00270 EISENHOWER MEMORIAL", "CO 00220 ST MARTINS NECK RD",
"CO 00208 DUKE ST", "MD 00235 THREE NOTCH RD", "IS 00083 HARRISBURG EXPWY"
), DIST_FROM_INTERSECT = c("0", "0.25", "100", "10", "100"
), DIST_DIRECTION = c("U", "W", "S", "E", "S"), CITY_NAME = c("Not Applicable",
"Not Applicable", "Not Applicable", "Not Applicable", "Not Applicable"
), COUNTY_CODE = c("15", "23", "4", "18", "3"), COUNTY_NAME = c("Montgomery",
"Worcester", "Calvert", "St. Marys", "Baltimore"), VEHICLE_COUNT = c("2",
"1", "1", "1", "2"), PROP_DEST = c("YES", "YES", "YES", "YES",
"YES"), INJURY = c("NO", "NO", "NO", "NO", "NO"), COLLISION_WITH_1 = c("VEH",
"FIXED OBJ", "FIXED OBJ", "FIXED OBJ", "VEH"), COLLISION_WITH_2 = c("OTHER-COLLISION",
"OTHER-COLLISION", "FIXED OBJ", "OTHER-COLLISION", "OTHER-COLLISION"
)), .Names = c("CASE_NUMBER", "BARRACK", "ACC_DATE", "ACC_TIME",
"ACC_TIME_CODE", "DAY_OF_WEEK", "ROAD", "INTERSECT_ROAD", "DIST_FROM_INTERSECT",
"DIST_DIRECTION", "CITY_NAME", "COUNTY_CODE", "COUNTY_NAME",
"VEHICLE_COUNT", "PROP_DEST", "INJURY", "COLLISION_WITH_1", "COLLISION_WITH_2"
), row.names = c(NA, 5L), class = "data.frame")
sqldf Code:
sqldf("select sum(Number_of_vehicle),DAY_OF_WEEK from accident group by DAY_OF_WEEK")
Any help is appreciated!
Thanks in advance!

nested data.frame [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have a nested data.frame
dput(res)
structure(list(date = structure(list(pretty = "12:00 PM CDT on August 14, 2015",
year = "2015", mon = "08", mday = "14", hour = "12", min = "00",
tzname = "America/Chicago"), .Names = c("pretty", "year",
"mon", "mday", "hour", "min", "tzname"), class = "data.frame", row.names = 1L),
fog = "0", rain = "1", snow = "0", snowfallm = "0.00", snowfalli = "0.00",
monthtodatesnowfallm = "", monthtodatesnowfalli = "", since1julsnowfallm = "",
since1julsnowfalli = "", snowdepthm = "", snowdepthi = "",
hail = "0", thunder = "0", tornado = "0", meantempm = "26",
meantempi = "79", meandewptm = "17", meandewpti = "63", meanpressurem = "1019",
meanpressurei = "30.09", meanwindspdm = "11", meanwindspdi = "7",
meanwdire = "", meanwdird = "139", meanvism = "16", meanvisi = "10",
humidity = "", maxtempm = "32", maxtempi = "90", mintempm = "21",
mintempi = "69", maxhumidity = "86", minhumidity = "36",
maxdewptm = "18", maxdewpti = "65", mindewptm = "15", mindewpti = "59",
maxpressurem = "1021", maxpressurei = "30.15", minpressurem = "1017",
minpressurei = "30.04", maxwspdm = "19", maxwspdi = "12",
minwspdm = "0", minwspdi = "0", maxvism = "16", maxvisi = "10",
minvism = "16", minvisi = "10", gdegreedays = "29", heatingdegreedays = "0",
coolingdegreedays = "14", precipm = "0.00", precipi = "0.00",
precipsource = "", heatingdegreedaysnormal = "", monthtodateheatingdegreedays = "",
monthtodateheatingdegreedaysnormal = "", since1sepheatingdegreedays = "",
since1sepheatingdegreedaysnormal = "", since1julheatingdegreedays = "",
since1julheatingdegreedaysnormal = "", coolingdegreedaysnormal = "",
monthtodatecoolingdegreedays = "", monthtodatecoolingdegreedaysnormal = "",
since1sepcoolingdegreedays = "", since1sepcoolingdegreedaysnormal = "",
since1jancoolingdegreedays = "", since1jancoolingdegreedaysnormal = ""), .Names = c("date",
"fog", "rain", "snow", "snowfallm", "snowfalli", "monthtodatesnowfallm",
"monthtodatesnowfalli", "since1julsnowfallm", "since1julsnowfalli",
"snowdepthm", "snowdepthi", "hail", "thunder", "tornado", "meantempm",
"meantempi", "meandewptm", "meandewpti", "meanpressurem", "meanpressurei",
"meanwindspdm", "meanwindspdi", "meanwdire", "meanwdird", "meanvism",
"meanvisi", "humidity", "maxtempm", "maxtempi", "mintempm", "mintempi",
"maxhumidity", "minhumidity", "maxdewptm", "maxdewpti", "mindewptm",
"mindewpti", "maxpressurem", "maxpressurei", "minpressurem",
"minpressurei", "maxwspdm", "maxwspdi", "minwspdm", "minwspdi",
"maxvism", "maxvisi", "minvism", "minvisi", "gdegreedays", "heatingdegreedays",
"coolingdegreedays", "precipm", "precipi", "precipsource", "heatingdegreedaysnormal",
"monthtodateheatingdegreedays", "monthtodateheatingdegreedaysnormal",
"since1sepheatingdegreedays", "since1sepheatingdegreedaysnormal",
"since1julheatingdegreedays", "since1julheatingdegreedaysnormal",
"coolingdegreedaysnormal", "monthtodatecoolingdegreedays", "monthtodatecoolingdegreedaysnormal",
"since1sepcoolingdegreedays", "since1sepcoolingdegreedaysnormal",
"since1jancoolingdegreedays", "since1jancoolingdegreedaysnormal"
), class = "data.frame", row.names = 1L)
and I am using the following command to retrieve data from it
df <- data.frame()
df <- rbind(df, ldply(res, function(x) x[[1]]))
To use this data frame, I convert it into data table, using dt <- data.table(df) and now I know how to work with the data, for instance dt[.id=="fog"].
Is there a more elegant/efficient solution?
The problem was solved by #antoine-sac. It was not necessary to use the apply to get the data, it was only a question of "un-nest" the data.

Your problem is that your data is a data.frame and one of its column is date. But date is a data.frame. As you say it is a nested list. So let's "un-nest" it.
You can simply do (assuming your data is in data):
df.date <- data$date
# removing incorrectly formated date from data
data$date <- NULL
At this point, data is a normal data.frame and df.date is also a basic data.frame.
> df.date
pretty year mon mday hour min tzname
1 12:00 PM CDT on August 14, 2015 2015 08 14 12 00 America/Chicago
If you want to merge that with your existing data.frame:
# binding df.date with your data
data <- cbind(data, df.date)
No need for any kind of apply.
Now if you don't know how to access variables in a data.frame, that's another thing.
If you want, say, meantempm, you can simply do data$meantempm.
I refer you to beginner tutorial about R, there are plenty to choose from with a google request.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Filling data down to subsequent row - r

Related

How to wrangle the dataset in R: reshaping and creating new columns with given information

converting multiple columns from wide to long using pivot_longer

How to use tapply to match specific condition

How to search for specific character that has space in the end using sqldf in R

nested data.frame [closed]

Categories

Resources