subtracting multiple columns from each other - r

I have a large dataset and I want to subtract specific columns from each other based on their position. I want to subtract column 2 from column 8, column 3 from column 9 and column 4 from column 10.
Thanks a lot
Magnus
structure(list(Stamp_summertime = structure(c(1546684744, 1546685858,
1546687004, 1547030061, 1547030835, 1547031816), tzone = "UTC", class = c("POSIXct",
"POSIXt")), X26.013 = c(0.138461, 0.138461, 0.138461, 0.144421,
0.144421, 0.144421), X27.024 = c(0.0752111, 0.0752111, 0.0752111,
0.0426819, 0.0426819, 0.0426819), X33.031 = c(3.75788, 3.75788,
3.75788, 3.12581, 3.12581, 3.12581), jar_camp = c("1_pf1.1",
"2_pf1.1", "3_pf1.1", "1_pf2.1", "2_pf2.1", "3_pf2.1"), jar = structure(c(1L,
12L, 23L, 1L, 12L, 23L), .Label = c("1", "10_blank", "11", "12",
"13", "14", "15", "16_blank", "17", "18", "19", "2", "20_blank",
"21", "22", "23", "24", "25", "26", "27", "28", "29", "3", "30_blank",
"31", "32", "33", "34", "35", "36", "37", "38_blank", "39", "4",
"40", "41", "42", "43", "44_blank", "45", "46", "47", "48", "49",
"5_blank", "blank_50", "51", "52", "53", "54", "55", "56", "57",
"6", "7", "8", "9", "X_blank"), class = "factor"), campaign = c("pf1.1",
"pf1.1", "pf1.1", "pf2.1", "pf2.1", "pf2.1"), i.X26.013 = c(0.144658,
0.21502, 0.458296, 0.191571, 0.0789067, 0.711814), i.X27.024 = c(0.0595547,
0.0651149, 0.146772, 0.0997815, 0.0539976, 0.185398), i.X33.031 = c(5.4066,
3.30406, 18.0479, 6.13854, 1.3028, 22.2226)), sorted = "Stamp_summertime", class = c("data.table",
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x00000237a3d91ef0>)

We can create 2 vectors of position and subtract the columns directly. Since you have data.table we use ..column_number to select columns by position.
library(data.table)
col1group <- 2:4
col2group <- 8:10
df[, ..col1group] - df[, ..col2group])
If you want to add them as new columns to original data you can rename them and cbind
cbind(df, setNames(df[, ..col1group] - df[, ..col2group],
paste0(names(df)[col1group], '_diff')))

Something like the following computes the subtractions in the question.
library(data.table)
nms <- names(df1)
iCols <- grep("^i\\.", nms, value = TRUE)
Cols <- sub("^i\\.", "", iCols)
df1[, lapply(seq_along(Cols), function(i) get(Cols[i]) - get(iCols[i]))]
# V1 V2 V3
#1: -0.0061970 0.0156564 -1.64872
#2: -0.0765590 0.0100962 0.45382
#3: -0.3198350 -0.0715609 -14.29002
#4: -0.0471500 -0.0570996 -3.01273
#5: 0.0655143 -0.0113157 1.82301
#6: -0.5673930 -0.1427161 -19.09679
Following Ronak Shah's answer I realized that the code below also works.
df1[, ..Cols] - df1[, ..iCols]
The numeric results are the same but the column names are the vector Cols.
To create new columns, try
newCols <- paste(Cols, "diff", sep = "_")
df1[, (newCols) := lapply(seq_along(Cols), function(i) get(Cols[i]) - get(iCols[i]))]

Base R solution:
idx <- c(2, 3, 4)
jdx <- c(8, 9, 10)
Using lapply() and column binding the list:
setNames(do.call("cbind", lapply(seq_along(idx), function(i){
df[, jdx[i], drop = FALSE] - df[, idx[i], drop = FALSE]
}
)
), c(paste("x", jdx, idx, sep = "_")))
Using sapply() and coercing vectors to a data.frame:
setNames(data.frame(sapply(seq_along(idx), function(i){
df[, jdx[i], drop = FALSE] - df[, idx[i], drop = FALSE]
}
)
), c(paste("x", jdx, idx, sep = "_")))
Using Map() and Reduce() and column binding to original data.frame:
cbind(df, setNames(Reduce(cbind, Map(function(i){
df[, jdx[i], drop = FALSE] - df[, idx[i], drop = FALSE]
}, seq_along(idx))), c(paste("x", jdx, idx, sep = "_"))))

Related

Why are those 3 loops faster than 1 short lapply

While trying to optimize and benchmark a function, I was able to shrink 3 for loops into 1 short lapply call, but the function got slower.
I am trying to understand why that happens, as with the 3 loops I preallocate 3 lists with the same length and fill them in 3 different loops, which doesnt seem necessary and inefficient.
## Data #################
Grid = structure(list(ID = 1:81, X = c(99.99922283, 299.99922281, 499.9992228,
699.99922279, 899.99922277, 1099.99922275, 1299.99922274, 1499.99922273,
1699.99922271, 99.99922293, 299.99922291, 499.99922291, 699.99922289,
899.99922287, 1099.99922286, 1299.99922284, 1499.99922283, 1699.99922282,
99.99922303, 299.99922302, 499.99922301, 699.999223, 899.99922298,
1099.99922296, 1299.99922295, 1499.99922294, 1699.99922292, 99.99922314,
299.99922312, 499.99922311, 699.9992231, 899.99922308, 1099.99922307,
1299.99922306, 1499.99922304, 1699.99922303, 99.99922324, 299.99922323,
499.99922322, 699.9992232, 899.99922319, 1099.99922317, 1299.99922316,
1499.99922315, 1699.99922313, 99.99922335, 299.99922333, 499.99922332,
699.99922331, 899.9992233, 1099.99922328, 1299.99922327, 1499.99922325,
1699.99922324, 99.99922345, 299.99922344, 499.99922342, 699.99922341,
899.9992234, 1099.99922338, 1299.99922337, 1499.99922335, 1699.99922334,
99.99922356, 299.99922354, 499.99922353, 699.99922352, 899.9992235,
1099.99922348, 1299.99922347, 1499.99922345, 1699.99922344, 99.99922367,
299.99922365, 499.99922364, 699.99922362, 899.99922361, 1099.99922359,
1299.99922358, 1499.99922356, 1699.99922355), Y = c(1699.9975638,
1699.99756369, 1699.99756357, 1699.99756347, 1699.99756336, 1699.99756325,
1699.99756314, 1699.99756303, 1699.99756292, 1499.99756399, 1499.99756388,
1499.99756377, 1499.99756366, 1499.99756355, 1499.99756344, 1499.99756333,
1499.99756322, 1499.99756311, 1299.99756418, 1299.99756408, 1299.99756396,
1299.99756386, 1299.99756375, 1299.99756363, 1299.99756353, 1299.99756342,
1299.99756331, 1099.99756438, 1099.99756427, 1099.99756416, 1099.99756405,
1099.99756394, 1099.99756384, 1099.99756372, 1099.99756361, 1099.99756351,
899.99756457, 899.99756446, 899.99756434, 899.99756424, 899.99756414,
899.99756403, 899.99756392, 899.99756381, 899.9975637, 699.99756477,
699.99756466, 699.99756454, 699.99756443, 699.99756433, 699.99756422,
699.99756411, 699.99756401, 699.99756389, 499.99756496, 499.99756485,
499.99756474, 499.99756463, 499.99756452, 499.99756441, 499.9975643,
499.9975642, 499.99756409, 299.99756516, 299.99756505, 299.99756494,
299.99756483, 299.99756472, 299.99756461, 299.9975645, 299.99756439,
299.99756428, 99.99756535, 99.99756524, 99.99756513, 99.99756502,
99.99756491, 99.9975648, 99.99756469, 99.99756458, 99.99756448
)), row.names = c("11", "12", "13", "14", "15", "16", "17", "18",
"19", "21", "22", "23", "24", "25", "26", "27", "28", "29", "31",
"32", "33", "34", "35", "36", "37", "38", "39", "41", "42", "43",
"44", "45", "46", "47", "48", "49", "51", "52", "53", "54", "55",
"56", "57", "58", "59", "61", "62", "63", "64", "65", "66", "67",
"68", "69", "71", "72", "73", "74", "75", "76", "77", "78", "79",
"81", "82", "83", "84", "85", "86", "87", "88", "89", "91", "92",
"93", "94", "95", "96", "97", "98", "99"), class = "data.frame")
mut2 = sapply(1:100, function(i) sample(c(0,1), size = nrow(Grid), replace = T))
## Functions #################
## Triple For loop
getRects <- function(trimtonOut, Grid){
len1 <- dim(trimtonOut)[2]
childli = childnew = rectidli = vector("list", len1);
for (i in 1:len1) {
childli[[i]] <- trimtonOut[,i]
}
for (u in 1:len1){
rectidli[[u]] <- which(childli[[u]]==1, arr.ind = T)
}
for (z in 1:len1) {
childnew[[z]] <- Grid[rectidli[[z]],];
}
return(childnew)
}
## Shortest Lapply
getRects1 <- function(trimtonOut, Grid){
lapply(1:dim(trimtonOut)[2], function(i) {
Grid[which(trimtonOut[,i]==1, arr.ind = T),]
})
}
## Shorter Lapply
getRects2 <- function(trimtonOut, Grid){
lapply(1:dim(trimtonOut)[2], function(i) {
tmp = which(trimtonOut[,i]==1, arr.ind = T)
Grid[tmp,]
})
}
## Longest Lapply
getRects3 <- function(trimtonOut, Grid){
lapply(1:dim(trimtonOut)[2], function(i) {
tmp = trimtonOut[,i]
tmp1 = which(tmp==1, arr.ind = T)
Grid[tmp1,]
})
}
## Execute and Compare #################
getRectV <- getRects(mut2, Grid)
getRectV1 <- getRects1(mut2, Grid)
getRectV2 <- getRects2(mut2, Grid)
getRectV3 <- getRects3(mut2, Grid)
identical(getRectV,getRectV1)
identical(getRectV,getRectV2)
identical(getRectV,getRectV3)
## Benchmark #################
library(microbenchmark)
# mut2 = sapply(1:400, function(i) sample(c(0,1), size = nrow(Grid), replace = T))
mc = microbenchmark(
loop = getRects(mut2, Grid),
lap1 = getRects1(mut2, Grid),
lap2 = getRects2(mut2, Grid),
lap3 = getRects3(mut2, Grid)
)
mc
Are you sure that those time differences are that significant?
library(microbenchmark)
# mut2 = sapply(1:400, function(i) sample(c(0,1), size = nrow(Grid), replace = T))
mc = microbenchmark(
loop = getRects(mut2, Grid),
lap1 = getRects1(mut2, Grid),
lap2 = getRects2(mut2, Grid),
lap3 = getRects3(mut2, Grid)
)
mc
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> loop 2.651485 2.699166 3.195301 2.756171 3.136741 8.010173 100
#> lap1 2.755571 2.828128 3.098850 2.877806 3.012487 7.427598 100
#> lap2 2.737105 2.808924 3.118260 2.863221 2.939996 13.706736 100
#> lap3 2.719101 2.787040 3.191893 2.852963 3.004811 8.490867 100

Plotting lines of multiple groups in ggplot2 gives a weird result

I have done species accumulation curves and would like to plot the SAC results of different substrate sizeclasses in the same ggplot, with expected species richness on y-axis and number of sites samples on x-axis. The data features a cumulative number of samples in each sizeclass (column "sites"), the expected species richness (column "richness"), and substrate size classes 10, 20 and 30 (column "sc").
sites richness sc
1 1 0.6696915 10
2 2 1.2008513 10
3 3 1.6387310 10
4 4 2.0128472 10
5 5 2.3424933 10
6 6 2.6403239 10
sites richness sc
2836 1 1.000000 20
2837 2 1.703442 20
2838 3 2.249188 20
2839 4 2.706618 20
2840 5 3.110651 20
2841 6 3.479173 20
I want each sizeclass to have unique linetype. I used the following code for ggplot:
sac_kaikki<-ggplot(sac_data, aes(x=sites, y=richness,group=sc)) +
geom_line(aes(linetype=sc))+
coord_cartesian(xlim=c(0,100))+
theme(axis.title.y = element_blank())+
theme(axis.title.x = element_blank())
However, instead of getting three neat lines in different linetypes, I got [this jumbly muddly messy thing with more stripes than a herd of zebras][1]: https://i.stack.imgur.com/iD75K.jpg. I am sure the solution is rather simple, but for my life I am not able to figure it out.
// as Brookes kindly pointed out I should add some reproducible data, here is a subset of my data with dput, featuring 10 first observations of size classes 10 and 20:
dput(head(subset(sac_data,sac_data$sc=="10"),10))
structure(list(sites = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), richness = c(0.669691470054462,
1.20085134466255, 1.63873100707468, 2.01284716414471, 2.34249332096243,
2.64032389106845, 2.91468283244696, 3.17111526890278, 3.41334794519086,
3.64392468817362), sc = c("10", "10", "10", "10", "10", "10",
"10", "10", "10", "10")), .Names = c("sites", "richness", "sc"
), row.names = c(NA, 10L), class = "data.frame")
dput(head(subset(sac_data,sac_data$sc=="20"),10))
structure(list(sites = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), richness = c(0.999999999999987,
1.70344155844158, 2.24918831168832, 2.70661814764865, 3.11065087175364,
3.47917264517669, 3.82165739030286, 4.14341144680334, 4.44765475554031,
4.73653870494466), sc = c("20", "20", "20", "20", "20", "20",
"20", "20", "20", "20")), .Names = c("sites", "richness", "sc"
), row.names = 2836:2845, class = "data.frame")
// okay so for whatever reason, the plot works just fine if I plot only two sizeclasses, but including the third one produces the absurd plot I posted a picture of.
structure(list(sites = 1:10, richness = c(0.42857142857143, 0.838095238095238,
1.22932330827066, 1.60300751879699, 1.95989974937343, 2.30075187969924,
2.62631578947368, 2.93734335839598, 3.23458646616541, 3.5187969924812
), sc = c("30", "30", "30", "30", "30", "30", "30", "30", "30",
"30")), .Names = c("sites", "richness", "sc"), row.names = c(NA,
10L), class = "data.frame")
Works fine for me with your sample data:
a <- structure(list(sites = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), richness = c(0.669691470054462,
1.20085134466255, 1.63873100707468, 2.01284716414471, 2.34249332096243,
2.64032389106845, 2.91468283244696, 3.17111526890278, 3.41334794519086,
3.64392468817362), sc = c("10", "10", "10", "10", "10", "10",
"10", "10", "10", "10")), .Names = c("sites", "richness", "sc"
), row.names = c(NA, 10L), class = "data.frame")
b <- structure(list(sites = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), richness = c(0.999999999999987,
1.70344155844158, 2.24918831168832, 2.70661814764865, 3.11065087175364,
3.47917264517669, 3.82165739030286, 4.14341144680334, 4.44765475554031,
4.73653870494466), sc = c("20", "20", "20", "20", "20", "20",
"20", "20", "20", "20")), .Names = c("sites", "richness", "sc"
), row.names = 2836:2845, class = "data.frame")
c <- structure(list(sites = 1:10, richness = c(0.42857142857143, 0.838095238095238,
1.22932330827066, 1.60300751879699, 1.95989974937343, 2.30075187969924,
2.62631578947368, 2.93734335839598, 3.23458646616541, 3.5187969924812
), sc = c("30", "30", "30", "30", "30", "30", "30", "30", "30",
"30")), .Names = c("sites", "richness", "sc"), row.names = c(NA,
10L), class = "data.frame")
sac_data <- bind_rows(a, b, c)
Plotting:
ggplot(sac_data, aes(sites, richness, group = sc)) +
geom_line(aes(linetype = sc))

Convert column types to their read_csv() column type in R

One of my favorite things about library(readr) and the read_csv() function in R is that it almost always sets the column types of my data to the correct class. However, I am currently working with an API in R that returns data to me as a dataframe of all character classes, even if the data is clearly numbers. Take this dataframe for example, which has some sports data:
dput(mydf)
structure(list(isUnplayed = c("false", "false", "false"), isInProgress =
c("false", "false", "false"), isCompleted = c("true", "true", "true"), awayScore = c("106",
"95", "95"), homeScore = c("94", "97", "111"), game.ID = c("31176",
"31177", "31178"), game.date = c("2015-10-27", "2015-10-27",
"2015-10-27"), game.time = c("8:00PM", "8:00PM", "10:30PM"),
game.location = c("Philips Arena", "United Center", "Oracle Arena"
), game.awayTeam.ID = c("88", "86", "110"), game.awayTeam.City = c("Detroit",
"Cleveland", "New Orleans"), game.awayTeam.Name = c("Pistons",
"Cavaliers", "Pelicans"), game.awayTeam.Abbreviation = c("DET",
"CLE", "NOP"), game.homeTeam.ID = c("91", "89", "101"), game.homeTeam.City = c("Atlanta",
"Chicago", "Golden State"), game.homeTeam.Name = c("Hawks",
"Bulls", "Warriors"), game.homeTeam.Abbreviation = c("ATL",
"CHI", "GSW"), quarterSummary.quarter = list(structure(list(
`#number` = c("1", "2", "3", "4"), awayScore = c("25",
"23", "34", "24"), homeScore = c("25", "18", "23", "28"
)), .Names = c("#number", "awayScore", "homeScore"), class = "data.frame", row.names = c(NA,
4L)), structure(list(`#number` = c("1", "2", "3", "4"), awayScore = c("17",
"23", "28", "27"), homeScore = c("26", "20", "25", "26")), .Names = c("#number",
"awayScore", "homeScore"), class = "data.frame", row.names = c(NA,
4L)), structure(list(`#number` = c("1", "2", "3", "4"), awayScore = c("35",
"14", "26", "20"), homeScore = c("39", "20", "35", "17")), .Names = c("#number",
"awayScore", "homeScore"), class = "data.frame", row.names = c(NA,
4L)))), .Names = c("isUnplayed", "isInProgress", "isCompleted",
"awayScore", "homeScore", "game.ID", "game.date", "game.time",
"game.location", "game.awayTeam.ID", "game.awayTeam.City", "game.awayTeam.Name",
"game.awayTeam.Abbreviation", "game.homeTeam.ID", "game.homeTeam.City",
"game.homeTeam.Name", "game.homeTeam.Abbreviation", "quarterSummary.quarter"
), class = "data.frame", row.names = c(NA, 3L))
It is quite a hassle to deal with this dataframe once it is returned by the API, given the class types. I've come up with a sort of a hack to update the column classes, which is as follows:
write_csv(mydf, 'mydf.csv')
mydf <- read_csv('mydf.csv')
By writing to CSV and then re-reading the CSV using read_csv(), the dataframe columns update. Unfortunately I am left with a CSV file in my directory that I don't want. Is there a way to update the columns of an R dataframe to their 'read_csv()' column classes, without actually having to write the CSV?
Any help is appreciated!
You don't need to write and read the data if you just want readr to guess you column type. You could use readr::type_convert for that:
iris %>%
dplyr::mutate(Sepal.Width = as.character(Sepal.Width)) %>%
readr::type_convert() %>%
str()
For comparison:
iris %>%
dplyr::mutate(Sepal.Width = as.character(Sepal.Width)) %>%
str()
try this code, type.convert convert a character vector to logical, integer, numeric, complex or factor as appropriate.
indx <- which(sapply(df, is.character))
df[, indx] <- lapply(df[, indx], type.convert)
indx <- which(sapply(df, is.factor))
df[, indx] <- lapply(df[, indx], as.character)

How to select range of rows in R

I have a dataframe called mydf. I also have a vector called myvec <- c("chr5:11", "chr3:112", "chr22:334"). What I want to do is select range (including 3 values above and 3 values below) of rows if any of the vector elements match the key in mydf and make a subset of mydf(result).
Since in the myvec we have chr5:11 matching with the key in mydf, we are selecting rows matching chr5:8 (three values below) to chr5:14 (three values above) in the result.
mydf<- structure(list(key = structure(c(5L, 2L, 7L, 8L, 4L, 1L, 6L,
3L, 11L, 10L, 9L), .Names = c("34", "35", "36", "37", "38", "39",
"40", "41", "42", "43", "44"), .Label = c("chr5:10", "chr5:11",
"chr5:1123", "chr5:118", "chr5:12", "chr5:123", "chr5:13", "chr5:14",
"chr5:19", "chr5:8", "chr5:9"), class = "factor"), variantId = structure(1:11, .Names = c("34",
"35", "36", "37", "38", "39", "40", "41", "42", "43", "44"), .Label = c("9920068",
"9920069", "9920070", "9920071", "9920072", "9920073", "9920074",
"9920075", "9920076", "9920077", "9920078"), class = "factor")), .Names = c("key",
"variantId"), row.names = c("34", "35", "36", "37", "38", "39",
"40", "41", "42", "43", "44"), class = "data.frame")
result
key variant
43 "chr5:8" "9920077"
42 "chr5:9" "9920076"
39 "chr5:10" "9920073"
35 "chr5:11" "9920069"
34 "chr5:12" "9920068"
36 "chr5:13" "9920070"
37 "chr5:14" "9920071"
How about the following (I use data.table but the base version is almost the same)
library(data.table)
mydf <- as.data.table(mydf) #(if mydf really is stored as a matrix currently)
myvec2 <- lapply(strsplit(gsub("chr", "", myvec), split=":"), as.integer)
mydf[unique(Reduce(c, sapply(myvec2, function(x){
which(key %in% paste0("chr", x[1], ":", seq((x2 <- x[2]) - 3L, x2 + 3L)))}
))), ]
(in base, replace as.data.table with as.data.frame,key with mydf$key, and replace the closing square bracket ] with ,])
Extra option for sorting
Actually, I think this option is better in general, since it stores your information in a more pliable way in the first place. This version's a bit heavier in the data.table parlance.
mydf <- as.data.table(mydf)
#Split your `key` variable into its pre- and post-colon components
# (of course using better names if those numbers mean something
# more specific to you)
mydf[ , c("chr", "sub") :=
.(as.integer(gsub("chr|:.*", "", key)),
as.integer(gsub(".*:", "", key)))]
Now, proceeding much as before with a slight tweak:
myvec2<-lapply(strsplit(gsub("chr","",myvec),split=":"),as.integer)
mydf[unique(Reduce(c, sapply(myvec2, function(x){
which(chr == x[1] & sub %in% seq((x2 <- x[2]) - 3L, x2 + 3L))}
)))][order(chr, sub)]
Outputs:
key variantId chr sub
1: chr5:8 9920077 5 8
2: chr5:9 9920076 5 9
3: chr5:10 9920073 5 10
4: chr5:11 9920069 5 11
5: chr5:12 9920068 5 12
6: chr5:13 9920070 5 13
7: chr5:14 9920071 5 14
You can use the GenomicRanges package.
library(GenomicRanges)
myvec <- c("chr5:11", "chr3:112", "chr22:334")
myvec.gr <- GRanges(gsub(":.+", "", myvec),
IRanges(as.numeric(gsub(".+:", "", myvec))-3,
as.numeric(gsub(".+:", "", myvec)))+3)
mydf.gr <- GRanges(gsub(":.+", "", mydf[,"key"]),
IRanges(as.numeric(gsub(".+:", "", mydf[,"key"])),
as.numeric(gsub(".+:", "", mydf[,"key"]))))
d.v.op <- findOverlaps(mydf.gr, myvec.gr)
mydf[queryHits(d.v.op), ]
# key variantId
# 34 "chr5:12" "9920068"
# 35 "chr5:11" "9920069"
# 36 "chr5:13" "9920070"
# 37 "chr5:14" "9920071"
# 39 "chr5:10" "9920073"
# 42 "chr5:9" "9920076"
# 43 "chr5:8" "9920077"

nested data.frame [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have a nested data.frame
dput(res)
structure(list(date = structure(list(pretty = "12:00 PM CDT on August 14, 2015",
year = "2015", mon = "08", mday = "14", hour = "12", min = "00",
tzname = "America/Chicago"), .Names = c("pretty", "year",
"mon", "mday", "hour", "min", "tzname"), class = "data.frame", row.names = 1L),
fog = "0", rain = "1", snow = "0", snowfallm = "0.00", snowfalli = "0.00",
monthtodatesnowfallm = "", monthtodatesnowfalli = "", since1julsnowfallm = "",
since1julsnowfalli = "", snowdepthm = "", snowdepthi = "",
hail = "0", thunder = "0", tornado = "0", meantempm = "26",
meantempi = "79", meandewptm = "17", meandewpti = "63", meanpressurem = "1019",
meanpressurei = "30.09", meanwindspdm = "11", meanwindspdi = "7",
meanwdire = "", meanwdird = "139", meanvism = "16", meanvisi = "10",
humidity = "", maxtempm = "32", maxtempi = "90", mintempm = "21",
mintempi = "69", maxhumidity = "86", minhumidity = "36",
maxdewptm = "18", maxdewpti = "65", mindewptm = "15", mindewpti = "59",
maxpressurem = "1021", maxpressurei = "30.15", minpressurem = "1017",
minpressurei = "30.04", maxwspdm = "19", maxwspdi = "12",
minwspdm = "0", minwspdi = "0", maxvism = "16", maxvisi = "10",
minvism = "16", minvisi = "10", gdegreedays = "29", heatingdegreedays = "0",
coolingdegreedays = "14", precipm = "0.00", precipi = "0.00",
precipsource = "", heatingdegreedaysnormal = "", monthtodateheatingdegreedays = "",
monthtodateheatingdegreedaysnormal = "", since1sepheatingdegreedays = "",
since1sepheatingdegreedaysnormal = "", since1julheatingdegreedays = "",
since1julheatingdegreedaysnormal = "", coolingdegreedaysnormal = "",
monthtodatecoolingdegreedays = "", monthtodatecoolingdegreedaysnormal = "",
since1sepcoolingdegreedays = "", since1sepcoolingdegreedaysnormal = "",
since1jancoolingdegreedays = "", since1jancoolingdegreedaysnormal = ""), .Names = c("date",
"fog", "rain", "snow", "snowfallm", "snowfalli", "monthtodatesnowfallm",
"monthtodatesnowfalli", "since1julsnowfallm", "since1julsnowfalli",
"snowdepthm", "snowdepthi", "hail", "thunder", "tornado", "meantempm",
"meantempi", "meandewptm", "meandewpti", "meanpressurem", "meanpressurei",
"meanwindspdm", "meanwindspdi", "meanwdire", "meanwdird", "meanvism",
"meanvisi", "humidity", "maxtempm", "maxtempi", "mintempm", "mintempi",
"maxhumidity", "minhumidity", "maxdewptm", "maxdewpti", "mindewptm",
"mindewpti", "maxpressurem", "maxpressurei", "minpressurem",
"minpressurei", "maxwspdm", "maxwspdi", "minwspdm", "minwspdi",
"maxvism", "maxvisi", "minvism", "minvisi", "gdegreedays", "heatingdegreedays",
"coolingdegreedays", "precipm", "precipi", "precipsource", "heatingdegreedaysnormal",
"monthtodateheatingdegreedays", "monthtodateheatingdegreedaysnormal",
"since1sepheatingdegreedays", "since1sepheatingdegreedaysnormal",
"since1julheatingdegreedays", "since1julheatingdegreedaysnormal",
"coolingdegreedaysnormal", "monthtodatecoolingdegreedays", "monthtodatecoolingdegreedaysnormal",
"since1sepcoolingdegreedays", "since1sepcoolingdegreedaysnormal",
"since1jancoolingdegreedays", "since1jancoolingdegreedaysnormal"
), class = "data.frame", row.names = 1L)
and I am using the following command to retrieve data from it
df <- data.frame()
df <- rbind(df, ldply(res, function(x) x[[1]]))
To use this data frame, I convert it into data table, using dt <- data.table(df) and now I know how to work with the data, for instance dt[.id=="fog"].
Is there a more elegant/efficient solution?
The problem was solved by #antoine-sac. It was not necessary to use the apply to get the data, it was only a question of "un-nest" the data.
Your problem is that your data is a data.frame and one of its column is date. But date is a data.frame. As you say it is a nested list. So let's "un-nest" it.
You can simply do (assuming your data is in data):
df.date <- data$date
# removing incorrectly formated date from data
data$date <- NULL
At this point, data is a normal data.frame and df.date is also a basic data.frame.
> df.date
pretty year mon mday hour min tzname
1 12:00 PM CDT on August 14, 2015 2015 08 14 12 00 America/Chicago
If you want to merge that with your existing data.frame:
# binding df.date with your data
data <- cbind(data, df.date)
No need for any kind of apply.
Now if you don't know how to access variables in a data.frame, that's another thing.
If you want, say, meantempm, you can simply do data$meantempm.
I refer you to beginner tutorial about R, there are plenty to choose from with a google request.

Resources