I have a list like this
a toy data like this
ltd <- list(structure(list(Abund = c("BROS", "KIS", "TTHS",
"MKS"), `Value: F111: cold, Sample1` = c("1.274e7", "",
"", "2.301e7"), `Value: F111: warm, Sample1` = c("", "",
"", "")), .Names = c("Abund", "Value: F111: cold, Sample1",
"Value: F111: warm, Sample1"), row.names = c(NA, 4L), class = "data.frame"),
structure(list(Abund = c("BROS", "TMS", "KIS",
"HERS"), `Value: F216: cold, Sample2` = c("1.670e6",
"4.115e7", "", "1.302e7"), `Value: F216: warm, Sample2` = c("",
"2.766e7", "", "1.396e7")), .Names = c("Abund", "Value: F216: cold, Sample2",
"Value: F216: warm, Sample2"), row.names = c(NA, 4L), class = "data.frame"),
structure(list(Abund = c("BROS", "TMS", "KIS",
"HERS"), `Value: F655: cold, Sample3` = c("7.074e4",
"1.038e7", "", "7.380e5"), `Value: F655: warm, Sample3` = c("",
"6.874e6", "", "7.029e5")), .Names = c("Abund", "Value: F655: cold, Sample3",
"Value: F655: warm, Sample3"), row.names = c(NA, 4L), class = "data.frame"))
List of 5000
$ :'data.frame': 397 obs. of 3 variables:
..$ Abund : chr [1:363] "TTT" "MMM" "GTR" "NLM" ...
..$ Value: F111: Warm, Sample1: chr [1:363] "1.274e7" "" "" "2.301e7" ...
..$ Value: F111: Cold, Sample1: chr [1:363] "" "" "" "" ...
$ :'data.frame': 673 obs. of 3 variables:
..$ Abund : chr [1:673] "MGL" "KKK" "LFT" "NKL" ...
..$ Value: F216: Warm, Sample2: chr [1:673] "1.670e6" "4.115e7" "" "1.302e7" ...
..$ Value: F216: Cold, Sample2: chr [1:673] "" "2.766e7" "" "1.396e7" ...
$ :'data.frame': 779 obs. of 3 variables:
..$ Abund : chr [1:779] "TTLS" "KIS" "KISA" "LISU" ...
..$ Value: F655: Warm, Sample3: chr [1:779] "7.074e4" "1.038e7" "" "7.380e5" ...
..$ Value: F655: Cold, Sample3: chr [1:779] "" "6.874e6" "" "7.029e5" ...
$ :'data.frame': 387 obs. of 3 variables:
..$ Abund : chr [1:387] "BRO" "BIA" "KIA" "TTHS" ...
..$ Value: F57: Warm, Sample4: chr [1:387] "6.910e6" "" "2.435e7" "3.924e6" ...
..$ Value: F57: Cold, Sample4: chr [1:387] "5.009e6" "" "" "3.624e6" ...
$ :'data.frame': 543 obs. of 3 variables:
I want to give unique names to the abund starting from 1 to whatever it has , so the output should look like
So a disire output looks like below. I have to just write blah blah that this web allow me to post my question otherwise it does not allow
List of 5000
$ :'data.frame': 397 obs. of 3 variables:
..$ Abund1 : chr [1:363] "TTT" "MMM" "GTR" "NLM" ...
..$ Value: F111: Warm, Sample1: chr [1:363] "1.274e7" "" "" "2.301e7" ...
..$ Value: F111: Cold, Sample1: chr [1:363] "" "" "" "" ...
$ :'data.frame': 673 obs. of 3 variables:
..$ Abund2 : chr [1:673] "MGL" "KKK" "LFT" "NKL" ...
..$ Value: F216: Warm, Sample2: chr [1:673] "1.670e6" "4.115e7" "" "1.302e7" ...
..$ Value: F216: Cold, Sample2: chr [1:673] "" "2.766e7" "" "1.396e7" ...
$ :'data.frame': 779 obs. of 3 variables:
..$ Abund3 : chr [1:779] "TTLS" "KIS" "KISA" "LISU" ...
..$ Value: F655: Warm, Sample3: chr [1:779] "7.074e4" "1.038e7" "" "7.380e5" ...
..$ Value: F655: Cold, Sample3: chr [1:779] "" "6.874e6" "" "7.029e5" ...
$ :'data.frame': 387 obs. of 3 variables:
..$ Abund4 : chr [1:387] "BRO" "BIA" "KIA" "TTHS" ...
..$ Value: F57: Warm, Sample4: chr [1:387] "6.910e6" "" "2.435e7" "3.924e6" ...
..$ Value: F57: Cold, Sample4: chr [1:387] "5.009e6" "" "" "3.624e6" ...
To solve a problem like this, instead of attacking the big problem up front, it's best to solve one piece of it at a time. If we look at just one frame from your list, I'll call it x:
x <- structure(list(Abund = c("BROS", "KIS", "TTHS",
"MKS"), `Value: F111: cold, Sample1` = c("1.274e7", "",
"", "2.301e7"), `Value: F111: warm, Sample1` = c("", "",
"", "")), .Names = c("Abund", "Value: F111: cold, Sample1",
"Value: F111: warm, Sample1"), row.names = c(NA, 4L), class = "data.frame")
str(x)
# 'data.frame': 4 obs. of 3 variables:
# $ Abund111 : chr "BROS" "KIS" "TTHS" "MKS"
# $ Value: F111: cold, Sample1: chr "1.274e7" "" "" "2.301e7"
# $ Value: F111: warm, Sample1: chr "" "" "" ""
You had originally wanted to append the number after the "F" in the other column names. I'll attack that first, and then if you really want it, I'll also do the "append an incrementing number" thing.
F-number
Write a function that finds the "F" number within the second column name and appends it to the first column name. (I'm wondering if there are more diverse patterns of headers in your full dataset; I'm confident that the regex we use here can easily be manipulated to handle them, given enough varying samples.)
somefunc <- function(x) {
cn2 <- colnames(x)[2]
Fnum <- gsub(".*F([0-9]+).*", "\\1", cn2)
colnames(x)[1] <- paste0(colnames(x)[1], Fnum)
x
}
A brief explanation:
colnames(x)[2] just retrieves the second one; I'm assuming that we can base everything on the presence and makeup of this second column
gsub(".*F([0-9]+).*", "\\1", cn2) extracts just the numbers after "F"; for the record, if it weren't for the Sample, we might be able to discard any non-number, but I chose being safe here.
.* matches zero or more "anything" characters; sandwiching the rest with this on both sides of our group is essentially discarding all but the number we want
F the literal "F"
(...) this is a group, saved for later (referenced with the \\1 in the replacement string, the second argument to gsub)
[0-9]+ accepts anything within the brackets, which can be literals ([acf] matches the three letters) or a range ([0-9A-F] matches any digit and any letters between A and F); the + makes it "one or more" (contrasting with the * before which is zero or more)
colnames(x)[1] <- ... reassign the first column name
The work on the "single frame":
str( somefunc(x) )
# 'data.frame': 4 obs. of 3 variables:
# $ Abund111 : chr "BROS" "KIS" "TTHS" "MKS"
# $ Value: F111: cold, Sample1: chr "1.274e7" "" "" "2.301e7"
# $ Value: F111: warm, Sample1: chr "" "" "" ""
So now the question is how to apply this function that operates on one frame across a list of frames. lapply to the rescue:
str(lapply(ltd, somefunc))
# List of 3
# $ :'data.frame': 4 obs. of 3 variables:
# ..$ Abund111 : chr [1:4] "BROS" "KIS" "TTHS" "MKS"
# ..$ Value: F111: cold, Sample1: chr [1:4] "1.274e7" "" "" "2.301e7"
# ..$ Value: F111: warm, Sample1: chr [1:4] "" "" "" ""
# $ :'data.frame': 4 obs. of 3 variables:
# ..$ Abund216 : chr [1:4] "BROS" "TMS" "KIS" "HERS"
# ..$ Value: F216: cold, Sample2: chr [1:4] "1.670e6" "4.115e7" "" "1.302e7"
# ..$ Value: F216: warm, Sample2: chr [1:4] "" "2.766e7" "" "1.396e7"
# $ :'data.frame': 4 obs. of 3 variables:
# ..$ Abund655 : chr [1:4] "BROS" "TMS" "KIS" "HERS"
# ..$ Value: F655: cold, Sample3: chr [1:4] "7.074e4" "1.038e7" "" "7.380e5"
# ..$ Value: F655: warm, Sample3: chr [1:4] "" "6.874e6" "" "7.029e5"
Incrementing number
This is both easier and harder. First, we attack the small problem:
otherfunc <- function(x, num) {
colnames(x)[1] <- paste0(colnames(x)[1], num)
x
}
Pretty straight forward. But we cannot use lapply: all it does it accept a single argument, so it will not know what to do for the number. One might be tempted to brute-force things with a tracking variable somewhere (global? please no), but it might be interesting to know that there is a variant of the "apply" functions that operates differently: mapply takes one or more lists, and "zips" them together. For example:
myfunc <- c
mapply(myfunc, 1:3, 4:6, 7:9, SIMPLIFY=FALSE)
# [[1]]
# [1] 1 4 7
# [[2]]
# [1] 2 5 8
# [[3]]
# [1] 3 6 9
We started with three (could have been more) independent vectors (could have been lists, typically are), and took the first value from each and passed them to the function. So this is effectively like:
list(myfunc(1, 4, 7), mufunc(2, 5, 8), myfunc(3, 6, 9))
Ok, so realizing that we want to "zip" together each frame with ltd with a number along a sequence, those numbers are easily generated with:
seq_along(ltd)
# [1] 1 2 3
(This is considered better than 1:length(ltd), since the latter will not behave correctly if the length is 0 ... try 1:length(list()) versus seq_along(list()).)
Okay, so let's use this new trick:
str(mapply(otherfunc, ltd, seq_along(ltd), SIMPLIFY=FALSE))
# List of 3
# $ :'data.frame': 4 obs. of 3 variables:
# ..$ Abund1 : chr [1:4] "BROS" "KIS" "TTHS" "MKS"
# ..$ Value: F111: cold, Sample1: chr [1:4] "1.274e7" "" "" "2.301e7"
# ..$ Value: F111: warm, Sample1: chr [1:4] "" "" "" ""
# $ :'data.frame': 4 obs. of 3 variables:
# ..$ Abund2 : chr [1:4] "BROS" "TMS" "KIS" "HERS"
# ..$ Value: F216: cold, Sample2: chr [1:4] "1.670e6" "4.115e7" "" "1.302e7"
# ..$ Value: F216: warm, Sample2: chr [1:4] "" "2.766e7" "" "1.396e7"
# $ :'data.frame': 4 obs. of 3 variables:
# ..$ Abund3 : chr [1:4] "BROS" "TMS" "KIS" "HERS"
# ..$ Value: F655: cold, Sample3: chr [1:4] "7.074e4" "1.038e7" "" "7.380e5"
# ..$ Value: F655: warm, Sample3: chr [1:4] "" "6.874e6" "" "7.029e5"
It should be noted that mapply, just like sapply, will by default try to simplify things; I find it hard to trust that it always do what I want, so I typically turn off this simplification. There are times for it, yes, here is not that time. The apply functions (including Reduce) are typically very hard to learn to use when thinking in a linear/iterative methodology, but they can be very useful in times like these.
In base R you can do it this way :
ltd2 <- Map(function(x,y) {names(x)[1] <- paste0(names(x)[1],y);x},ltd,seq(ltd))
str(ltd2)
# List of 3
# $ :'data.frame': 4 obs. of 3 variables:
# ..$ Abund1 : chr [1:4] "BROS" "KIS" "TTHS" "MKS"
# ..$ Value: F111: cold, Sample1: chr [1:4] "1.274e7" "" "" "2.301e7"
# ..$ Value: F111: warm, Sample1: chr [1:4] "" "" "" ""
# $ :'data.frame': 4 obs. of 3 variables:
# ..$ Abund2 : chr [1:4] "BROS" "TMS" "KIS" "HERS"
# ..$ Value: F216: cold, Sample2: chr [1:4] "1.670e6" "4.115e7" "" "1.302e7"
# ..$ Value: F216: warm, Sample2: chr [1:4] "" "2.766e7" "" "1.396e7"
# $ :'data.frame': 4 obs. of 3 variables:
# ..$ Abund3 : chr [1:4] "BROS" "TMS" "KIS" "HERS"
# ..$ Value: F655: cold, Sample3: chr [1:4] "7.074e4" "1.038e7" "" "7.380e5"
# ..$ Value: F655: warm, Sample3: chr [1:4] "" "6.874e6" "" "7.029e5"
But I would use purrr::imap and dplyr::rename_at for same result:
library(purrr)
library(dplyr)
ltd3 <- imap(ltd,~rename_at(.,1,paste0,.y))
Related
I have a nested list where each nested list has the same elements but not in the same order and the elements are not named explicitly but do have a name value within the list.
As you can see in the structure the list containing the 'Date' field appears at the second position in the first list and the third position in the second list, so I can't extract at a position.
I would like to extract the lists where name : is Date and keep the value associated with it, using the purrr package.
STRUCTURE
dplyr::glimpse(my_list)
List of 2
$ :List of 10
..$ :List of 2
.. ..$ name : chr "MIME-Version"
.. ..$ value: chr "1.0"
..$ :List of 2
.. ..$ name : chr "Date"
.. ..$ value: chr "Wed, 13 Feb 2019 15:20:40 -0800"
..$ :List of 2
.. ..$ name : chr "References"
.. ..$ value: chr "<CAE1g-C7AmC3zoJRG_UgdwwkkSiJMuEuDYLU1j4ni0MZJXNrGNQ#mail.gmail.com>"
..$ :List of 2
.. ..$ name : chr "In-Reply-To"
.. ..$ value: chr "<CAE1g-C7AmC3zoJRG_UgdwwkkSiJMuEuDYLU1j4ni0MZJXNrGNQ#mail.gmail.com>"
..$ :List of 2
.. ..$ name : chr "Message-ID"
.. ..$ value: chr "<CAPApPh+WZszKg_bjQBPrS8TOvLA23hQkaa9Hocb_cgrQYs2R1w#mail.gmail.com>"
..$ :List of 2
.. ..$ name : chr "Subject"
.. ..$ value: chr "Re:"
..$ :List of 2
.. ..$ name : chr "From"
.. ..$ value: chr ""
..$ :List of 2
.. ..$ name : chr "To"
.. ..$ value: chr ""
..$ :List of 2
.. ..$ name : chr "Cc"
.. ..$ value: chr ""
..$ :List of 2
.. ..$ name : chr "Content-Type"
.. ..$ value: chr "multipart/alternative; boundary=\"000000000000f3d8810581cec99d\""
$ :List of 7
..$ :List of 2
.. ..$ name : chr "MIME-Version"
.. ..$ value: chr "1.0"
..$ :List of 2
.. ..$ name : chr "Message-ID"
.. ..$ value: chr "<CAPApPh+THbnCg2e5WmKEwQwHEEjKDHq3V6LkYV9oL88DbHE9Pg#mail.gmail.com>"
..$ :List of 2
.. ..$ name : chr "Date"
.. ..$ value: chr "Wed, 13 Feb 2019 12:18:32 -0800"
..$ :List of 2
.. ..$ name : chr "Subject"
.. ..$ value: chr ""
..$ :List of 2
.. ..$ name : chr "From"
.. ..$ value: chr "Daniel Seneca <senecad#gene.com>"
..$ :List of 2
.. ..$ name : chr "To"
.. ..$ value: chr "Daniel Seneca <seneca.daniel#gene.com>"
..$ :List of 2
.. ..$ name : chr "Content-Type"
.. ..$ value: chr "multipart/mixed; boundary=\"000000000000f11ad10581cc3e85\""
DATA
my_list <-list(list(list(name = "MIME-Version", value = "1.0"), list(name = "Date", value = "Wed, 13 Feb 2019 15:20:40 -0800"), list(name = "References", value = "<CAE1g-C7AmC3zoJRG_UgdwwkkSiJMuEuDYLU1j4ni0MZJXNrGNQ#mail.gmail.com>"), list(name = "In-Reply-To", value = "<CAE1g-C7AmC3zoJRG_UgdwwkkSiJMuEuDYLU1j4ni0MZJXNrGNQ#mail.gmail.com>"),list(name = "Message-ID", value = "<CAPApPh+WZszKg_bjQBPrS8TOvLA23hQkaa9Hocb_cgrQYs2R1w#mail.gmail.com>"), list(name = "Subject", value = "Re:"), list(name = "From", value = ""),list(name = "To", value = ""),list(name = "Cc", value = ""),list(name = "Content-Type", value = "multipart/alternative; boundary=\"000000000000f3d8810581cec99d\"")),
list(list(name = "MIME-Version", value = "1.0"), list(name = "Message-ID",value = "<CAPApPh+THbnCg2e5WmKEwQwHEEjKDHq3V6LkYV9oL88DbHE9Pg#mail.gmail.com>"),list(name = "Date",value = "Wed, 13 Feb 2019 12:18:32 -0800"), list(name = "Subject", value = ""), list(name = "From",value = "Daniel Seneca <senecad#gene.com>"), list(name = "To", value = "Daniel Seneca <seneca.daniel#gene.com>"),list(name = "Content-Type", value = "multipart/mixed; boundary=\"000000000000f11ad10581cc3e85\"")))
The question does not define what the output is supposed to look like so we will assume that it should be a list of lists. Remove the first layer using flatten as shown and then filter its elements using keep.
library(purrr)
my_list %>%
flatten %>%
keep(~ .x$name == "Date")
In base R this could be written:
Filter(function(x) x$name == "Date", do.call("c", my_list))
Link to data.
For my purposes, I downloaded the data from the above link and saved it as a JSON file.
json_convert <- do.call(rbind, lapply(paste(readLines("Myfile.json", warn=TRUE),
collapse=""),
jsonlite::fromJSON))
So far, I have managed to code the above. However, I am confused as to how I can convert this into a data frame. All help is appreciated.
Let's start by examining the data structure:
library(purrr)
library(tibble)
library(jsonlite)
my_json <- fromJSON("Myfile.json")
str(my_json)
List of 3
$ resource : chr "shotchartdetail"
$ parameters:List of 30
..$ LeagueID : chr "00"
..$ Season : chr "2017-18"
..$ SeasonType : chr "Regular Season"
..$ TeamID : int 1610612750
..$ PlayerID : int 0
..$ GameID : NULL
..$ Outcome : NULL
..$ Location : NULL
..$ Month : int 0
..$ SeasonSegment : NULL
..$ DateFrom : NULL
..$ DateTo : NULL
..$ OpponentTeamID: int 0
..$ VsConference : NULL
..$ VsDivision : NULL
..$ Position : NULL
..$ RookieYear : NULL
..$ GameSegment : NULL
..$ Period : int 0
..$ LastNGames : int 0
..$ ClutchTime : NULL
..$ AheadBehind : NULL
..$ PointDiff : NULL
..$ RangeType : int 0
..$ StartPeriod : int 1
..$ EndPeriod : int 10
..$ StartRange : int 0
..$ EndRange : int 28800
..$ ContextFilter : chr "SEASON_YEAR='2017-18'"
..$ ContextMeasure: chr "FGA"
$ resultSets:'data.frame': 2 obs. of 3 variables:
..$ name : chr [1:2] "Shot_Chart_Detail" "LeagueAverages"
..$ headers:List of 2
.. ..$ : chr [1:24] "GRID_TYPE" "GAME_ID" "GAME_EVENT_ID" "PLAYER_ID" ...
.. ..$ : chr [1:7] "GRID_TYPE" "SHOT_ZONE_BASIC" "SHOT_ZONE_AREA" "SHOT_ZONE_RANGE"
...
..$ rowSet :List of 2
.. ..$ : chr [1:7063, 1:24] "Shot Chart Detail" "Shot Chart Detail" "Shot Chart
Detail" "Shot Chart Detail" ...
.. ..$ : chr [1:20, 1:7] "League Averages" "League Averages" "League Averages" "League Averages" ...
Now you have to decide what it is that you want in your data frame.
I would assume that player statistics are in the first element of $rowSet (1:7063 = rows, 1:24 = columns) and the headers for those columns are in the first element of $resultSets$headers (1:24).
I'm sure there's a very elegant way to use the map functions in purrr. This isn't it, but it works:
my_list <- my_json %>%
flatten()
my_df <- my_list$rowSet[[1]] %>%
as.tibble() %>%
setNames(my_list$headers[[1]])
str(my_df)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 7063 obs. of 24 variables:
$ GRID_TYPE : chr "Shot Chart Detail" "Shot Chart Detail" "Shot Chart Detail" "Shot Chart Detail" ...
$ GAME_ID : chr "0021700011" "0021700011" "0021700011" "0021700011" ...
$ GAME_EVENT_ID : chr "10" "12" "16" "21" ...
$ PLAYER_ID : chr "1626157" "202710" "202710" "201959" ...
$ PLAYER_NAME : chr "Karl-Anthony Towns" "Jimmy Butler" "Jimmy Butler" "Taj Gibson" ...
$ TEAM_ID : chr "1610612750" "1610612750" "1610612750" "1610612750" ...
$ TEAM_NAME : chr "Minnesota Timberwolves" "Minnesota Timberwolves" "Minnesota Timberwolves" "Minnesota Timberwolves" ...
$ PERIOD : chr "1" "1" "1" "1" ...
$ MINUTES_REMAINING : chr "11" "11" "10" "10" ...
$ SECONDS_REMAINING : chr "14" "9" "32" "21" ...
$ EVENT_TYPE : chr "Missed Shot" "Made Shot" "Missed Shot" "Missed Shot"
...
$ ACTION_TYPE : chr "Jump Shot" "Jump Shot" "Driving Reverse Layup Shot" "Jump Shot" ...
$ SHOT_TYPE : chr "2PT Field Goal" "3PT Field Goal" "2PT Field Goal" "3PT Field Goal" ...
$ SHOT_ZONE_BASIC : chr "Mid-Range" "Above the Break 3" "Restricted Area" "Left Corner 3" ...
$ SHOT_ZONE_AREA : chr "Left Side Center(LC)" "Right Side Center(RC)" "Center(C)" "Left Side(L)" ...
$ SHOT_ZONE_RANGE : chr "16-24 ft." "24+ ft." "Less Than 8 ft." "24+ ft." ...
$ SHOT_DISTANCE : chr "20" "25" "1" "22" ...
$ LOC_X : chr "-113" "199" "-11" "-225" ...
$ LOC_Y : chr "169" "152" "6" "16" ...
$ SHOT_ATTEMPTED_FLAG: chr "1" "1" "1" "1" ...
$ SHOT_MADE_FLAG : chr "0" "1" "0" "0" ...
$ GAME_DATE : chr "20171018" "20171018" "20171018" "20171018" ...
$ HTM : chr "SAS" "SAS" "SAS" "SAS" ...
$ VTM : chr "MIN" "MIN" "MIN" "MIN" ...
I got this nested list :
dico <- list(list(list(c("dim.", "dimension", "dimensions", "mesures"
), c("45 cm", "45", "45 CM", "0.45m")), list(c("tamano", "volumen",
"dimension", "talla"), c("45 cm", "45", "0.45 M", "45 centimiento"
)), list(c("measures", "dimension", "measurement"), c("45 cm",
"0.45 m", "100 inches", "100 pouces"))), list(list(c("poids",
"poid", "poids net"), c("100 grammes", "100 gr", "100")), list(
c("peso", "carga", "peso especifico"), c("100 gramos", "100g",
"100", "100 g")), list(c("weight", "net wieght", "weight (grammes)"
), c("100 grams", "100", "100 g"))), list(list(c("Batterie oui/non",
"batterie", "présence batterie"), c("Oui", "batterie", "OUI"
)), list(c("bateria", "bateria si or no", "bateria disponible"
), c("si", "bateria furnindo", "1")), list(c("Battery available",
"battery", "battery yes or no"), c("yes", "Y", "Battery given"
))))
[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
[1] "dim." "dimension" "dimensions" "mesures"
[[1]][[1]][[2]]
[1] "45 cm" "45" "45 CM" "0.45m"
What I want is to create a list with the same structure but instead of having the original values, I want to have a sort of "index" name like :
[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
[1] "1|1|1|1" "1|1|1|2" "1|1|1|3" "1|1|1|4"
[[1]][[1]][[2]]
[1] "1|1|2|1" "1|1|2|2" "1|1|2|3" "1|1|2|4"
and so forth ...
Of course the number of elements is not constant through the different nested indexes. Anyone know how to do that? I heard about rapply but I could not make it.
Try this recursive function with a 2-line body. It does not assume a fixed depth and allows unbalanced lists. No packages are used.
It accepts an object L and a level. If the object is not a list then we have a leaf and it returns its levels. If the object is a list then we have a node and it iterates over its components invoking indexer on each passing the concatenation of lev, i and | for the ith component's level.
indexer <- function(L, lev = character(0)) {
if (!is.list(L)) paste0(lev, seq_along(L))
else Map(indexer, L, paste0(lev, seq_along(L), "|"))
}
Example 1 Using dico from the question
> str( indexer(dico) )
List of 3
$ :List of 3
..$ :List of 2
.. ..$ : chr [1:4] "1|1|1|1" "1|1|1|2" "1|1|1|3" "1|1|1|4"
.. ..$ : chr [1:4] "1|1|2|1" "1|1|2|2" "1|1|2|3" "1|1|2|4"
..$ :List of 2
.. ..$ : chr [1:4] "1|2|1|1" "1|2|1|2" "1|2|1|3" "1|2|1|4"
.. ..$ : chr [1:4] "1|2|2|1" "1|2|2|2" "1|2|2|3" "1|2|2|4"
..$ :List of 2
.. ..$ : chr [1:3] "1|3|1|1" "1|3|1|2" "1|3|1|3"
.. ..$ : chr [1:4] "1|3|2|1" "1|3|2|2" "1|3|2|3" "1|3|2|4"
$ :List of 3
..$ :List of 2
.. ..$ : chr [1:3] "2|1|1|1" "2|1|1|2" "2|1|1|3"
.. ..$ : chr [1:3] "2|1|2|1" "2|1|2|2" "2|1|2|3"
..$ :List of 2
.. ..$ : chr [1:3] "2|2|1|1" "2|2|1|2" "2|2|1|3"
.. ..$ : chr [1:4] "2|2|2|1" "2|2|2|2" "2|2|2|3" "2|2|2|4"
..$ :List of 2
.. ..$ : chr [1:3] "2|3|1|1" "2|3|1|2" "2|3|1|3"
.. ..$ : chr [1:3] "2|3|2|1" "2|3|2|2" "2|3|2|3"
$ :List of 3
..$ :List of 2
.. ..$ : chr [1:3] "3|1|1|1" "3|1|1|2" "3|1|1|3"
.. ..$ : chr [1:3] "3|1|2|1" "3|1|2|2" "3|1|2|3"
..$ :List of 2
.. ..$ : chr [1:3] "3|2|1|1" "3|2|1|2" "3|2|1|3"
.. ..$ : chr [1:3] "3|2|2|1" "3|2|2|2" "3|2|2|3"
..$ :List of 2
.. ..$ : chr [1:3] "3|3|1|1" "3|3|1|2" "3|3|1|3"
.. ..$ : chr [1:3] "3|3|2|1" "3|3|2|2" "3|3|2|3"
Example 2 Here is an example of a list with a different depth and lack of balance:
L <- list(list(1:3, 5:7), 9:10)
giving:
> str( indexer(L) )
List of 2
$ :List of 2
..$ : chr [1:3] "1|1|1" "1|1|2" "1|1|3"
..$ : chr [1:3] "1|2|1" "1|2|2" "1|2|3"
$ : chr [1:2] "2|1" "2|2"
We can use melt (from reshape2) to convert the nested list to a data.frame with the index columns ('L1', 'L2', 'L3') and the 'value' column, convert it to data.table (setDT(...)), grouped by 'L1', 'L2', 'L3', we get the sequence of rows (1:.N), paste the elements of the rows with do.call to a single vector, then relist it to a list with the same structure as that of 'dico' by specifying the skeleton.
library(data.table)
library(reshape2)
dico2 <- relist(do.call(paste, c(setDT(melt(dico))[, 1:.N ,
by = .(L1, L2, L3)], sep="|")), skeleton = dico)
dico2
#[[1]]
#[[1]][[1]]
#[[1]][[1]][[1]]
#[1] "1|1|1|1" "1|1|1|2" "1|1|1|3" "1|1|1|4"
#[[1]][[1]][[2]]
#[1] "1|1|2|1" "1|1|2|2" "1|1|2|3" "1|1|2|4"
#...
#[[3]][[3]]
#[[3]][[3]][[1]]
#[1] "3|3|1|1" "3|3|1|2" "3|3|1|3"
#[[3]][[3]][[2]]
#[1] "3|3|2|1" "3|3|2|2" "3|3|2|3"
After importing data from a JSON stream, I have a data frame that is 621 lists of the same 22 variables.
List of 621
$ :List of 22
..$ _id : chr "55c79e711cbee48856a30886"
..$ number : num 1
..$ country : chr "Yemen"
..$ date : chr "2002-11-03T00:00:00.000Z"
..$ narrative : chr ""
..$ town : chr ""
..$ location : chr ""
..$ deaths : chr "6"
..$ deaths_min : chr "6"
..$ deaths_max : chr "6"
..$ civilians : chr "0"
..$ injuries : chr ""
..$ children : chr ""
..$ tweet_id : chr "278544689483890688"
..$ bureau_id : chr "YEM001"
..$ bij_summary_short: chr ""
..$ bij_link : chr ""
..$ target : chr ""
..$ lat : chr "15.47467"
..$ lon : chr "45.322755"
..$ articles : list()
..$ names : chr ""| __truncated__
$ :List of 22
..$ _id : chr "55c79e711cbee48856a30887"
..$ number : num 2
..$ country : chr "Pakistan"
..$ date : chr "2004-06-17T00:00:00.000Z"
..$ narrative : chr ""
..$ town : chr ""
..$ location : chr ""
..$ deaths : chr "6-8"
..$ deaths_min : chr "6"
..$ deaths_max : chr "8"
..$ civilians : chr "2"
..$ injuries : chr "1"
..$ children : chr "2"
..$ tweet_id : chr "278544750867533824"
..$ bureau_id : chr "B1"
..$ bij_summary_short: chr ""| __truncated__
..$ bij_link : chr ""
..$ target : chr ""
..$ lat : chr "32.30512565"
..$ lon : chr "69.57624435"
..$ articles : list()
..$ names : chr ""
...
How can I combine these lists into one data frame of 621 observations of 22 variables? Notice that all 621 lists are unnamed.
edit: Per request, here is how I got this data set:
library(rjson)
url <- 'http://api.dronestre.am/data'
document <- fromJSON(file=url, method='C')
str(document$strike)
Can you provide example on how you generated the data ? I did not test the answer but, the following should help. If you can update the Q, on how you came up with the data, I can work to try that.
update
library(rjson)
library(data.table)
library(dplyr)
url <- 'http://api.dronestre.am/data'
document <- fromJSON(file=url, method='C')
is(document)
listdata<- document$strike
df<-do.call(rbind,listdata) %>% as.data.table
dim(df)
purrr has a useful transpose function which 'inverts' a list. The $articles element causes trouble as it appears always to be empty, and scuppers you when you try to convert to a data.frame, so I've subsetted for it.
library(purrr)
df <- transpose(document$strike) %>%
t %>%
apply(FUN = unlist, MARGIN = 2)
df <- df[-21] %>% data.frame %>% tbl_df
df
Source: local data frame [621 x 21]
X_id number country date
(fctr) (dbl) (fctr) (fctr)
1 55c79e711cbee48856a30886 1 Yemen 2002-11-03T00:00:00.000Z
2 55c79e711cbee48856a30887 2 Pakistan 2004-06-17T00:00:00.000Z
3 55c79e711cbee48856a30888 3 Pakistan 2005-05-08T00:00:00.000Z
4 55c79e721cbee48856a30889 4 Pakistan 2005-11-05T00:00:00.000Z
5 55c79e721cbee48856a3088a 5 Pakistan 2005-12-01T00:00:00.000Z
6 55c79e721cbee48856a3088b 6 Pakistan 2006-01-06T00:00:00.000Z
7 55c79e721cbee48856a3088c 7 Pakistan 2006-01-13T00:00:00.000Z
8 55c79e721cbee48856a3088d 8 Pakistan 2006-10-30T00:00:00.000Z
9 55c79e721cbee48856a3088e 9 Pakistan 2007-01-16T00:00:00.000Z
10 55c79e721cbee48856a3088f 10 Pakistan 2007-04-27T00:00:00.000Z
.. ... ... ... ...
Variables not shown: narrative (fctr), town (fctr), location (fctr), deaths
(fctr), deaths_min (fctr), deaths_max (fctr), civilians (fctr), injuries
(fctr), children (fctr), tweet_id (fctr), bureau_id (fctr), bij_summary_short
(fctr), bij_link (fctr), target (fctr), lat (fctr), lon (fctr), names (fctr)
I'm working on a R Project. While trying to analyse the sentiments, I had to create a data frame (Here in my ex, it is "sentiment.df").
sentiment.df <- data.frame(text, emotion=emotion, polarity=polarity, stringsAsFactors=FALSE)
Here, text - a list containing processed(cleaned) tweets split into keywords; emotion - contains a bag of emotions in characters; polarity - contains +ve,-ve critics. When running the above LOC my RStudio threw the following error:
Error in data.frame(c("httpstcoux1aacnxbk", "endalz"), c("i", "have", :
arguments imply differing number of rows: 2, 5, 19, 7, 1, 11, 4, 6, 9, 3, 13, 17, 8, 10, 24, 21, 15, 12, 25, 16, 20, 23, 18, 28, 14, 22, 26, 27, 30, 31, 29, 35
Length of those 3 variables - text, emotion & polarity are all same : 2621
This is how my data looks like:
> str(text)
List of 2621
$ : chr [1:2] "httpstcoux1aacnxbk" "endalz"
$ : chr [1:5] "i" "have" "the" "best" ...
$ : chr [1:19] "kenny" "easley" "seahawks" "captain" ...
$ : chr [1:2] "good" "defense"
$ : chr [1:7] "superbowlxlix" "party" "<U+FFFD><U+FFFD><U+FFFD><U+FFFD>""| __truncated__ "" ...
$ : chr "ihatetombrady"
$ : chr [1:11] "coachbourbonusa" "understood" "still" "dont" ...
$ : chr [1:19] "tiwaworks" "whitney" "houston" "sings" ...
$ : chr [1:4] "thats" "still" "bae" "<U+2764><U+FE0F>""| __truncated__
$ : chr [1:6] "were" "a" "thousand" "miles" ...
$ : chr [1:7] "dredoo24" "what" "i" "like" ...
$ : chr [1:2] "bww" "<U+FFFD><U+FFFD><U+FFFD><U+FFFD>""| __truncated__
$ : chr [1:9] "i" "seriously" "cant" "wait" ...
$ : chr [1:3] "flyysociety" "photoshoot<U+2716><U+FE0F>""| __truncated__ "httptcoxkywsj5i2x"
$ : chr [1:5] "lienne11" "wait" "whos" "performing" ...
$ : chr [1:13] "game" "on" "go" "wildcats<U+FFFD><U+FFFD>\u2b07<U+FE0F>""| __truncated__ ...
$ : chr [1:2] "good" "defense"
$ : chr [1:11] "seattle" "seahawks" "fan" "" ...
$ : chr [1:9] "realprestonj" "congratulations" "preston" "the" ...
$ : chr [1:5] "tsu19" "so" "funny" "bruh" ...
$ : chr [1:4] "drunk" "tweets" "coming" "soon"
$ : chr "tb12"
$ : chr [1:13] "hicksville" "schools" "will" "be" ...
$ : chr [1:5] "but" "momma" "said" "superbowl" ...
$ : chr [1:4] "raggedy" "ass" "bitch" ""
$ : chr [1:5] "arbyscares" "arbys" "prairie" "village" ...
$ : chr [1:17] "lovetruth79" "ltltltloves" "to" "send" ...
$ : chr [1:8] "“boynamedhxlz""| __truncated__ "quote" "this" "tweet" ...
$ : chr [1:13] "stretching" "for" "ballet" "now" ...
$ : chr [1:7] "jerrodflusche" "janabewley" "narnia" "for" ...
$ : chr [1:8] "here" "goes" "my" "whole" ...
$ : chr [1:10] "who" "you" "going" "for" ...
$ : chr [1:3] "good" "stop" "hawks"
$ : chr [1:5] "brady" "be" "smokin" "blounts" ...
$ : chr [1:8] "me" "decepcioné" "perdoné" "hice" ...
$ : chr [1:7] "happy21stbirthdayharry" "" "its" "also" ...
$ : chr [1:24] "teammic3rd" "sounds" "amazing" "" ...
$ : chr [1:21] "millions" "of" "people" "packed" ...
$ : chr [1:8] "missed" "idina" "singing" "by" ...
$ : chr [1:2] "your" "stupid"
$ : chr [1:5] "seahawks" "all" "the" "way" ...
$ : chr [1:4] "takeathillpill" "you" "are" "vile"
$ : chr [1:3] "lets" "goo" "superbowlixlix"
$ : chr [1:4] "snow" "day" "nigga" "<U+FFFD><U+FFFD><U+FFFD><U+FFFD>""| __truncated__
$ : chr [1:6] "ill" "just" "watch" "total" ...
$ : chr [1:9] "liveextra" "site" "down" "its" ...
$ : chr [1:3] "time" "to" "punt"
$ : chr [1:5] "zachdettloff516" "groans" "at" "terrible" ...
$ : chr [1:3] "go" "seahawks" "<U+FFFD><U+FFFD>""| __truncated__
$ : chr [1:7] "pizza" "friends" "super" "bowl" ...
$ : chr [1:9] "hold" "onto" "me" "cause" ...
$ : chr [1:6] "tom" "gonna" "get" "his" ...
$ : chr [1:6] "lets" "goooooo" "nice" "3rd" ...
$ : chr [1:15] "2" "fatal" "crashes" "reported" ...
$ : chr [1:12] "supra" "dope" "atx" "sundayfunday" ...
$ : chr [1:19] "all" "these" "students" "from" ...
$ : chr [1:3] "danstricko" "not" "happening"
$ : chr [1:17] "tom" "brady" "may" "wear" ...
$ : chr "httptconqabzdezwf"
$ : chr [1:4] "i" "miss" "you" "<U+FFFD><U+FFFD><U+FFFD><U+FFFD>""| __truncated__
$ : chr [1:25] "john" "legend" "and" "idina" ...
$ : chr [1:13] "snowed" "in" "with" "kadybuchler" ...
$ : chr [1:6] "that" "bright" "green" "and" ...
$ : chr [1:9] "ive" "got" "the" "seahawks" ...
$ : chr [1:9] "sds" "by" "mac" "miller" ...
$ : chr [1:5] "jakeski52" "rotowire" "or" "roger" ...
$ : chr "damnit"
$ : chr "hawks"
$ : chr [1:7] "my" "nephews" "and" "niece" ...
$ : chr [1:16] "liking" "your" "own" "posts" ...
$ : chr [1:2] "bailaconbruce" "fb"
$ : chr [1:4] "djones7" "hell" "no" "<U+FFFD><U+FFFD>""| __truncated__
$ : chr [1:7] "best" "part" "of" "the" ...
$ : chr [1:13] "holls016" "f" "u" "i" ...
$ : chr [1:6] "mikebarnicle" "nice" "to" "meet" ...
$ : chr [1:5] "u" "played" "me" "dirty" ...
$ : chr [1:13] "my" "bac" "is" "looking" ...
$ : chr [1:2] "est" "2008"
$ : chr [1:12] "vacation" "time" "" "thats" ...
$ : chr [1:3] "<U+FFFD><U+FFFD>""| __truncated__ "ok" "<U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD"| __truncated__
$ : chr [1:2] "common" "seattle"
$ : chr [1:3] "no" "cacc" "talc"
$ : chr "lob"
$ : chr [1:3] "cut" "the" "crap"
$ : chr [1:11] "im" "at" "las" "alitas" ...
$ : chr [1:3] "backstreets" "back" "alrighttttt"
$ : chr [1:6] "the" "seahawks" "are" "going" ...
$ : chr [1:13] "baby" "its" "cold" "outside" ...
$ : chr [1:15] "i" "have" "sooo" "much" ...
$ : chr [1:10] "so" "whos" "gonna" "pull" ...
$ : chr [1:5] "my" "driveway" "tonight" "nwiweather" ...
$ : chr "fuck"
$ : chr [1:21] "now" "that" "its" "actually" ...
$ : chr [1:7] "green" "goats" "<U+FFFD><U+FFFD>""| __truncated__ "" ...
$ : chr [1:15] "i" "guess" "its" "time" ...
$ : chr [1:3] "lets" "go" "seattle"
$ : chr [1:20] "jozybrambila7" "do" "you" "ever" ...
$ : chr [1:4] "reggiewo" "nice" "choice" "cheers"
$ : chr [1:20] "i" "enjoy" "super" "bowl" ...
[list output truncated]
> str(emotion)
chr [1:2621] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "joy" ...
> str(polarity)
chr [1:2621] "positive" "positive" "positive" "positive" "positive" "positive" "positive" ...
When I posted this error online, programmers say no.of rows & cols are not same. i.e., its not a square matrix and Dataframe will not work for a rectangular matrix.
Would be grateful if someone helps me out from this error.
Thanks in advance!
You have 2621 lists in the 'text' but not the same quantity of the text entries.
Each list may contains different numbers of the words.
Thus even unlist() won`t help you, because the amount of all words is greater than the number of entries in 'emotion' and 'polarity' vectors.