R list to wide (sparse) data frame - r

My first time here so I hope I don't break anything...
I have a list of lists:
Browse[2]> head(str(mylist))
List of 33
$ : chr [1:33] "0001" "space" "28" "night_club" ...
$ : chr [1:33] "0002" "concert" "28" "night_club" ...
$ : chr [1:31] "0003" "night_club" "24" "martial_arts" ...
$ : chr [1:31] "0004" "stage" "24" "basketball" ...
$ : chr [1:43] "0005" "night_club" "16" "concert" ...
$ : chr [1:43] "0006" "night_club" "16" "concert" ...
$ : chr [1:39] "0007" "night_club" "22" "concert" ...
$ : chr [1:39] "0008" "night_club" "22" "concert" ...
$ : chr [1:31] "0009" "night_club" "46" "martial_arts" ...
$ : chr [1:31] "0010" "night_club" "46" "martial_arts" ...
$ : chr [1:41] "0011" "night_club" "17" "martial_arts" ...
$ : chr [1:41] "0012" "night_club" "17" "martial_arts" ...
$ : chr [1:29] "0013" "concert" "23" "night_club" ...
$ : chr [1:29] "0014" "concert" "23" "night_club" ...
$ : chr [1:25] "0015" "night_club" "26" "concert" ...
$ : chr [1:31] "0016" "night_club" "42" "concert" ...
$ : chr [1:31] "0017" "night_club" "42" "concert" ...
$ : chr [1:31] "0018" "night_club" "25" "wrestling" ...
$ : chr [1:31] "0019" "night_club" "25" "wrestling" ...
$ : chr [1:33] "0020" "night_club" "46" "wrestling" ...
$ : chr [1:33] "0021" "night_club" "46" "wrestling" ...
$ : chr [1:41] "0022" "concert" "21" "stage" ...
$ : chr [1:41] "0023" "concert" "21" "stage" ...
$ : chr [1:55] "0024" "basketball" "8" "concert" ...
$ : chr [1:55] "0025" "basketball" "8" "concert" ...
$ : chr [1:37] "0026" "bald_person" "26" "martial_arts" ...
$ : chr [1:37] "0027" "bald_person" "26" "martial_arts" ...
$ : chr [1:37] "0028" "night_club" "32" "business_meeting" ...
$ : chr [1:37] "0029" "night_club" "32" "business_meeting" ...
$ : chr [1:15] "0030" "night_club" "59" "stage" ...
$ : chr [1:37] "0031" "stage" "12" "night_club" ...
$ : chr [1:37] "0032" "stage" "12" "night_club" ...
$ : chr [1:33] "0033" "night_club" "23" "portrait" ...
I want to turn this list into a wide format data frame where the first column would be each of every inner list first element (i.e. "0001", "0002" etc.) and there would be all possible columns with categories exist in the file:
"space", "night_club", "concert", "marital_arts", "wrestling" etc.
meaning that I would a very wide data frame that each row will begin with some id (0001,0002,0003 ...) and the columns names would be again all categories in the file: "space", "night_club", "concert", "marital_arts", "wrestling" etc. and for each row, where the category exists for that id, it would populate the value next to the category from the list ("space" -> 28 from the first line for example).
I was trying to construct a normalized data frame with loops and then convert it to a wide format, but as data scales it would be a bad idea:
for (file in files){# iterate over files in folder
mylist <- strsplit(readLines(file), ":")
#close(mylist)
for (elem in mylist){
dataframe <- data.frame(frameid = numeric(), category = character(), nrow = length(unlist(elem)))
frameid <- rep.int(elem[[1]], length(elem)-1)
categories <- elem[-1:-1]
dataframe$frameid <- frameid
dataframe$category <- categories
}
}
Reproducible input output example:
dput of input:
list(c("0001", "space", "28", "night_club", "25"), c("0002",
"concert", "28", "night_club", "26"), c("0003", "night_club",
"24", "martial_arts", "27"), c("0004", "stage", "24", "basketball",
"30"))
output:
Dataframe
frameid, cat_space, cat_night_club, cat_concert, cat_martial_arts, cat_stage, cat_basketball
0001, 28, 25, 0, 0, 0, 0
0002, 0, 26, 28, 0, 0, 0
0003, 0, 24, 0, 27, 0, 0
0004, 0, 0, 0, 0, 24, 30

Here's a possibility. I've created the answer as a function, and commented what is happening at each stage. The basic idea is to:
Create a column of just the first items from each list element.
Create a two-column matrix of the rest of the items. This assumes that the data are nicely paired.
Create a data.frame of these two elements put together.
Use xtabs to convert the output to a wide format. Note that if there are duplicated combinations of "ID" and "var", the values would be added together because of the use of xtabs.
Here's the function:
myFun <- function(inList) {
## Extract the first value in each list element
ID <- vapply(inList, `[`, character(1L), 1)
## Convert the remaining elements into a two column matrix, first
## column as variable, second column as value. Bind all list
## elements together to a single 2-column mantrix.
varval <- do.call(rbind, lapply(inList, function(x) {
matrix(x[-1], ncol = 2, byrow = TRUE, dimnames = list(NULL, c("var", "val")))
}))
## Create a data.frame where ID is repeated to the same number of rows
## as the matrices found in varval.
temp <- data.frame(ID = rep(ID, (lengths(inList)-1)/2), varval)
## Convert the val columns to numeric
temp$val <- as.numeric(as.character(temp$val))
## Use xtabs to go from a "long" form to a "wide" form
xtabs(val ~ ID + var, temp)
}
Here it is applied to your sample data (assuming your data is called "L"):
myFun(L)
# var
# ID basketball concert martial_arts night_club space stage
# 0001 0 0 0 25 28 0
# 0002 0 28 0 26 0 0
# 0003 0 0 27 24 0 0
# 0004 30 0 0 0 0 24

Related

How do I convert a JSON file to a data frame in R?

Link to data.
For my purposes, I downloaded the data from the above link and saved it as a JSON file.
json_convert <- do.call(rbind, lapply(paste(readLines("Myfile.json", warn=TRUE),
collapse=""),
jsonlite::fromJSON))
So far, I have managed to code the above. However, I am confused as to how I can convert this into a data frame. All help is appreciated.
Let's start by examining the data structure:
library(purrr)
library(tibble)
library(jsonlite)
my_json <- fromJSON("Myfile.json")
str(my_json)
List of 3
$ resource : chr "shotchartdetail"
$ parameters:List of 30
..$ LeagueID : chr "00"
..$ Season : chr "2017-18"
..$ SeasonType : chr "Regular Season"
..$ TeamID : int 1610612750
..$ PlayerID : int 0
..$ GameID : NULL
..$ Outcome : NULL
..$ Location : NULL
..$ Month : int 0
..$ SeasonSegment : NULL
..$ DateFrom : NULL
..$ DateTo : NULL
..$ OpponentTeamID: int 0
..$ VsConference : NULL
..$ VsDivision : NULL
..$ Position : NULL
..$ RookieYear : NULL
..$ GameSegment : NULL
..$ Period : int 0
..$ LastNGames : int 0
..$ ClutchTime : NULL
..$ AheadBehind : NULL
..$ PointDiff : NULL
..$ RangeType : int 0
..$ StartPeriod : int 1
..$ EndPeriod : int 10
..$ StartRange : int 0
..$ EndRange : int 28800
..$ ContextFilter : chr "SEASON_YEAR='2017-18'"
..$ ContextMeasure: chr "FGA"
$ resultSets:'data.frame': 2 obs. of 3 variables:
..$ name : chr [1:2] "Shot_Chart_Detail" "LeagueAverages"
..$ headers:List of 2
.. ..$ : chr [1:24] "GRID_TYPE" "GAME_ID" "GAME_EVENT_ID" "PLAYER_ID" ...
.. ..$ : chr [1:7] "GRID_TYPE" "SHOT_ZONE_BASIC" "SHOT_ZONE_AREA" "SHOT_ZONE_RANGE"
...
..$ rowSet :List of 2
.. ..$ : chr [1:7063, 1:24] "Shot Chart Detail" "Shot Chart Detail" "Shot Chart
Detail" "Shot Chart Detail" ...
.. ..$ : chr [1:20, 1:7] "League Averages" "League Averages" "League Averages" "League Averages" ...
Now you have to decide what it is that you want in your data frame.
I would assume that player statistics are in the first element of $rowSet (1:7063 = rows, 1:24 = columns) and the headers for those columns are in the first element of $resultSets$headers (1:24).
I'm sure there's a very elegant way to use the map functions in purrr. This isn't it, but it works:
my_list <- my_json %>%
flatten()
my_df <- my_list$rowSet[[1]] %>%
as.tibble() %>%
setNames(my_list$headers[[1]])
str(my_df)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 7063 obs. of 24 variables:
$ GRID_TYPE : chr "Shot Chart Detail" "Shot Chart Detail" "Shot Chart Detail" "Shot Chart Detail" ...
$ GAME_ID : chr "0021700011" "0021700011" "0021700011" "0021700011" ...
$ GAME_EVENT_ID : chr "10" "12" "16" "21" ...
$ PLAYER_ID : chr "1626157" "202710" "202710" "201959" ...
$ PLAYER_NAME : chr "Karl-Anthony Towns" "Jimmy Butler" "Jimmy Butler" "Taj Gibson" ...
$ TEAM_ID : chr "1610612750" "1610612750" "1610612750" "1610612750" ...
$ TEAM_NAME : chr "Minnesota Timberwolves" "Minnesota Timberwolves" "Minnesota Timberwolves" "Minnesota Timberwolves" ...
$ PERIOD : chr "1" "1" "1" "1" ...
$ MINUTES_REMAINING : chr "11" "11" "10" "10" ...
$ SECONDS_REMAINING : chr "14" "9" "32" "21" ...
$ EVENT_TYPE : chr "Missed Shot" "Made Shot" "Missed Shot" "Missed Shot"
...
$ ACTION_TYPE : chr "Jump Shot" "Jump Shot" "Driving Reverse Layup Shot" "Jump Shot" ...
$ SHOT_TYPE : chr "2PT Field Goal" "3PT Field Goal" "2PT Field Goal" "3PT Field Goal" ...
$ SHOT_ZONE_BASIC : chr "Mid-Range" "Above the Break 3" "Restricted Area" "Left Corner 3" ...
$ SHOT_ZONE_AREA : chr "Left Side Center(LC)" "Right Side Center(RC)" "Center(C)" "Left Side(L)" ...
$ SHOT_ZONE_RANGE : chr "16-24 ft." "24+ ft." "Less Than 8 ft." "24+ ft." ...
$ SHOT_DISTANCE : chr "20" "25" "1" "22" ...
$ LOC_X : chr "-113" "199" "-11" "-225" ...
$ LOC_Y : chr "169" "152" "6" "16" ...
$ SHOT_ATTEMPTED_FLAG: chr "1" "1" "1" "1" ...
$ SHOT_MADE_FLAG : chr "0" "1" "0" "0" ...
$ GAME_DATE : chr "20171018" "20171018" "20171018" "20171018" ...
$ HTM : chr "SAS" "SAS" "SAS" "SAS" ...
$ VTM : chr "MIN" "MIN" "MIN" "MIN" ...

Naming elements of a nested list from their index in R

I got this nested list :
dico <- list(list(list(c("dim.", "dimension", "dimensions", "mesures"
), c("45 cm", "45", "45 CM", "0.45m")), list(c("tamano", "volumen",
"dimension", "talla"), c("45 cm", "45", "0.45 M", "45 centimiento"
)), list(c("measures", "dimension", "measurement"), c("45 cm",
"0.45 m", "100 inches", "100 pouces"))), list(list(c("poids",
"poid", "poids net"), c("100 grammes", "100 gr", "100")), list(
c("peso", "carga", "peso especifico"), c("100 gramos", "100g",
"100", "100 g")), list(c("weight", "net wieght", "weight (grammes)"
), c("100 grams", "100", "100 g"))), list(list(c("Batterie oui/non",
"batterie", "présence batterie"), c("Oui", "batterie", "OUI"
)), list(c("bateria", "bateria si or no", "bateria disponible"
), c("si", "bateria furnindo", "1")), list(c("Battery available",
"battery", "battery yes or no"), c("yes", "Y", "Battery given"
))))
[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
[1] "dim." "dimension" "dimensions" "mesures"
[[1]][[1]][[2]]
[1] "45 cm" "45" "45 CM" "0.45m"
What I want is to create a list with the same structure but instead of having the original values, I want to have a sort of "index" name like :
[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
[1] "1|1|1|1" "1|1|1|2" "1|1|1|3" "1|1|1|4"
[[1]][[1]][[2]]
[1] "1|1|2|1" "1|1|2|2" "1|1|2|3" "1|1|2|4"
and so forth ...
Of course the number of elements is not constant through the different nested indexes. Anyone know how to do that? I heard about rapply but I could not make it.
Try this recursive function with a 2-line body. It does not assume a fixed depth and allows unbalanced lists. No packages are used.
It accepts an object L and a level. If the object is not a list then we have a leaf and it returns its levels. If the object is a list then we have a node and it iterates over its components invoking indexer on each passing the concatenation of lev, i and | for the ith component's level.
indexer <- function(L, lev = character(0)) {
if (!is.list(L)) paste0(lev, seq_along(L))
else Map(indexer, L, paste0(lev, seq_along(L), "|"))
}
Example 1 Using dico from the question
> str( indexer(dico) )
List of 3
$ :List of 3
..$ :List of 2
.. ..$ : chr [1:4] "1|1|1|1" "1|1|1|2" "1|1|1|3" "1|1|1|4"
.. ..$ : chr [1:4] "1|1|2|1" "1|1|2|2" "1|1|2|3" "1|1|2|4"
..$ :List of 2
.. ..$ : chr [1:4] "1|2|1|1" "1|2|1|2" "1|2|1|3" "1|2|1|4"
.. ..$ : chr [1:4] "1|2|2|1" "1|2|2|2" "1|2|2|3" "1|2|2|4"
..$ :List of 2
.. ..$ : chr [1:3] "1|3|1|1" "1|3|1|2" "1|3|1|3"
.. ..$ : chr [1:4] "1|3|2|1" "1|3|2|2" "1|3|2|3" "1|3|2|4"
$ :List of 3
..$ :List of 2
.. ..$ : chr [1:3] "2|1|1|1" "2|1|1|2" "2|1|1|3"
.. ..$ : chr [1:3] "2|1|2|1" "2|1|2|2" "2|1|2|3"
..$ :List of 2
.. ..$ : chr [1:3] "2|2|1|1" "2|2|1|2" "2|2|1|3"
.. ..$ : chr [1:4] "2|2|2|1" "2|2|2|2" "2|2|2|3" "2|2|2|4"
..$ :List of 2
.. ..$ : chr [1:3] "2|3|1|1" "2|3|1|2" "2|3|1|3"
.. ..$ : chr [1:3] "2|3|2|1" "2|3|2|2" "2|3|2|3"
$ :List of 3
..$ :List of 2
.. ..$ : chr [1:3] "3|1|1|1" "3|1|1|2" "3|1|1|3"
.. ..$ : chr [1:3] "3|1|2|1" "3|1|2|2" "3|1|2|3"
..$ :List of 2
.. ..$ : chr [1:3] "3|2|1|1" "3|2|1|2" "3|2|1|3"
.. ..$ : chr [1:3] "3|2|2|1" "3|2|2|2" "3|2|2|3"
..$ :List of 2
.. ..$ : chr [1:3] "3|3|1|1" "3|3|1|2" "3|3|1|3"
.. ..$ : chr [1:3] "3|3|2|1" "3|3|2|2" "3|3|2|3"
Example 2 Here is an example of a list with a different depth and lack of balance:
L <- list(list(1:3, 5:7), 9:10)
giving:
> str( indexer(L) )
List of 2
$ :List of 2
..$ : chr [1:3] "1|1|1" "1|1|2" "1|1|3"
..$ : chr [1:3] "1|2|1" "1|2|2" "1|2|3"
$ : chr [1:2] "2|1" "2|2"
We can use melt (from reshape2) to convert the nested list to a data.frame with the index columns ('L1', 'L2', 'L3') and the 'value' column, convert it to data.table (setDT(...)), grouped by 'L1', 'L2', 'L3', we get the sequence of rows (1:.N), paste the elements of the rows with do.call to a single vector, then relist it to a list with the same structure as that of 'dico' by specifying the skeleton.
library(data.table)
library(reshape2)
dico2 <- relist(do.call(paste, c(setDT(melt(dico))[, 1:.N ,
by = .(L1, L2, L3)], sep="|")), skeleton = dico)
dico2
#[[1]]
#[[1]][[1]]
#[[1]][[1]][[1]]
#[1] "1|1|1|1" "1|1|1|2" "1|1|1|3" "1|1|1|4"
#[[1]][[1]][[2]]
#[1] "1|1|2|1" "1|1|2|2" "1|1|2|3" "1|1|2|4"
#...
#[[3]][[3]]
#[[3]][[3]][[1]]
#[1] "3|3|1|1" "3|3|1|2" "3|3|1|3"
#[[3]][[3]][[2]]
#[1] "3|3|2|1" "3|3|2|2" "3|3|2|3"

r: merge list of unnamed data sets

After importing data from a JSON stream, I have a data frame that is 621 lists of the same 22 variables.
List of 621
$ :List of 22
..$ _id : chr "55c79e711cbee48856a30886"
..$ number : num 1
..$ country : chr "Yemen"
..$ date : chr "2002-11-03T00:00:00.000Z"
..$ narrative : chr ""
..$ town : chr ""
..$ location : chr ""
..$ deaths : chr "6"
..$ deaths_min : chr "6"
..$ deaths_max : chr "6"
..$ civilians : chr "0"
..$ injuries : chr ""
..$ children : chr ""
..$ tweet_id : chr "278544689483890688"
..$ bureau_id : chr "YEM001"
..$ bij_summary_short: chr ""
..$ bij_link : chr ""
..$ target : chr ""
..$ lat : chr "15.47467"
..$ lon : chr "45.322755"
..$ articles : list()
..$ names : chr ""| __truncated__
$ :List of 22
..$ _id : chr "55c79e711cbee48856a30887"
..$ number : num 2
..$ country : chr "Pakistan"
..$ date : chr "2004-06-17T00:00:00.000Z"
..$ narrative : chr ""
..$ town : chr ""
..$ location : chr ""
..$ deaths : chr "6-8"
..$ deaths_min : chr "6"
..$ deaths_max : chr "8"
..$ civilians : chr "2"
..$ injuries : chr "1"
..$ children : chr "2"
..$ tweet_id : chr "278544750867533824"
..$ bureau_id : chr "B1"
..$ bij_summary_short: chr ""| __truncated__
..$ bij_link : chr ""
..$ target : chr ""
..$ lat : chr "32.30512565"
..$ lon : chr "69.57624435"
..$ articles : list()
..$ names : chr ""
...
How can I combine these lists into one data frame of 621 observations of 22 variables? Notice that all 621 lists are unnamed.
edit: Per request, here is how I got this data set:
library(rjson)
url <- 'http://api.dronestre.am/data'
document <- fromJSON(file=url, method='C')
str(document$strike)
Can you provide example on how you generated the data ? I did not test the answer but, the following should help. If you can update the Q, on how you came up with the data, I can work to try that.
update
library(rjson)
library(data.table)
library(dplyr)
url <- 'http://api.dronestre.am/data'
document <- fromJSON(file=url, method='C')
is(document)
listdata<- document$strike
df<-do.call(rbind,listdata) %>% as.data.table
dim(df)
purrr has a useful transpose function which 'inverts' a list. The $articles element causes trouble as it appears always to be empty, and scuppers you when you try to convert to a data.frame, so I've subsetted for it.
library(purrr)
df <- transpose(document$strike) %>%
t %>%
apply(FUN = unlist, MARGIN = 2)
df <- df[-21] %>% data.frame %>% tbl_df
df
Source: local data frame [621 x 21]
X_id number country date
(fctr) (dbl) (fctr) (fctr)
1 55c79e711cbee48856a30886 1 Yemen 2002-11-03T00:00:00.000Z
2 55c79e711cbee48856a30887 2 Pakistan 2004-06-17T00:00:00.000Z
3 55c79e711cbee48856a30888 3 Pakistan 2005-05-08T00:00:00.000Z
4 55c79e721cbee48856a30889 4 Pakistan 2005-11-05T00:00:00.000Z
5 55c79e721cbee48856a3088a 5 Pakistan 2005-12-01T00:00:00.000Z
6 55c79e721cbee48856a3088b 6 Pakistan 2006-01-06T00:00:00.000Z
7 55c79e721cbee48856a3088c 7 Pakistan 2006-01-13T00:00:00.000Z
8 55c79e721cbee48856a3088d 8 Pakistan 2006-10-30T00:00:00.000Z
9 55c79e721cbee48856a3088e 9 Pakistan 2007-01-16T00:00:00.000Z
10 55c79e721cbee48856a3088f 10 Pakistan 2007-04-27T00:00:00.000Z
.. ... ... ... ...
Variables not shown: narrative (fctr), town (fctr), location (fctr), deaths
(fctr), deaths_min (fctr), deaths_max (fctr), civilians (fctr), injuries
(fctr), children (fctr), tweet_id (fctr), bureau_id (fctr), bij_summary_short
(fctr), bij_link (fctr), target (fctr), lat (fctr), lon (fctr), names (fctr)

Error in data.frame: arguments imply differing number of rows: 2, 5, 19, 7, 1, 11, 4, 6, 9, 3, 13 14, 22, 26, 27, 30, 31, 29, 35

I'm working on a R Project. While trying to analyse the sentiments, I had to create a data frame (Here in my ex, it is "sentiment.df").
sentiment.df <- data.frame(text, emotion=emotion, polarity=polarity, stringsAsFactors=FALSE)
Here, text - a list containing processed(cleaned) tweets split into keywords; emotion - contains a bag of emotions in characters; polarity - contains +ve,-ve critics. When running the above LOC my RStudio threw the following error:
Error in data.frame(c("httpstcoux1aacnxbk", "endalz"), c("i", "have", :
arguments imply differing number of rows: 2, 5, 19, 7, 1, 11, 4, 6, 9, 3, 13, 17, 8, 10, 24, 21, 15, 12, 25, 16, 20, 23, 18, 28, 14, 22, 26, 27, 30, 31, 29, 35
Length of those 3 variables - text, emotion & polarity are all same : 2621
This is how my data looks like:
> str(text)
List of 2621
$ : chr [1:2] "httpstcoux1aacnxbk" "endalz"
$ : chr [1:5] "i" "have" "the" "best" ...
$ : chr [1:19] "kenny" "easley" "seahawks" "captain" ...
$ : chr [1:2] "good" "defense"
$ : chr [1:7] "superbowlxlix" "party" "<U+FFFD><U+FFFD><U+FFFD><U+FFFD>""| __truncated__ "" ...
$ : chr "ihatetombrady"
$ : chr [1:11] "coachbourbonusa" "understood" "still" "dont" ...
$ : chr [1:19] "tiwaworks" "whitney" "houston" "sings" ...
$ : chr [1:4] "thats" "still" "bae" "<U+2764><U+FE0F>""| __truncated__
$ : chr [1:6] "were" "a" "thousand" "miles" ...
$ : chr [1:7] "dredoo24" "what" "i" "like" ...
$ : chr [1:2] "bww" "<U+FFFD><U+FFFD><U+FFFD><U+FFFD>""| __truncated__
$ : chr [1:9] "i" "seriously" "cant" "wait" ...
$ : chr [1:3] "flyysociety" "photoshoot<U+2716><U+FE0F>""| __truncated__ "httptcoxkywsj5i2x"
$ : chr [1:5] "lienne11" "wait" "whos" "performing" ...
$ : chr [1:13] "game" "on" "go" "wildcats<U+FFFD><U+FFFD>\u2b07<U+FE0F>""| __truncated__ ...
$ : chr [1:2] "good" "defense"
$ : chr [1:11] "seattle" "seahawks" "fan" "" ...
$ : chr [1:9] "realprestonj" "congratulations" "preston" "the" ...
$ : chr [1:5] "tsu19" "so" "funny" "bruh" ...
$ : chr [1:4] "drunk" "tweets" "coming" "soon"
$ : chr "tb12"
$ : chr [1:13] "hicksville" "schools" "will" "be" ...
$ : chr [1:5] "but" "momma" "said" "superbowl" ...
$ : chr [1:4] "raggedy" "ass" "bitch" ""
$ : chr [1:5] "arbyscares" "arbys" "prairie" "village" ...
$ : chr [1:17] "lovetruth79" "ltltltloves" "to" "send" ...
$ : chr [1:8] "“boynamedhxlz""| __truncated__ "quote" "this" "tweet" ...
$ : chr [1:13] "stretching" "for" "ballet" "now" ...
$ : chr [1:7] "jerrodflusche" "janabewley" "narnia" "for" ...
$ : chr [1:8] "here" "goes" "my" "whole" ...
$ : chr [1:10] "who" "you" "going" "for" ...
$ : chr [1:3] "good" "stop" "hawks"
$ : chr [1:5] "brady" "be" "smokin" "blounts" ...
$ : chr [1:8] "me" "decepcioné" "perdoné" "hice" ...
$ : chr [1:7] "happy21stbirthdayharry" "" "its" "also" ...
$ : chr [1:24] "teammic3rd" "sounds" "amazing" "" ...
$ : chr [1:21] "millions" "of" "people" "packed" ...
$ : chr [1:8] "missed" "idina" "singing" "by" ...
$ : chr [1:2] "your" "stupid"
$ : chr [1:5] "seahawks" "all" "the" "way" ...
$ : chr [1:4] "takeathillpill" "you" "are" "vile"
$ : chr [1:3] "lets" "goo" "superbowlixlix"
$ : chr [1:4] "snow" "day" "nigga" "<U+FFFD><U+FFFD><U+FFFD><U+FFFD>""| __truncated__
$ : chr [1:6] "ill" "just" "watch" "total" ...
$ : chr [1:9] "liveextra" "site" "down" "its" ...
$ : chr [1:3] "time" "to" "punt"
$ : chr [1:5] "zachdettloff516" "groans" "at" "terrible" ...
$ : chr [1:3] "go" "seahawks" "<U+FFFD><U+FFFD>""| __truncated__
$ : chr [1:7] "pizza" "friends" "super" "bowl" ...
$ : chr [1:9] "hold" "onto" "me" "cause" ...
$ : chr [1:6] "tom" "gonna" "get" "his" ...
$ : chr [1:6] "lets" "goooooo" "nice" "3rd" ...
$ : chr [1:15] "2" "fatal" "crashes" "reported" ...
$ : chr [1:12] "supra" "dope" "atx" "sundayfunday" ...
$ : chr [1:19] "all" "these" "students" "from" ...
$ : chr [1:3] "danstricko" "not" "happening"
$ : chr [1:17] "tom" "brady" "may" "wear" ...
$ : chr "httptconqabzdezwf"
$ : chr [1:4] "i" "miss" "you" "<U+FFFD><U+FFFD><U+FFFD><U+FFFD>""| __truncated__
$ : chr [1:25] "john" "legend" "and" "idina" ...
$ : chr [1:13] "snowed" "in" "with" "kadybuchler" ...
$ : chr [1:6] "that" "bright" "green" "and" ...
$ : chr [1:9] "ive" "got" "the" "seahawks" ...
$ : chr [1:9] "sds" "by" "mac" "miller" ...
$ : chr [1:5] "jakeski52" "rotowire" "or" "roger" ...
$ : chr "damnit"
$ : chr "hawks"
$ : chr [1:7] "my" "nephews" "and" "niece" ...
$ : chr [1:16] "liking" "your" "own" "posts" ...
$ : chr [1:2] "bailaconbruce" "fb"
$ : chr [1:4] "djones7" "hell" "no" "<U+FFFD><U+FFFD>""| __truncated__
$ : chr [1:7] "best" "part" "of" "the" ...
$ : chr [1:13] "holls016" "f" "u" "i" ...
$ : chr [1:6] "mikebarnicle" "nice" "to" "meet" ...
$ : chr [1:5] "u" "played" "me" "dirty" ...
$ : chr [1:13] "my" "bac" "is" "looking" ...
$ : chr [1:2] "est" "2008"
$ : chr [1:12] "vacation" "time" "" "thats" ...
$ : chr [1:3] "<U+FFFD><U+FFFD>""| __truncated__ "ok" "<U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD><U+FFFD"| __truncated__
$ : chr [1:2] "common" "seattle"
$ : chr [1:3] "no" "cacc" "talc"
$ : chr "lob"
$ : chr [1:3] "cut" "the" "crap"
$ : chr [1:11] "im" "at" "las" "alitas" ...
$ : chr [1:3] "backstreets" "back" "alrighttttt"
$ : chr [1:6] "the" "seahawks" "are" "going" ...
$ : chr [1:13] "baby" "its" "cold" "outside" ...
$ : chr [1:15] "i" "have" "sooo" "much" ...
$ : chr [1:10] "so" "whos" "gonna" "pull" ...
$ : chr [1:5] "my" "driveway" "tonight" "nwiweather" ...
$ : chr "fuck"
$ : chr [1:21] "now" "that" "its" "actually" ...
$ : chr [1:7] "green" "goats" "<U+FFFD><U+FFFD>""| __truncated__ "" ...
$ : chr [1:15] "i" "guess" "its" "time" ...
$ : chr [1:3] "lets" "go" "seattle"
$ : chr [1:20] "jozybrambila7" "do" "you" "ever" ...
$ : chr [1:4] "reggiewo" "nice" "choice" "cheers"
$ : chr [1:20] "i" "enjoy" "super" "bowl" ...
[list output truncated]
> str(emotion)
chr [1:2621] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "joy" ...
> str(polarity)
chr [1:2621] "positive" "positive" "positive" "positive" "positive" "positive" "positive" ...
When I posted this error online, programmers say no.of rows & cols are not same. i.e., its not a square matrix and Dataframe will not work for a rectangular matrix.
Would be grateful if someone helps me out from this error.
Thanks in advance!
You have 2621 lists in the 'text' but not the same quantity of the text entries.
Each list may contains different numbers of the words.
Thus even unlist() won`t help you, because the amount of all words is greater than the number of entries in 'emotion' and 'polarity' vectors.

Extracting component from all objects within an indexed list

I have an indexed list containing several objects each of which contains 3 matrices ($tab, $nobs and $other). There are hundred such objects in the list. The objective is to access only $tab matrix and transpose it from each of the objects.
genfreqT <- lapply(genfreq[[1:100]]$tab, function(x) t(x))
This does not seem to work.
Here is how the genfreq object is structured. This was created with R package adegenet.
> str(genfreq[[1]])
List of 3
$ tab : num [1:30, 1:1974] 0.6 0.5 0.325 0.675 0.6 0.5 0.5 0.375 0.55 0.475 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : Named chr [1:30] "1" "2" "3" "4" ...
.. .. ..- attr(*, "names")= chr [1:30] "01" "02" "03" "04" ...
.. ..$ : chr [1:1974] "L0001.1" "L0001.2" "L0002.1" "L0002.2" ...
$ nobs: num [1:30, 1:1000] 40 40 40 40 40 40 40 40 40 40 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : Named chr [1:30] "1" "2" "3" "4" ...
.. .. ..- attr(*, "names")= chr [1:30] "01" "02" "03" "04" ...
.. ..$ : Named chr [1:1000] "L0001" "L0002" "L0003" "L0004" ...
.. .. ..- attr(*, "names")= chr [1:1000] "L0001" "L0002" "L0003" "L0004" ...
$ call: language makefreq(x = x, truenames = TRUE)
genfreqT <-lapply(lapply(genfreq, "[[", "tab"),function(x) t(x))
The package developer for 'Adegenet' provided this solution:
> genfreqT <- lapply(genfreq, function(e) t(e$tab))
> summary(genfreqT)
Length Class Mode
data1.str 59220 -none- numeric
data2.str 59220 -none- numeric
data3.str 59220 -none- numeric

Resources