I am looking at vote data and it's a nested list. I am trying to get multiple variable on each element of my list (example bellow )
So for each element "vote" i am trying to get the uid and the list of individual that vote for or against ("pours" and "contre" ) the law.
I try to simplify the original data ( can be found here )
This is the simplified list i came up with :
scrutin1_detail<-list(uid="VTANR5L14V1",organref="P0644420")
scrutin1_vote1_for<-list(acteurref="PA1816",mandatRef="PM645051")
scrutin1_vote2_for<-list(acteurref="PA1817",mandatRef="PM645052")
scrutin1_vote3_for<-list(acteurref="PA1818",mandatRef="PM645053")
scrutin1_vote_for<-list(scrutin1_vote1_for,scrutin1_vote2_for,scrutin1_vote3_for)
scrutin1_vote1_against<-list(acteurref="PA1816",mandatRef="PM645051")
scrutin1_vote2_against<-list(acteurref="PA1817",mandatRef="PM645052")
scrutin1_vote3_against<-list(acteurref="PA1818",mandatRef="PM645053")
scrutin1_vote_against<-list(scrutin1_vote1_against,scrutin1_vote2_against,scrutin1_vote3_against)
votant1<-list(pours=scrutin1_vote_for,contres=scrutin1_vote_against)
vote1<-list(decompte_nominatif=votant1)
ventilationVotes1<-list(vote=vote1)
scrutin1<-list(scrutin1_detail,list(ventilationVotes=ventilationVotes1))
# Scrutin 2
scrutin2_detail<-list(uid="VTANR5L14V5",organref="P0644423")
scrutin2_vote1_for<-list(acteurref="PA1816",mandatRef="PM645051")
scrutin2_vote2_for<-list(acteurref="PA1817",mandatRef="PM645052")
scrutin2_vote3_for<-list(acteurref="PA1818",mandatRef="PM645053")
scrutin2_vote_for<-list(scrutin1_vote1_for,scrutin1_vote2_for,scrutin1_vote3_for)
scrutin2_vote1_against<-list(acteurref="PA1816",mandatRef="PM645051")
scrutin2_vote2_against<-list(acteurref="PA1817",mandatRef="PM645052")
scrutin2_vote3_against<-list(acteurref="PA1818",mandatRef="PM645053")
scrutin2_vote_against<-list(scrutin2_vote1_against,scrutin2_vote2_against,scrutin2_vote3_against)
scrutin2_votant1<-list(pours=scrutin2_vote_for,contres=scrutin2_vote_against)
scrutin2_vote1<-list(decompte_nominatif=scrutin2_votant1)
scrutin2_ventilationVotes1<-list(vote=scrutin2_vote1)
scrutin2<-list(scrutin2_detail,list(ventilationVotes=scrutin2_ventilationVotes1))
scrutins<-list(scrutins=list(scrutin=list(scrutin1,scrutin2)))
So i am looking at the end ( but i am really interested to understand how to do it as i run into this problem quite often ) to build a dataframe with these column :
uid
For/against (if it was in the list "pour"(for) or "contre" (against)
-acteurref
-mandatref
Sadly I don't speak (or read French) and so am not able to make many correct guesses as to the meaning of names of items in the object constructed using alistaire's suggestion:
library(jsonlite)
scrutin1_detail <- fromJSON("~/Downloads/Scrutins_XIV.json")
> length(scrutin1_detail[[1]])
[1] 1
> length(scrutin1_detail[[1]][[1]])
[1] 18
> names(scrutin1_detail[[1]][[1]])
[1] "#xmlns:xsi" "uid"
[3] "numero" "organeRef"
[5] "legislature" "sessionRef"
[7] "seanceRef" "dateScrutin"
[9] "quantiemeJourSeance" "typeVote"
[11] "sort" "titre"
[13] "demandeur" "objet"
[15] "modePublicationDesVotes" "syntheseVote"
[17] "ventilationVotes" "miseAuPoint"
> str(scrutin1_detail[[1]][[1]]$uid)
chr [1:1219] "VTANR5L14V1" "VTANR5L14V2" "VTANR5L14V3" ...
> table( scrutin1_detail[[1]][[1]]$organeRef)
PO644420
1219
> table( scrutin1_detail[[1]][[1]]$sessionRef)
SCR5A2012E1 SCR5A2012E2 SCR5A2013E1 SCR5A2013E3 SCR5A2013O1 SCR5A2014E1
15 5 42 4 529 50
SCR5A2014E2 SCR5A2014O1 SCR5A2015E1 SCR5A2015E2 SCR5A2015O1 SCR5A2016O1
7 253 18 5 236 55
Maybe you should help us Anglophones to make sense of this. It is very beneficial to provide context rather than just code.
Related
Asked to review questions for grammar and spelling:
library(XML)
tbls_all <- readHTMLTable(url_v9)
length(tbls_all)
[1] 34
names(tbls_all)
[1] "NULL" "DisplayedQuestions" "AllQuestions1"
[4] "HiddenAnswerTable1" "HiddenAnswerTable1" "AllQuestions2"
[7] "HiddenAnswerTable2" "HiddenAnswerTable2" "AllQuestions3"
[10] "HiddenAnswerTable3" "HiddenAnswerTable3" "AllQuestions4"
[13] "HiddenAnswerTable4" "HiddenAnswerTable4" "AllQuestions5"
[16] "HiddenAnswerTable5" "HiddenAnswerTable5" "AllQuestions6"
[19] "HiddenAnswerTable6" "HiddenAnswerTable6" "AllQuestions7"
[22] "HiddenAnswerTable7" "HiddenAnswerTable7" "AllQuestions8"
[25] "HiddenAnswerTable8" "HiddenAnswerTable8" "AllQuestions9"
[28] "HiddenAnswerTable9" "HiddenAnswerTable9" "AllQuestions10"
[31] "HiddenAnswerTable10" "HiddenAnswerTable10" "TotalsTable"
[34] "HiddenTable"
just interested in AllQuestions, so
tbls_q <- tbls_all[grep('AllQuestions\\d', names(tbls_all))]
length(tbls_q)
[1] 10
names(tbls_q[[1]])
[1] "V1" "V2" "V3" "V4"
The questions are in V1
tbls_q[[1]]$V1[2]
[1] "<strong>Now I am going to evaluate how well you can remember the names of some common items. First, I will show you pictures of 16 items that I want you to remember. Each item belongs to a different category. For example, 'type of reading materials' is a category. I will show you the items four at a time and ask you to tell me which item belongs with each category and then to immediately recall the items when I tell you their categories. Later, I will ask you to recall all of the items I have shown you. For any items you miss, I will tell you the categories to help you recall more items. You will have 3 tries to recall the items.</strong>(726368 - WAS_Card1_Intro)"
> tbls_q[[1]]$V1[4]
[1] "<br><br><font color=\"blue\"><i>Bear - Correctly Named?</i></font>(726370 - WAS_Card1_Word1_Name)"
> tbls_q[[1]]$V1[3]
[1] "<font color=\"blue\"><i>Place Worksheet 1 in front of the subject.</i></font><br><br><strong>There are 4 pictures on this worksheet. When I tell you a category, point to the item that is in that category and tell me its name. <br><br><br>Point to the 4–Legged Animal and tell me its name.</strong><br><br><font color=\"blue\"><i>Bear - Correctly Identified?</i></font>(726369 - WAS_Card1_Word1_Identify)"
At which point, I'm stuck for how to further extract the text without embedded html <tags, report what it says, report what it should say and which variable, (726369 for example, the question is. I can imagine some regex approaches, but, fragile...
We conducted an experiment at Uni which we tried out ourselves before we gave it to real test persons. The problem now is, that our testing-data is included in the whole csv datafile so I need to delete the first 23 "test persons".
They all got a unique code and I could count how many of those unique codes exist (as you can see, there are 38). Now I only need the last 15 of them... I tried it with subset but I don't really now how to filter for those specific last 15 subjectId's (VPcount)
unique(d$VPcount)
uniqueN(d$VPcount)
[1] 7.941675e-312 7.941683e-312 7.941686e-312 7.941687e-312 7.941695e-312 7.941697e-312 7.941734e-312
[8] 7.942134e-312 7.942142e-312 7.942146e-312 7.942176e-312 7.942191e-312 7.942194e-312 7.942199e-312
[15] 7.942268e-312 7.942301e-312 7.942580e-312 7.943045e-312 7.944383e-312 7.944386e-312 7.944388e-312
[22] 7.944388e-312 7.944429e-312 7.944471e-312 7.944477e-312 7.944478e-312 7.944494e-312 7.944500e-312
[29] 7.944501e-312 7.944501e-312 7.944503e-312 7.944503e-312 7.944506e-312 7.944506e-312 7.944506e-312
[36] 7.944506e-312 7.944508e-312 7.944511e-312
[1] 38
You can try :
data <- subset(d, VPcount %in% tail(unique(VPcount), 15))
I have a data frame in R with 341 rows. I want to rename the row names using a list with 349 names. All 341 names will be in this list for sure. But not all of them will be perfect hits.
The data looks like this
rownames(df_RPM1)
[1] "LQNS02059392.1_11686_5p"
[2] "LQNS02277998.1_30984_3p"
[3] "LQNS02277998.1_30984_5p"
[4] "LQNS02277998.1_30988_3p"
[5] "LQNS02277998.1_30988_5p"
[6] "LQNS02277997.1_30943_3p"
[7] "miR-9|LQNS02278070.1_31740_3p"
[8] "miR-9|LQNS02278094.1_36129_3p"
head(inlist)
[1] "dpu-miR-2-03_LQNS02059392.1_11686_5p" "dpu-miR-10-P2_LQNS02277998.1_30984_3p"
[3] "dpu-miR-10-P2_LQNS02277998.1_30984_5p" "dpu-miR-10-P3_LQNS02277998.1_30988_3p"
[5] "dpu-miR-10-P3_LQNS02277998.1_30988_5p" "miR-9|LQNS02278070.1_31740_3p"
[6] "miR-9|LQNS02278094.1_36129_3p"
The order won't necessarily be the same in the two.
Can anyone suggest me how to do this in R?
Thanks a lot
Depends a lot what a "non-perfect hit" looks like. Assuming the row name is a substring of the real name, str_detect() does the job quite well:
library(tidyverse)
real_names <- c("dpu-miR-2-03_LQNS02059392.1_11686_5p",
"dpu-miR-10-P2_LQNS02277998.1_30984_3p",
"dpu-miR-10-P2_LQNS02277998.1_30984_5p",
"dpu-miR-10-P3_LQNS02277998.1_30988_3p",
"dpu-miR-10-P3_LQNS02277998.1_30988_5p",
"miR-9|LQNS02278070.1_31740_3p",
"miR-9|LQNS02278094.1_36129_3p")
str_which(real_names, "LQNS02059392.1_11686_5p")
#> [1] 1
So we can vectorize (I removed the element 6 which is not found in the example list):
pos <- map_int(rownames(df_RPM1), ~ str_which(real_names, fixed(.)))
pos
#> [1] 1 2 3 4 5 6 7
And all that's left is to change the row names:
rownames(df_RPM1) <- real_names[pos]
Of course, if a non-perfect hit means something more complicated, you may need to create a regex from the row names or something like that.
I got a list with a weird format:
[[1]]
[1] "Freq.2432.40862794099" "Freq.2792.87280096993" "Freq.2955.16577598796"
[4] "Freq.3161.12982491516" "Freq.3194.19720315405" "Freq.3218.83311568825"
[7] "Freq.3265.37951283662" "Freq.3317.86908506493" "Freq.3900.50408838719"
[10] "Freq.4073.33935633108" "Freq.4302.8830598659" "Freq.4404.80065271461"
[13] "Freq.4469.12305573234" "Freq.4567.90688886175" "Freq.4965.4984006347"
[16] "Freq.5854.45161215455" "Freq.5905.64933878776" "Freq.6175.68130655941"
[19] "Freq.6433.22411185796" "Freq.6631.46775487994" "Freq.6958.20015968149"
[22] "Freq.7469.83422424355" "Freq.8602.43342069553" "Freq.8766.14436081853"
[25] "Freq.8811.22677706485" "Freq.8915.90029255773" "Freq.9131.39810096"
[28] "Freq.9378.82122607608"
Never saw that [[1]] in a list before, and the problem is that I can't append things to this list.
How can I solve this?
This is a list in a list. Normally this can be referred to as a nested list.
a <- c(1,2,3)
b <- c(4,5,6)
list <- list(a,b)
In this code snippet we are creating two vectors and put them into a list. Now you can access the nested vectors/lists using the double brackets. Like so:
list[[1]]
> [1] 1 2 3
Now, if you want to change the value (or append it, see comment) you can use the normal syntax but solely assign it to the nested object.
list[[1]] <- c(7,8,9)
list[[1]]
> [1] 7 8 9
So I've been trying to get a subset of a character vector for the last hour or so. In my (floundering) attempt to get this working I ran into an interesting characteristic of R. I have data (after JSON parsing) in the form of
[[1]]
[[1]]$business_id
[1] "rncjoVoEFUJGCUoC1JgnUA"
[[1]]$full_address
[1] "8466 W Peoria Ave\nSte 6\nPeoria, AZ 85345"
[[1]]$open
[1] TRUE
[[1]]$categories
[1] "Accountants" "Professional Services" "Tax Services"
[4] "Financial Services"
[[1]]$city
[1] "Peoria"
[[1]]$review_count
[1] 3
[[1]]$name
[1] "Peoria Income Tax Service"
[[1]]$neighborhoods
list()
[[1]]$longitude
[1] -112.2416
[[1]]$state
[1] "AZ"
[[1]]$stars
[1] 5
[[1]]$latitude
[1] 33.58187
[[1]]$type
[1] "business"
Here's the code I'm using
#!/usr/bin/Rscript
require(graphics)
require(RJSONIO)
parsed_data <- lapply(readLines("yelp_phoenix_academic_dataset/yelp_academic_dataset_business.json"), fromJSON)
#parsed_data[,c("categories")]
print(parsed_data[1])
As I was trying to drop everything but the categories column I ran into this interesting behaviour
print(parsed_data[1])
print(parsed_data[1][1])
print(parsed_data[1][1][1][1][1][1])
All produce the same output (the one posted above). Why is that?
This is the difference between [ and [[. It is hard to search for these online, but ?'[' will bring up the help.
When indexing a list with [, a list is returned:
list(a=1:10, b=11:20)[1]
## $a
## [1] 1 2 3 4 5 6 7 8 9 10
This is a list of one element, so repeating the operation again results in the same value:
list(a=1:10, b=11:20)[1][1]
## $a
## [1] 1 2 3 4 5 6 7 8 9 10
[[ returns the element, not a list containing the element. It also only accepts a single index (whereas [ accepts a vector):
list(a=1:10, b=11:20)[[1]]
## [1] 1 2 3 4 5 6 7 8 9 10
And this operation is not idempotent on lists:
list(a=1:10, b=11:20)[[1]][[1]]
## [1] 1
Your JSON data is currently stored in a list, rather than a vector, so the indexing is different.
As Matthew has pointed out, there is a difference between using [] to access an element and using [[]]. For a discussion on this I will refer you to this stack overflow thread:
In R, what is the difference between the [] and [[]] notations for accessing the elements of a list?
Looking at the data print out your data is stored as a nested list:
parsed_data[[1]]
Will give you a list containing each of the columns. To access the categories column you can use any of the following:
parsed_data[[1]][["categories"]]
parsed_data[[1]][[4]]
parsed_data[[1]]$categories
This will give you a vector of names as a you'd expect:
## [1] "Accountants" "Professional Services" "Tax Services"
## [4] "Financial Services"
Note that when accessing by index (either named or numeric) you still have to use the double bracket notation: [[]]. If you use [] instead, it will give you a list instead of a vector:
parsed_data[[1]]["categories"]
## [[1]]
## [1] "Accountants" "Professional Services" "Tax Services"
## [4] "Financial Services"