I'm working in a shiny app and I need make a factor with database values.
In all examples people do something like this:
selectInput("selectSectorAgrupado","Sector agrupado",
c("Cylinders" = "cyl",
"Transmission" = "am",
"Gears" = "gear"),selected=NULL,multiple = FALSE)
In this case the user see "Cylinders" string but the value is "cyl". I need do this but the factor values isn't strings, they are data frames fields from database. I try to do something similar but when i put a text field in factor he return numbers.
Using this:
sectorAgrupado$L0NOMBRE
The console returns this:
[1] Actividades recreativas Agencias de viaje [3] Alimentación
Bazares [5] Comercio electrónico Gremios, vivienda
[7] Electrodomésticos, SAT Energia [9] Enseñanza
Grandes Superficies [11] Hosteleria Informática
[13] Joyeria, Relojeria Muebles [15] Otro
comercio por menor Otros [17] Publicidad
Seguros [19] Servicios bancarios Telefonía
[21] Textil, Calzado Tintorerias [23] Transportes
Venta domiciliaria [25] Automóviles Promoción
inmobiliaria
But when I put the code into a factor:
c(sectorAgrupado$L0NOMBRE)
He return this
[1] 1 2 3 5 6 11 7 8 9 10 12 13 14 15 16 17 19 20 21 22
[21] 23 24 25 26 4 18
I'm new in R programing and maybe i don't understand good factors, but i need some help.
Thank you
EDITED (More problems)
OK, the solution for the first problem is solved. The solution is save in factor the values as character like this:
c(as.character(sectorAgrupado$L0NOMBRE))
But the problem continues. When i asociate string with value R return a error.
The code:
c(as.character(sectorAgrupado$L0NOMBRE)=as.character(sectorAgrupado$L0CODIGO))
Return the error:
Error: unexpected '=' in "c(as.character(sectorAgrupado$L0NOMBRE)="
Related
vector<-c("0.78953744969927742", "0.46557689748480685", "0.19740881059705201",
"9.7073839462985714E-2", "4.9051709747422199E-2", "0.1167420589551126",
"0.12679434401288708", "0.51370748568563795", "0.1925345466801483",
"0.48287163643195624", "4.211984449707315E-2", "blablablab",
"0.10553766233766231", "7.8187250996015922E-2", "0.20718689788053954",
"1.6450511945392491E-2", "0.51752961082910309", "0.10978571428571428",
"0.42610062893081763", "0.52208333333333334", "0.27569868995633184",
"7.7189939288811793E-2", "0.53982300884955747", "38.25% (blablabla) blablablablablablablablablablablabla","0.22324159021406728")
I have to transform all observations into numerical values. Those consisting only of words in NA. If there are words after an observation starting with a number; retrieve only the numbers. If there are percentages after the number, eliminate these percentages and keep only the number
With readrs parse_number
library(readr)
vec_num <- parse_number(vector)
Warning: 1 parsing failure.
row col expected actual
12 -- a number blablablab
vec_num
[1] 0.78953745 0.46557690 0.19740881 0.09707384 0.04905171 0.11674206
[7] 0.12679434 0.51370749 0.19253455 0.48287164 0.04211984 NA
[13] 0.10553766 0.07818725 0.20718690 0.01645051 0.51752961 0.10978571
[19] 0.42610063 0.52208333 0.27569869 0.07718994 0.53982301 38.25000000
[25] 0.22324159
attr(,"problems")
# A tibble: 1 × 4
row col expected actual
<int> <int> <chr> <chr>
1 12 NA a number blablablab
vec_num[24]
[1] 38.25
Removing all the trash
> as.numeric(gsub("[^0-9\\.\\E\\-]","",vector))
[1] 0.78953745 0.46557690 0.19740881 0.09707384 0.04905171 0.11674206
[7] 0.12679434 0.51370749 0.19253455 0.48287164 0.04211984 NA
[13] 0.10553766 0.07818725 0.20718690 0.01645051 0.51752961 0.10978571
[19] 0.42610063 0.52208333 0.27569869 0.07718994 0.53982301 38.25000000
[25] 0.22324159
You can use
as.numeric(stringr::str_extract(vector, '[\\d+.\\-\\E]+'))
Depends a bit on what kind of regexp exactly, but this will do:
\d+(.\d+(E[+-]\d+)?)
Check and refine at regex101.com
So I am trying to write into a FASTA file, it does write but for some reason when I open the file it starts with an empty > and then >SOMESEQID and so on. Could someone help?
When opening the file it looks like so:
>
>NP_001997.5 fibroblast growth factor 2 isoform 34 kDa [Homo sapiens] MVGVGGGDVEDVTPRPGGCQISGRGARGCNGIPGAAAWEAALPRRRPRRHPSVNPRSRAAGSPRTRGRRT EERPSGSRLGDRGRGRALPGGRLGGRGRGRAPERVGGRGRGRGTAAPRAAPAARGSRPGPAGTMAAGSIT TLPALPEDGGSGAFPPGHFKDPKRLYCKNGGFFLRIHPDGRVDGVREKSDPHIKLQLQAEERGVVSIKGV CANRYLAMKEDGRLLASKCVTDECFFFERLESNNYNTYRSRKYTSWYVALKRTGQYKLGSKTGPGQKAIL FLPMSAKS
FGF2 is a vector of ID something like so:
FGF2 = c("ID1","ID2", ...)
Here is my code:
files = entrez_fetch(id = FGF2, rettype = "fasta", db = "protein")
files
fastFile = write.fasta(sequences = files, names = names(files), file.out = "mySeqs.fasta")
You don't need to use write.fasta . That function most likely assumes some kind of data. Just use writeLines() :
library(rentrez)
a = entrez_fetch(id=c("NP_001997.5","NP_001348594.1"),
rettype = "fasta", db = "protein")
writeLines(a,"test.fa")
readLines("test.fa")
[1] ">NP_001997.5 fibroblast growth factor 2 isoform 34 kDa [Homo sapiens]"
[2] "MVGVGGGDVEDVTPRPGGCQISGRGARGCNGIPGAAAWEAALPRRRPRRHPSVNPRSRAAGSPRTRGRRT"
[3] "EERPSGSRLGDRGRGRALPGGRLGGRGRGRAPERVGGRGRGRGTAAPRAAPAARGSRPGPAGTMAAGSIT"
[4] "TLPALPEDGGSGAFPPGHFKDPKRLYCKNGGFFLRIHPDGRVDGVREKSDPHIKLQLQAEERGVVSIKGV"
[5] "CANRYLAMKEDGRLLASKCVTDECFFFERLESNNYNTYRSRKYTSWYVALKRTGQYKLGSKTGPGQKAIL"
[6] "FLPMSAKS"
[7] ""
[8] ">NP_001348594.1 fibroblast growth factor 2 isoform 18 kDa [Homo sapiens]"
[9] "MAAGSITTLPALPEDGGSGAFPPGHFKDPKRLYCKNGGFFLRIHPDGRVDGVREKSDPHIKLQLQAEERG"
[10] "VVSIKGVCANRYLAMKEDGRLLASKCVTDECFFFERLESNNYNTYRSRKYTSWYVALKRTGQYKLGSKTG"
[11] "PGQKAILFLPMSAKS"
[12] ""
Or read in using:
library(Biostrings)
readAAStringSet("test.fa")
A AAStringSet instance of length 2
width seq names
[1] 288 MVGVGGGDVEDVTPRPGGCQISG...YKLGSKTGPGQKAILFLPMSAKS NP_001997.5 fibro...
[2] 155 MAAGSITTLPALPEDGGSGAFPP...YKLGSKTGPGQKAILFLPMSAKS NP_001348594.1 fi...
I am currently running an stm (structural topic model) of a series of articles from the french newspaper Le Monde. The model is working just great, but I have a problem with the pre-processing of the text.
I'm currently using the quanteda package and the tm package for doing things like removing words, removing numbers...etc...
There's only one thing, though, that doesn't seem to work.
As some of you might know, in French, the masculine determinative article -le- contracts in -l'- before vowels. I've tried to remove -l'- (and similar things like -d'-) as words with removeWords
lmt67 <- removeWords(lmt67, c( "l'","d'","qu'il", "n'", "a", "dans"))
but it only works with words that are separate from the rest of text, not with the articles that are attached to a word, such as in -l'arbre- (the tree).
Frustrated, I've tried to give it a simple gsub
lmt67 <- gsub("l'","",lmt67)
but that doesn't seem to be working either.
Now, what's a better way to do this, and possibly through a c(...) vector so that I can give it a series of expressions all together?
Just as context, lmt67 is a "large character" with 30,000 elements/articles, obtained by using the "texts" functions on data imported from txt files.
Thanks to anyone that will want to help me.
I'll outline two ways to do this using quanteda and quanteda-related tools. First, let's define a slightly longer text, with more prefix cases for French. Notice the inclusion of the ’ apostrophe as well as the ASCII 39 simple apostrophe.
txt <- c(doc1 = "M. Trump, lors d’une réunion convoquée d’urgence à la Maison Blanche,
n’en a pas dit mot devant la presse. En réalité, il s’agit d’une
mesure essentiellement commerciale de ce pays qui l'importe.",
doc2 = "Réfugié à Bruxelles, l’indépendantiste catalan a désigné comme
successeur Jordi Sanchez, partisan de l’indépendance catalane,
actuellement en prison pour sédition.")
The first method will use pattern matches for the simple ASCII 39 (apostrophe) plus a bunch of
Unicode variants, matched through the category "Pf" for "Punctuation: Final Quote" category.
However, quanteda does its best to normalize the quotes at the tokenization stage - see the
"l'indépendance" in the second document for instance.
The second way below uses a French part-of-speech tagger integrated with quanteda that allows similar
selection after recognizing and separating the prefixes, and then removing determinants (among other POS).
1. quanteda tokens
toks <- tokens(txt, remove_punct = TRUE)
# remove stopwords
toks <- tokens_remove(toks, stopwords("french"))
toks
# tokens from 2 documents.
# doc1 :
# [1] "M" "Trump" "lors" "d'une" "réunion"
# [6] "convoquée" "d'urgence" "à" "la" "Maison"
# [11] "Blanche" "n'en" "a" "pas" "dit"
# [16] "mot" "devant" "la" "presse" "En"
# [21] "réalité" "il" "s'agit" "d'une" "mesure"
# [26] "essentiellement" "commerciale" "de" "ce" "pays"
# [31] "qui" "l'importe"
#
# doc2 :
# [1] "Réfugié" "à" "Bruxelles" "l'indépendantiste"
# [5] "catalan" "a" "désigné" "comme"
# [9] "successeur" "Jordi" "Sanchez" "partisan"
# [13] "de" "l'indépendance" "catalane" "actuellement"
# [17] "en" "prison" "pour" "sédition"
Then, we apply the pattern to match l', d', or l', using a regular expression replacement on the types (the unique tokens):
toks <- tokens_replace(
toks,
types(toks),
stringi::stri_replace_all_regex(types(toks), "[lsd]['\\p{Pf}]", "")
)
# tokens from 2 documents.
# doc1 :
# [1] "M" "Trump" "lors" "une" "réunion"
# [6] "convoquée" "urgence" "à" "la" "Maison"
# [11] "Blanche" "n'en" "a" "pas" "dit"
# [16] "mot" "devant" "la" "presse" "En"
# [21] "réalité" "il" "agit" "une" "mesure"
# [26] "essentiellement" "commerciale" "de" "ce" "pays"
# [31] "qui" "importe"
#
# doc2 :
# [1] "Réfugié" "à" "Bruxelles" "indépendantiste" "catalan"
# [6] "a" "désigné" "comme" "successeur" "Jordi"
# [11] "Sanchez" "partisan" "de" "indépendance" "catalane"
# [16] "actuellement" "En" "prison" "pour" "sédition"
From the resulting toks object you can form a dfm and then proceed to fit the STM.
2. using spacyr
This will involve more sophisticated part-of-speech tagging and then converting the tagged object into quanteda tokens. This requires first that you install Python, spacy, and the French language model. (See https://spacy.io/usage/models.)
library(spacyr)
spacy_initialize(model = "fr", python_executable = "/anaconda/bin/python")
# successfully initialized (spaCy Version: 2.0.1, language model: fr)
toks <- spacy_parse(txt, lemma = FALSE) %>%
as.tokens(include_pos = "pos")
toks
# tokens from 2 documents.
# doc1 :
# [1] "M./NOUN" "Trump/PROPN" ",/PUNCT"
# [4] "lors/ADV" "d’/PUNCT" "une/DET"
# [7] "réunion/NOUN" "convoquée/VERB" "d’/ADP"
# [10] "urgence/NOUN" "à/ADP" "la/DET"
# [13] "Maison/PROPN" "Blanche/PROPN" ",/PUNCT"
# [16] "\n /SPACE" "n’/VERB" "en/PRON"
# [19] "a/AUX" "pas/ADV" "dit/VERB"
# [22] "mot/ADV" "devant/ADP" "la/DET"
# [25] "presse/NOUN" "./PUNCT" "En/ADP"
# [28] "réalité/NOUN" ",/PUNCT" "il/PRON"
# [31] "s’/AUX" "agit/VERB" "d’/ADP"
# [34] "une/DET" "\n /SPACE" "mesure/NOUN"
# [37] "essentiellement/ADV" "commerciale/ADJ" "de/ADP"
# [40] "ce/DET" "pays/NOUN" "qui/PRON"
# [43] "l'/DET" "importe/NOUN" "./PUNCT"
#
# doc2 :
# [1] "Réfugié/VERB" "à/ADP" "Bruxelles/PROPN"
# [4] ",/PUNCT" "l’/PRON" "indépendantiste/ADJ"
# [7] "catalan/VERB" "a/AUX" "désigné/VERB"
# [10] "comme/ADP" "\n /SPACE" "successeur/NOUN"
# [13] "Jordi/PROPN" "Sanchez/PROPN" ",/PUNCT"
# [16] "partisan/VERB" "de/ADP" "l’/DET"
# [19] "indépendance/ADJ" "catalane/ADJ" ",/PUNCT"
# [22] "\n /SPACE" "actuellement/ADV" "en/ADP"
# [25] "prison/NOUN" "pour/ADP" "sédition/NOUN"
# [28] "./PUNCT"
Then we can use the default glob-matching to remove the parts of speech in which we are probably not interested, including the newline:
toks <- tokens_remove(toks, c("*/DET", "*/PUNCT", "\n*", "*/ADP", "*/AUX", "*/PRON"))
toks
# doc1 :
# [1] "M./NOUN" "Trump/PROPN" "lors/ADV" "réunion/NOUN" "convoquée/VERB"
# [6] "urgence/NOUN" "Maison/PROPN" "Blanche/PROPN" "n’/VERB" "pas/ADV"
# [11] "dit/VERB" "mot/ADV" "presse/NOUN" "réalité/NOUN" "agit/VERB"
# [16] "mesure/NOUN" "essentiellement/ADV" "commerciale/ADJ" "pays/NOUN" "importe/NOUN"
#
# doc2 :
# [1] "Réfugié/VERB" "Bruxelles/PROPN" "indépendantiste/ADJ" "catalan/VERB" "désigné/VERB"
# [6] "successeur/NOUN" "Jordi/PROPN" "Sanchez/PROPN" "partisan/VERB" "indépendance/ADJ"
# [11] "catalane/ADJ" "actuellement/ADV" "prison/NOUN" "sédition/NOUN"
Then we can remove the tags, which you probably don't want in your STM - but you could leave them if you prefer.
## remove the tags
toks <- tokens_replace(toks, types(toks),
stringi::stri_replace_all_regex(types(toks), "/[A-Z]+$", ""))
toks
# tokens from 2 documents.
# doc1 :
# [1] "M." "Trump" "lors" "réunion" "convoquée"
# [6] "urgence" "Maison" "Blanche" "n’" "pas"
# [11] "dit" "mot" "presse" "réalité" "agit"
# [16] "mesure" "essentiellement" "commerciale" "pays" "importe"
#
# doc2 :
# [1] "Réfugié" "Bruxelles" "indépendantiste" "catalan" "désigné"
# [6] "successeur" "Jordi" "Sanchez" "partisan" "indépendance"
# [11] "catalane" "actuellement" "prison" "sédition"
From there, you can use the toks object to form your dfm and fit the model.
Here's a scrape from the current page at Le Monde's website. Notice that the apostrophe they use is not the same character as the single-quote here "'":
text <- "Réfugié à Bruxelles, l’indépendantiste catalan a désigné comme successeur Jordi Sanchez, partisan de l’indépendance catalane, actuellement en prison pour sédition."
It has a little angle and is not actually "straight down" when I view it. You need to copy that character into your gsub command:
sub("l’", "", text)
[#1] "Réfugié à Bruxelles, indépendantiste catalan a désigné comme successeur Jordi Sanchez, partisan de l’indépendance catalane, actuellement en prison pour sédition."
From this link, I´m trying to download multiple pdf files, but I can´t get the exact URL for each file.
To access one of the pdf files, you could click on "Región de Arica y Parinacota" and then click on "Arica". Then, you can check that the url is http://cdn.servel.cl/padronesauditados/padron/A1501001.pdf, if you click on the next link "Camarones" you now noticed that the URL is http://cdn.servel.cl/padronesauditados/padron/A1501002.pdf
I checked more URLs, and they all have a similar pattern:
"A" + "two digit number from 1 to 15" + "two digit number of unknown range" + "three digit number of unknown range"
Even though the URL examples I showed seem to suggest that the file names are named sequentally, this is not always the case.
What I did to be able to download all the files despite not knowing the exact URLs I did the following:
1) I made a for loop in order to write all possible file names based on the pattern I describe above, i.e, A0101001.pdf, A0101002.pdf....A1599999.pdf
library(downloader)
library(stringr)
reg.ind <- 1:15
pro.ind <- 1:99
com.ind <- 1:999
reg <- str_pad(reg.ind, width=2, side="left", pad="0")
prov <- str_pad(pro.ind, width=2, side="left", pad="0")
com <- str_pad(com.ind, width=3, side="left", pad="0")
file <- c()
for(i in 1:length(reg)){
reg.i <- reg[i]
for(j in 1:length(prov)){
prov.j <- prov[j]
for(k in 1:length(com)){
com.k <- com[k]
file <- c(file, (paste0("A", reg.i, prov.j, com.k)))
}
}
}
2) then I used another for loop to download a file everytime I hit a correct URL. I use tryCatchto ignore the cases when the URL was incorrect (most of the time)
for(i in 1:length(file)){
tryCatch({
url <- paste0("http://cdn.servel.cl/padronesauditados/padron/", file[i],
".pdf")
# change destfile accordingly if you decide to run the code
download.file(url, destfile = paste0("./datos/comunas/", file[i], ".pdf"),
mode = "wb")
}, error = function(e){})
}
PROBLEM: In total I know there are not more than 400 pdf files, as each one of them correspond to a commune in Chile, but I wrote a vector with 1483515 possible file names, and therefore my code, even though it works, takes a much longer time than if I could manage to obtain the file names before hand.
Does anyone know how to workaround this problem?
You can re-create the "browser developer tools" experience in R with splashr:
library(splashr) # devtools::install_github("hrbrmstr/splashr")
library(tidyverse)
sp <- start_splash()
Sys.sleep(3) # give the docker container time to work
res <- render_har(url = "http://cdn.servel.cl/padronesauditados/padron.html",
response_body=TRUE)
map_chr(har_entries(res), c("request", "url"))
## [1] "http://cdn.servel.cl/padronesauditados/padron.html"
## [2] "http://cdn.servel.cl/padronesauditados/stylesheets/navbar-cleaned.min.css"
## [3] "http://cdn.servel.cl/padronesauditados/stylesheets/virtue.min.css"
## [4] "http://cdn.servel.cl/padronesauditados/stylesheets/virtue2.min.css"
## [5] "http://cdn.servel.cl/padronesauditados/stylesheets/custom.min.css"
## [6] "https://fonts.googleapis.com/css?family=Lato%3A400%2C700%7CRoboto%3A100%2C300%2C400%2C500%2C700%2C900%2C100italic%2C300italic%2C400italic%2C500italic%2C700italic%2C900italic&ver=1458748651"
## [7] "http://cdn.servel.cl/padronesauditados/jquery-ui-1.12.1.custom/jquery-ui.css"
## [8] "http://cdn.servel.cl/padronesauditados/jquery-ui-1.12.1.custom/external/jquery/jquery.js"
## [9] "http://cdn.servel.cl/padronesauditados/jquery-ui-1.12.1.custom/jquery-ui.js"
## [10] "http://cdn.servel.cl/padronesauditados/images/logo-txt-retina.png"
## [11] "http://cdn.servel.cl/assets/img/nav_arrows.png"
## [12] "http://cdn.servel.cl/padronesauditados/images/loader.gif"
## [13] "http://cdn.servel.cl/padronesauditados/archivos.xml"
## [14] "http://cdn.servel.cl/padronesauditados/jquery-ui-1.12.1.custom/images/ui-icons_444444_256x240.png"
## [15] "https://fonts.gstatic.com/s/roboto/v16/zN7GBFwfMP4uA6AR0HCoLQ.ttf"
## [16] "https://fonts.gstatic.com/s/roboto/v16/RxZJdnzeo3R5zSexge8UUaCWcynf_cDxXwCLxiixG1c.ttf"
## [17] "https://fonts.gstatic.com/s/roboto/v16/Hgo13k-tfSpn0qi1SFdUfaCWcynf_cDxXwCLxiixG1c.ttf"
## [18] "https://fonts.gstatic.com/s/roboto/v16/Jzo62I39jc0gQRrbndN6nfesZW2xOQ-xsNqO47m55DA.ttf"
## [19] "https://fonts.gstatic.com/s/roboto/v16/d-6IYplOFocCacKzxwXSOKCWcynf_cDxXwCLxiixG1c.ttf"
## [20] "https://fonts.gstatic.com/s/roboto/v16/mnpfi9pxYH-Go5UiibESIqCWcynf_cDxXwCLxiixG1c.ttf"
## [21] "http://cdn.servel.cl/padronesauditados/stylesheets/fonts/virtue_icons.woff"
## [22] "https://fonts.gstatic.com/s/lato/v13/v0SdcGFAl2aezM9Vq_aFTQ.ttf"
## [23] "https://fonts.gstatic.com/s/lato/v13/DvlFBScY1r-FMtZSYIYoYw.ttf"
Spotting the XML entry is easy in ^^, so we can focus on it:
har_entries(res)[[13]]$response$content$text %>%
openssl::base64_decode() %>%
xml2::read_xml() %>%
xml2::xml_find_all(".//Region") %>%
map_df(~{
data_frame(
id = xml2::xml_find_all(.x, ".//id") %>% xml2::xml_text(),
nombre = xml2::xml_find_all(.x, ".//nombre") %>% xml2::xml_text(),
nomcomuna = xml2::xml_find_all(.x, ".//comunas/comuna/nomcomuna") %>% xml2::xml_text(),
id_archivo = xml2::xml_find_all(.x, ".//comunas/comuna/idArchivo") %>% xml2::xml_text(),
archcomuna = xml2::xml_find_all(.x, ".//comunas/comuna/archcomuna") %>% xml2::xml_text()
)
})
## # A tibble: 346 x 5
## id nombre nomcomuna id_archivo archcomuna
## <chr> <chr> <chr> <chr> <chr>
## 1 1 Región de Arica y Parinacota Arica 1 A1501001.pdf
## 2 1 Región de Arica y Parinacota Camarones 2 A1501002.pdf
## 3 1 Región de Arica y Parinacota General Lagos 3 A1502002.pdf
## 4 1 Región de Arica y Parinacota Putre 4 A1502001.pdf
## 5 2 Región de Tarapacá Alto Hospicio 5 A0103002.pdf
## 6 2 Región de Tarapacá Camiña 6 A0152002.pdf
## 7 2 Región de Tarapacá Colchane 7 A0152003.pdf
## 8 2 Región de Tarapacá Huara 8 A0152001.pdf
## 9 2 Región de Tarapacá Iquique 9 A0103001.pdf
## 10 2 Región de Tarapacá Pica 10 A0152004.pdf
## # ... with 336 more rows
stop_splash(sp) # don't forget to clean up!
You can then either programmatically download all the PDFs by using the URL prefix: http://cdn.servel.cl/padronesauditados/padron/
I am looking at vote data and it's a nested list. I am trying to get multiple variable on each element of my list (example bellow )
So for each element "vote" i am trying to get the uid and the list of individual that vote for or against ("pours" and "contre" ) the law.
I try to simplify the original data ( can be found here )
This is the simplified list i came up with :
scrutin1_detail<-list(uid="VTANR5L14V1",organref="P0644420")
scrutin1_vote1_for<-list(acteurref="PA1816",mandatRef="PM645051")
scrutin1_vote2_for<-list(acteurref="PA1817",mandatRef="PM645052")
scrutin1_vote3_for<-list(acteurref="PA1818",mandatRef="PM645053")
scrutin1_vote_for<-list(scrutin1_vote1_for,scrutin1_vote2_for,scrutin1_vote3_for)
scrutin1_vote1_against<-list(acteurref="PA1816",mandatRef="PM645051")
scrutin1_vote2_against<-list(acteurref="PA1817",mandatRef="PM645052")
scrutin1_vote3_against<-list(acteurref="PA1818",mandatRef="PM645053")
scrutin1_vote_against<-list(scrutin1_vote1_against,scrutin1_vote2_against,scrutin1_vote3_against)
votant1<-list(pours=scrutin1_vote_for,contres=scrutin1_vote_against)
vote1<-list(decompte_nominatif=votant1)
ventilationVotes1<-list(vote=vote1)
scrutin1<-list(scrutin1_detail,list(ventilationVotes=ventilationVotes1))
# Scrutin 2
scrutin2_detail<-list(uid="VTANR5L14V5",organref="P0644423")
scrutin2_vote1_for<-list(acteurref="PA1816",mandatRef="PM645051")
scrutin2_vote2_for<-list(acteurref="PA1817",mandatRef="PM645052")
scrutin2_vote3_for<-list(acteurref="PA1818",mandatRef="PM645053")
scrutin2_vote_for<-list(scrutin1_vote1_for,scrutin1_vote2_for,scrutin1_vote3_for)
scrutin2_vote1_against<-list(acteurref="PA1816",mandatRef="PM645051")
scrutin2_vote2_against<-list(acteurref="PA1817",mandatRef="PM645052")
scrutin2_vote3_against<-list(acteurref="PA1818",mandatRef="PM645053")
scrutin2_vote_against<-list(scrutin2_vote1_against,scrutin2_vote2_against,scrutin2_vote3_against)
scrutin2_votant1<-list(pours=scrutin2_vote_for,contres=scrutin2_vote_against)
scrutin2_vote1<-list(decompte_nominatif=scrutin2_votant1)
scrutin2_ventilationVotes1<-list(vote=scrutin2_vote1)
scrutin2<-list(scrutin2_detail,list(ventilationVotes=scrutin2_ventilationVotes1))
scrutins<-list(scrutins=list(scrutin=list(scrutin1,scrutin2)))
So i am looking at the end ( but i am really interested to understand how to do it as i run into this problem quite often ) to build a dataframe with these column :
uid
For/against (if it was in the list "pour"(for) or "contre" (against)
-acteurref
-mandatref
Sadly I don't speak (or read French) and so am not able to make many correct guesses as to the meaning of names of items in the object constructed using alistaire's suggestion:
library(jsonlite)
scrutin1_detail <- fromJSON("~/Downloads/Scrutins_XIV.json")
> length(scrutin1_detail[[1]])
[1] 1
> length(scrutin1_detail[[1]][[1]])
[1] 18
> names(scrutin1_detail[[1]][[1]])
[1] "#xmlns:xsi" "uid"
[3] "numero" "organeRef"
[5] "legislature" "sessionRef"
[7] "seanceRef" "dateScrutin"
[9] "quantiemeJourSeance" "typeVote"
[11] "sort" "titre"
[13] "demandeur" "objet"
[15] "modePublicationDesVotes" "syntheseVote"
[17] "ventilationVotes" "miseAuPoint"
> str(scrutin1_detail[[1]][[1]]$uid)
chr [1:1219] "VTANR5L14V1" "VTANR5L14V2" "VTANR5L14V3" ...
> table( scrutin1_detail[[1]][[1]]$organeRef)
PO644420
1219
> table( scrutin1_detail[[1]][[1]]$sessionRef)
SCR5A2012E1 SCR5A2012E2 SCR5A2013E1 SCR5A2013E3 SCR5A2013O1 SCR5A2014E1
15 5 42 4 529 50
SCR5A2014E2 SCR5A2014O1 SCR5A2015E1 SCR5A2015E2 SCR5A2015O1 SCR5A2016O1
7 253 18 5 236 55
Maybe you should help us Anglophones to make sense of this. It is very beneficial to provide context rather than just code.