Dynamic variable names in rvest - r

I am trying to set values with rvest for dynamic field names. So I retrieve the field name with the code below. f.account is a character of length one, containing a dynamic field name like OCmaCMFgHij
library(rvest)
url <- "https://sv01.net-housting.de/user/index.php"
sp.session <- html_session(url)
f0 <- html_form(sp.session)
f.account <- f0[[2]][['fields']][[1]]$name
Note: This is a little arbitrary example since f.account is not really dynamic in this case, but will always have the value username.
Now I want to set a fixed value to the field in the form with smth like this:
f1 <- set_values(f0, UQ(rlang::sym("f.account")) = "MyAccountName" ))
I tried to use the .dot statement or setNames as described here, but I have failed so far.

Related

How to mutate a new column with the extracted numeric values from another column in the same table in a remote PostgreSQL Database using R

I am new to R and I need to find a way to do the following: I have access to a huge remote PostgreSQL database. In this database,
I have a table called occurrence and this table has a column called uri.
It contains a list of URI (links to a webpage). Each entry in this column have this format: abc://abc- def-ghi-abc/12345.
The only thing that changes in the column is the number 12345, the text part (the URI) stays the same throughout the column.
My question is, how can I manage to create(mutate) a new column in the same table, and this new column will be named uri_id and MUST contain ONLY the numeric part extracted from the above mentioned uri column.
Example:
|id|sub_id|uri||
|---|---|---|---|
|3654|5741|abc://abc- def-ghi-abc/12345|
|9784|5742|abc://abc- def-ghi-abc/45789|
|9751|5743|abc://abc- def-ghi-abc/97856|
|9794|5746|abc://abc- def-ghi-abc/69785|
|||||
Results should look like this:
|id|sub_id|uri|uri_id|
|---|---|---|---|
|3654|5741|abc://abc-de-fgh.abc/12345|12345|
|9784|5742|abc://abc-de-fgh.abc/45789|45789|
|9751|5743|abc://abc-de-fgh.abc/97856|97856|
|9794|5746|abc://abc-de-fgh.abc/69785|69785|
First I defined the table that contains this column:
library(tidyverse)
library(dbplyr)
occurrence <- tbl(db_name, in_schema("metadata", "occurrence"))
print(occurrence)
returns the table normally. Then I tried this
str_replace(occurrence$uri, "abc://abc- def-ghi-abc/", "")
it returned character (0). Printing it or exporting it would give NULL and an empty table. I also tried this:
uri_id <- mutate(uri_id = as.numeric(str_extract(occurrence$uri, "[0-9]+")))
it returned this error:
Error in UseMethod("mutate") : no applicable method for 'mutate' applied to an object of class "c('double', 'numeric')"
I tried to just substitute text elements like this:
uri_id <- mutate(uri_id = as.numeric(gsub(".*?([0-9]+).*", "\\1", occurrence$uri)))
print(uri_id)
it returned the same error!
I tried with extract and extract_:
occurrence$uri %>% extract_(occurrence$uri, "abc://abc- def-ghi-abc/")
This returned an error:
Error in UseMethod("extract_") : no applicable method for 'extract_' applied to an object of class "NULL"
I would really appreciate your help in choosing the right way to achieve this task.
#AR4891, welcome to Stack Overflow! I think you are explaining the steps well so not sure about all the downvotes. But I think it will help to show us a few more things, especially a minimal, reproducible example.
I think I'm right in thinking you are relying on tidyverse and dbplyr, so I added those in. I think you have to re-read the function descriptions and examples once more. Let me give you some examples:
Let's first create a sample dataset. I'm ignoring the dbplyr part and assuming just a tibble structure.
library(tidyverse)
occurrence <- tibble(
id = c(3654, 9784, 9751, 9794),
sub_id = c(5741, 5742, 5743, 5746),
uri = c(
"abc://abc- def-ghi-abc/12345",
"abc://abc- def-ghi-abc/45789",
"abc://abc- def-ghi-abc/97856",
"abc://abc- def-ghi-abc/69785"
)
)
str_replace(occurrence$uri, "abc://abc- def-ghi-abc/", "") ---> you don't seem to be storing this anywhere.
uri_id <- mutate(uri_id = as.numeric(str_extract(occurrence$uri, "[0-9]+"))) ---> once within mutate, you don't need to reference the dataframe by occurrence$ either.
The above is also not the correct method to use tidyr::extract either.
If you want to take the gsub or str_replace approach,
occurrence %>%
mutate(uri_id = gsub("abc://abc- def-ghi-abc/", "", uri))
If you want to take the str_extract with regex approach,
occurrence %>%
mutate(uri_id = as.numeric(str_extract(uri, "[0-9]+")))
Let's see if either of these work.

Dynamic String Manipulation in Knime

I want to dyanamically change my URL at runtime for which I did string manipulations in R and it worked with this script:
x <- 'https://news.google.com/search?q='
var <- 'NREGA'
z <- '&hl=en-IN&gl=IN&ceid=IN%3Aen'
url <- paste0(x, var,z, collapse = '')
url
Now I want to change this variable var dynamically and the value for var needs to be retrieved from a knime node which can then be used in url. In my case it's the table creator node in knime to get this value but I'm not able to assign it to var.
Kindly suggest any knime node by which the value for 'var' can be obtained and then be used in 'R snippet' node in knime, and can be used in manipulating the url.
The first line of the code in your image is
knime.in <- knime.flow.in[["NREGA"]]
The expression knime.flow.in[["NREGA"]] in an R script node gets you the value of the flow variable named NREGA.
You probably don't want to assign that value to knime.in as that's the R dataframe containing the table from the input port of the KNIME node. Either you want to use that table - in which case don't overwrite it - or you don't, in which case just use the R Source (Table) node which has no input port.
Later on in your code you have the line
var <- knime.flow.in[["NREGA"]]
which should assign the flow variable value to the R variable var. If that isn't working the way you want, please edit your question to better describe what the problem is.

R Loop error using character

I have the below function which inserts a row into a table (new_scores) based upon the attribute that I feed into it (where the attribute represents a table that I select things from):
buildNewScore <- function(x) {
something <- bind_rows(new_scores,x%>%select(ATT,ADJLOGSCORE))
return(something)
}
Which works fine when I define x.
But when I try to create a for loop that feeds the rest of my attributes into the function it falls over as I'm feeding in a character.
attlist <- c('Z','Y','X','W','V','U','T','RT','RO')
record_count <- length(attlist)
for (x in c(1:record_count)){
buildNewScore(attlist[x])
}
I've tried to convert the attribute into other classes but I can't get the loop to use anything I change it to (name, data.frame etc.).
Anyone have any ideas as to where I'm going wrong - is my attlist vector in the wrong format?
Thanks,
Spikelete.

Referencing a function parameter in R

I'm working on a function and need to know how to reference the incoming parameters.
For example, in python or lots of other languages, you can reference the input parameters something like this:
sys.argv[1:].
How can I reference the name of a parameter in R?
The specific problem I'm trying to solve is I want to capture the string value of the incoming parameter, so I can paste it as a concentration with a list of column_names I want to iterate through.
Here's the head of the function call, just so you can see the incoming parameter:
function(df_in)
So here's an example of the code I am writing and I want the string value of the dataframe_in, not the object that it references.
col_name <-paste(df_in,varnames[i],sep="$")
if df_in contained "my_df" and the current column_name is my_col, I'm trying to have col_name in the example above set to my_df$my_col.
I was thinking of using the get() function but quite sure how to apply it in this situation.
Thanks
Try something along these lines:
fn1 <- function(df_in){ in_nam <- deparse(substitute(df_in) )
col_names <-paste(in_nam, names(df_in), sep="$")
cat(col_names) }
> dfrm <- data.frame(a=1:10, b=letters[1:10])
> fn1(dfrm)
#dfrm$a dfrm$b
You didn't say what varnames was supposed to be so I'm guessing you want the column names from the object. BTW, don't expect to be able to reference the column values with those character values. They are no longer language objects.

xpath node determination

I´m all new to scraping and I´m trying to understand xpath using R. My objective is to create a vector of people from this website. I´m able to do it using :
r<-htmlTreeParse(e) ## e is after getURL
g.k<-(r[[3]][[1]][[2]][[3]][[2]][[2]][[2]][[1]][[4]])
l<-g.k[names(g.k)=="text"]
u<-ldply(l,function(x) {
w<-xmlValue(x)
return(w)
})
However this is cumbersome and I´d prefer to use xpath. How do I go about referencing the path detailed above? Is there a function for this or can I submit my path somehow referenced as above?
I´ve come to
xpathApply( htmlTreeParse(e, useInt=T), "//body//text//div//div//p//text()", function(k) xmlValue(k))->kk
But this leaves me a lot of cleaning up to do and I assume it can be done better.
Regards,
//M
EDIT: Sorry for the unclearliness, but I´m all new to this and rather confused. The XML document is too large to be pasted unfortunately. I guess my question is whether there is some easy way to find the name of these nodes/structure of the document, besides using view source ? I´ve come a little closer to what I´d like:
getNodeSet(htmlTreeParse(e, useInt=T), "//p")[[5]]->e2
gives me the list of what I want. However still in xml with br tags. I thought running
xpathApply(e2, "//text()", function(k) xmlValue(k))->kk
would provide a list that later could be unlisted. however it provides a list with more garbage than e2 displays.
Is there a way to do this directly:
xpathApply(htmlTreeParse(e, useInt=T), "//p[5]//text()", function(k) xmlValue(k))->kk
Link to the web page: I´m trying to get the names, and only, the names from the page.
getURL("http://legeforeningen.no/id/1712")
I ended up with
xml = htmlTreeParse("http://legeforeningen.no/id/1712", useInternalNodes=TRUE)
(no need for RCurl) and then
sub(",.*$", "", unlist(xpathApply(xml, "//p[4]/text()", xmlValue)))
(subset in xpath) which leaves a final line that is not a name. One could do the text processing in XML, too, but then one would iterate at the R level.
n <- xpathApply(xml, "count(//p[4]/text())") - 1L
sapply(seq_len(n), function(i) {
xpathApply(xml, sprintf('substring-before(//p[4]/text()[%d], ",")', i))
})
Unfortunately, this does not pick up names that do not contain a comma.
Use a mixture of xpath and string manipulation.
#Retrieve and parse the page.
library(XML)
library(RCurl)
page <- getURL("http://legeforeningen.no/id/1712")
parsed <- htmlTreeParse(page, useInternalNodes = TRUE)
Inspecting the parsed variable which contains the page's source tells us that instead of sensibly using a list tag (like <ul>), the author just put a paragraph (<p>) of text split with line breaks (<br />). We use xpath to retrieve the <p> elements.
#Inspection tells use we want the fifth paragraph.
name_nodes <- xpathApply(parsed, "//p")[[5]]
Now we convert to character, split on the <br> tags and remove empty lines.
all_names <- as(name_nodes, "character")
all_names <- gsub("</?p>", "", all_names)
all_names <- strsplit(all_names, "<br />")[[1]]
all_names <- all_names[nzchar(all_names)]
all_names
Optionally, separate the names of people and their locations.
strsplit(all_names, ", ")
Or more prettily with stringr.
str_split_fixed(all_names, ", ", 2)

Resources