Umlaute not displayed correctly with osmdata - r

I am having troubles with the German Umlaute (ä, ü, ö) and other signs when using osmdata in R.
I can successfully get the data via query (notice the Ü in the bounding box in the first line, it is working fine):
#install.packages("osmdata")
#library(osmdata)
bw <- osmdata::getbb("Baden-Württemberg") %>%
osmdata::opq(timeout = 25*100) %>%
osmdata::add_osm_feature(
key = "admin_level",
value = "4"
) %>%
osmdata::osmdata_sf()
Having a look at the data, one can see that the umlaute aren't displayed correctly.
View(bw$osm_multipolygons)
Consequently, searching by "name" doesn't work anymore:
dplyr::filter(bw$osm_multipolygons, name == "Tirol")
dplyr::filter(bw$osm_multipolygons, name == "Baden-Württemberg")
Tirol is working (no umlaut), Baden-Württemberg (with the ü) isn't.
I'm running R on a German Windows 10, R is running in English.
Best regards

The current working solution for me is described here:
Importing Open Street Map data gives wrong encoding
It is really annoying because
a) I am using a function which I don't understand
b) it seems to work for other people (#mrgrund)
on the other side, it seems I am not the only person with the problem.
Best regards

Related

i used FROM_geojson() to get data but letters are broken

I used FROM_geojson() to get geojson data from url.
The context contains korean language and i set default encoding in Rstudio global option as "UTF-18".
But context are broken like "湲몄뙂怨듭썝" which should be presented like "길쌈공원".
Do you have any idea to solve this problem?
(I'm not sure if it's encoding problem.. )
url="https://icloudgis.incheon.go.kr/server/rest/services/ParkScore_Dynamic/MapServer/1/query?outFields=*&where=1%3D1&f=geojson"
file_json=FROM_GeoJson(url_file_string = url)

Is there some way to change the characters encoding to its English equivalent IN R?

In R
I am extracting data from Pdf tables using Tabulizer library and the Name are on Nepali language
and after extracting i Get this Table
[1]: https://i.stack.imgur.com/Ltpqv.png
But now i want that column 2's name To change, in its English Equivalent
Is there any way to do this in R
The R code i wrote was
library(tabulizer)
location <- "https://citizenlifenepal.com/wp-content/uploads/2019/10/2nd-AGM.pdf"
out <- extract_tables(location,pages = 113)
##write.table(out,file = "try.txt")
final <- do.call(rbind,out)
final <- as.data.frame(final) ### creating df
col_name <- c("S.No.","Types of Insurance","Inforce Policy Count", "","Sum Assured of Inforce Policies","","Sum at Risk","","Sum at Risk Transferred to Re-Insurer","","Sum At Risk Retained By Insurer","")
names(final) <- col_name
final <- final[-1,]
write.csv(final,file = "/cloud/project/Extracted_data/Citizen_life.csv",row.names = FALSE)
View(final)```
It appears that document is using a non-Unicode encoding. This web site https://www.ashesh.com.np/preeti-unicode/ can convert some Nepali encodings to Unicode, which would display properly in R, assuming you have the right fonts loaded. When I tried it on the output of your code, it did something that looked okay to me, but I don't know Nepali:
> out[[1]][1,2]
[1] ";fjlws hLjg aLdf"
When I convert the contents of that string, I get
सावधिक जीवन बीमा
which looks to me something like the text on that page in the document. If it's actually written correctly, then converting it to English will need some Nepali speaker to do the translation: hopefully that's you, but if I use Google Translate, it gives
Term life insurance
So here's my suggestion: contact the owner of that www.ashesh.com.np website, and find out if they can give you the translation rules. Write an R function to implement them if you can't find one by someone else. Then do the English translations manually.

How do I scrape this text from a 2004 Wayback machine site/why is the code I'm running wrong?

note: I haven't asked a question here before, and am still not sure how to make this legible, so let me know of any confusion or tips on making this more readable
I'm trying to download user information from the 2004/06 to 2004/09 Internet Archive captures of makeoutclub.com (a wacky, now-defunct social network targeted toward alternative music fans, which was created in ~2000, making it one of the oldest profile-based social networks on the Internet) using r,* specifically the rcrawler package.
So far, I've been able to use the package to get the usernames and profile links in a dataframe, using xpath to identify the elements I want, but somehow it doesn't work for either the location or interests sections of the profiles, both of which are just text instead of other elements in the html. For an idea of the site/data I'm talking about, here's the page I've been texting my xpath on: https://web.archive.org/web/20040805155243/http://www.makeoutclub.com/03/profile/html/boys/2.html
I have been testing out my xpath expressions using rcrawler's ContentScraper function, which extracts the set of elements matching the specified xpath from one specific page of the site you need to crawl. Here is my functioning expression that identifies the usernames and links on the site, with the specific page I'm using specified, and returns a vector:
testwaybacktable <- ContentScraper(Url = "https://web.archive.org/web/20040805155243/http://www.makeoutclub.com/03/profile/html/boys/2.html", XpathPatterns = c("//tr[1]/td/font/a[1]/#href", "//tr[1]/td/font/a[1]"), ManyPerPattern = TRUE)
And here is the bad one, where I'm testing the "location," which ends up returning an empty vector
testwaybacklocations <- ContentScraper(Url = "https://web.archive.org/web/20040805155243/http://www.makeoutclub.com/03/profile/html/boys/2.html", XpathPatterns = "//td/table/tbody/tr[1]/td/font/text()[2]", ManyPerPattern = TRUE)
And the other bad one, this one looking for the text under "interests":
testwaybackint <- ContentScraper(Url = "https://web.archive.org/web/20040805155243/http://www.makeoutclub.com/03/profile/html/boys/2.html", XpathPatterns = "//td/table/tbody/tr[2]/td/font/text()", ManyPerPattern = TRUE)
The xpath expressions I'm using here seem to select the right elements when I try searching them in the Chrome Inspect thing, but the program doesn't seem to read them. I also have tried selecting only one element for each field, and it still produced an empty vector. I know that this tool can read text in this webpage–I tested another random piece of text–but somehow I'm getting nothing when I run this test.
Is there something wrong with my xpath expression? Should I be using different tools to do this?
Thanks for your patience!
*This is for a digital humanities project will hopefully use some nlp to analyze especially language around gender and sexuality, in dialogue with some nlp analysis of the lyrics of the most popular bands on the site.
A late answer, but maybe it will help nontheless. Also I am not sure about the whole TOS question, but I think that's yours to figure out. Long story short ... I will just try to to adress the technical aspects of your problem ;)
I am not familiar with the rcrawler-package. Usually I use rvest for webscraping and I think it is a good choice. To achive the desired output you would have to use something like
# parameters
url <- your_url
xpath_pattern <- your_pattern
# get the data
wp <- xml2::read_html(url)
# extract whatever you need
res <- rvest::html_nodes(wp,xpath=xpath_pattern)
I think it is not possible to use a vector with multiple elements as pattern argument, but you can run html_nodes for each pattern you want to extract seperately.
I think the first two urls/patterns should work this way. The pattern in your last url seems to be wrong somehow. If you want to extract the text inside the tables, it should probably be something like "//tr[2]/td/font/text()[2]"

Can R transform emoji characters to their text equivalents?

In my question yesterday, "Can R read html-encoded emoji characters?", user rensa noted that:
As far as I'm aware, there's no solution to printing emoji in the R console: they always come out as "U0001f600" (or what have you). However, the packages I described above can help you plot emoji in some circumstances (I'm hoping to expand ggflags to display arbitrary full-colour emoji at some point). They can also help you search for emoji to get their codes, but they can't get names given the codes AFAIK. But maybe you could try importing the emoji list from emojilib into R and doing a join with your data frame, if you've extracted the emoji codes into a column, to get the English names.
How would this look in R?
(Note: I'm posting this question with the intention of answering it immediately, rather than posting this in the question linked above, since it's tangential to that question, but still possibly of use to others.)
The approach below works for transforming an emoji character or unicode representation into a name.
I am happy to release the code snippet below under a CC0 dedication (i.e., putting this implementation into the public domain for free reuse).
# Get (MIT-licensed) emojilib data:
emoji_json_file <- "https://raw.githubusercontent.com/muan/emojilib/master/emojis.json"
json_data <- rjson::fromJSON(paste(readLines(emoji_json_file), collapse = ""))
get_name_from_emoji <- function(emoji_unicode, emoji_data = json_data){
emoji_evaluated <- stringi::stri_unescape_unicode(emoji_unicode)
# names(json_data)
vector_of_emoji_names_and_characters <- unlist(
lapply(json_data, function(x){
x$char
})
)
name_of_emoji <- attr(
which(vector_of_emoji_names_and_characters == emoji_evaluated)[1],
"names"
)
name_of_emoji
}
get_name_from_emoji("\\U0001f917")
# [1] "hugs"
get_name_from_emoji("🤗") # An attempt actually pasting the hugs emoji in also works.
# [1] "hugs"

Skimr - cant seem to produce the histograms

came across this seemingly new package - skimr, which looks pretty nifty, and was trying it out and looks like I'm missing some package installation. Skim works fine except that it doesn't print the histogram, it is supposed to print for numeric variables. I am merely trying the examples given in the documentation.
Link to skimr documentation here - https://github.com/ropenscilabs/skimr#skimr
this is the code I'm using
devtools::install_github("hadley/colformat")
devtools::install_github("ropenscilabs/skimr")
library(skimr)
a<-skim(mtcars)
dim(a)
View(a)
instead of histograms being printed, I see some ASCII/unicode characters .
A solution that can be used to workaround the above problem is to set the locale of the R system to Chinese and to set the font of the R console to NSimSun.
temp <- tempfile()
cat("font = NSimSun\n", file = temp, append = TRUE)
loadRconsole(file = temp)
Sys.setlocale( locale='Chinese' )
library(skimr)
(a <- skim(mtcars))
View(a)
In RStudio this solution works only partially. Histograms generated by skim can be visualized only using View after setting the locale of R to Chinese
Sys.setlocale( locale='Chinese' )
library(skimr)
a <- skim(mtcars)
View(a)
Hope this can help you.

Resources