Export Cyrillic characters from R? - r

I have a dataset where one of the columns includes Russian words:
raw_data2 = structure(list(word = c("абрикос",
"автомобиль",
"аист",
"ананас",
"апрель",
"атака",
"баклажан"),
subject_nr = c(3L, 21L, 12L, 17L, 8L, 1L, 17L),
acc = c(98.976109215, 91.8803418803, 94.8979591837, 94.5273631841, 94.4444444444, 94.5355191257, 94.3661971831)),
row.names = c(1L, 100L, 200L, 300L, 400L, 500L, 600L),
class = "data.frame")
When I look at the file in RStudio there's no problem:
However, when I export the data into a table to work with them further in Excel I get this UTF-mess which Excel cannot convert back into Russian words (even when UTF-8 is chosen during data importing):
"word";"subject_nr";"acc"
"<U+0430><U+0431><U+0440><U+0438><U+043A><U+043E><U+0441>";3;98,976109215
"<U+0430><U+0432><U+0442><U+043E><U+043C><U+043E><U+0431><U+0438><U+043B><U+044C>";21;91,8803418803
"<U+0430><U+0438><U+0441><U+0442>";12;94,8979591837
"<U+0430><U+043D><U+0430><U+043D><U+0430><U+0441>";17;94,5273631841
"<U+0430><U+043F><U+0440><U+0435><U+043B><U+044C>";8;94,4444444444
"<U+0430><U+0442><U+0430><U+043A><U+0430>";1;94,5355191257
"<U+0431><U+0430><U+043A><U+043B><U+0430><U+0436><U+0430><U+043D>";17;94,3661971831
Is there any way to force R to replace those strings with corresponding Cyrillic letters when saving the table? It certainly "knows" what these letters are, since it shows them in preview. I use the following code (which does not work):
write.table(raw_data2,
file = "raw_data2.csv",
append = FALSE,
quote = TRUE,
sep = ";",
eol = "\n",
na = "NA",
dec = ",",
row.names = FALSE,
col.names = TRUE,
qmethod = c("escape", "double"),
fileEncoding = "UTF-8")

Works fine for me if you write it to xlsx file.
openxlsx::write.xlsx(raw_data2, 'temp.xlsx')

For me, Sys.setlocale("LC_CTYPE", "russian") works well
(code source: https://www.r-bloggers.com/2013/01/r-and-foreign-characters/)

Related

How to get an specific data from a json column in a dataframe (R)?

I am working with an dataset that contains this kind of column, that looks like a json structure
enter image description here
I am trying to get only the occupation, how can I access that?
I tried to replace some symbols to convert into a chr vector
data$speakers <- str_replace(data$speakers, "[{", "(")
data:
structure(list(X_id = c(21L, 1L, 7L, 47L, 55L), duration = c(992L,
957L, 1266L, 1126L, 1524L), event = c("TED2006", "TED2006", "TED2006",
"TEDGlobal 2005", "TED2006"), likes = c("17000", "110000", "60000",
"80000", "14000"), published_date = structure(c(1156464660, 1151367060,
1151367060, 1158019860, 1153786260), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), related_videos = c("[\"144\",\"1282\",\"1379\",\"87\",\"2302\",\"2638\"]",
"[\"243\",\"547\",\"2093\",\"74405\",\"64693\",\"83767\"]", "[\"1725\",\"2274\",\"172\",\"2664\",\"2464\",\"1268\"]",
"[\"2237\",\"701\",\"1095\",\"1386\",\"76211\",\"242\"]", "[\"2228\",\"1476\",\"800\",\"2890\",\"45233\",\"2694\"]"
), speakers = c("[{\"name\":\"Mena Trott\",\"occupation\":\"Blogger; cofounder, Six Apart\"}]",
"[{\"name\":\"Al Gore\",\"occupation\":\"Climate advocate\"}]",
"[{\"name\":\"David Pogue\",\"occupation\":\"Technology columnist\"}]",
"[{\"name\":\"David Deutsch\",\"occupation\":\"Physicist, author\"}]",
"[{\"name\":\"Jehane Noujaim\",\"occupation\":\"Filmmaker\"}]"
)), row.names = c(NA, 5L), class = "data.frame")
I got some help and it worked!
"[{"name":"Mena Trott","occupation":"Blogger; cofounder, Six Apart"}]" |>
jsonlite::fromJSON() |> as_tibble() |> select(occupation) |> as.vector()

R: replace column values with string

I have a set of BAM files within the chr16_bam directory and a sgseq_sam.txt file.
I want to replace the file_bam column values with the full path where the BAM files are stored.
My code hasn't been able to achieve that.
bamPath = "C:/Users/User/Downloads/chr16_bam/"
samFile <- read.delim("C:/Users/User/Downloads/sgseq_sam.txt", header=T)
for (i in samFile[,2]) {
p <- gsub(i, bamPath, samFile)
}
> dput(samFile)
structure(list(sample_name = c("N60", "N11", "T132", "T114"),
file_bam = c("60.bam", "11.bam", "132.bam", "114.bam"), paired_end = c(TRUE,
TRUE, TRUE, TRUE), read_length = c(75L, 75L, 75L, 75L), frag_length = c(1075L,
1466L, 946L, 1154L), lib_size = c(2589976L, 5153522L, 4429912L,
3131400L)), class = "data.frame", row.names = c(NA, -4L))
library(tidyverse)
sam_file <- sam_file %>%
mutate(file_bam = paste0(bamPath, file_bam))
Or, alternately, in base R:
data_file$file_bam <- paste0(bamPath, data_file$file_bam)

Grouped barplots in R using csv

I have a 3 column csv file like this
x,y1,y2
100,50,10
200,10,20
300,15,5
I want to have a barplot using R, with first column values on x axis and second and third columns values as grouped bars for the corresponding x. I hope I made it clear. Can someone please help me with this? My data is huge so I have to import the csv file and can't enter all the data.I found relevant posts but none was exactly addressing this.
Thank you
Use the following code
library(tidyverse)
df %>% pivot_longer(names_to = "y", values_to = "value", -x) %>%
ggplot(aes(x,value, fill=y))+geom_col(position = "dodge")
Data
df = structure(list(x = c(100L, 200L, 300L), y1 = c(50L, 10L, 15L),
y2 = c(10L, 20L, 5L)), class = "data.frame", row.names = c(NA,
-3L))

import csv-table into R and got multiple errors

As you can see I would like to read a csv-table into my data-pool. The table has multiple columns but when i simply try following code:
reviews <- read.table("Sz-Iraki2.csv", fileEncoding = "UTF-8")
i get the error: Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 22 elements
When i Add header=True i get the error: more columns than column names. Seems like a basic problem but i can´t find the answer :(strong text
but should look like this
Data looks like this
You have to define a separator otherwise R fail to read data properly. Suppose your data structure is the following:
structure(list(month = 2:5, titles_tmp = structure(c(1L, 1L,
1L, 1L), .Label = "some text", class = "factor"), info_tmp = structure(c(1L,
1L, 1L, 1L), .Label = "More text", class = "factor"), unlist.text = structure(c(1L,
1L, 1L, 1L), .Label = "http://somelink.com", class = "factor")), .Names = c("month",
"titles_tmp", "info_tmp", "unlist.text"), class = "data.frame", row.names = c(NA,
-4L))
That means you separate each columns with single tab. Meaning you need to use sep = " " as a data separator. Provided your data file name is "df.csv" the following should import your data nicely:
df = read.csv("Sz-Iraki2.csv", sep= " ", fileEncoding = "UTF-8")
I like to use:
require(readr)
read_csv("myData.csv")
Seems more appropriate, if your file type is csv.
Also comes with some useful options like defining 'coltype' on import.

How can I add a title to a sunburstR graph and export it as .png or .jpeg

I've been looking to create a multilevel pie-chart (or doughnut chart) in R and the best I found was the package sunburstR, which I must say is a very promising tool.
The interactive functionality is great - however I don't really need it. I'd like to add a title and counts in the legend object and export the graph to an image format. Does this require advanced html coding? There is not much help material about this package on the web yet. Should I look into a difference package? The pie() function is for single level data and the example of geom_polar of ggplot2 I found on this forum does not seem to be appropriate for factors.
Here is an example of my dataset and sunburstR object - however my question is more general in nature and not specific to this example.
require(sunburstR)
data = gg=structure(list(V1 = structure(c(2L, 1L, 3L, 4L, 8L, 5L, 6L, 7L
), .Label = c("Pine Tree-Soft", "Pine Tree-Hard",
"Pine Tree-Long", "Pine Tree-Undecided", "Maple Tree-Red",
"Maple Tree-Green", "Maple Tree-Yellow",
"Maple Tree-Delicious"), class = "factor"), V2 = c(3L,
5L, 2L, 1L, 10L, 5L, 3L, 2L)), .Names = c("V1", "V2"), row.names = c(NA,
-8L), class = "data.frame")
sunburst(data)
Any help or suggestion would be appreciated. Thank you.
I will add an example just in case you might want to pursue this option, but it seems you would like to avoid extra coding.
require(sunburstR)
data = gg=structure(list(V1 = structure(c(2L, 1L, 3L, 4L, 8L, 5L, 6L, 7L
), .Label = c("Pine Tree-Soft", "Pine Tree-Hard",
"Pine Tree-Long", "Pine Tree-Undecided", "Maple Tree-Red",
"Maple Tree-Green", "Maple Tree-Yellow",
"Maple Tree-Delicious"), class = "factor"), V2 = c(3L,
5L, 2L, 1L, 10L, 5L, 3L, 2L)), .Names = c("V1", "V2"), row.names = c(NA,
-8L), class = "data.frame")
sb <- sunburst(
data,
count = TRUE, # add count just for demonstration
legend = list(w=250), # make extra room for our legend
legendOrder = unique(unlist(strsplit(as.character(data$V1),"-")))
)
# for the coding part
# to add some additional information in the legend,
# force show the legend,
# and disable toggling of the legend
htmlwidgets::onRender(
sb,
"
function(el,x) {
// force show the legend
// check legend
d3.select(el).select('.sunburst-togglelegend').property('checked',true);
// simulate click
d3.select(el).select('.sunburst-togglelegend').on('click')();
// change the text in the legend to add count
d3.select(el).selectAll('.sunburst-legend text')
.text(function(d) {return d.name + ' ' + d.value})
// remove the legend toggle
d3.select(el).select('.sunburst-togglelegend').remove()
}
"
)
You can generate a sunburst using the ggsunburst package.
It is based on ggplot2, so you can use ggsave to export as image.
Here there is an example using your data. All the information can be included in the plot, so I removed the legend
# install ggsunburst package
if (!require("ggplot2")) install.packages("ggplot2")
if (!require("rPython")) install.packages("rPython")
install.packages("http://genome.crg.es/~didac/ggsunburst/ggsunburst_0.0.9.tar.gz", repos=NULL, type="source")
library(ggsunburst)
df <- read.table(header = T, text = "
parent node size
Pine Hard 3
Pine Soft 5
Pine Long 2
Pine Undecided 1
Maple Delicious 10
Maple Red 5
Maple Green 3
Maple Yellow 2
")
write.table(df, 'df.csv', sep = ",", row.names = F)
sb <- sunburst_data('df.csv', type = "node_parent", sep = ",", node_attributes = "size")
p <- sunburst(sb, node_labels = T, leaf_labels = F, rects.fill.aes = "name") +
geom_text(data = sb$leaf_labels,
aes(x=x, y=y, label=paste(label, size, sep="\n"), angle=angle), size = 2) +
scale_fill_discrete(guide = F)
ggsave('sunburst.png', plot = p, w=4, h=4)

Resources