igraph.to.gexf encoding issue - r

I'm trying to export a network I've built in igraph to a gexf format using rgexf so I can use it in Gephi. My basic code is below, plus then I have a few formatting things (removing labels and such) which I haven't included below.
df <- read.csv ("<file>", header = TRUE, sep = ",")
df.network <- graph.data.frame(df, directed=F)
V(df.network)$type <-bipartite.mapping(df.network)$type
plot(df.network, vertex.label.cex = 0.8, vertex.label.color = "black", layout=layout_with_kk, asp=0)
I then use the following
g1.gexf <- igraph.to.gexf(df.network)
but I get the following error message:
Input is not proper UTF-8, indicate encoding !
Bytes: 0xED 0x6E 0x20 0x46
Error: 1: Input is not proper UTF-8, indicate encoding !
Bytes: 0xED 0x6E 0x20 0x46
and g1.gexf isn't written. Can anyone help with what might have gone wrong?

Related

R How to convert a byte in a raw vector into a ascii space

I am reading some very old files created by C code that consist of a header (ASCII) and then data. I use readBin() to get the header data. When I try to convert the header to a string it fails because there are 3 'bad' bytes. Two of them are binary 0 and the other binary 17 (IIRC).
How do I convert the bad bytes to ASCII SPACE?
I've tried some versions of the below code but it fails.
hd[hd == as.raw(0) | hd == as.raw(0x17)] <- as.raw(32)
I'd like to replace each bad value with a space so I don't have to recompute all the fixed data locations in parsing the string derived from hd.
I normally just go through a conversion to integer.
Suppose we have this raw vector:
raw_with_null <- as.raw(c(0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x00,
0x57, 0x6f, 0x72, 0x6c, 0x64, 0x21))
We get an error if we try to convert it to character because of the null byte:
rawToChar(raw_with_null)
#> Error in rawToChar(raw_with_null): embedded nul in string: 'Hello\0World!'
It's easy to convert to numeric and replace any 0s or 23s with 32s (ascii space)
nums <- as.integer(raw_with_null)
nums[nums == 0 | nums == 23] <- 32
We can then convert nums back to raw and then to character:
rawToChar(as.raw(nums))
#> [1] "Hello World!"
Created on 2022-03-05 by the reprex package (v2.0.1)

How to use write.table in R with up/down-arrows?

I've got a dataframe f in R with one column called utterance which contains lines with character strings like:
~↑I don't think I can↑~ and
↓carrying↓
Whenever I'm using
write.table(f, "C:/Users/...txt", sep="\t", quote=F, row.names=F, fileEncoding = "UTF-8")
to create a table in a .txt, Up and Down arrows are given like so in the created .txt file:
<U+2191> instead of the actual ↑
<U+2193> instead of the actual ↓
~<U+2191>I don't think I can<U+2191>~
<U+2193>carrying<U+2193>
How can I fix this problem to get the actual ↑ and ↓ in the txt files by using the correct settings for write.table in R? I'm using the standard text editor of Windows10 and Notepad++.
There are some advices in the Escaping from character encoding hell in R on Windows (and all other known articles on this topic) however those do not seem to be useful for this particular case as the ↑ and ↓ characters do not come under any natural language.
Good news
Write file as UTF-8 encoding in R for Windows
… when the R writes a UTF-8 text into a file on Windows, characters of
unsupported language are modified. In contrast, all characters are
written correctly in Mac OS.
Using binary
There is a solution for this problem. Writing a binary file instead of
a text file solves this. All applications handling a UTF-8 file in
Windows are using the same trick.
BOM
The BOM should not be used in UTF-8 files. This is what the Linux and
the Mac OS are doing. But the Windows Notepad and some applications
use the BOM. So, handling the BOM is needed, in spite of grammatically
wrong.
…
Solution
arrows.html (a sample UTF-8 file, used later in 70166451.r)
<!DOCTYPE html>
<html>
<head> <meta charset="utf-8"> </head>
<body>up=↑ ↑↓ down=↓</body>
</html>
70166451.r (partially commented script):
### my circumstances
setwd("D:\\BAT\\R")
filepath = '70166451.txt'
### ↓↓↓ adapted from https://tomizonor.wordpress.com/2013/04/17/file-utf8-windows/
BOM <- charToRaw('\xEF\xBB\xBF')
writeUtf8 <- function(xstr, filepath, forappend=F, bom=F) {
openmode <- ifelse(forappend, 'ab', 'wb')
con <- file( filepath, open=openmode)
if( !forappend && bom ) writeBin(BOM, con, endian="little")
# If the connection is open it is written from its current position:
writeBin(charToRaw(xstr), con, endian="little")
close(con)
}
### ↑↑↑ adapted from https://tomizonor.wordpress.com/2013/04/17/file-utf8-windows/
### hard-coded characters ↑ and ↓
aa <- "up ↑ (↑↓) ↓ down" # unworkable? (not solved)
aa <- "up \u2191 (↑↓) \u2193 down" # unworkable! (unsolvable?)
aa <- "up \u2191 (\u2191\u2193) \u2193 down" # workable! (solved here)
# print( c( 'aa ', Encoding(aa), aa ))
# "aa " "UTF-8" "up <U+2191> (<U+2191><U+2193>) <U+2193> down"
xx <- data.frame( myword = c(aa,toupper(aa)), word = c(toupper(aa),aa))
yy <- readr::format_tsv( xx, append = F, quote_escape = "none", eol = "\r\n")
writeUtf8( yy, filepath)
### characters read from a file
library(xml2)
rawHTML <- paste(readLines("arrows.html", encoding='utf-8'), collapse=" ")
aaa <- xml_text(read_html(charToRaw(rawHTML)))
# print( c( 'aaa', Encoding(aaa), aaa ))
# "aaa" "UTF-8" "up=<U+2191> <U+2191><U+2193> down=<U+2193>"
xxx <- data.frame( myword = c(aaa,toupper(aaa)), word = c(toupper(aaa),aaa))
yyy <- readr::format_tsv( xxx, append = T, quote_escape = "none", eol = "\r\n")
writeUtf8( yyy, filepath, forappend=T)
Result (one can Copy&Paste above code snippet to an open R Console window, or save and run using Rscript.exe as shown below):
pushd D:\bat\R & del 70166451*.txt & rscript 70166451.r & type 70166451*.txt & popd
70166451.txt
myword word
up ↑ (↑↓) ↓ down UP ↑ (↑↓) ↓ DOWN
UP ↑ (↑↓) ↓ DOWN up ↑ (↑↓) ↓ down
up=↑ ↑↓ down=↓ UP=↑ ↑↓ DOWN=↓
UP=↑ ↑↓ DOWN=↓ up=↑ ↑↓ down=↓

Chess PGN: Error in paste... result would exceed 2^31-1 bytes

I'm using the library bigchess in R to read in a 3GB PGN file, see source here: https://github.com/rosawojciech/bigchess
Reading it in like so:
df <- read.pgn(paste0(path, file), add.tags = c("UTCDate", "UTCTime", "WhiteElo", "BlackElo", "WhiteRatingDiff", "BlackRatingDiff", "WhiteTitle", "BlackTitle","TimeControl", "Termination"), n.moves = T, extract.moves = -1, stat.moves = T, big.mode = F, quiet = F, ignore.other.games = F)
I get the following error -
"Error in paste(subset(r2, tmp1 == "Movetext", select = c(tmp2))[, 1],
: result would exceed 2^31-1 bytes"
Based on internet searches this exists because R cannot store a character vector in memory that is greater than 2Gb. Is there a way to override this limit, or otherwise deal with the issue?

Reading RAW data from an R socket

To exchange data very fast between Python and R I programmed a rather dirty solution, which works. On linux and OSX. May it not be that I now have to get this working in windows.
The below code runs a python script that builds a raw vector which can be serialised by RApiSerialise to an R object.
COMMAND = "python"
PATH_TO_SCRIPT='/GetCassandraData.py'
QueryCassandra <- function(query){
allArgs = c(PATH_TO_SCRIPT, query)
output.connection <- rawConnection(raw(length = 0), "r+")
exec_wait(COMMAND, args = allArgs, std_out = output.connection)
output <- rawConnectionValue(output.connection)
close(output.connection)
final <- unserializeFromRaw(output)
return(final)
}
This works as intended on OSX & linux however, windows has the tendancy to put a 0x0d (Carriage return) byte before a 0x0a (line feed) byte which makes RApiSerialise unable to deserialise it.
I am now attempting to solve the problem by communicating through sockets but I do not seem to be able to find a way to read data from a make.socket() object to a raw vector.
I have tried:
data <- read.socket(datasocket)
Which resulted in:
Error in read.socket(datasocket) :
embedded nul in string: 'X\n\0\0\0\002\0\003\004\002\0\002\003\0'
The function read.socket() tries to read a string and doesn't accept null bytes.
Is there a way to read socket data to a raw vector in R?
R server-side:
library(sys)
COMMAND = "python"
PATH_TO_SCRIPT='/lengthCheck.py'
allArgs = c(PATH_TO_SCRIPT)
sys::exec_background(COMMAND, args = allArgs, std_out = TRUE, std_err = TRUE)
datasocket <- socketConnection(port = 1205, server = TRUE, open = "w+b", blocking = TRUE)
on.exit(close(datasocket))
datasize <- readBin(datasocket, what = "double")
data <- readBin(datasocket, what = "raw", n = datasize)
Python client-side:
import struct
import socket
import time
your_raw_array_to_send = bytearray([0x58, 0x0a, 0x00, 0x00, 0x00, 0x02, 0x00, 0x03, 0x04, 0x02, 0x00, 0x02, 0x03, 0x00])
arrayLength = len(your_raw_array_to_send)
datasize = struct.pack('d', arrayLength)
# Wait 100ms for R to set up a listening socket
time.sleep(.100)
client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
client_socket.connect(('localhost', 1205))
client_socket.send(datasize)
client_socket.send(your_raw_array_to_send)
client_socket.close()

why causes invalid format '%d in R?

The code given below is to convert binary files from float32 to 16b with scale factor of 10. I am getting error of invalidation of %d.
setwd("C:\\2001")
for (b in paste("data", 1:365, ".flt", sep="")) {
conne <- file(b, "rb")
file1<- readBin(conne, double(), size=4, n=360*720, signed=TRUE)
file1[file1 != -9999] <- file1[file1 != -9999]*10
close(conne)
fileName <- sprintf("C:\\New folder (11)\\NewFile%d.bin", b)
writeBin(as.integer(file1), fileName, size = 2)
}
Result:
Error in sprintf("C:\\New folder (11)\\NewFile%d.bin", :
invalid format '%d'; use format %s for character objects
I used %s as suggested by R.But the files from 1:365 were totally empty
The %d is a placeholder for a integer variable inside a string. Therefore, when you use sprintf(%d, var), var must be an integer.
In your case, the variable b is a string (or a character object). So, you use the placeholder for string variables, which is %s.
Now, if your files are empty, there must be something wrong elsewhere in your code. You should ask another question more specific to it.

Resources