How can I decode Base 64 characters like <e9> (é) or <b0> (°) in a table that was saved with write.table without the UTF-8 option?
Apologies if the answer is obvious, but the R documentation pages mention nothing else than enc2utf8() (which does not work here).
Note: if there is no solution, I know I can either gsub() the whole thing, but that would be long and messy, or I can generate the data again, but that would take some real time (crawler data).
Related
I'm doing a swirl lesson.
This is the problem:
Edit the string inside writeLines() so that it correctly displays
(with the line breaks in these positions)
This is a really
really really
long string
I tried typing
writeLines("This is a really\n\nreally really\n\nlong string")
but the swirl lesson keeps telling me that it is incorrect. Is there a different way to write the same thing?
Swirl is generally very strict about the answer, as it would be time consuming and difficult to put in ways to check for all the potentially correct answers.
As a matter of fact the answer is writelines("This is a really \nreally really \nlong string") (see here). You have the newline \n doubled, so Swirl won't accept that as an answer.
Hi guys I encrypted school project but my AES saved txt has been deleted, I pictured it before and I filled a new file. But new AES key file is not equal to the typed in jpeg file. Which character is wrong I couldn't find it. Could you please help me.
Pic : https://i.stack.imgur.com/pAXzl.jpg
Text file : http://textuploader.com/dfop6
If you directly convert bytes with any value to Unicode you may lose information because some bytes will not correspond to a Unicode character, a whitespace character or other information that cannot be easily distinguished in printed out form.
Of course there may be ways to brute force your way out of this, but this could easily result in very complex code and possibly near infinite running time. Better start over, and if you want to use screen shots or similar printed text: base 64 or hex encode your results; those can be easily converted back.
Is there a native method in R to test if a file on disk is an ASCII text file, or a binary file? Similar to the file command in Linux, but a method that will work cross platform?
The file.info() function can distinguish a file from a dir, but it doesn't seem to go beyond that.
If all you care about is whether the file is ASCII or binary...
Well, first up definitions. All files are binary at some level:
is.binary <- function(file){
if(system.type() != "quantum computer"){
return(TRUE)
}else{
return(cat=alive&dead)
}
}
ASCII is just an encoding system for characters. It is therefore impossible to tell if a file is ASCII or binary, because ASCII-ness is a matter of interpretation. If I save a file and decide that binary number 01001101 is Q and 01001110 is Z then you might decode this as ASCII but you'll get the wrong message. Luckily the Americans muscled in and said "Hey, everyone use ASCII to code their text! You get 128 characters and a parity bit! Woo! Go USA!". IBM tried to tell people to use EBCDIC but nobody listened. Which was A Good Thing.
So everyone was packing ASCII-coded text into their 8-bit bytes, and using the eighth bit for parity checking. But then people stopped doing parity checking because TCP/IP handled all that, which was also A Good Thing, and the eighth bit was expected to be zero. If not, there was trouble.
Because people (read "Microsoft") started abusing the eighth bit, and making up their own encoding schemes, and so unless you knew what encoding scheme the file was using, you were stuffed. And the file very rarely told you what encoding scheme it was. And now we have Unicode and even more encoding schemes. And that is a third Good Thing. But I digress.
Nowadays when people ask if a file is binary, what they are normally asking is "Does any byte in this file have it's highest bit set?". Which you can do in R by reading a raw file connection as unsigned integers and testing the highest value. Something like:
is.binary <- function(filepath,max=1000){
f=file(filepath,"rb",raw=TRUE)
b=readBin(f,"int",max,size=1,signed=FALSE)
return(max(b)>128)
}
This will by default test only at most the first 1000 characters. I think the file command does something similar.
You may want to change the test to check for printable character codes, and whitespace, and line feed, carriage return, and other codes you might want to consider plausible in your non-binary files...
Well, how would you do that? I guess you can't without reading (parts or all of) the file, which is why files extensions are used to signal content type.
I looked into that years ago---and as I recall, the file(1) apps actually reads the first few header bytes of a file and compares that to what is stored in a lookup table. Sounds like a good candidate for an add-on package to me..
The example section of the manual for ?raw uses this:
isASCII <- function(txt) all(charToRaw(txt) <= as.raw(127))
Am using the RODBC library to bring data into R. I have a long query that I want to pass a variable to, much like this SO user.
Problem is that R interprets the whitespace/carriage returns in my query as a newline '\n'.
The accepted solution for this question suggests to simply break up the text into chunks and then paste() together - which works, but ideally I'd like to keep the whitespace intact - makes it easier to test/verify the behavior of the query over in the database before pasting into R.
In other languages I'm familiar with there's a simple line continuation character - indeed, several of the comments on the accepted answer are looking for an approach similar to python's \.
I found an aside to a workaround using strwrap deep in the bowels of an R discussion lists, so in the interest of making the internet better I will post it here. However, if someone can point the direction toward a more elegant/straightforward solution, I will happily accept your answer.
I don't know if you will find this helpful or not, but I have eventually gravitated towards keeping my SQL separate from my R scripts. Keeping the query in my R script, except for very very short ones, I find gets unreadable very quickly.
These days, I tend to keep queries that are more than a single line in their own separate .sql file. Then I can keep them nice and formatted and readable in a nice text editor, and read them into R as needed via something like this:
read_sql <- function(path){
stopifnot(file.exists(path))
sql <- readChar(path,nchar = file.info(path)$size)
sql
}
For binding parameters into the queries, I just keep a %s where the parameter will go in the .sql file, and then add in the parameters in R using sprintf.
I've been much happier this way, as I was finding that cluttering up my R scripts with really long paste statements and multi-line character objects was making my code really hard to read.
R's strwrap will destroy whitespace, including newline characters, per the documentation.
Essentially, you can get the desired behavior by initially letting R introduce line breaks/newline \ns, and then immediately stripping them out.
#make query using PASTE
query_1 <- paste("SELECT map.ps_studentid
,students.first_name || ' ' || students.last_name AS full_name
,map.testritscore
,map.termname
,map.measurementscale
FROM map$comprehensive_with_growth map
JOIN students
ON map.ps_studentid = students.id
WHERE map.termname = '",map_term,"'", sep='')
#remove newline characters introduced above.
#width is an arbitrary big number-
#it just needs to be longer than your string.
query_1 <- strwrap(query_1, width=10000, simplify=TRUE)
#execute the query
map_njask <- sqlQuery(XE, query_1)
query <- gsub(pattern='\\s',replacement="",x=query)
Try using sprintf to get variable substitution, and then replacing all newlines and whitespace.
See my answer to a similar question for details.
Setting:
I have (simple) .csv and .dat files created from laboratory devices and other programs storing information on measurements or calculations. I have found this for other languages but nor for R
Problem:
Using R, I am trying to extract values to quickly display results w/o opening the created files. Hereby I have two typical settings:
a) I need to read a priori unknown values after known key words
b) I need to read lines after known key words or lines
I can't make functions such as scan() and grep() work.
c) Finally I would like to loop over dozens of files in a folder and give me a summary (to make the picture complete: I will manage this part)
I woul appreciate any form of help.
ok, it works for the key value (although perhaps not very nice)
variable<-scan("file.csv", what=character(),sep="")
returns a charactor vector of everything
variable[grep("keyword", ks)+2] # + 2 as the actual value is stored two places ahead
returns characters of seaked values.
as.numeric(lapply(variable, gsub, patt=",", replace="."))
for completion: data had to be altered to number and "," and "." problem needed to be solved.
in a line:
data=as.numeric(lapply(ks[grep("Ks_Boden", ks)+2], gsub, patt=",", replace="."))
Perseverence is not to bad of an asset ;-)
The rest isn't finished, yet, I will post once finished.