Formatting a XLSX file in R into a custom text blob - r

I want to read a xlsx file and I want to convert the data in the file into a long text string. I want to format this string in an intelligent manner, such as each row is contained in parentheses “()”, and keep the data in a comma separated value string. So for example if this was the xlsx file looked like this..
one,two,three
x,x,x
y,y,y
z,z,z
after formatting the string would look like
header(one,two,three)row(x,x,x)row(y,y,y)row(z,z,z)
How would you accomplish this task with R?
my first instinct was something like this… but I can’t figure it out..
library(xlsx)
sheet1 <- read.xlsx("run_info.xlsx",1)
paste("(",sheet1[1,],")")

This works for me:
DF <- read.xlsx("run_info.xlsx",1)
paste0("header(", paste(names(DF), collapse = ","), ")",
paste(paste0("row(", apply(DF, 1, paste, collapse = ","), ")"),
collapse = ""))
# [1] "header(one,two,three)row(x,x,x)row(y,y,y)row(z,z,z)"

Related

how to read delimiter "|" vertical line in R external csv file

In R, how does one read delimiter or and also convert delimiter for "|" vertical line (ASCII: | |). I need to split on whole numbers inside the file, so strsplit() does not help me.
I have R code that reads csv file, but it still retains the vertical line "|" character. This file has a separator of "|" between fields. When I try to read with read.table() I get comma, "," separating every individual character. I also try to use dplyr in R for tab_spanner_delim(delim = "|") to convert the vertical line after the read.delim("file.csv", sep="|") read the file, even this read.delmin() does not work. I new to special char R programming.
read.table(text = gsub("|", ",", readLines("file.csv")))
dat_csv <- read.delim("file.csv", sep="|")
x <- cat_csv %>% tab_spanner_delim(delim = "|")
dput() from read.table(text = gsub("|", ",", readLines("file.csv")))
",\",R,D,|,I,|,7,8,|,0,1,0,|,0,0,1,2,|,8,8,1,0,1,|,1,|,7,|,1,0,5,|,1,1,6,|,1,9,9,9,1,2,2,0,|,0,0,:,0,0,|,|,A,M,|,6,|,|,|,|,|,|,|,|,|,|,|,|,|,\",",
",\",R,D,|,I,|,7,8,|,0,1,0,|,0,0,1,2,|,8,8,1,0,1,|,1,|,7,|,1,0,5,|,1,1,6,|,1,9,9,9,1,2,2,6,|,0,0,:,0,0,|,4,.,9,|,|,6,|,|,|,|,|,|,|,|,|,|,|,|,|,\","
dput() from dat_csv <- read.delim("file.csv", sep="|")
"RD|I|78|010|0012|88101|1|7|105|116|19991220|00:00||AM|6|||||||||||||",
"RD|I|78|010|0012|88101|1|7|105|116|19991226|00:00|4.9||6|||||||||||||"
dput(dat_csv)
"RD|I|78|010|0012|88101|1|7|105|116|19991220|00:00||AM|6|||||||||||||",
"RD|I|78|010|0012|88101|1|7|105|116|19991226|00:00|4.9||6|||||||||||||"
We can read the data line by line using readLines. Remove unwanted characters at the end of each line using trimws, paste the string into one string with new line (\n) character as the collapse argument and use this string in read.table to read data as dataframe.
data <- read.table(text = paste0(trimws(readLines('file.csv'),
whitespace = '[", ]'), collapse = '\n'), sep = '|')

Interchangeable simulating and writing data to a file

I'm experimenting with R and I try to interchangeably simulate and write data to a file. I tried out many variants for example:
connection<-file("file.txt", open="w")
for (i in 1:2){
X<-runif(3,0,1)
writeLines(as.character(X), con=connection, sep="\n")
}
close(connection)
But what I get is
0.442033957922831
0.0713443560525775
0.950616024667397
0.0807233764789999
0.186026858631521
0.658676357707009
instead of something like
0.442033957922831 0.0713443560525775 0.950616024667397
0.0807233764789999 0.186026858631521 0.658676357707009
Could you explain me what I'm doing wrong?
We can paste the elements in 'X' to a single string and then use sep='\n', otherwise after each element, it is jumping to nextline
connection<-file("file.txt", open="w")
for (i in 1:2){
X<-runif(3,0,1)
writeLines(paste(X, collapse=" "), con=connection, sep="\n")
}
close(connection)
-output
Instead of writing line by line in a for loop we can create the string once and write it in the text file in one-go.
We can use replicate to repeat the runif code n times, paste the numbers row-wise, and paste them again collapsing with a new line character.
temp <- paste0(apply(t(replicate(2, runif(3,0,1))), 1, paste, collapse = ' '),
collapse = '\n')
connection <- file("file.txt")
writeLines(temp, connection)
close(connection)
where temp gives us a string of length one which looks like this :
temp
#[1] "0.406911700032651 0.416268902365118 0.698520892066881\n0.96398281189613 0.834513065638021 0.655840792460367"
which looks in text file as :
cat(temp)
#0.406911700032651 0.416268902365118 0.698520892066881
#0.96398281189613 0.834513065638021 0.655840792460367

Remove comma which is a thousands separator in R

I need to import a bunch of .csv files into R. I do this using the following code:
Dataset <- read.csv(paste0("./CSV/State_level/",file,".csv"),header = F,sep = ";",dec = "," , stringsAsFactors = FALSE)
The input is an .csv file with "," as separator for decimal places. Unfortunately there are quite a few entries as follows: 20,012,054.
This should really be: 20012,054 and leads to either NAs but usually the whole df being imported as character and not numeric which I'd like to have.
How do I get rid of the first "," when looking from left to right and only if the number has more than 3 figuers infront of the decimal-comma?
Here is a sample of how the data looks in the .csv-file:
A data.frame might look like this:
df<-data.frame(a=c(0.5,0.84,12.25,"20,125,25"), b=c("1,111,054",0.57,105.25,0.15))
I used "." as decimal separator in this case to make it a number, which in the .csv is a ",", but this is not the issue for numbers in the format: 123,45.
Thank you for your ideas & help!
We can use sub to get rid of the first ,
df[] <- lapply(df, function(x) sub(",(?=.*,)", "", x, perl = TRUE))
Just to show it would leave the , if there is only a single , in the code
sub(",(?=.*,)", "", c("0,5", "20,125,25"), perl = TRUE)
#[1] "0,5" "20125,25"

copy from clipboard and tranform to comma separated string

I am trying to copy a single column dataframe from an excel spreadsheet and transform in R as a comma separated list.
read.table(file = "clipboard") %>% as.list()
the desired output would be a,b,c,d,e,f
file = "clipboard" looks for a file named clipboard. You need to change file to text and then use readClipboard() (without quotes). The next step is to unlist the values and then convert into a single string using paste
read.table(text = readClipboard()) %>% unlist %>% paste(collapse = ",")
#[1] "a,b,c,d,e,f"
Something like paste(readClipboard(), collapse = ",") could work too.

Import csv file with both tab and quotes as separators into R

I have a dataset in csv with separators as displayed below.
NO_CAND";"DS_CARGO";"CD_CARGO";"NR_CAND";"SG_UE";"NR_CNPJ";"NR_CNPJ_1";
CLODOALDO JOSÉ DE RAMOS";"Deputado Estadual";"7";"22111";"PB";"08126218000107";"Encargos financeiros e taxas bancárias";
I am using the function read.csv2 with options
mydataframe <- read.csv2("filename.csv",header = T, sep=";", quote="\\'", dec=",",
stringsAsFactors=F, check.names = F, fileEncoding="latin1")
The code reads in the data, but with all the quotes.
I have tried to delete the quotes using
mydataframe[,] <- apply(mydataframe[,], c(1,2), function(x) {
gsub("\\'", "", x)
})
but it doesn't work.
Any ideas on how I could import the data getting rid of these quotes?
Many thanks.
To delete the quotes, use lapply and gsub as follows.
mydataframe[] <- lapply(mydataframe, function(x) gsub("\"", "", x))
lapply iterates over all columns of the data frame and returns a list; by having mydataframe[] on the LHS of the assignment, you assign the results back into the data frame without losing its attributes (dimensions, names, etc). Also, you don't have any single quotes ' in your data, so searching for them won't achieve anything.

Resources