Save matrix to .csv file in R without losing format - r

I'm trying to write a matrix to a .csv file using the write.matrix from MASS but I'm having some problems.
When I print the matrix it look something like this
p q s S2 R2 R2adj Cp AIC PRESS
1 0 1 167.27779 27981.8583 NA NA 3679.294476 NA NA
2 1 2 160.32254 25703.3165 0.08866209 0.08142925 3343.909110 1666.993 3338167.3
3 1 2 86.73559 7523.0630 0.73326195 0.73114498 891.016823 1509.726 1045980.3
4 1 2 67.50458 4556.8690 0.83843145 0.83714916 490.815893 1445.555 693993.5
but when I do
write.matrix(moDat2, file = paste(targetPath, "dat2.csv", sep="/"), sep=",")
It save it to the file like this
p,q,s,S2,R2,R2adj,Cp,AIC,PRESS
0.000000e+00,1.000000e+00,1.672778e+02,2.798186e+04, NA, NA,3.679294e+03, NA, NA
1.000000e+00,2.000000e+00,1.603225e+02,2.570332e+04,8.866209e-02,8.142925e-02,3.343909e+03,1.666993e+03,3.338167e+06
1.000000e+00,2.000000e+00,8.673559e+01,7.523063e+03,7.332620e-01,7.311450e-01,8.910168e+02,1.509726e+03,1.045980e+06
Is there anyway I can save it to the file without the data getting transform to scientific notation?

You can use format inside your write.matrix call.
write.matrix(format(moDat2, scientific=FALSE),
file = paste(targetPath, "dat2.csv", sep="/"), sep=",")

The help(page for MASS::write.table does not suggest that there are controls available. This is what write.table's help page says about formating numbers:
"In almost all cases the conversion of numeric quantities is governed by the option "scipen" (see options), but with the internal equivalent of digits=15. For finer control, use format to make a character matrix/data frame, and call write.table on that."

Related

R Printing specific columns

I have this file test.csv. I have used -
test <- read.csv ("test.csv", check.names=FALSE)
To get it into R. I have used check.names as the column headers contains brackets and if I dont use it, they turn into periods which I have issues with when coding.
I have then done this-
sink(file='interest.txt')
print((test["test$log(I)">=1 & test$number >= 6 , "Name"]),)
My aim is to create a sink file so the print output is put into there. I wanted to print the value in the name column if the values for 2 columns (log(I) and number) equal a certain value.
log(I) Number Name
1.00 6 LAMP1
3.50 6 MND1
1.20 2 GGD3
0.98 7 KLP1
So in this example, the code would output just LAMP1 and MND1 to the sink file I created.
My issue is that I don't think R is recognising that log(I) is the header title as it seems to give me the same result with or without this part included.
If I dont use
check.names=FALSE
Then the column is turned to log.I. instead. How can I get around this issue?
Thanks

Maintaining long number string in R, avoiding scientific notation and not deleting leading zeroes

I am having an issue with R enforcing scientific notation and deleting my leading 0s. I have capture history over 24 intervals where an animal caught is marked 1, and 0 if not. I have this in excel .csv and .txt files.
I have tried loading into R via the .csv file then running:
with_options(c(scipen=999),(str_pad(data$capture.history,24, pad="0")))
This almost works but this adds an extra 0 in front of capture histories that don't need it and add a bunch of odd values in spots that aren't supposed to be there such as "010000009999999999934424" for some but not all of the histories.
I upload the .txt file using:
cjs <- read.table(file.choose(),header=TRUE, sep="\t", strip.white=TRUE)
And pretty much the same thing happens.
Perhaps have a look at the colClasses= argument, which allows you to tell read.csv() and friends how to treat each column, rather than letting them / forcing them to guess:
cat("char1, char2, num1\n01000, 00000004000, 0004\n", file="eg.csv")
read.csv("eg.csv")
# char1 char2 num1
# 1 1000 4000 4
read.csv("eg.csv", colClasses=c("character", "character", "numeric"))
# char1 char2 num1
#1 01000 00000004000 4

r functions will not recognise apostrophe in character string

I have a large data frame of survey data read from a .csv that looks like this when simplified.
x <- data.frame("q1" = c("yes","no","don’t_know"),
"q2" = c("no","no","don’t_know"),
"q3" = c("yes","don’t_know","don’t_know"))
I want to create a column using rowSums as below
x$dntknw<-rowSums(x=="don’t_know")
I can do it for all the yes and no answers easily, but In my dataframe it just generates zeros for the don’t_know's.
I previously had an issue with the apostrophe looking like this don’t_know. I added encoding = "UTF-8"to my read.table to fix this. However now I cant seem to get any R functions to recognise it, I tried gsub("’","",df) but this didnt work as with rowSums.
Is this a problem with the encoding? is there a regex solution to removing them? what solutions are there for dealing with this?
It is an encoding issue and not a regex one. I am unable to reproduce the issue and my encoding is set as UTF-8 in R. Try by setting the encoding to UTF-8 in default R rather than at the time of read.
here is my sample output with your code.
> x
q1 q2 q3 dntknw
1 yes no yes 0
2 no no don’t_know 1
3 don’t_know don’t_know don’t_know 3
> Sys.setlocale()
[1] "en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8"
Here is some more detail that may be helpful.
https://support.rstudio.com/hc/en-us/articles/200532197-Character-Encoding
As #Drj stated, it is probably an encoding error. When I paste your code into my console, I get
> x$q1
[1] yes no don<U+0092>t_know
Even if the encoding is off, you can still match it using regex:
grepl("don.+t_know", x$q1)
# [1] FALSE FALSE TRUE
Hence, you can calculate the row sums as follows:
x$dntknw <- rowSums(apply(x, 2, function(y) grepl("don.+t_know", y)))
Which results in
> x
q1 q2 q3 dntknw
1 yes no yes 0
2 no no don<U+0092>t_know 1
3 don<U+0092>t_know don<U+0092>t_know don<U+0092>t_know 3

R readr package - written and read in file doesn't match source

I apologize in advance for the somewhat lack of reproducibility here. I am doing an analysis on a very large (for me) dataset. It is from the CMS Open Payments database.
There are four files I downloaded from that website, read into R using readr, then manipulated a bit to make them smaller (column removal), and then stuck them all together using rbind. I would like to write my pared down file out to an external hard drive so I don't have to read in all the data each time I want to work on it and doing the paring then. (Obviously, its all scripted but, it takes about 45 minutes to do this so I'd like to avoid it if possible.)
So I wrote out the data and read it in, but now I am getting different results. Below is about as close as I can get to a good example. The data is named sa_all. There is a column in the table for the source. It can only take on two values: gen or res. It is a column that is actually added as part of the analysis, not one that comes in the data.
table(sa_all$src)
gen res
14837291 822559
So I save the sa_all dataframe into a CSV file.
write.csv(sa_all, 'D:\\Open_Payments\\data\\written_files\\sa_all.csv',
row.names = FALSE)
Then I open it:
sa_all2 <- read_csv('D:\\Open_Payments\\data\\written_files\\sa_all.csv')
table(sa_all2$src)
g gen res
1 14837289 822559
I did receive the following parsing warnings.
Warning: 4 parsing failures.
row col expected actual
5454739 pmt_nature embedded null
7849361 src delimiter or quote 2
7849361 src embedded null
7849361 NA 28 columns 54 columns
Since I manually add the src column and it can only take on two values, I don't see how this could cause any parsing errors.
Has anyone had any similar problems using readr? Thank you.
Just to follow up on the comment:
write_csv(sa_all, 'D:\\Open_Payments\\data\\written_files\\sa_all.csv')
sa_all2a <- read_csv('D:\\Open_Payments\\data\\written_files\\sa_all.csv')
Warning: 83 parsing failures.
row col expected actual
1535657 drug2 embedded null
1535657 NA 28 columns 25 columns
1535748 drug1 embedded null
1535748 year an integer No
1535748 NA 28 columns 27 columns
Even more parsing errors and it looks like some columns are getting shuffled entirely:
table(sa_all2a$src)
100000000278 Allergan Inc. gen GlaxoSmithKline, LLC.
1 1 14837267 1
No res
1 822559
There are columns for manufacturer names and it looks like those are leaking into the src column when I use the write_csv function.

Creating a vector from a file in R

I am new to R and my question should be trivial. I need to create a word cloud from a txt file containing the words and their occurrence number. For that purposes I am using the snippets package.
As it can be seen at the bottom of the link, first I have to create a vector (is that right that words is a vector?) like bellow.
> words <- c(apple=10, pie=14, orange=5, fruit=4)
My problem is to do the same thing but create the vector from a file which would contain words and their occurrence number. I would be very happy if you could give me some hints.
Moreover, to understand the format of the file to be inserted I write the vector words to a file.
> write(words, file="words.txt")
However, the file words.txt contains only the values but not the names(apple, pie etc.).
$ cat words.txt
10 14 5 4
Thanks.
words is a named vector, the distinction is important in the context of the cloud() function if I read the help correctly.
Write the data out correctly to a file:
write.table(words, file = "words.txt")
Create your word occurrence file like the txt file created. When you read it back in to R, you need to do a little manipulation:
> newWords <- read.table("words.txt", header = TRUE)
> newWords
x
apple 10
pie 14
orange 5
fruit 4
> words <- newWords[,1]
> names(words) <- rownames(newWords)
> words
apple pie orange fruit
10 14 5 4
What we are doing here is reading the file into newWords, the subsetting it to take the one and only column (variable), which we store in words. The last step is to take the row names from the file read in and apply them as the "names" on the words vector. We do the last step using the names() function.
Yes, 'vector' is the proper term.
EDIT:
A better method than write.table would be to use save() and load():
save(words. file="svwrd.rda")
load(file="svwrd.rda")
The save/load combo preserved all the structure rather than doing coercion. The write.table followed by names()<- is kind of a hassle as you can see in both Gavin's answer here and my answer on rhelp.
Initial answer:
Suggest you use as.data.frame to coerce to a dataframe an then write.table() to write to a file.
write.table(as.data.frame(words), file="savew.txt")
saved <- read.table(file="savew.txt")
saved
words
apple 10
pie 14
orange 5
fruit 4

Resources