How to output string data on separate lines on Rmarkdown document - r

I am using a dataframe (called "Survey") from survey data in which the final question is an open comment box. Hence, my final vector in the dataframe consists of string data. I am attempting to create a report in R Markdown in which each comment (each row of that string vector) appears on a separate line in the outpout. My first attempt was to simply insert the variable name into a line of r code within my markdown window, as such:
r Survey$Comments
This resulted in the comments all appearing in one big chunk, with a comma separating each comment. I then attempted to use the "cat" function as follows:
r cat(Survey$Comments, sep="\n")
When I run this code in my regular R console window (not in R Markdown), it gives me the output I want (each comment on its own line), but does not work the same way when I run it in markdown. I'm at a loss as to how to get the output I need, and thought I'd turn to the broader community to see if anyone has any advice.

We can use writeLines.
writeLines(c("Line1", "Line2", "Line3"))
Line1
Line2
Line3

Related

Inline code in r markdown is printed 'as is'

While knitting an R markdown document the inline code is printed 'as is', for example:
- The number of patients in the dataframe is `n_distinct(med1$patients)`.
Is knitted exactly the same:
The number of patients in the dataframe is n_distinct(med1$patients).
The code is not evaluated, rather the text is formatted as code. In a previous question someone suggested adding brackets but it doesn't work for me.
Any suggestions will be much appreciated.
I stumbled across the solution. I had to write it this way:
The number of patients in the data frame is `r n_distinct(med1$patients)`.
With the extra R in the code made it run as desired.

Importing data from Excel to vector in R

I am a novice in R and I have been having some trouble trying to get R and Excel to cooperate.
I have written a code that makes it able to compare two vectors with each other and determine the differences between them:
data.x<-read.csv(file.choose(), header=T)
data.y<-read.csv(file.choose(), header=T)
newdata.x<-grep("DAG36|G379",data.x,value=TRUE,invert=TRUE)
newdata.x
newdata.y<-grep("DAG36|G379",data.y,value=TRUE,invert=TRUE)
newdata.y
setdiff(newdata.x,newdata.y)
setdiff(newdata.y,newdata.x)
The data I want to transfer from Excel to R is a long row of numbers placed as so:
“312334-2056”, “457689-0932”, “857384-9857”,….,
There are about 350 of these numbers placed in their own separate cell along a single row.
I used the command: = """" & A1 & """" To put double quotes around every number in order for R to read it properly.
At first I tried to simply copy/paste the data directly into a vector in R, but it's as if R won’t read it as a single row of data and therefore splits it up.
I also tried to save the excel file as a CSV file but that didn’t work either.
Lastly I tried to open it directly in to R using the command:
data.x<- read.csv(file.choose(), header=T)
But as I type in: data.x and press enter it simply says:
<0 rows> (or 0-lenghts row.names)
I simply can’t figure out what I’m doing wrong. Any help would be greatly appreciated.
It's hard to access without a reproducible example, but you should be able to transpose the Excel file into a single column. Then import using read_csv from the readr package. Take a look at the tidyverse package, which will contain some great tools to import and work with this type of data.
I use https://github.com/tidyverse/readxl/. It makes it easy to maintain formatting from excel into type safe tibbles.
If you can share some sample data a working solution can be generated.

R: Capture output of regression in string [duplicate]

I have multiple regressions in an R script and want to append the regression summaries to a single text file output. I know I can use the following code to do this for one regression summary, but how would I do this for multiple?
rpt1 <- summary(fit)
capture.output(rpt1, file = "results.txt")
I would prefer not to have to use this multiple times in the same script (for rpt1, rpt2, etc.), and thus have separate text files for each result. I'm sure this is easy, but I'm still learning the R ropes. Any ideas?
You can store the result as a list and then use the capture.output
fit1<-lm(mpg~cyl,data=mtcars)
fit2<-lm(mpg~cyl+disp,data=mtcars)
myresult<-list(fit1,fit2)
capture.output(myresult, file = "results.txt")
If you want multiple output sent to a file then look at the sink function, it will redirect all output to a file until you call sink again. The capture.output function actually uses sink.
You might also be interested in the txtStart function (and friends) in the TeachingDemos package which will also include the commands interspersed with the output and gives a few more options for output formatting.
Eventually you will probably want to investigate the knitr package for ways of running a set of commands in a batch and nicely capturing all the output together nicely formatted (and documented).

Save column descriptions when exporting SAS dataset to CSV

I've been given SAS data which I'd like to export to CSV, so that I can analyze it using R. I've never used SAS but I found Efficiently convert a SAS dataset into a CSV, which explains how to convert to CSV using code like this:
proc export data=sashelp.class
outfile='c:\temp\sashelp class.csv'
dbms=csv
replace;
run;
This works, but I've noticed that I end up with what I'll call short column names in the CSV, whereas I see long, verbose column descriptions when I look at the data in SAS (i.e. using the SAS software).
I'd like to programmatically save those column descriptions to a txt file, so that I can read them into an R vector. In other words, I'm happy having the short column names in my CSV header (i.e. the first line of my CSV), but I'd like to also have a second file, with only one line, containing the longer column descriptions. How do I do that? I googled and didn't notice anything helpful.
To give an example, the long column descriptions I see in SAS might be something like "Number of elephants in Tanzania", with a corresponding short column name of "ElephTanz".
You can use the SAS "dictionary" library to access this kind of info. The following code creates a table work.column_labels that has two columns: the "short name" you're seeing and the longer label that appears when you view the data in SAS. (Note that the sashelp.class data doesn't happen to have labeled columns to this particular example will have the second column empty.)
proc sql;
create table work.column_lables as
select Name,label
from dictionary.columns
where libname = 'SASHELP'
and memname = 'CLASS';
quit;
Then you can export this table to a csv using code similar to what you already have.

The woes of endless columns in .csv data in R

So I have a bunch of .csv files that were output by a simulation. I'm writing an R script to run through them and make a histogram of a column in each .csv file. However, the .csv is written in such a way that R does not like it. When I was testing it, I had been originally opening the files in Excel and apparently this changed the format to one R liked. Then when I went back to run the script on the entire folder I discovered that R doesn't like the format.
I was reading the data in as:
x <- read.csv("synch-imit-characteristics-2-tags-2-size-200-cost-0.1run-2-.csv", strip.white=TRUE)
Error in read.table(test, strip.white = TRUE, header = TRUE) :
more columns than column names
Investigating I found that the original .csv file, which R does not like, looks different than after the test one I opened with excel. I copied and pasted the first bit below after opening it in notepad:
cost,0.1
mean-loyalty, mean-hospitality
0.9885449527316088, 0.33240076252915735
weight,1 of p1, 2 of p1,
However, in notepad, there is no apparent formatting. In fact, between rows there is no space at all, ie it is cost,0.1mean-loyalty,mean-hospitality0.988544, etc. So it is weird to me as well that when I cope and paste it from notepad it gets the desired formatting as above. Anyway, moving on, after I had opened it in excel it got transferred to this"
cost,0.1,,,,,,,,
mean-loyalty, mean-hospitality,,,,,,,,
0.989771257,0.335847092,,,,,,,,
weight,1 of p1, etc...
So it seems like the data originally has no separation between rows (but I don't know how excel figures it out, or copying and pasting it) but R doesn't pick up on this. Instead, it views it all as one row (and since I have 40,000+ rows, it doesn't have that many columns). I don't want to have to open and save every file in excel. Is there a way to get R to read the data as desired?
Since when I copy and paste it from notepad it had new lines for the rows, it seems like I just need R to read it knowing that commas separate columns on the same row and a return separates rows. I tried messing around with all the sep="" commands I could find. But I can't figure it out.
To first solve the Notepad issue:
You must have CR (carriage return, \r) characters between the lines (and no LF, \n characters, causing Notepad to see it as one line).
Some programs accept this as well as a new line character, some don't.
You can for example use Notepad++ to replace all '\r' with '\n' or '\r\n', using Replace wih the "Extended" option. First select View > Show Symbol > Show all characters, so see what you are doing.
Finally, to get back to R:
(As it was pointed out, R can actually handle CR as a newline)
read.csv assumes that you have non-empty header names in the first row, but instead you have:
cost,0.1
while later in the data you have a row with more than just two columns:
weight,1 of p1, 2 of p1,
This means that not all columns have a header name (and I wonder if 0.1 was supposed to be a header name anyway).
The two solutions can be:
add a header including all columns, or
as it was pointed out in a comment use header=F.

Resources