R: Add paste() elements to file - r

I'm using base::paste in a for loop:
for (k in 1:length(summary$pro))
{
if (k == 1)
mp <- summary$pro[k]
else
mp <- paste(mp, summary$pro[k], sep = ",")
}
mp comes out as one big string, where the elements are separated by commas.
For example mp is "1,2,3,4,5,6"
Then, I want to put mp in a file, where each of its elements is added to a separate column in the same row. My code for this is:
write.table(mp, file = recompdatafile, sep = ",")
However, mp just appears in the CSV as one big string as opposed to being divided up. How can I achieve my desired format?
FYI
I've also tried converting mp to a list, and strsplit()-ing it, neither of which have worked.
Once I've added summary$pro to the file, how can I also add summary$me (which has the same format), in one row with multiple columns?
Thanks,
n.i.

If you want to write something to a file, write.table() isn't the only way. If you want to avoid headers and quotes and such, you can use the more direct cat. For example
cat(summary$pro, sep=",", file="filename.txt")
will write out the vector of values from summary$pro separated by commas more directly. You don't need to build a string first. (And building a string one element at a time as you did above is a bad practice anyway. Most functions in R can operate on an entire vector at a time, including paste).

Related

How to use loop function to rename multiple files' name after read them in R?

I have multiple files under the folder of "rawdata", after read them and assigned them as seperated dataset, i also want to rename them from "dataset 1 a.csv" to "dataset1".
I wrote the code that achieve the first goal,using the first loop read all files as a list, then use the second loop to unset the list: ldf. But I don't know where I should add the code to let R change all file's name at once? I tried to add str_replace_all (names(ldf)," ", "-") at different places, but all returned wrong outputs, and it cannot solve the issue of getting rid of ".csv". Thanks a lot~~
Here is my code:
datanames<-list.files(here("rawdata"))
ldf<-list()
for (i in (datanames)){
ldf[[i]]<-import(here("rawdata",i))
for (j in names(ldf)){
assign(j,ldf[[j]], .GlobalEnv)
}
}
I'm not sure the pattern of the name you want to replace, but if it is blank-number-blank-letter.csv, use gsub to remove. You then appear to want to add the index to the name, so paste0 with index i.
I'm not sure how you will import,but can use read.csv
Assign will assign the name.
lapply(1:length(list.files()), function(i) assign(paste0(gsub(" [0-9] [a-z].csv", "", list.files()[i]),i), read.csv(list.files()[i])))

Adding new column and its value based on file name

I have 10 data files in my current directory such as data-01, data-02, data-03, data-04 till data-10.
Each of these data files has a few hundred rows with 4 fields. I would like to add a new column name "ID" and keep its ID like 01 (for data file data-01) for all the rows in that file.
A base R solution using a loop would go like this:
df<- c()
for (x in list.files(pattern="*.csv")) {
u<-read.table(x)
u$Label = factor(x)
df <- rbind(df, u)
cat(x, "\n ")
}
This depends on your data files having the same number of columns (though you get get around that inside the loop by selecting which columns you need before rbind) and then you can set whichever filetype you are looking at. The cat is useful because you can better trace read problems (because there are always problems). I bet there is a better way to do this with apply as well.

Read in data from .txt file (no header, no separator)

I have a large dataset (~ 200MB) stored in a .txt-file which I need to read into R. Unfortunately there are no separators (like " " or ",") between the values of the variables and there is no header file.
But there is a codebook, which gives the variable names and also specifies which column belongs to which variable. Some of the variable take one column of space, some take more (so read.fwf won't work); but their width is the same for all cases.
I possibly only have to read in a few of these variables, so I expect that I will just have to select the necessary columns and name the variables. What would be an elegant solution to do this (and maybe even preselect meaningful variable types)?
You can consider loading the data as is and then parsing each line using 'strsplit' with appropriate regular expression.
con <- file("yourfile.txt", open = "r")
lines <- readLines(con)
Iterate it over, apply strsplit to each line and add that to your data table with rbind.

readline is considering every record in the spreadsheet as a new line [R]

I am trying to create a function that will calculate the frequency count of keywords using TM package. The function works fine if the text pasted from readline is on free form text without a new line. The problem is, when I paste a bunch of text copied from a spreadsheet, readline considers it as a new line.
keyword <- function() {
x <- readline(as.character('Input text here: '))
x <- Corpus(VectorSource(x))
...
tdm <- TermDocumentMatrix(x)
...
tdm
}
Here's the full code: https://github.com/CSCDataAnalytics/PM-Analysis/blob/master/Keyword.R
How can I prevent this from happening or at least consider a bunch of text of every row from the spreadsheet as one vector only?
If I'm understanding you correctly, the problem is when the user pastes the text from another application: the newline is causing R to stop accepting the subsequent lines.
One technique (fragile as it may be) is to look for a specific line, such as an empty line "" or a period ".". It's a little fragile because now you need (1) assurance that the data will "never" include that as a whole line, and (2) it is easily appended by the user.
Try:
endofinput <- ""
totalstr <- ""
while(! endofinput == (x <- readline('prompt (empty string when done): ')))
totalstr <- paste(totalstr, x)
In this case, the empty string is the catch, and when the while loop is done, totalstr contains all input separated by a space (this can be changed in the paste function).
NB: one problem with this technique is that it is "growing" the vector totalstr, which will eventually cause performance penalties (depending on the size of the input data): every loop iteration, more memory is allocated and the entire string is copied plus the new line of text. There are more verbose ways to side-step this problem (e.g., pre-allocate a vector larger than your anticipated input data), but if you aren't anticipated 1000s of lines then you may be able to accept this naive programming for simplicity.
Another option would be to have the user save the data to a text file and use file.choose() and readLines() to get your data.
Try collapsing the data into a single string after using readline
x <- paste(readline(as.character('Input text here: ')), collapse=' ')

span across columns with hwrite

Is it possible to span a heading across multiple columns with hwrite (or any other HTML-creating package)? I can sort of fake it with dataframe pieces nested within a larger table, but it's not quite a real span (and it looks ugly).
I did not see a version of this in the examples but maybe there exists elsewhere.
Thanks,
Tom
Edit: I should add that the print.xtable method does html, also (I shouldn't assume that is known). Use the type = "html" option.
No experience with html, but I do the following with LaTeX.
In the xtable package, the print.xtable method has an option add.to.row that allows you to do just that. For add.to.row you add a list-of-lists, where the first list is a list of row numbers and the second list is a list of commands to insert at that spot. From the ?print.xtable:
add.to.row -- a list of two
components. The first component (which
should be called 'pos') is a list
contains the position of rows on which
extra commands should be added at the
end, The second component (which
should be called 'command') is a
character vector of the same length of
the first component which contains the
command that should be added at the
end of the specified rows. Default
value is NULL, i.e. do not add
commands.
For LaTeX I use the following homemade command that add a "(1)" above the coefficient and t-stat column.
my.add.to.row <- function(x) {
first <- "\\hline \\multicolumn{1}{c}{} & "
middle <- paste(paste("\\multicolumn{2}{c}{(", seq(x), ")}", sep = ""), collapse = " & ")
last <- paste("\\\\ \\cline {", 2, "-", 1 + 2 * x, "}", collapse = "")
string <- paste(first, middle, last, collapse = "")
list(pos = list(-1), command = string)
}
HTH.
I can't see an obvious way of generating a table with headers that cross multiple columns. Here's a really awful hack that might solve your problem though.
Generate your table as normal.
In the source code for that page, the first row of the table will look something like
<td someattribute="somevalue">First column name</td><td someattribute="somevalue">Second column name</td>
You can read the file into R, either with htmlTreeParse from the XML package, or plain old readLines.
Now replace the offending bit of html with the correct value. The stringr package may well help here.
<td someattribute="somevalue" colspan="2">Column name spanning two columns</td>
And write back out to file.

Resources