turning text into paginated two-column format and pipe this into less - unix

I want to read a long text file in two-column format on my terminal. This means that the columns must be page-aware, so that text at the bottom of the first column continues at the top of the second column, but text at the bottom of the second column continues at the beginning of the first column after a page-down.
I tried column and less to get this result, but with no luck. If I pipe the text into column, it produces two columns but truncates the text before it reaches the end of the file. And if I pipe the output of column into less, it also reverts back to single-column.
a2ps does what I want in the way of reformatting, but I would rather have the output in pure plain text, readable from the terminal, rather than a PostScript file that I would need to read in a PDF reader.

You can use pr for this, eg.
ls /usr/src/linux/drivers/char/ | pr -2 |less

Related

Cleaning a column with break spaces that obtain last, first name so I can filter it from my data frame

I'm stumped. My issue is that I want to grab specific names from a given column. However, when I try and filter them I get most of the names except for a few, even though I can clearly see their names in the original excel file. I think it has to do what some sort of special characters or spacing in the name column. I am confused on how I can fix this.
I have tried using excels clean() function to apply that to the given column. I have tried working an Alteryx flow to clean the data. All of these steps haven't helped any. I am starting to wonder if this is an r issue.
surveyData %>% filter(`Completed By` == "Spencer,(redbox with whitedot in middle)Amy")
surveyData %>% filter(`Completed By` == "Spencer, Amy")
in r the first line had this redbox with white dot in between the comma and the first name. I got this red box with white dot by copy the name from the data frame and copying it into notepad and then pasting it in r. This actually works and returns what I want. Now the second case is a standard space which doesn't return what I want. So how can I fix this issue by not having to copy a name from the data frame and copy to notepad then copying the results from notepad to r, which has the redbox with a white dot in between the comma(,) and first name.
Expected results is that I get the rows that are attached to what ever name I filter by.
I was able to find the answer, it turns out the space is actually a break space with unicode of (U+00A0) compared to the normal space unicode (U+0020). The break space is not apart of the American Standard Code for Information Interchange(ACSII). Thus r filter() couldn't grab some names because they had break spaces. I fixed this by subbing the Unicode of the break space with the Unicode for a normal space and applying that to my given column. Example below:
space_fix = gsub("\u00A0", " ", surveyData$`Completed By`, fixed = TRUE) #subbing break space unicode with space unicode for the given column I am interested in
surveyData$`Completed By Clean` = space_fix
Once, I applied this I could easily filter any name!
Thanks everyone!

R Reading a badly formatted csv with uneven quotes and separators in fields

I have a badly formatted csv file (I did not make it) that includes both separators and broken quotes in some fields. I would like to read this into R.
Three lines of the table look something like this:
| ids |info | text |
| id 1 |extra_info;1998| text text text |
| id 2 |extra_info2 | text with broken dialogues quotes "hi! |
#the same table in R string could be
string <- "ids;info;text\n\"id 1\";\"extra_info;1998\";\"text text text\"\n\"id 2\";extra_info2;\"text with broken dialogues quotes \"hi!\" \n"
With " quotes surrounding any field with more than one word as is common in csv-s, and semicolon ; used as a separator. Unfortunately the way it was built, the last column (and it is always last), can contain a random number of semicolons or quotes within a text bulk, and these quotes are not always escaped.
I'm looking for a way to read this file. So far I have come up with a really complicated workflow to replace the first N separators with another less used separator when they are in the beginning of line with regex (from here) - because text is always last, however this still fails currently when there is an uneven number of quotes in the line.
I'm thinking there must be an easier way to do this, as badly formed csv-s should be a reoccurring problem here. Thanks.
data.table::fread works wonders:
library(data.table)
test <- fread("test.csv")
# Remove extraneous columns
test$V1 <- NULL
test$V5 <- NULL

R - Exctract multiple tables from text file

I have a .txt file containing text (which I don't want) and 65 tables, as shown below (just the top of the .txt file)
Does anyone know how I can extract only the tables from this text file, such that I can open the resulting .txt file as a data.frame with my 65 tables in R? Above each table is a fixed number of lines (starting with "The result of abcpred on seq..." and ending with "Predicted B cell epitopes") and below each of them is a variable number of lines, depending on how many rows each tables has. Then it comes the next table, and it goes like that until I reach the 65th table.
Given that the tables are the only elements that start with numbers, to grep for integers at the beginning of the line is indeed the best solution. Using the shell (and not R) the command:
grep '^[0-9]' input > output
did exactly what I wanted.

Split text file into paragraph files in R

I'm trying to split a huge .txt file into multiples .txt files containing just one paragraph each.
Let me provide an example. I would need a text like this:
This is the first paragraph. It makes no sense because is just an example.
This a second paragraph, as meaningless as the previous one.
Saved as two independent .txt files containing the first paragraph (the first file) and the second paragraph (the second file).
The first file would have only: "This is the first paragraph. It makes no sense because is just an example."
And the second one: "This a second paragraph, as meaningless as the previous one."
And the same for the whole text. In the huge .txt file paragraphs are divided by one or several empty lines. Ideas?
Thank you very much!
I created a 3 paragraph example and am using your comment here to recreate what I think you're describing.
text <- "This is the first paragraph. It makes no sense because is just an example. Nothing makes sense and I'm trying to understand what I'm doing with life. This paragraph does not seem to end.
What are we doing here.
This a second paragraph, as meaningless as the previous one.
There's too much to do - this is meaningless though.
Wow, that's funny."
paras <- unlist(strsplit(text, "\n\n"))
for (i in 1:length(paras)) {
write.table(paras[i], file = paste0("paragraph", i, ".txt"), row.names = F)
}
This code first assigns the value to the variable text and is followed bu the use of the strsplit function with the argument "\n\n" to split the text at each double newline character.
Then, a for loop is used to go through each element and save it into a separate .txt file.

How do parse a text file for the line after a phrase in r?

I have a large text file with 40,000+ sections of output. Each output section is ~150 lines. I want one number from each section to put in a vector. The section I want to parse is shown as below.
Min ChiSq_Th: ith_cs ith_rk
-1 1
chisq_th chisq_th_min chisq_th_max ftmp_imv fstp_imv
0.149282D+05 0.200268D+05 0.200268D+05 0.100000D+01 0.100000D+00
I need the number below chisq_th in each sweep. I tried taking every 152nd line but not every sweep is exactly the same. I know R is not the ideal platform for this problem but it is the language I know best.

Resources