Controlling the width of R console - r

In the documentation of options function it says, width argument
controls the maximum number of columns on a line used in printing
vectors, matrices and arrays, and when filling by cat.
Columns are normally the same as characters except in East Asian
languages.
What it refers to as columns? It is the number of characters?
If options(width=50) what this 50 means?

Related

R Text Mining: find correlations among words within a certain distance of a keyword

I have been following this example and was wondering if it is possible to draw the figure 4.4 for combinations of words that were within 10 words of the keyword instead of words that are right next to each other. For example, let's say I wanted to know which words were commonly within 10 words of "sir"?
Sorry, my company has disabled copying/pasting text on your website so I can't post the code.
Don't know about the 10 words difference. But one option may be to calculate co-occurrences on sentence-level, for example with the udpipe::cooccurrence function.

Create new row after a combination of numbers and letters

I want to clean a list of Titles that have numbers after them.
an excerpt of the list:
A Adaptive Behavior 1059-7123 1741-2633 Adaptive Human Behavior and Physiology 2198-7335 Addiction 0965-2140 1360-0443
So how can I tell R not to create a new row after a specific string of numbers, but after a this generel form of 'number number number number - number number number number'?

Printing out R Dataframe - Single Character Between Columns While Maintaining Alignment (Variable Spacing)

In a previous question, I received output for an R dataframe that had two aligned columns. The answer gave me the following output:
While the post answered my initial question, it seems as if the program I intend to use requires a text file in which the two columns are both aligned and separated by a single character (e.g. a tab). The previous solution instead results in a large and variable number of spaces between the first and second columns (depending on the length of the string in the first column for that particular row.) Inserting a single character, however, results in a misalignment of the columns.
Is there any way in which I can replace a large number of spaces with a single character that has variable spacing to 'reach' to the second column?
If it helps, this webpage contains a .txt file that you may download to see the intended output (although it does not suffer from the problem with the first column having variable name lengths, it has a single 'space character' that separates the first and second columns. If I 'copy and paste' this specific space character between columns 1 and 2, the program can successfully interpret the .txt file. This copy + paste results in a single character separating the columns and appropriate alignment.)
For further example, the first of the following pictures (note the highlight is a single character) properly parses while the second does not:

How can I filter a wordlist based on multiple characters in shared positions?

I have a very large wordlist. How can I use Unix to find instances of multiple words fitting specific character-sharing criteria? For example, I want Words 1 and 2 to have the same fourth and seventh characters, Words 2 and 3 to have the same fourth and ninth characters, and Words 3 and 4 to have the same second, fourth, and ninth characters.
Example:
aaadiigjlf
abcdefghij
aswdofflle
bbbbbbbbbb
bisofmlwpa
fsbdfopkld
gikfkwpspa
hogkellgis
might return
abcdefghij
aaadiigjlf
fsbdfopkld
aswdofflle
Something like
grep '...d..g' somefile
would only work for specific characters, but I need it to work for any shared characters in certain positions; I don't have specific characters (like "d" and "g" as given in the example) in mind. Also, I'd like it to be able to return words that don't fit ALL of the criteria; e.g. in the example given, Words 1 and 4 share a fourth character, but not necessarily the second, seventh, and ninth. With the program I'm running in its finished form, I'm expecting it to return a very small list of words (probably only ten) based on nine strict character-sharing criteria.
EDIT: Due to some confusion in other forums, I've added this clarification. Here's the problem exactly how I was given it.
I am given a wordlist and told that there are ten ten-letter words in the list that can fit into a grid like so:
-112--3---
---2--3-4-
-5-2----4-
-5-2--6-4-
75-2--6---
75---8----
7----8----
79---8----
-9--0-----
-9--0---xx
Every word reads across. Every space with the same digit (and x) occupying it (all the 1s, all the 2s, etc.) is the same letter (different digits could potentially be the same letter, though not necessarily).

How to prevent R from dropping leading zeros in an integer vector

Is there any way of stopping R from dropping leading zeros in an integer? e.g.,
a<-c(00217,00007,00017)
I understand this is not the correct way of writing integers. Sadly I've been given a text file (person and non-R code are not around anymore) containing thousands of vectors in a single list:
list(drugA=c(...), drugB=c(....),........)
I need to keep the leading zeros as 00002 becomes 2. I could load these thousands of values in and then write a function to parse the list and convert into characters whilst correcting for any number that isn't five characters long but I was hoping for a speedy alternative.
UPDATE1
An example of the text file I've been provided:
list(CETUXIMAB=c(05142,05316),
DORNASEALFA=c(94074),
ETANERCEPT=c(05342,99075),
BIVALIRUDIN=c(04400,09177),
LEUPROLIDE=c(02074,03219,91035,91086),
PEGINTERFERONALFA2A=c(03162),
ALTEPLASE=c(00486,01032,03371,05314),
DARBEPOETINALFA=c(02217,03421),
GOSERELIN=c(99221),
RETEPLASE=c(00157),
ERYTHROPOIETIN=c(92078,92122))
I have truncated the list as there are thousands of vectors. This was a text file generated using a program written in C++ (code not available). Some of the values e.g., RETEPLASE=c(00157) becomes truncated to 157.
library(stringr)
str_pad(a, 5, pad = "0")

Resources