How to read a plain text of digits into array in R - r

For example I have a file containing digits of pi, like this:
> 1415926535 8979323846 2643383279 5028841971 6939937510
> 5820974944 5923078164 0628620899 8628034825 3421170679
> 8214808651 3282306647 0938446095 5058223172 5359408128
> 4811174502 8410270193 8521105559 6446229489 5493038196
> 4428810975 6659334461 2847564823 3786783165 2712019091
I want to do some statistics on the numbers, but I couldn't figure out how to read all digits into an array
Thanks in advance!

You can read the values into a vector using scan like so:
digits <- scan('digits.txt')
Or you can read into a data.frame using read.table:
digits <- read.table('digits.txt', ' ')
To get the digits separated, you can first paste the groups and then split the resulting sequence:
digits <- paste(digits, collapse='')
digits <- as.numeric(strsplit(as.character(digits), '')[[1]])
(This assumes the file is named digits.txt and is placed in your working directory.)

Related

Regular expression to split contents of a bracket: e.g. "Emulsifiers (322, 476)" into "Emulsifier 322" and "Emulsifier 476"?

I'm using R (stringr specifically), and I'm trying to work with food ingredient data. Is there a regular expression that I could used to take a string that consists of a word followed by a bracket and split it into several strings, each of which contains the word + one of the brackets contents?
For example, I may have the following string:
"Emulsifiers (322, 476)"
I want to split it into:
"Emulsifier 322"
"Emulsifier 476"
If they are all that same format (e.g., "text [space] (number, number)"), then this should do it:
myfunc <- function(s) {
word <- trimws(strsplit(s, "\\(|\\)")[[1]][1])
numbers <- strsplit(s, "\\(|\\)")[[1]][2]
numbers <- trimws(strsplit(numbers, "[,\\s]+")[[1]])
return(paste(word, numbers))
}
s <- "Emulsifiers (322, 476)"
mufunc(s)
Assuming your input is s <- "Emulsifiers (322, 476)", then
r <- paste0(gsub("(.*?)\\(.*\\)","\\1",s), unlist(regmatches(s,gregexpr("\\d+",s))))`
will give you
> r
[1] "Emulsifiers 322" "Emulsifiers 476"

How to transform long names into shorter (two-part) names

I have a character vector in which long names are used, which will consist of several words connected by delimiters in the form of a dot.
x <- c("Duschekia.fruticosa..Rupr...Pouzar",
"Betula.nana.L.",
"Salix.glauca.L.",
"Salix.jenisseensis..F..Schmidt..Flod.",
"Vaccinium.minus..Lodd...Worosch")
The length of the names is different. But only the first two words of the entire name are important.
My goal is to get names up to 7 symbols: 3 initial symbols from the first two words and a separator in the form of a "dot" between them.
Very close to my request are these examples, but I do not know how to apply these code variations to my case.
R How to remove characters from long column names in a data frame and
how to append names to " column names" of the output data frame in R?
What should I do to get exit names to look like this?
x <- c("Dus.fru",
"Bet.nan",
"Sal.gla",
"Sal.jen",
"Vac.min")
Any help would be appreciated.
You can do the following:
gsub("(\\w{1,3})[^\\.]*\\.(\\w{1,3}).*", "\\1.\\2", x)
# [1] "Dus.fru" "Bet.nan" "Sal.gla" "Sal.jen" "Vac.min"
First we match up to 3 characters (\\w{1,3}), then ignore anything which is not a dot [^\\.]*, match a dot \\. and then again up to 3 characters (\\w{1,3}). Finally anything, that comes after that .*. We then only use the things in the brackets and separate them with a dot \\1.\\2.
Split on dot, substring 3 characters, then paste back together:
sapply(strsplit(x, ".", fixed = TRUE), function(i){
paste(substr(i[ 1 ], 1, 3), substr(i[ 2], 1, 3), sep = ".")
})
# [1] "Dus.fru" "Bet.nan" "Sal.gla" "Sal.jen" "Vac.min"
Here a less elegant solution than kath's, but a bit more easy to read, if you are not an expert in regex.
# Your data
x <- c("Duschekia.fruticosa..Rupr...Pouzar",
"Betula.nana.L.",
"Salix.glauca.L.",
"Salix.jenisseensis..F..Schmidt..Flod.",
"Vaccinium.minus..Lodd...Worosch")
# A function that takes three characters from first two words and merges them
cleaner_fun <- function(ugly_string) {
words <- strsplit(ugly_string, "\\.")[[1]]
short_words <- substr(words, 1, 3)
new_name <- paste(short_words[1:2], collapse = ".")
return(new_name)
}
# Testing function
sapply(x, cleaner_fun)
[1]"Dus.fru" "Bet.nan" "Sal.gla" "Sal.jen" "Vac.min"

Reading tables with strings containing the separator in R

I got a text file with data I want to read, but one of the columns is a messy "code" which contains the same character used as the separator. Take the following set as an example:
number:string
1:abc?][
2:def:{+
There will be a line with 3 columns and only 2 column names.
Is there any strategy to read this dataset?
Read the file a line at a time, split into two parts on the ":", bind into a data frame. The column names get lost but you can put them back on again easy enough. You need the stringr and readr packages:
> do.call(rbind.data.frame,stringr::str_split(readr::read_lines("seps.csv",skip=1),":",2))
c..1....2.. c..abc.......def.....
1 1 abc?][
2 2 def:{+
Here with stringr and readr attached for readability, with the names fixed:
> library(stringr)
> library(readr)
> d = do.call(rbind.data.frame,str_split(read_lines("seps.csv",skip=1),":",2))
> names(d) = str_split(read_lines("seps.csv",n_max=1),":",2)[[1]]
> d
number string
1 1 abc?][
2 2 def:{+
Good old regular expressions should help you with this
Read txt file
df <- read.table("pathToFile/fileName.txt", header = TRUE)
The data.frame will be one column, so we will need to split it based on some pattern
Create the columns
df$number <- sub("([0-9]+):.*", "\\1", df[, 1])
df$string <- sub("[0-9]+:(.*)", "\\1", df[, 1])
df <- df[, c("number", "string")]
View(df)

Add commas to output

How can obtain my R output (let us say elements of a vector) separated by commas?
(i.e. not by space)
Currently I can only get output separated by space.
Try this example:
#dummy vector
v <- c("a","1","c")
#separated with commas
paste(v,collapse=",")
#output
#[1] "a,1,c"
EDIT 1:
Thanks to #DavidArenburg:
cat(noquote(paste(v,collapse=",")))
a,1,c
EDIT 2:
Another option: by #RichardScriven
cat(v, sep = ",")

How to display numeric columns in an R dataframe without scientific notation ('e+07')

I have an R dataframe with one column containing a stringt of numbers but I would like to treat them as a factor (mainly to stop R shortening the numbers using e+04 etc...). One way I have found to fix this problem is to edit the csv file the data is taken from, and add a dummy entry that has a word in the desired column and then reimporting it. How do I get this effect using R functions without messing around with the csv?
To clarify, my dataframe looks like this:
pNum,Condition,numberEntered
1,2,5.0970304e+07
I want to change the data type of numberEntered from numeric to factor and get rid of the pesky e+07.
As Joshua said, it is a printing issue not a storage issue. You can change the way all numbers are printed (=by adjusting getOption("scipen").
x <- c(1, 2, 509703045845, 0.0001)
print(x)
options(scipen = 50)
print(x)
Alternatively, you may wish to change the way just those numbers are formatted. (This converts them to character.) It is worth getting to know format and formatC. To get you started, compare
format(x)
format(x, digits = 10)
format(x, digits = 3)
format(x, digits = 3, scientific = 5)
format(x, trim = TRUE, digits = 3, scientific = 5)
formatC(x)
formatC(x, format = "fg")
formatC(x, format = "fg", flag = "+")
Sorry to say, but you've been spending time trying to fix a problem that doesn't exist. Use str to check the types of data in your data.frame and you'll see that numberEntered is num and it isn't being "shortened". The only issue is the number of significant digits being printed.
options(digits=7)
(x <- data.frame(pNum=1,Condition=2,numberEntered=509703045845))
options(digits=10)
x
You can use options(digits=22) to set it to print the maximum number of significant digits. See ?options for more information.
I would advise against storing floating-point numbers as factors... but you can still do it. But I have also included several other options.
> txt <- "pNum,Condition,numberEntered
+ 1,2,5.0970304e+07"
> dat <- read.csv(textConnection(txt),colClasses=c("integer","integer","factor"))
> dat
pNum Condition numberEntered
1 1 2 5.0970304e+07
> dat[,3]
[1] 5.0970304e+07
Levels: 5.0970304e+07
> dat <- read.csv(textConnection(txt),colClasses=c("integer","integer","character"))
> dat[,3]
[1] "5.0970304e+07"
> dat <- read.csv(textConnection(txt),colClasses=c("integer","integer","numeric"))
> dat[,3]
[1] 50970304
> print.numeric <- function(...) formatC(...,format="f")
> print(dat[,3])
[1] "50970304.0000"

Resources