formatting, Sprintf in R - r

I am very new to this site and I am having an issue using sprintf in R. Briefly what I am trying to do is the following:
I need to create a text file with a header (which is space delimited and has to maintain that particular space), below which I need to copy some numbers made of X rows (depending on the data, I will read a big table made of thousands of line, but for each ID there will be a variable number of rows; this is not a major problem as I can loop through them). My problem is that i cannot align the numbers below the header.
setwd("C:\\Example\\formatting")
My data are in a CSV format so I read:
s100 = read.csv("example.csv", header=T)
Then I take the columns I am interested in and transform it in this way:
SID1 = as.vector(as.matrix(s100$Row1))
SID2 = as.vector(as.matrix(s100$Row2))
SID3 = as.vector(as.matrix(s100$Row3))
SIDN = as.vector(as.matrix(s100$RowN))
Then I have the following (do not worry about the letters, that part up to a certain point is really easy, I got stuck at the end when I need to read the SID:
sink("Example.xxx", append = T)
cat("*Some description goes here\n")
This goes on an on until I need to put the numbers down. So, when I arrive to this piece:
cat("# SAB SSD TAR.....", sep = "\n")
I now need to have aligned the numbers under SAB, SSD, TAR... and so on.
So, now I do the following ( I only tried using one column and one header first):
cat("SAB ", sep = "\n")
cat(sprintf("%s\n", SID1, sep="\n" ))
But, what I get in the end is the following:
SAB
0.30
0.40
0.50
Instead of
SAB SSD TAR
0.30 0.40 10
0.40 0.80 40
0.50 0.90 00
.... .... ...
So my two questions are:
How to solve the above problem?
Since at the beginning of that header I have a "#" spaced before the "SAB" how do I align all my numbers accordingly?
I hope I have been clear and not messy, it seems as a simple solution but my knowledge of R and programming go only up to a certain point.
Thank you in advance for any help!

I think the problem is your call to sprintf. SID1 provides a vector of values but you only have one position in your string that accepts input. How about replacing the last line with
cat(paste(SID1, collapse="\n"))
EDIT/UPDATE:
Here's an example that might work (xx represents your SID data that have been combined into a matrix or data frame)
library(MASS)
xx <- matrix(rnorm(100),10)
write.matrix(round(xx, 2))

Related

R read_delim() changes values when reading data

I am trying to read in a tabstop seperated csv file using read_delim(). For some reason the function seems to change some field entries in integer values:
# Here some example data
# This should have 3 columns and 1 row
file_string = c("Tage\tID\tVISITS\n19.02.01\t2163994407707046646\t40")
# reading that data using read_delim()
data = read_delim(file_string, delim = "\t")
view(data)
data$ID
2163994407707046656 # This should be 2163994407707046646
I totally do not understand what is happening here. If I chnage the col type to character the entry stays the same. Does anyone has an explanation for this?
Happy about any help!
Your number has so many digits, that it does not fit into the R object. According to the specification IEEE 754, the precision of double is 53 bits which is approx. a number with 15 decimal digits. You reach that limit using as.double("2163994407707046646").

Remove Everything Except Specific Words From Text

I'm working with twitter data using R. I have a large data frame where I need to remove everything from the text except from specific information. Specifically, I want to remove everything except from statistical information. So basically, I want to keep numbers as well as words such as "half", "quarter", "third". Also is there a way to also keep symbols such as "£", "%", "$"?
I have been using "gsub" to try and do this:
df$text <- as.numeric(gsub(".*?([0-9]+).*", "\\1", df$text))
This code removes everything except from numbers, however information regarding any words was gone. I'm struggling to figure out how I would be able to keep specific words within the text as well as the numbers.
Here's a mock data frame:
text <- c("here is some text with stuff inside that i dont need but also some that i do, here is a word half and quarter also 99 is too old for lego", "heres another one with numbers 132 1244 5950 303 2022 and one and a half", "plz help me with code i am struggling")
df <- data.frame(text)
I would like to be be able to end up with data frame outputting:
Also, I've included a N/A table in the picture because some of my observations will have neither a number or the specific words. The goal of this code is really just to be able to say that these observations contain some form of statistical language and these other observations do not.
Any help would be massively appreciate and I'll do my best to answer any Q's!
I am sure there is a more elegant solution, but I believe this will accomplish what you want!
df$newstrings <- unlist(lapply(regmatches(df$text, gregexpr("half|quarter|third|[[:digit:]]+", df$text)), function(x) paste(x, collapse = "")))
df$newstrings[df$newstrings == ""] <- NA
> df$newstrings
# [1] "halfquarter99" "132124459503032022half" NA
You can capture what you need to keep and then match and consume any character to replace with a backreference to the group value:
text <- c("here is some text with stuff inside that i dont need but also some that i do, here is a word half and quarter also 99 is too old for lego", "heres another one with numbers 132 1244 5950 303 2022 and one and a half", "plz help me with code i am struggling")
gsub("(half|quarter|third|\\d+)|.", "\\1", text)
See the regex demo. Details:
(half|quarter|third|\d+) - a half, quarter or third word, or one or more digits
| - or
. - any single char.
The \1 in the replacement pattern puts the captured vaue back into the resulting string.
Output:
[1] "halfquarter99" "132124459503032022half" ""

Splitting a column in a dataframe in R into two based on content

I have a column in a R dataframe that holds a product weight i.e. 20 kg but it has mixed measuring systems i.e. 1 lbs & 2 kg etc. I want to separate the value from the measurement and put them in separate columns then convert them in a new column to a standard weight. Any thoughts on how I might achieve that? Thanks in advance.
Assume you have the column given as
x <- c("20 kg","50 lbs","1.5 kg","0.02 lbs")
and you know that there is always a space between the number and the measurement. Then you can split this up at the space-character, e.g. via
splitted <- strsplit(x," ")
This results in a list of vectors of length two, where the first is the number and the second is the measurement.
Now grab the numbers and convert them via
numbers <- as.numeric(sapply(splitted,"[[",1))
and grab the units via
units <- sapply(splitted,"[[",2)
Now you can put everything together in a `data.frame.
Note: When using as.numeric, the decimal point has to be a dot. If you have commas instead, you need to replace them by a dot, for example via gsub(",","\\.",...).
separate(DataFrame, VariableName, into = c("Value", "Metric"), sep = " ")
My case was simple enough that I could get away with just one space separator but I learned you can also use a regular expression here for more complex separator considerations.

Trying to convert .txt into Excel using R, issues with irregular spacing as "delimiter"

I'm a fairly basic user and I'm having issues uploading a .txt file in a neat manner to get a Excel-like table output using R.
My main issue stems from the fact that the "columns" in the .txt file are created by using a varying amount of spaces. So for example (periods representing spaces, imagining that the info lines up together):
Mister B Smith....Age 35.....Brooklyn
Mrs Smith.........Age 33.....Brooklyn
Child Smith.......Age 8......Brooklyn
Other Child Smith.Age 1......Brooklyn
Grandma Smith.....Age 829....Brooklyn
And there are hundreds of thousands of these rows, all with different spaces that line up to make "columns." Any idea on how I should go about inputting the data?
It appears as your your file is not delimited at all, but in a fixed width format. You focused on the number of spaces when really it seems like the data have varying number of characters in fields of the same fixed width. You'll need to verify this. But the first "column" seems to be exactly 19 characters long. Then comes the string Age (with a space at the end) and then a 7 character column with the age. Then a final column and it's not clear at all how long it might be.
Of course this could be me overfitting to this small snippet. Check if I have guessed correctly. If I have, you can use the base function read.fwf for files like this. Let's say the file name is foo.txt and you want to call the result my_foo. The Age column is redundant, so let's skip it. And let's say the final column actually has 8 characters (the number of characters in Brooklyn but you'll need to check this)
my_foo <- read.fwf("foo.txt", c(19, -4, 7, 8))
might get you what you want. See ?read.fwf for details.
If the deliminator is always a number of spaces you can read in your .txt file and split each line into a vector using a regex that looks for more than one space:
x <- c("Mister B Smith Age 35 Brooklyn",
"Mrs Smith Age 33 Brooklyn")
stringr::str_split(x, " {2,}")
[[1]]
[1] "Mister B Smith" "Age 35" "Brooklyn"
[[2]]
[1] "Mrs Smith" "Age 33" "Brooklyn"
The only problem you might run into with this approach is if, due to the length of one field, there is only one space between fields (for example: "Mister B Smithees Age 35 Brooklyn"). In this case, #ngm's approach is the only possible option.

Divide column values within a vector

I'm not sure if my title is properly expressing what I'm asking. Once I'm done writing, it'll make sense. Firstly, I just started learning R, so I am a newbie. I've been reading through tutorial series and PDF's I've found online.
I'm working on a data set and I created a data frame of just the year 2001 and the DAM value Bon. Here's a picture.
What I want to do now is create a matrix with 3 columns: Coho Adults, Coho Jacks and the third column the ratio of Coho Jacks to Adults. This is what I'm having trouble with. The ratio between Coho Jacks to Adults.
If I do a line of code like this I get a normal output.
(cohoPassage <- matrix(fishPassage1995BON[c(5,6, 7)], ncol = 3))
The values are 259756, 6780 114934.
I'm figuring in order to get the ratio, I should divide column 5 and column 6's values. So basically 259756/6780 = 38.31
I've tried many things like:
(cohoPassage <- matrix(fishPassage1995BON[c(5,6, 5/6)], ncol = 3))
This just outputs the value of the fifth column instead of dividing for some reason
I've tried this:
matrix(fishPassage1995BON[c(5,6)],fishPassage1995BON[,5]/fishPassage1995BON[,6], ncol = 3)
Which gives me an incorrect output
I decided to break down the problem and divide the fifth and sixth columns separately and it gave the correct ratio.
If I create a matrix like this
matrix(fishPassage1995BON[,5]/fishPassage1995BON[,6])
It outputs the correct ratio of 38.31209. But when I try to combine everything, I just keep getting errors.
What can I do? Any help would be appreciated. Thank you.

Resources