This question already has answers here:
how to replace nth character of a string in a column in r
(3 answers)
Closed 2 years ago.
For context, I am writing a code in R that selects out the most common character from a list of strings - determining the most common character in the first position of each string, and so on. To start I am running a loop within a loop to save each character to a list for use later.
I am trying to use the head function to select out each character along the string, which of course is giving me the first character, first two characters, and so on when what I want is the first, second, third, etc. character to be saved to the list.
Here is my code so far:
Store <- list()
for (j in (1:SequenceNumber)){
SequenceLength <- length(Sequences[[j]])
for (i in (1:SequenceLength)){
Store[[length(Store)+1]] <- head(Sequences[[j]], n=i)
}
}
So in summary, I am wondering what (probably extremely simple) solution there might be to select the nth element only within a loop using R.
I have tried looking around for a solution, but can only find results selecting out a specified range (for example, the first five results), instead of the nth result.
To get the Nth letter in a string use substring. For example, the 5th letter in Chicago:
> substring("Chicago", 5, 5)
[1] "a"
Related
I want to learn how to access data from a nested list in R. I am relatively new to the R programming language, so I am unsure how to proceed.
The data is a 'large list(947 elements, 654.9mb) and takes the form:
The numbers within the datalist refer to station numbers and when I click on one (in Rstudio) it looks like this:
I want to kow how I can access the data within 'doy' for example. I have tried:
data[[1]]
which returns all the data for the first element of the list (site, location, doy,ltm etc). So clearly the number used within the square brackets is interpreted as an index for the list, as opposed to an identifier for the elements/station in the list.
Then I tried:
data$1
but it returned the error:
Error: unexpected numeric constant in "data$1"
Then I tried:
data[data$1==doy]
But was returned this:
Error: unexpected numeric constant in "data[data$1"
So at this point, I realise that it is not construing the number of the station as a category/factor within the list. It's just reading it as a number. So I thought I'd put some quotes around it to see if that changed what happened:
data[data$"1"=="doy"]
This returned
named list()
But when I looked at it in the environment, it was a list of 0.
I looked at some of the similar question here on Stack (like: accessing nested lists in R) and tried:
data[data$"1"=="doy",][[1]]
But just got:
Error in data[data$"1" == "doy", ] : incorrect number of dimensions
How can I access this data? It reminds me of a structure in Matlab, but it doesn't seem to be indexed in a similar fashion in R.
Let's look at some ways to do what you want:
data[[1]]
This returns the first element of the list, which is itself a list. You can use the $ subsetting shorthand, but the name of the first element is nonstandard. R prefers names that start with letters and include only alphanumeric characters, periods and underscores. You can escape this behavior with backticks:
data$`1`
If you want to access one of the elements of list 1 in your list of lists, you need to further subset. To get to doy, which is the third element of 1. You can do that four ways.
data[[1]][[3]]
data$`1`[[3]]
data[[1]]$doy
data$`1`$doy
One way (in addition to what Ben Norris has shown):
our_list[[c("1", "doy")]]
Reproducible example data (please provide next time)
our_list <- list(`1` = list(site = "x", doy = 3))
I have different/non-repeating years at the beginning of a column (Details) in my data frame (Data) that I would like to split off and create a "Years" Column with. Based on other questions that have been answered, I am assuming I would use tstrsplit. However, I don't understand how to actually use the function to get it to do what I want.
When you want characters from a fixed position within the string (like the first 4 characters) substr is quick and easy.
Data$Years = substr(Data$Details, 1, 4)
strsplit needs something to split at, so if you had "2020-04-02" you could split at "-" characters. But if you know you want the first four, substr is best.
This question already has answers here:
Double quotes not escaped in R
(1 answer)
Get indices of all character elements matches in string in R
(1 answer)
Closed 5 years ago.
I want to do two thing:
1) I want to create a character string with a double quote inside. An example in R would look like follows:
x <- 'vjghvbh"kljnj"kjbn"jk'
[1] "vjghvbh\"kljnj\"kjbn\"jk"
Question 1: How could I create such a character string without the backslash inside?
I tried to use gsub(), but unfortunately that didn't work. I also found some sources, which suggested cat(), but that just prints my character, but does not store it in x.
2) Let's assume that I solved Question 1. Then my character would look like follows:
[1] "vjghvbh"kljnj"kjbn"jk"
Now I need to find the positions of the double quotes. Based on this thread I tried gregexpr(). However, this also did not work, since I was not able to specify the pattern.
Question 2: How could I find the position of the double quotes within my character string?
The result in R should look like this:
[1] 8 14 19
This question already has answers here:
Matching multiple patterns
(6 answers)
Closed 7 years ago.
I am trying to understand how R deals with string manipulation and comparisons.
To this end I have set up two data frames, one which is my raw data and the other which is my reference data to which I would like to compare. I'm trying to understand the different ways of comparing strings and how to compare data frames in general (it seems far easier in SQL where you can just use the key word contains).
For the example below, the first item is the reference data and the second is the raw data.
grepl ("1845","UN1845")
Will return TRUE
any ("1845"=="UN1845")
Will return FALSE (I assume here because the word has to match fully)
is.element ("1845","UN1845")
Will return FALSE (same reason as the the any)
If I wanted to check the entire data reference table against each and every item in the raw table, how would I go about this?
From playing around I could do something like
grepl(Raw$Contents, Ref$desc)
Where the Raw data is basically strings and the ref data is strings. However when I run something like this, I get the message:
In grepl(Raw$Contents, MyCode$desc)
argument 'pattern' has length > 1 and only the first element will be used
I assume this is related to the fact that the table size for the reference table is different to the table I'm running comparisons against.
Sample data:
rawdata = data.frame(A=c("UN1845","FROZEN FOOD DRY ICE","LTD QTY8000"))
refdata = data.frame(A=c("1845","8000"))
The errror message means: your pattern argument has more than one element, but grepl and its family only accept one pattern at a time. You will have to loop (or *apply) over each pattern in your refdata collection.
EDIT: to clarify: grepl only accepts one pattern, but if that pattern contains the complete search set, e.g. via the OR operator, grepl will function as desired. thanks to David Arenburg for his comments.
Trying to learn some R after doing mostly Haskell for rather a long time I got stuck on a problem I would usually have using unzip1 and map.
I have a sequence of strings, each containing two substrings separated by an underscore. I want to "unzip" this sequence into something like a data frame or a matrix, where the first column is the sequence of all the first substrings and the second column the sequence of all the second substrings.
Is there any analogue to unzip in R, and would it be considered ideomatic to use it here, or am I approaching this from alltogether the wrong direction?
[1] Given a list (or more generally any kind of sequence) of pairs unzip produces a pair of lists, in the obvious way.
You're on the right track. You want strsplit
vec <- paste(letters,letters[26:1],sep='_')
out <- strsplit(vec,'_')
thats a list.. and sapply will get the vectors out.
data.frame(one = sapply(out,'[',1), two = sapply(out,'[',2))