How to get access to "str_match_all" results in R? - r

Just used "str_match_all" as follows:
a <- str_match_all(dd, '\\d+(\\w+)')`
and obtained the following:
#[[1]]
# [,1] [,2]
#[1,] "12hours" "hours"
#[2,] "23days" "days"
How can I access each string?
I have tried a[1][,1] to access the first column for example but I get an error saying the number of dimensions is not correct.

If I understand your problem correctly, you are having trouble accessing each individual element.
I think you have to remember that your output is a list and the element in that list is a matrix. Therefore to access each individual element you first have to invoke which element of the list you are interested in and then the row and then the column.
a[[1]][1,2]
So in your case, this will access the first element in your list (looks like you only have 1), and then the 1st row and then the 2nd column so it will give you, "hours".
If however, you're more used to working with dataframes as I assume that is your end goal, I would approach this programmatically as follows:
Taking an example from the str_match_all() documentation
# Creating reproduceable example
strings <- c("Home: 219 733 8965. Work: 229-293-8753 ",
"banana pear apple", "595 794 7569 / 387 287 6718")
phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})"
Your goal is to convert the matrix in to a data frame. Which you do as follows
as.data.frame(a[[1]])
For future reference, lets say your output is more than 1 element as is the case in this example, you should approach the solution like so:
# Make a function that accepts your list variable.
# Copy and paste the step before and then add an extra step using dplyr::bind_rows()
output_to_df <- function(x){
a <- as.data.frame(x)
bind_rows(a)
}
# Using this function we will then use map_dfr()
# so that we can apply our premade function on all elements
# of our list no matter how many elements it contains
str_output <- map_df(a, output_to_df)
You can now reuse your output_to_df() function as many times as you need.

Related

Dynamically change part of variable name in R

I am trying to automatise some post-hoc analysis, but I will try to explain myself with a metaphor that I believe will illustrate what I am trying to do.
Suppose I have a list of strings in two lists, in the first one I have a list of names and in the other a list of adjectives:
list1 <- c("apt", "farm", "basement", "lodge")
list2 <- c("tiny", "noisy")
Let's suppose also I have a data frame with a bunch of data that I have named something like this as they are the results of some previous linear analysis.
> head(df)
qt[apt_tiny,Intercept] qt[apt_noisy,Intercept] qt[farm_tiny,Intercept]
1 4.196321 -0.4477012 -1.0822793
2 3.231220 -0.4237787 -1.1433449
3 2.304687 -0.3149331 -0.9245896
4 2.768691 -0.1537728 -0.9925387
5 3.771648 -0.1109647 -0.9298861
6 3.370368 -0.2579591 -1.0849262
and so on...
Now, what I am trying to do is make some automatic operations where the strings in the previous lists dynamically change as they go in a for loop. I have made a list with all the distinct combinations and called it distinct. Now I am trying to do something like this:
for (i in 1:nrow(distinct)){
var1[[i]] <- list1[[i]]
var2[[i]] <- list2[[i]]
#this being the insertable name part for the rest of the variables and parts of variable,
#i'll put it inside %var[[i]]% for the sake of the explanation.
%var1[[i]]%_%var2[[i]]%_INT <- df$`qt[%var1[[i]]%_%var2[[i]]%,Intercept]`+ df$`qt[%var1[[i]]%,Intercept]`
}
The difficult thing for me here is %var1[[i]]% is at the same time inside a variable and as the name of a column inside a data frame.
Any help would be much appreciated.
You cannot use $ to extract column values with a character variable. So df$`qt[%var1[[i]]%_%var2[[i]]%,Intercept] will not work.
Create the name of the column using sprintf and use [[ to extract it. For example to construct "qt[apt_tiny,Intercept]" as column name you can do :
i <- 1
sprintf('qt[%s_%s,Intercept]', list1[i], list2[i])
#[1] "qt[apt_tiny,Intercept]"
Now use [[ to subset that column from df
df[[sprintf('qt[%s_%s,Intercept]', list1[i], list2[i])]]
You can do the same for other columns.

Using Loop variable to access and write specific data.frames

I wrote a script, that reads CSV-Data with help of user input. For example when the user enters "20 40 160" the CSV files 1, 2 and 3 are read and saved as the data.frames d20, d40 and d160 in my global enviroment/workspace. The variable vel has the values for the user input.
Now for the actual question:
Im trying to manipulate the read data in a loop with the vel variable. For example:
for (i in vel)
{
newVariable"i" <- d"i"[6]
}
I know thats not the correct syntax for the programming, but what im trying to do ist to write a newVariable with a specific row from a specific data frame d.
The result should be:
newVariable20 = d20[20]
newVariable40 = d40[20]
newVariable160 = d160[20]
So I think the actual question is, how do I use the Loop Variable for calling out the names of the created data frames and for writing new variables.
There are a couple of ways to do this. One is to store all of your dataframes in a list originally. There are a couple ways to do this. Start with an empty list and then put each df into the next position in the list. Note that you have to use list(df) because a dataframe is actually already a list and gets messed up if you don't do this.
list_of_df <- list();
list_of_df[1] <- list(df1);
list_of_df["df20"] <- list(df2)
This makes it easy to loop through the dataframes. If you want column 4 of dataframe 2 you just put in
list_of_df[[2]][,4]
# Same thing different code
list_of_df[["df20"]][,4]
The double brackets [[2]] give you the value that is stored in the list at position 2 (instead of [2] which gives you a list containing the value and metadata). The next [,4] says that from the dataframe we just got the value of, we now want to get every row of the 4th column. Note that this will output a vector and not a dataframe.
Or in a loop:
for(df in list_of_df) {
print(df)
}

Using paste function within colnames

I want to use iteration to turn the entries in a list into a 2x2 matrix, and then assign the same column and row names to these tables, as well as integer values for the matrix cells.
For examples sake let's pretend this is the list with the entries whose names I want to turn into matrices:
cnames <- c("Honda", "Toyota", "Nissan")
Creating the tables themselves seem to work fine with the assign function:
for (i in 1:length(cnames)){
assign(paste(cnames[i],"table",sep="_"), matrix(,nrow=2,ncol=2))
}
Which when I type, for instance:
> Honda_table
...returns:
[,1] [,2]
[1,] NA NA
[2,] NA NA
But if in the original iterative function I try to assign column names, like such:
for (i in 1:length(cnames)){
assign(paste(cnames[i],"table",sep="_"), matrix(,nrow=2,ncol=2))
colnames(paste(cnames[i],"table",sep="_")) <- c("A","B")
}
...I get this error instead:
Error : attempt to set 'colnames' on an object with less than two dimensions
I don't understand why this is coming up, since after using the original assign function, if I look up the dimensions any of the tables, such as:
>dim(honda_table)
...I get:
[1] 2 2
Which indicates it is a 2x2 dimensional object.
Moreover, I cannot assign pre-set values to the matrix cells, like so:
for (i in 1:length(cnames)){
assign(paste(cnames[i],"table",sep="_"), matrix(,nrow=2,ncol=2))
paste(cnames[i],"table",sep="_")[1,1] = 1
}
...without getting this error:
Error : incorrect number of subscripts on matrix
What is going on here?
Thanks.
I am not sure it is the best, and the most beautiful, way but seems to work:
for (i in 1:length(cnames)){
tab<- matrix(,nrow=2,ncol=2)
colnames(tab)<- c("A","B")
assign(paste(cnames[i],"table",sep="_"), tab)
}
rm(tab)
After much suggestion I ended up scraping the assign function and simply created a vector of tables instead

Creating multiple matrices with a "for" loop

I am currently in a statistics class working on multivariate clustering and classification. For our homework we are trying to use a 10 fold cross validation to test how accurate different classification methods are on a 6 variable data set with three classifications. I was hoping I could get some help on creating a for loop (or something else which would be better that I don't know about) to create and run 10 classifications and validations so I don't have to repeat myself 10 times on everything.... Here is what I have. It will run but the first two matrices only show the first variable. Because of this, I have not been able to troubleshoot the other parts.
index<-sample(1:10,90,rep=TRUE)
table(index)
training=NULL
leave=NULL
Trfootball=NULL
football.pred=NULL
for(i in 1:10){
training[i]<-football[index!=i,]
leave[i]<-football[index==i,]
Trfootball[i]<-rpart(V1~., data=training[i], method="class")
football.pred[i]<- predict(Trfootball[i], leave[i], type="class")
table(Actual=leave[i]$"V1", classfied=football.pred[i])}
Removing the "[i]" and replacing them with 1:10 individually works right now....
Your problem lies is the assignment of a data.frame or matrix to a vector that you initially set as NULL (training and leave). A way to think about it is, you are trying to squeeze in a whole matrix into an element that can only take a single number. That's why R has a problem with your code. You need to initialise training and leave to something that can handle your iterative agglomeration of values (the R object list as #akrun points out).
The following example should give you a feel for what is happening and what you can do to fix your problem:
a<-NULL # your set up at the moment
print(a) # NULL as expected
# your football data is either data.frame or matrix
# try assigning those objects to the first element of a:
a[1]<-data.frame(1:10,11:20) # no good
a[1]<-matrix(1:10,nrow=2) # no good either
print(a)
## create "a" upfront, instead of an empty object
# what you need:
a<-vector(mode="list",length=10)
print(a) # empty list with 10 locations
## to assign and extract elements out of a list, use the "[[" double brackets
a[[1]]<-data.frame(1:10,11:20)
#access data.frame in "a"
a[1] ## no good
a[[1]] ## what you need to extract the first element of the list
## how does it look when you add an extra element?
a[[2]]<-matrix(1:10,nrow=2)
print(a)

Creating a new nested list element that is a combination of two existing nested list elements (in R)

I am looking for a hint about how to create a new nested list element from two existing nested list elements. In the current form of the script I am working on, I create a list called tardis that is n elements long, based on the number of elements in an input list. In the example blow, that input list, dataLayers, is 2 elements long.
After creating tardis, the script populates it by reading in data from 1200 netCDF files. Each of the 12 elements in 'mean' and 'sd' in tardis are matrices of geographic data, tardis[['data']][[decade]][['mean']][[month]], for example, for the 12 calendar months. When the list is fully populated I would like to create some derived variables. For example, in the snippet below, I would like to create a variable TOTALPRECIP by adding SNOW and RAIN. In doing this, I would like to create TOTALPRECIP from SNOW + RAIN as a third list element in tardis with the exact nested structure as the other two elements (adding them together and preserving the structure).
Is this possible with apply or its related functions?
begin <- 1901
end <- 1991
dataLayers <- c("SNOW","RAIN")
tardis<-list()
for (i in 1:length(dataLayers)){
tardis[[dataLayers[i]]]<-list('longName'='timeLord','units'='theDr','data'=list())
for (j in seq(begin,end,10)){
tardis[[dataLayers[[i]]]][['data']][[as.character(j)]]<-list('mean'=vector(mode='list',length=12),'sd'=vector(mode='list',length=12))
}
}
#add SNOW AND RAIN
print(names(tardis))
>[1] "SNOW" "RAIN" "TOTALPRECIP"
Here are your for loops using replicate (Note that the expression value for each replicate is the same expression you have in the assignment portion of your for loop)
## This is your inner for-loop, using replicate
inds <- seq(begin, end, 10)
datas <- replicate(length(inds), list('mean'=vector(mode='list',length=12),'sd'=vector(mode='list',length=12))
, simplify=FALSE)
names(datas) <- inds
# This is your outer loop
tardis2 <- replicate(length(dataLayers), list('longName'='timeLord','units'='theDr','data'=datas)
, simplify=FALSE)
names(tardis2) <- dataLayers
# Compare Results
identical(tardis2, tardis)
# [1] TRUE
However, I'm not sure if lists are relaly the best structure for this. Have you considered data.frames?

Resources