Concatenating string names of variables matlabfile in R - r

I have matlab files with an integer in each name of my variables inside (except the first one). I want to loop to concatenate the name of the integers.
There is my code:
library('R.matlab')
mat <- readMat('SeriesContPJM.mat')
#str(mat)
#typeof(mat)
#mat[[1]]
write.csv(mat$vol.PJM$data[[4]][[1]], "PJM.csv")
i = 2
while (i < 7)
{
write.csv(get(paste("mat$vol.PJM", as.character(i), "$data[[4]][[1]]", sep = "")), paste(paste("PJM", as.character(i), sep="_"), "csv", sep ="."))
i = i + 1
}
I have write.csv(mat$vol.PJM$data[[4]][[1]], "PJM.csv") which gives me the good ouput. I would like the same for the other variable names in the loop but I get the following ouput:
+ Error in get(paste("mat$vol.PJM", as.character(i), "$data[[4]][[1]]", (from importpjm.R#10) :
objet 'mat$vol.PJM2$data[[4]][[1]]' introuvable
"introuvable" means "not found" in French.

Here you're mixing where you need to use get and where you need to use eval(parse()).
You can use get with a string variable game, e.g., get("mtcars"), but [ is a function that needs to be evaluated.
get("mtcars[2, 2]") won't work because you don't have a variable named "mtcars[2, 2]", you have a variable named "mtcars" and a function named "[" that can take arguments 2, 2.
eval(parse(text = "mtcars[2, 2]")) will work because it doesn't just look for a variable, it actually evaluates the text string as if you typed it into the command line.
So, you could rewrite your loop, replacing the get(...) with eval(parse(text = ...)) and it would probably work, assuming the strings you've pasted together have the right syntax. But this can be difficult to read and debug. In 6 months, if you look back at this code and need to understand it or modify it, it will be confusing.
Another way to do it would be to use [[ with strings to extract sublists. Rather than mess with eval(parse()) I would do this:
vols = paste0("vol.PJM", 2:7)
for (vol in vols) {
write.csv(mat[[vol]][["data"]][[4]][[1]],
paste0(vol, ".csv"))
}
I think it's more readable, and it's easy to debug the vols vector beforehand to make sure all your names are correct. Similarly, if you want to loop over all elements, you could initialize the vols as something like names(mat), or use some sort of regex criteria to extract the appropriate sublists from names(mat).

Related

Convert R list to Pythonic list and output as a txt file

I'm trying to convert these lists like Python's list. I've used these codes
library(GenomicRanges)
library(data.table)
library(Repitools)
pcs_by_tile<-lapply(as.list(1:length(tiled_chr)) , function(x){
obj<-tileSplit[[as.character(x)]]
if(is.null(obj)){
return(0)
} else {
runs<-filtered_identical_seqs.gr[obj]
df <- annoGR2DF(runs)
score = split(df[,c("start","end")], 1:nrow(df[,c("start","end")]))
#print(score)
return(score)
}
})
dt_text <- unlist(lapply(tiled_chr$score, paste, collapse=","))
writeLines(tiled_chr, paste0("x.txt"))
The following line of code iterates through each row of the DataFrame (only 2 columns) and splits them into the list. However, its output is different from what I desired.
score = split(df[,c("start","end")], 1:nrow(df[,c("start","end")]))
But I wanted the following kinda output:
[20350, 20355], [20357, 20359], [20361, 20362], ........
If I understand your question correctly, using as.tuple from the package 'sets' might help. Here's what the code might look like
library(sets)
score = split(df[,c("start","end")], 1:nrow(df[,c("start","end")]))
....
df_text = unlist(lapply(score, as.tuple),recursive = F)
This will return a list of tuples (and zeroes) that look more like what you are looking for. You can filter out the zeroes by checking the type of each element in the resulting list and removing the ones that match the type. For example, you could do something like this
df_text_trimmed <- df_text[!lapply(df_text, is.double)]
to get rid of all your zeroes
Edit: Now that I think about it, you probably don't even need to convert your dataframes to tuples if you don't want to. You just need to make sure to include the 'recursive = F' option when you unlist things to get a list of 0s and dataframes containing the numbers you want.

How to find common variables in a list of datasets & reshape them in R?

setwd("C:\\Users\\DATA")
temp = list.files(pattern="*.dta")
for (i in 1:length(temp)) assign(temp[i], read.dta13(temp[i], nonint.factors = TRUE))
grep(pattern="_m", temp, value=TRUE)
Here I create a list of my datasets and read them into R, I then attempt to use grep in order to find all variable names with pattern _m, obviously this doesn't work because this simply returns all filenames with pattern _m. So essentially what I want, is my code to loop through the list of databases, find variables ending with _m, and return a list of databases that contain these variables.
Now I'm quite unsure how to do this, I'm quite new to coding and R.
Apart from needing to know in which databases these variables are, I also need to be able to make changes (reshape them) to these variables.
First, assign will not work as you think, because it expects a string (or character, as they are called in R). It will use the first element as the variable (see here for more info).
What you can do depends on the structure of your data. read.dta13 will load each file as a data.frame.
If you look for column names, you can do something like that:
myList <- character()
for (i in 1:length(temp)) {
# save the content of your file in a data frame
df <- read.dta13(temp[i], nonint.factors = TRUE))
# identify the names of the columns matching your pattern
varMatch <- grep(pattern="_m", colnames(df), value=TRUE)
# check if at least one of the columns match the pattern
if (length(varMatch)) {
myList <- c(myList, temp[i]) # save the name if match
}
}
If you look for the content of a column, you can have a look at the dplyr package, which is very useful when it comes to data frames manipulation.
A good introduction to dplyr is available in the package vignette here.
Note that in R, appending to a vector can become very slow (see this SO question for more details).
Here is one way to figure out which files have variables with names ending in "_m":
# setup
setwd("C:\\Users\\DATA")
temp = list.files(pattern="*.dta")
# logical vector to be filled in
inFileVec <- logical(length(temp))
# loop through each file
for (i in 1:length(temp)) {
# read file
fileTemp <- read.dta13(temp[i], nonint.factors = TRUE)
# fill in vector with TRUE if any variable ends in "_m"
inFileVec[i] <- any(grepl("_m$", names(fileTemp)))
}
In the final line, names returns the variable names, grepl returns a logical vector for whether each variable name matches the pattern, and any returns a logical vector of length 1 indicating whether or not at least one TRUE was returned from grepl.
# print out these file names
temp[inFileVec]

Pass sequentially named variables to functions within a loop in R

I have created some sequentially named variable i.e holder1, holder2, holder3, which contain single strings I would like to pass to grep
However I am having trouble getting the holder variables to return their values rather than just their names when I pass it to grep, my current attempt looks like:
vec<-c("str1","str2","str3","str4")
for(i in 1:length(vec)){
assign(paste("holder",i,sep=""),vec[i])
positions[i]<-grep( eval (paste ("holder",i,sep="")) ,colnames(df),ignore.case=TRUE)
}
This will search for holder1 within the colnames of df, which is not what I want, I would like to search for the contents of holder i within the df i.e str1 str2 e.t.c.
Any help greatly appreciated!
Solved it with eval and parse:
for(i in 1:length(vec)){
assign(paste("holder",i,sep=""),vec[i])
print ( paste ("holder",i,sep="") )
positions<-c(positions,grep( eval( parse(text= paste ("holder",i,sep="")) ),colnames(data),ignore.case=TRUE))
}

In R, I am trying to loop through variables of lists and pull out a specific index from each one

In R, I am using readHTMLTable to read in a tables from the web. The tables I want occur at indexes 16 & 17, [[16]] & [[17]].
Here is a small sample of the data for you to work with:
These are some of the urls that contain the HTML tables.
url1 = "http://www.basketball-reference.com/leagues/NBA_1980.html"
url2 = "http://www.basketball-reference.com/leagues/NBA_1981.html"
url3 = "http://www.basketball-reference.com/leagues/NBA_1982.html"
And here, I read in the tables to variables named x1, x2, and x3.
x1 = readHTMLTable(url1)
x2 = readHTMLTable(url2)
x3 = readHTMLTable(url3)
If you look at the summary of each of these summary(x1), summary(x2), summary(x3) and count down through the indexes, the tables I want are the ones named "team" and "opponent", which occur on line 16 and line 17.
I have been trying to write a loop that would cycle through these and name the "team" table from each to a variables named team.1980, team.1981, and team.1982, respectively. The "opponent" tables would follow the same trend, opp.1980, and so forth.
This is the code for the loop I have been trying:
for(i in 1:3) {
for (j in 1980:1982) {
nam1 = paste0("team.", j)
nam2 = paste0("opp.", j)
assign(nam1, paste0("x.", i)[[16]])
assign(nam2, paste0("x.", i)[[17]])
}
}
I think the theory behind this loop works, however the problem occurs with the two assign functions:
assign(nam1, paste0("x.", i)[[16]])
assign(nam2, paste0("x.", i)[[17]])
When I run the loop, I get the error message
Error in paste0("x.", i)[[16]] : subscript out of bounds
which is the same error I get if I just run:
paste0("x", 1)[[16]]
> paste0("x", 1)[[16]]
Error in paste0("x", 1)[[16]] : subscript out of bounds
So I am pretty sure this is where my problem is. Does anyone know how I could cycle through variables and pull out indexes from each?
Please keep in mind that I am rather new to R, so simplicity would be much appreciated! Thanks in advance!
The output from readHTMLTable() is a list and the elements can be referenced by name; index isn't necessary. (Though you can use it.)
Suppose x1, x2, and x3 are defined as in your post. Then you can just do this:
for (i in 1:3) {
year <- 1980 + i - 1
eval(parse(text=paste0("team.", year, " <- x", i, '[["team"]]')))
eval(parse(text=paste0("opp.", year, " <- x", i, '[["opponent"]]')))
}
This evaluates the parsed text that's constructed dynamically in the loop. It creates 6 data frames: team.1980 and opp.1980 for years 1980-1982.
Let's take a closer look at what it's doing...
First a string is constructed using paste0() to concatenate the values into a string with no separator. The first call to paste0() in the first iteration yields this string:
'team.1980 <- x1[["team"]]'
Calling parse() on this tells R to turn that string into an object called an expression. Expressions can be evaluated using eval(). So this string gets turned into an R statement and executed, thereby assigning team.1980.
This process continues for each of the 3 iterations.
This may not be the best approach, but it should work in your situation. I assume you have more than just these 6, otherwise you might as well just write them as individual assignments.

R : rename columns time series data

I am trying to rename the columns of a time series using assign function as follows -
assign(colnames(paste0(<logic_to_get_dataset>)),
c(<logic_to_get_column_names>))
I am getting a warning : In assign(colnames(get(paste0("xvars_", TopVars[j, 1], "_lag", :
only the first element is used as variable name
also, the column name assignment does not happen. I think this is happening because of colnames() function. Is there a workaround ?
The issue is that assign only looks at the first element of the vector.
You can try this, for example:
df = data.frame(x = 1:3, y = 4:2)
within(df, assign(colnames(df),c('a','b'))
You'll notice that R only looks at the first variable, and it tries to reassign the values that are described by those column names to the second value. This behavior is obviously not what you're looking for.
Unfortunately, it's kind of hackey, but you can always use something like this
data.frame.name = get_df()#some function that returns text
data.frame.columns = get_cols()#some function that returns text
eval(parse(text = paste0('colnames(',data.frame.name,') = c(',
paste(data.frame.columns,collapse = ','),')')))
I prefer to avoid doing these kinds of expressions, but it should work as intended.
Here it goes -
temp_var <- paste0('colnames(var_',TopLines[j,1],'_lag',get(paste0('uniqLg_',TopLines[j,1]))[k,],'_',get(paste0('uniqLg_',TopLines[j,1]))[k,]+12 ,
') <- c(gsub( "xt',get(paste0('uniqLg_',TopLines[j,1]))[k,],'" , "xt',get(paste0('uniqLg_',TopLines[j,1]))[k,],'__',get(paste0('uniqLg_',TopLines[j,1]))[k,]+12,
'", colnames(var_',TopLines[j,1],'_xt',get(paste0('uniqLg_',TopLines[j,1]))[k,],')))')
print(temp_var )
eval(parse( text=temp_var ))
where TopLines is a data frame with one column and contains a list of lines. The only problem with this method is, I can't test the output of eval unless I actually open the dataset and see if the changes have been affected.

Resources