In R, I am trying to dynamically select columns of a data.frame called DF. If
cutOffYear=2014
and
forecast_years=3
Then this piece of code
paste0("DF$X",cutOffYear+1:forecast_years)
yields:
[1] "DF$X2015" "DF$X2016" "DF$X2017"
Assuming all three columns exist in DF how do I assign the column variables to the characters?
I have tried a lot of combinations of get, assign and paste0 but I am failing.
We can try with [ to select the columns. It is often error prone when using $. If we need to get the output as a data.frame with columns specified in the pasted combination of 'cutOffYear', 'forecast_years', then the below code should work fine
DF[paste0("X", cutOffYear+1:forecast_years)]
Related
The below code allows simple filtering of a list:
#Filter to applicable codes only
ICS_List <- "QMM|QJG|QH8|QM7|QUE|QHG"
EofEMSOAs <- EofEMSOAs[grep(ICS_List, EofEMSOAs$Code),]
What I am looking to do instead is take all data from the column from another dataframe within a project and use the grep function to filter for values contained within that column - there could be hundreds so typing a manual list is not practical.
I have tried the below but it results in error 'argument 'pattern' has length > 1 and only the first element will be used. Seems using dplyr in this way does not create the same output as manually typing in a list which is throwing the error so I only get one result.
#To filter from required dataframe 'EofEMSOAsIMD'
EofEMSOAsCodeListOnly <- dplyr::pull(EofEMSOAsIMD, "Area Code")
EofEMSOAsFinalList <- EofEMSOAs[grep(EofEMSOAsCodeListOnly, EofEMSOAs$msoa11cd),]
Could anyone please amend the above so it does work using similar logic to the code at top of this question, namely 1. List created from column 2. Dataframe filtered for matches to that list? Thank you.
This is the code I am trying to run:
data_table<-data_table%>%
merge(new_table, by = 'Sample ID')%>%
mutate(Normalized_value = ((1.8^(data_table$Ubb - data_table$Ct_adj))*10000))
I want to first add the new column ("Ubb") from "new_table" and then add a calculated column using that new column. However, I get an error saying that Ubb column does not exist. So it's not performing merge before running mutate? When I separate the functions everything works fine:
data_table<-data_table%>%
merge(new_table, by = 'Sample ID')
data_table<-data_table%>%
mutate(Normalized_value = ((1.8^(data_table$Ubb - data_table$Ct_adj))*10000))
I would like to keep everything together just for style, but I'm also just curious, shouldn't R perform merge first and then mutate? How does order of operation during piping work?
Thank you!
you dont need to refer to column name with $ sign. i.e. use Normalized_value = ((1.8^(Ubb - Ct_adj))*10000)
because it is merged now. with $ sign I believe R, even though does the merge, has original data_table still in memory. because the assignment operator did not work yet. the assignment will take place after all operations.
Try running the code like this:
data_table<-data_table%>%
merge(new_table, by = 'Sample ID')%>%
mutate(Normalized_value = ((1.8^(Ubb - Ct_adj))*10000))
Notice that I'm not using the table name with the $ within the pipe. Your way is telling the mutate column to look at a vector. Maybe it's having some trouble understanding the length of that vector when used with the merge. Just call the variable name within the pipe. It's easiest to think of the %>% as 'and then': data_table() and then merge() and then mutate(). You might also want to think about a left_join instead of a merge.
I have already checked the forum for the solution but could not find the solution, hence, posting the problem here.
I am trying to change a name of a column using Rename, although the command is executed without any error but the names of the columns does not change.
I use this coderename(red,parts = category) Thanks
You need to pass a named vector as the second argument, and the names need to be quoted. Also, don't forget that you need to overwrite red with the result.
red <- reshape::rename(red, c(parts = "category"))
If you want to change multiple columns, add the items to the named vector you supply to the second argument.
I am doing a loop and I need to subset a fixed range of the columns of a data frame. But the loop is exactly to generate the name of the data.frame that I need to extract the columns. I would like to know how can I call a data.frame from a string name. It must be something similar to assign() function, but I am not assigning any value to nothing, I just need to generate the name of a data.frame from a string using paste0() function.
a=5
b="a"
get(b)
Output:
5
How I found this? I did help(assign) and looked under "See also" section.
I'd like to loop through a list of files and record detailed info about them (size, no. of rows, means of columns).
I just started with storing the info in a data frame:
df<-data.frame()
all <-list.files(pattern=".csv")
for (i in all){
file<-read.csv(i)
filas<-nrow(file)
cols<-ncol(file)
info<-c(i,filas,cols)
df<-rbind(df,i,filas,cols)
}
but it triggers an error caused by the 'i' variable, which is just a file name. What am I doing wrong?
Thanks in advance, p.
Don't use for loops. Rather, use lapply in combination with do.call to obtain your desired result. Try:
do.call(rbind,lapply(all,function(x) {y<-read.csv(x); c(file=x, filas=nrow(y), cols=ncol(y))}))
Your approach was failing because in order of rbind to work, you need two data.frames with the same number of columns. You initially have created an empty data.frame (with 0 column) and this couldn't be rbinded to a vector of length 3 (assuming that you want a row for each file showing file name, number of rows and number of columns). If you really want to use a for loop, you should do something like:
for (i in 1:length(all)) {
file<-read.csv(all[i])
info<- data.frame(file=all[i], filas=nrow(file), cols=ncol(file))
if (i==1) df<-info else df<-rbind(df,info)
}