I am trying to run a loop to change the names of two columns in my data but the name of these two columns start with a number. For the same work I changed them (they were not starting with a number) by writing as shown below but it is not working.
Here is the code (the loop finishes later) :
#Filtering
for(i in 1:length(names$ID)){
f<-names$ID[[i]]
corrpoints<-sprintf("corrpoints%i",as.numeric(levels(f))[f])
pts=readOGR(dsn="C:/Users/Charlie/Desktop/Stage_permafrost/SIG/Quantification_des_mouvements/Corr_points_disp/Corr_points_ubaye", layer=corrpoints)
pts$Gvalue2004<-pts$2004_red_gr
pts$Gvalue2012<-pts$2012_red_gr
pts$Aspect_mnt<-pts$Aspect_25m_
Any idea on how I could fix this?
Thanks
Your example is not reproducible for us.
Column names CAN start with numbers. Use ticks around the name to access it as below. Whether this is a good idea is an entirely different story.
x <- mtcars
colnames(x)[1] <- '1mpg'
x$`1mpg`
Related
I am writing this post to ask for some advice for looping code to rename columns by index.
I have a data set that has scale item columns positioned next to each other. Unfortunately, they are oddly named.
I want to re-name each column in this format: SimRac1, SimRac2, SimRac3.... and so on. I know the location of the columns (Columns number 30 to 37). I know these scale items are ordered in such a way that they can be named and numbered in increased order from left to right.
The code I currently have works, but is not efficient. There are other scales, in different locations, that also need to be renamed in a similar fashion. This would result in dozens of code rows.
See below code.
names(Total)[30] <- "SimRac1"
names(Total)[31] <- "SimRac2"
names(Total)[32] <- "SimRac3"
names(Total)[33] <- "SimRac4"
names(Total)[34] <- "SimRac5"
names(Total)[35] <- "SimRac6"
names(Total)[36] <- "SimRac7"
names(Total)[37] <- "SimRac8"
I want to loop this code so that I only have a chunk of code that does the work.
I was thinking perhaps a "for loop" would help.
Hence, the below code
for (i in Total[,30:37]){
names(Total)[i] <- "SimRac(1:8)"
}
This, unfortunately does not work. This chunk of code runs without error, but it doesn't do anything.
Do advice.
In the OP's code, "SimRac(1:8)" is a constant. To have dynamic names, use paste0.
We do not need a loop here. We can use a vectorized function to create the names, then assign the names to a subset of names(Total)
names(Total)[30:37]<-paste0('SimRac', 1:8)
I'm creating some routines in R to ease model creation and to distinguish several groups based on several parameters (ex: original watches VS fakes ones using watches common attributes).
During the proccess, I keep track of the potential excluded lines in a vector (empty at first), and I get ride of them at the end using:
model$var <- raw_data[-line_excluded,]
The problem is that if line_excluded is c() (ndlr no line exlcuded), model$var is an empty dataframe then in that case I want all the lines of the dataframe.
The only solution I have think about is the us of
if (!is.null(line_excluded)){
model$var <- raw_data[-line_excluded,]}
But that's not really pretty, and I have several tracking variables as line_excluded which need that.
Thanks for the help
You can make it in another way using setdiff(), which can deal with empty line_excluded i.e.,
model$var <- raw_data[setdiff(seq(nrow(raw_data)),line_excluded),]
You can also try:
model$var <- raw_data[!(1:nrow(raw_data) %in% line_excluded),]
This is similar to what #THomasIsCoding suggested, you look for the row numbers that are not in your line_excluded..
I have a list of dataframes (subspec2) which I want to loop through to get the columns with the maximum value from each dataframe, and write these to a new dataframe. I wrote the following loop:
good.data<-data.frame(matrix(nrow=401, ncol=78)) #create empty dataframe
for (i in length(subspec2)) ##subspec2 is the list of dataframes
{
max.name<-names(which.max(apply(subspec2[[i]],MARGIN=2,max))) #find column name with max value
good.data[,i]<-subspec2[[i]][max.name] #write the contents of this column into dataframe
}
This seems to work but only returns values in the last column, nothing else appears to have been saved. Many threads point out the df must be outside the loop, but that is not the problem here.
What am I doing wrong?
Thank you!
I believe you need to change for (i in length(subspec2)) to for (i in 1:length(subspec2)). The former will only do 1 iteration, where i = length(subspec2) whereas the latter iterates over multiple is.
(I am pretty sure that is your issue, but one thing that is great to do is to create a reproducible example so I can run your code to double check, for example I am not exactly sure what subspec2 looks like, and I am not able to run your code as it is, a great resource for this is the reprex package).
I have a data frame loaded using the CSV Library in R, like
mySheet <- read.csv("Table.csv", sep=";")
I now can print a summary on that mySheet object
summary(mySheet)
and it will show me a summary for each column, for example, one column named Diagnose has the unique values RCM, UCM, HCM and it shows the number of occurences of each of these values.
I now filter by a diagnose, like
subSheet <- mySheet[mySheet$Diagnose=='UCM',]
which seems to be working, when I just type subSheet in the console it will print only the rows where the value has been matched with 'UCM'
However, if I do a summary on that subSheet, like
summary(subSheet)
it still 'knows' about the other two possibilities RCM and HCM and prints those having a value of 0. However, I expected that the new created object will NOT know about the possible values of the original mySheet I initially loaded.
Is there any way to get rid of those other possible values after filtering? I also tried subset but this one just seems to be some kind of shortcut to '[' for the interactive mode... I also tried DROP=TRUE as option, but this one didn't change the game.
Totally mind squeezing :D Any help is highly appreciated!
What you are dealing with here are factors from reading the csv file. You can get subSheet to forget the missing factors with
subSheet$Diagnose <- droplevels(subSheet$Diagnose)
or
subSheet$Diagnose <- subSheet$Diagnose[ , drop=TRUE]
just before you do summary(subSheet).
Personally I dislike factors, as they cause me too many problems, and I only convert strings to factors when I really need to. So I would have started with something like
mySheet <- read.csv("Table.csv", sep=";", stringsAsFactors=FALSE)
I am pretty new to R and have a couple of questions about a loop I am attemping to execute. I will try explain myself as best as possible reguarding what I wish the loop to do.
for(i in (1988:1999,2000:2006)){
yearerrors=NULL
binding=do.call("rbind.fill",x[grep(names(x), pattern ="1988.* 4._ data=")])
cmeans=lapply(binding[,2:ncol(binding)],mean)
datcmeans=as.data.frame(cmeans)
finvec=datcmeans[1,]
kk=0
result=RMSE2(yields[(kk+1):(kk+ncol(binding))],finvec)
kk=kk+ncol(binding)
yearerrors=c(result)
}
yearerrors
First I wish for the loop to iterate over file names of data.
Specifically over the years 1988-2006 in the place where 1988 is
placed right now in the binding statement. x is a list of data files
inputted into R and the 1988 is part of the file name. So, I have
file names starting with 1988,1989,...,2006.
yields is a numeric vector and I would like to input the indices of
the vector into the function RMSE2 as indicated in the loop. For
example, over the first iteration I wish for the indices 1 to the
number of columns in binding to be used. Then for the next iteration
I want the first index to be 1 more than what the previous iteration
ended with and continue to a number equal to the number of columns in the next binding
statement. I just don't know if what I have written will accomplish
this.
Finally, I wish to store each of these results in the vector
yearerrors and then access this vector afterwards.
Thanks so much in advance!
OK, there's a heck of a lot of guesswork here because the structure of your data is extremely unclear, I have no idea what the RMSE2 function is (and you've given no detail). Based on your question the other day, I'm going to assume that your data is in .csv files. I'm going to have a stab at your problem.
I would start by building the combined dataframe while reading the files in, not doing one then the other. Like so:
#Set your working directory to the folder containing the .csv files
#I'm assuming they're all in the form "YEAR.something.csv" based on your pattern matching
filenames <- list.files(".", pattern="*.csv") #if you only want to match a specific year then add it to the pattern match
years <- gsub("([0-9]+).*", "\\1", filenames)
df <- mdply(filenames, read.csv)
df$year <- as.numeric(years[df$X1]) #Adds the year
#Your column mean dataframe didn't work for me
cmeans <- as.data.frame(t(colMeans(df[,2:ncol(df)])))
It then gets difficult to know what you're trying to achieve. Since your datcmeans is a one row data.frame, datcmeans[1,] doesn't change anything. So if a one row from a dataframe (or a numeric vector) is an argument required for your RMSE2 function, you can just pass it datcmeans (cmeans in my example).
Your code from then is pretty much indecipherable to me. Without know what yields looks like, or how RMSE2 works, it's pretty much impossible to help more.
If you're going to do a loop here, I'll say that setting kk=kk+ncol(binding) at the end of the first iteration is not going to help you, since you've set kk=0, kk is not going to be equal to ncol(binding), which is, I'm guessing, not what you want. Here's my guess at what you need here (assuming looping is required).
yearerrors=vector("numeric", ncol(df)) #Create empty vector ahead of loop
for(i in 1:ncol(df)) {
yearerrors[i] <- RMSE2(yields[i:ncol(df)], finvec)
}
yearerrors
I honestly can't imagine a function that would work like this, but it seems the most logical adaption of your code.