Remove Column in R after an "if" clause - r

I am learning R and I have a R data table in which I want to remove unnecessary features (unnecessary table columns). For this I am using the ReliefexpRank algorithm from the CORElearn package, with table and originaltable being the R tables.
library(CORElearn)
estRelifF <-attrEval(FLAG_READMITIDO_MEAN ~.,table,estimator="ReliefFexpRank",ReliefIterations=30)
for( i in estRelifF ){
if(estReliefF[i]==0) {originaltable[i]<-NULL}
}
output <-data.frame (estReliefF)
I know that the estReliefF has the correct results, getting me results like this sample below for each feature
LOCAL
-4.428817e-01
HORA
0.000000e+00
And I want to remove the Hora one which is 0.
I don't know what the problem is though I suspect that's around the IF statement, since it's my first time using R I would appreciate some help since I can't seem to find the mistake.

The issue comes from you modifying your columns while running a loop on them. Let's say your vector and table are :
x<-c(1,1,0,1,0)
df<-data.frame(1:5,2:6,3:7,4:8,5:9)
If you run for(i in 1:5){if(x[i]==0){df[i]<-NULL}}, you'll see that the third column has been removed, but not the fifth. That's because after the third column has been removed, the fifth column is no longer the fifth but the fourth, and x[4]is not null.
You need to find all the unwanted columns before deleting them : one possible solution is :
df[-which(x==0)]

Related

Ignore first row in R

How do I do in R, to ignore the first row of a data set and a second turn the column names?
I am currently reading a file that sometimes has garbage on the first or second line, and I am looking for a way to resolve this.

Dealing with points vs. rows

I fear I've missed some crucial point in my education thus far.
I have a table HR and I've performed functions on it.
For example HR$FTE <- HR$'Std Hrs' / 38 gives me a new column for each employee; working as intended.
However, whenever I try to perform a function when creating a new column it doesn't like that. The question that I posted yesterday is similar in nature where the error result was from returning the whole row.
An example function that doesn't work would be HR$FYEnd <- as.Date(paste(HR$FY + 1,"06","30", sep = "-")). In this case, non-numeric argument to binary operator is returned, as HR$FY is not numeric but rather a column of numeric data. What should be outputted is a set of dates on 30/06.
In Excel (which I'm trying to train myself to leave) the equivalent when dealing with tables would be [#[FY Start]] or something to that effect which demonstrates that you're working with the figure on that row rather than the whole row.
Worked it out - couple of days later.
The step that I was missing was using the mapply/sapply commands. Using these has sorted everything out.

Only last iteration of loop is saved

I have a list of dataframes (subspec2) which I want to loop through to get the columns with the maximum value from each dataframe, and write these to a new dataframe. I wrote the following loop:
good.data<-data.frame(matrix(nrow=401, ncol=78)) #create empty dataframe
for (i in length(subspec2)) ##subspec2 is the list of dataframes
{
max.name<-names(which.max(apply(subspec2[[i]],MARGIN=2,max))) #find column name with max value
good.data[,i]<-subspec2[[i]][max.name] #write the contents of this column into dataframe
}
This seems to work but only returns values in the last column, nothing else appears to have been saved. Many threads point out the df must be outside the loop, but that is not the problem here.
What am I doing wrong?
Thank you!
I believe you need to change for (i in length(subspec2)) to for (i in 1:length(subspec2)). The former will only do 1 iteration, where i = length(subspec2) whereas the latter iterates over multiple is.
(I am pretty sure that is your issue, but one thing that is great to do is to create a reproducible example so I can run your code to double check, for example I am not exactly sure what subspec2 looks like, and I am not able to run your code as it is, a great resource for this is the reprex package).

Not aggregating correctly

My goal of this code is to create a loop that aggregates each company's word frequency by a certain principle vector I created and adds it to a list. The problem is, after I run this, it only prints the 7 principles that I have rather than the word frequencies along side them. The word frequencies being the certain column of the FREQBYPRINC.AG data frame. Individually, running this code without the loop and just testing out a certain column, it works no problem. For some reason, the loop doesn't want to give me the correct data frames for the list. Any suggestions?
list.agg<-vector("list",ncol(FREQBYPRINC.AG)-2)
for (i in 1:14){
attach(FREQBYPRINC.AG)
list.agg[i]<-aggregate(FREQBYPRINC.AG[,i+1],by=list(Type=principle),FUN=sum,na.rm=TRUE)
}
I really wish I could help. After reading your statement, It seems that to you , you feel that the code should be working and it is not. Well maybe there exists a glitch.
Since you had previously specified list. agg as a list, you need to subset it with double square brackets. Try this one out:
list.agg<-vector("list",ncol(FREQBYPRINC.AG)-2)
for (i in 1:14){
list.agg[[i]]<-aggregate(FREQBYPRINC.AG[,i+1],by=list
(Type=principle),FUN=sum,na.rm=TRUE)}

Can I delete values using edit()?

Running:
R 3.2.2;
R Studio 0.99.484;
Windows 10
Simple code like below:
z<-data.frame(c(1,2),c(3,4))
edit(z)
When I open up the R editor, I see I can edit or add to cells. However, I cannot delete rows or columns. I know I can delete individual cells by clicking edit->delete, but this just gives a NULL. Is there a way to actually delete values in the editor?
Note: I realize that something like
z<-z[1]
would probably be easier, but I'm relatively new to R and trying to understand when and how to use the edit() function.
If we need to remove the second row,
z[-2, , drop=FALSE]
or first column
z[-1]
Also, removing values to a particular cell implies creating a missing value i.e. NA
z[2,1] <- NA
creates the 2nd row, 1st column as NA.

Resources