Running:
R 3.2.2;
R Studio 0.99.484;
Windows 10
Simple code like below:
z<-data.frame(c(1,2),c(3,4))
edit(z)
When I open up the R editor, I see I can edit or add to cells. However, I cannot delete rows or columns. I know I can delete individual cells by clicking edit->delete, but this just gives a NULL. Is there a way to actually delete values in the editor?
Note: I realize that something like
z<-z[1]
would probably be easier, but I'm relatively new to R and trying to understand when and how to use the edit() function.
If we need to remove the second row,
z[-2, , drop=FALSE]
or first column
z[-1]
Also, removing values to a particular cell implies creating a missing value i.e. NA
z[2,1] <- NA
creates the 2nd row, 1st column as NA.
Related
I have a data frame that looks kind of like this...but much larger
I want to look at the record_id column and shift the right side columns down when the row says admin_time. Then make that previous row NA. Then when I write it to a csv, I'll just use the na = "" to make those cells blank
For example, in the first few rows, it would look like this...
No need to try to recreate the data frame. I was thinking maybe a for loop would work with an embedded if statement to review the patient_id, record_id, and pk_day. I was just looking for alternate suggestions or how to use a statement within the loop to pick out the admin line and do what I mentioned above
I have a line of code that includes data.table package which allows me to identify all the rows and look if the cell contains the word "Margin".
Census_Bureau_Data<-Filter(function(Census_Bureau_Data) !any(Census_Bureau_Data %like% "Margin"), Census_Bureau_Data)
The code works perfectly and allows me to remove the columns that contain one row with the word Margin. Though I got result I wanted, I only want my script to limit the process to the first row. This is in case in the future the word Margin happens to appear somewhere outside of the first row and i wouldn't necessarily want my whole column deleted because of that. I only care about the first column.
Census_Bureau_Data<-Filter(function(Census_Bureau_Data) !any(Census_Bureau_Data[1,] %like% "Margin"), Census_Bureau_Data)
so i tried this instead. Note the bracket i added. I thought this would be enough. This should be simple enough. Where can I maintain the same string but just have it run through the first row?
[1,]
Two comments:
I think it's a little confusing (though not an error) to have the anonymous function's argument named the same as the external object itself, so for brevity I'll use function(xyz) ... here.
Realize that in that function, xyz is a vector of data, not a frame of data, so [,1] or [1,] are meaningless.
Since you're only looking at the first row's worth of values, you don't need any, just [1].
I think this is what you need:
Filter(
function(xyz) !(xyz[1] %like% "Margin"),
Census_Bureau_Data
)
However, while the use of Filter is not wrong, I think this can be simplified a little:
# data.table
Census_Bureau_Data[, !Census_Bureau_Data[1,,drop=TRUE] %like% "Margin", with = FALSE ]
# data.frame or tbl_df
Census_Bureau_Data[, !Census_Bureau_Data[1,,drop=TRUE] %like% "Margin" ]
It seems that I found this to work.
Census_Bureau_Data<-Filter(function(Census_Bureau_Data) !(Census_Bureau_Data[[1]] %like% "Margin"), Census_Bureau_Data)
i removed "any" as the comments suggested and added a double bracket [[1]]. I also ran tests. So i added the word "margin" in column 5 and row 5.
When i ran my original the cell that included the word margin in the 5th row and column had their column deleted. When i ran the code i have here the script applied only to Row 1 and it kept the column I had.
I’m looking for a simple expression that puts a ‘1’ in column E if ‘SomeContent’ is contained in column D. I’m doing this in Azure ML Workbench through their Add Column (script) function. Here’s some examples they give.
row.ColumnA + row.ColumnB is the same as row["ColumnA"] + row["ColumnB"]
1 if row.ColumnA < 4 else 2
datetime.datetime.now()
float(row.ColumnA) / float(row.ColumnB - 1)
'Bad' if pd.isnull(row.ColumnA) else 'Good'
Any ideas on a 1 line script I could use for this? Thanks
Without really knowing what you want to look for in column 'D', I still think you can find all the information you need in the examples they give.
The script is being wrapped by a function that collects the value you calculate/provide and puts it in the new column. This assignment happens for each row individually. The value could be a static value, an arbitrary calculation, or it could be dependent on the values in the other columns for the specific row.
In the "Hint" section, you can see two different ways of obtaining the values from the other rows:
The current row is referenced using 'row' and then a column qualifier, for example row.colname or row['colname'].
In your case, you obtain the value for column 'D' either by row.D or row['D']
After that, all you need to do is come up with the specific logic for ensuring if 'SomeContent' is contained in column 'D' for that specific row. In your case, the '1 line script' would look something like this:
1 if [logic ensuring 'SomeContent' is contained in row.D] else 0
If you need help with the logic, you need to provide more specific examples.
You can read more in the Azure Machine Learning Documentation:
Sample of custom column transforms (Python)
Data Preparations Python extensions
Hope this helps
I am using PHPExcel_v1_8, I have implemented formula to some cells like following.
$objPHPExcel->getActiveSheet()->SetCellValue('G4','=SUBTOTAL(2,B6:B'.$row.')');
$objPHPExcel->getActiveSheet()->SetCellValue('H4','=ROUND(SUBTOTAL(9,Q6:Q'.$row.'),2)');
I also tried like
$objPHPExcel->getActiveSheet()->setCellValueByColumnAndRow(6,4,'=SUBTOTAL(2,B6:B'.$row.')');
Here $row means total number of rows.
But when I filter any column then it append/override value of formula applied cell. Please see following filtered total row.
I want only latest value should in filtered total row means want to replace existing value. Right now, I am getting correct value but why it is overrided? Any suggestions what can be the solution?
You've actually discovered a genuine bug here.
I wasn't aware when I implemented the SUBTOTAL logic in PHPExcel that it only worked with visible rows, and ignored hidden rows. Can you please raise an issue on the github repo
However, reading through the MS Excel docs for SUBTOTAL, a function of 2 or 9 will return the result for all rows in the range (hidden or otherwise) while 102 or 109 will return the result only for visible rows
I am learning R and I have a R data table in which I want to remove unnecessary features (unnecessary table columns). For this I am using the ReliefexpRank algorithm from the CORElearn package, with table and originaltable being the R tables.
library(CORElearn)
estRelifF <-attrEval(FLAG_READMITIDO_MEAN ~.,table,estimator="ReliefFexpRank",ReliefIterations=30)
for( i in estRelifF ){
if(estReliefF[i]==0) {originaltable[i]<-NULL}
}
output <-data.frame (estReliefF)
I know that the estReliefF has the correct results, getting me results like this sample below for each feature
LOCAL
-4.428817e-01
HORA
0.000000e+00
And I want to remove the Hora one which is 0.
I don't know what the problem is though I suspect that's around the IF statement, since it's my first time using R I would appreciate some help since I can't seem to find the mistake.
The issue comes from you modifying your columns while running a loop on them. Let's say your vector and table are :
x<-c(1,1,0,1,0)
df<-data.frame(1:5,2:6,3:7,4:8,5:9)
If you run for(i in 1:5){if(x[i]==0){df[i]<-NULL}}, you'll see that the third column has been removed, but not the fifth. That's because after the third column has been removed, the fifth column is no longer the fifth but the fourth, and x[4]is not null.
You need to find all the unwanted columns before deleting them : one possible solution is :
df[-which(x==0)]