I used
milsa <- edit(data.frame())
To open the R Data Editor and now I can type the data of my table.
My problem is: my table has 36 rows, but for some reason I have 39 rows appearing in the program (the 3 additional rows are all filled with NA).
When I try to use:
length(civil)
I'm getting 39 instead of 36. How can I solve this? I am trying to use fix(milsa) but it can't delete the additional rows.
PS: Civil is a variable of milsa.
Subset with the index:
You can reassign the data.frame to itself with only the rows you want to keep.
milsa <- milsa[1:36,]
Here is a LINK to a quick tutorial for your reference
To delete specific rows
milsa <- milsa[-c(row_num1, row_num2, row_num3), ]
To delete rows containing one or more NA's
milsa <- na.omit(milsa)
I'm looking to identify duplicate records in my data set based on multiple columns, review the records, and keep the ones with the most complete data in R. I would like to keep the row(s) associated with each name that have the maximum number of data points populated. In the case of date columns, I would also like to treat invalid dates as missing. My data looks like this:
df<-data.frame(Record=c(1,2,3,4,5),
First=c("Ed","Sue","Ed","Sue","Ed"),
Last=c("Bee","Cord","Bee","Cord","Bee"),
Address=c(123,NA,NA,456,789),
DOB=c("12/6/1995","0056/12/5",NA,"12/5/1956","10/4/1980"))
Record First Last Address DOB
1 Ed Bee 123 12/6/1995
2 Sue Cord 0056/12/5
3 Ed Bee
4 Sue Cord 456 12/5/1956
5 Ed Bee 789 10/4/1980
So in this case I would keep records 1, 4, and 5. There are approximately 85000 records and 130 variables, so if there is a way to do this systematically, I'd appreciate the help. Also, I'm a total R newbie (as if you couldn't tell), so any explanation is also appreciated. Thanks!
#Add a new column to the dataframe containing the number of NA values in each row.
df$nMissing <- apply(df,MARGIN=1,FUN=function(x) {return(length(x[which(is.na(x))]))})
#Using ave, find the indices of the rows for each name with min nMissing
#value and use them to filter your data
deduped_df <-
df[which(df$nMissing==ave(df$nMissing,paste(df$First,df$Last),FUN=min)),]
#If you like, remove the nMissinig column
df$nMissing<-deduped_df$nMissing<-NULL
deduped_df
Record First Last Address DOB
1 1 Ed Bee 123 12/6/1995
4 4 Sue Cord 456 12/5/1956
5 5 Ed Bee 789 10/4/1980
Edit: Per your comment, if you also want to filter on invalid DOBs, you can start by converting the column to date format, which will automatically treat invalid dates as NA (missing data).
df$DOB<-as.Date(df$DOB,format="%m/%d/%Y")
I have table A in teradata which has column plancode .
Possible values of this plancode are
GNSC11Q
BNSC12Q
HNSC13Q
12345
A1234
I want to remove first 4 character from string which has first 4 character like GNSC, BNSC and HNSC so the final values will be like 11Q, 12Q, 13Q.
I need update statement that will remove above mentioned first 4 char from all the data in that Plancode column. Any help would be appreciated.
I created a report with 3 columns, Department, Ticket Count, Ticket Number. It groups Department names and the second column shows 1 instance of the Ticket Count. The last column shows all of the ticket numbers.
I added a row that shows the grand total of all of the departments displayed.
This is the data set results :
Department TicketCount TicketNumber
D1 3 12345
D1 3 22345
D1 3 32345
I group the Department and the TicketCount so that the display is like this:
Department TicketCount TicketNumber
D1 3 12345
22345
32345
I want to add a ticket total at the end but the result is always adding all of the ticket counts and not just one.
So the Total displayed is 9 not 3.
I need to create an expression that picks the distinct TicketCounts of the departments and sums them.
The function DistinctCount returns the correct number of counts when I have multiple departments but not the values.
I tried the RunningValue function but it adds all of the values in the column.
=RunningValue(Fields!ReopenedTicketCount.Value, sum, Nothing)
I need to create a function that sums the distinct values of the ticket counts of each department.
Can anyone point me in the direction as to the functions that I need to use?
I figured this out. I just used the COUNT function on the third column to get the grand total.
I am working on a web page that has a data grid and have a need to do
the following:
Have a column that is a drop down when the grid loads (no need to click edit)
This column is bound to a column from the data query (everything up
to this point works fine)
I know need to add more values to this drop down from another
dataset (so the user can change the value if required)
These values come from a query to another table in the database. The
values will be same for all the rows in the table, these values are
based on a master key for the complete webpage.
As an example:
table 1 has:
Mangoes $12
Apricots $13
Peaches $14
This is on the grid.
The other table has:
Prices
12
13
14
15
16
I want these values from the prices table to appear in the drop down
for table 1 in the data grid, with the current values as the selected
item.
Any ideas will help. Thanks for the help.
As far as I know,
U should better change ur table design.
If ur fruit table is concern with price id, then u can easily select the item from dropdown with current value in fruit table.
Fruit
PriceId
PriceId
Price
------------------------ --------------------
Mangoes
1
<--------------------->
1
12
Apricots
2
<--------------------->
2
13
Peaches
3
<--------------------->
3
14
then it is easier to list all price in dropdown list
and also easier to choose selected price based on PriceId from Fruit table.
Sound like little confuse? let me know, if anything u want?
Hope it works!