command to remove row from a data frame [duplicate] - r

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to delete a row in R
I can't figure out how to simply remove row (n) from a dataframe in R.
R's documentation and intro manual are so horribly written, they are virtually zero help on this very simple problem.
Also, every explanation i've found here/ on google is for removing rows that contain strings, or duplicates, etc, which have been excessively advanced for my problem and lead me to introduce more bugs and get nowhere. I just want to remove a row.
Thanks in advance for your help.
fyi the list is in the variable eld, which has 5 columns and 33 rows. I would like to remove row 14. I initialized eld with the following command
eld <- read.table("election2012.txt")
so my desired result is
eldNew <- eld(minus row 14)

eldNew <- eld[-14,]
See ?"[" for a start ...
For ‘[’-indexing only: ‘i’, ‘j’, ‘...’ can be logical
vectors, indicating elements/slices to select. Such vectors
are recycled if necessary to match the corresponding extent.
‘i’, ‘j’, ‘...’ can also be negative integers, indicating
elements/slices to leave out of the selection.
(emphasis added)
edit: looking around I notice
How to delete the first row of a dataframe in R? , which has the answer ... seems like the title should have popped to your attention if you were looking for answers on SO?
edit 2: I also found How do I delete rows in a data frame? , searching SO for delete row data frame ...
Also http://rwiki.sciviews.org/doku.php?id=tips:data-frames:remove_rows_data_frame

Related

Find the index of the last occurence of fulfilled criteria in a matrix in r

I have an array (x) in R of size 30x11x10.
x=array(-2:20, c(30,11,10))
Each 'grid' or matrix represents a day of data for a month (30 days represented here). I want to find the index (i,j,k) of when the last occurrence of a number less than 2 occurs. Ideally, I would also like the value returned too. If this was in Matlab, I could just use [i,j,k]=find(x(x<2)) but I don't see an exact equivalent for this in R.
I have looked at 'match' as suggested in other posts here, but it seems to find elements when they are specified, but not when a criteria (x<2) is given?
I tried this:
xxx<-match(x,x<2,0) but it returns a long vector of integers that don't appear to show what I am looking for.
Then I tried:xxx<-match(x,x[x<2],0) which looks a bit more promising, but still isn't what I want (to be honest I'm not sure what the output is indexing).
I think I'm probably asking a foolish question here because if I want 3 indices and the value returned, then I should be assigning them to something preemptively right (which I'm not doing)? Can anyone offer any advice?

How to print the number of data entries inside a variable in R? [duplicate]

This question already has answers here:
How to know a dimension of matrix or vector in R?
(6 answers)
Closed 3 years ago.
I know this is probably a very simple question but I can't seem to find the answer anywhere online. I am trying to print just the number of data points inside of a variable that I created but I can't figure out how.
I tried using summary() or num() or n() but I am really just making stuff up here and cannot seem to figure it out at all.
For my specific example I have a data set on peoples heights, age, weight, gender, stuff like that. I used
one_sd_weight <- cdc$weight[abs(cdc$weight - mean(cdc$weight)) <= sd(cdc$weight)]
to determine how many of the weights fall within one standard deviation of the mean. After I do this, I can see that on the right side it created a new variable called one_sd_weight that contains 14152 out of the original 20000 entries. How do I print the number 14152 as a variable? For the work I am doing I need to create a new variable that just contains one number, 14152 or whatever number is produced when I run the code above. For example, I need to create
n_one_sd <- 14152
without typing in 14152, instead typing some function that grabs the number of entries in one_sd_weight.
I have tried things like summary() and n() but only receive error messages in return. Any help is greatly appreciated!!
n_one_sd <- length(one_sd_weight)
You're looking for length (in case of a vector) or nrow in case of a matrix/data.frame.
Or you can use NROW() for both, that should work too.

How do you the return column(s) number(s) based on class of said column? [duplicate]

This question already has answers here:
How to find all numeric columns in data
(2 answers)
Closed 4 years ago.
I have a list of 185 data-frames. I'm trying to edit them so each data frame only shows its numeric columns and also 2 specific, non-numeric ones.
I've had many issues with solving this, so I plan to use a for loop and find the column numbers of all numeric columns, use match to do the same for the two specific ones and then use c() to overwrite the data-frames.
I can pull the column number for the specific ones with
match("Device_Name",colnames(DFList$Dataframe))
successfully.
However, I cannot figure out how to return the numbers for all integer columns in a data-frame.
I have tried
match(is.numeric(colnames(DFList$Dataframe)),colnames(DFList$Dataframe))
and
match(class == "numeric",colnames(DFList$Dataframe),colnames(DFList$Dataframe))
to name a few, but now I am just taking wild stabs in the dark. Any advice would be welcome.
which(sapply(DFList$Dataframe,is.numeric))

how to view head of as.data.frame in R? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have a huge data set with 20 columns and 20,000 rows, according to the manual of a program I use, we have to put the data as a data frame, though I'm not I understand what it does.. and I can't seem to view the head data frame I created.
I wrote in Bold the part that I don't understand, I'm very new with R, can a kind mind explain to me how the following works?
First I read the CSV file
vData = read.csv("my_matrix.csv");
1) Here we create the data frame as per the manual, what does -c(1:8) do exactly??
dataExpr0 = as.data.frame(t(vData[, -c(1:8)]))
2) Here, to understand what the above part does, I tried to view only the header of the data frame, with the following line, but it display the first 2 columns for the 20,000 rows of data. Is there a way to view only the first 2 rows?
head(dataExpr0, n = 2)
Let's disect what your call is doing, from the inside out.
Basic Indexing
When indexing a data.frame or matrix (assuming 2 dimensions), you access a single element of it with the square bracket notation, as you're seeing. For instance, to see the value in the fourth row, fifth column, you'd use vData[4,5]. This can work with ranges of rows and/or columns as well, such as vData[1:4,5] returning the first 4 rows and the 5th column as a vector.
Note: the range 1:4 can also be an arbitrary vector of numbers, such as vData[c(1,2,5),c(4,8)] which returns a 3 by 2 matrix.
BTW: by default, when the resulting slice/submatrix has one of its dimensions reduced to 1 (as in the latter example), R will drop it to the lower structure (e.g., matrix -> vector -> scalar). In this case, it will drop vData[1:4,5] to a vector. You can prevent this from happening by adding what appears to be a third dimension to the square brackets: vData[1:4,5,drop=FALSE], meaning "do not drop the simplified dimension". Now, you should get a matrix of 4 rows and 1 column in return.
You can read a much more thorough explanation of how to subset data.frames by reading (for example) some of the "Hadleyverse". If you do that, I highly encourage you to make it an interactive session: play in R as you read, to help cement the methods.
Negative Indexing
Negative indices mean "everything except what is listed". In your example, you are subsetting the data to extract everything except columns 1:8. So your vData[,-c(1:8)] is returning all rows and columns 9 through 20, a 20K by 12 matrix. Not small.
Transposition
You probably already know what t() does: transpose the matrix so that it is now 12 by 20K.
A word of warning: if all of your data.frame columns are of the same class (e.g., 'character', 'logical'), then all is fine. However, the fact that data.frames allow disparate types of data in different columns is not a feature shared by matrices. If one data.frame column is different than the others, they will be converted to the highest common format, e.g., logical < integer < numeric < character.
Back to a data.frame
After you transpose it (which converts to a matrix), you convert back to a data.frame, which may or may not be necessary depending on how to intend to deal with the data later. For instance, if the row names are not meaningful, then it may not be that useful to convert into a data.frame. That's relatively immaterial, but I'm a fan of not over-converting things. I'm also a fan of using the simpler data structure, and matrices are typically faster than data.frames.
Head
... merely gives you the top n rows of a data.frame or matrix. In your case, since you transposed it, it is now 20K columns wide, which may be a bit unwieldy on the command line.
Alternatives
Based on what I provided earlier, perhaps you just want to look at the top few rows and first few columns? dataExpr0[1:5,1:5] will work, as will (identically) head(dataExpr0[,1:5], n=5).
More Questions?
I strongly encourage you to read more of the Hadleyverse and become a little more familiar with subsetting and basic data management. It is fundamental to using R, and StackOverflow is not always patient enough to answer baseline questions like this. This forum is best suited for those who have already done some research, read documentation and help pages, and tried some code, and only after that cannot figure out why it is not working. You provided some basic code with is good, but SO is not ideally suited to teach how to start with R.

R referring to dataframe columns by label to delete them [duplicate]

This question already has answers here:
How to drop columns by name in a data frame
(12 answers)
Closed 9 years ago.
An easy one I suppose though my searches have been pretty fruitless --
given
z=data.frame(X.39=rnorm(20),X.40=rnorm(20),X.51=rnorm(20))
the subsetting operation
z[,c('X.39','X.51')]
works. but
z[,-c('X.39','X.51')]
gives me
Error in -c("X.39", "X.51") : invalid argument to unary operator
why is that and how do I remove a set of columns using a list of column names?
EDIT
I know that I can always use
z[,!names(z) %in% c('X.39','X.51')]
but I'm looking for a lazier solution
EDIT2
Most of the discussion has been in the comment section but to close this off for good order, the gist of this is that a lazier solution (direct reference by name) is not possible. This appears to be designed in.
You could use setdiff function, but I can't say if its the most elegant solution:
z[, setdiff(names(z), c('X.39','X.51'))]

Resources