Extract a row after a specific Name [duplicate] - r

This question already has an answer here:
r select values below and after a certain value in a dataframe
(1 answer)
Closed 5 years ago.
I have a problem with an unstructured text. I have a data frame made by a single column divided in multiple rows, that i won't show here for simplicity. I create a simple example to describe better what i am trying to do:
DATA
grey
blue
yellow
green
white
black
I need to extract the SINGLE row, after the one containing the word that i select.
Example, the word "blue" is my "topic"; i want to extract only the SINGLE row following it, obtaining "yellow".
How could i do?
Thank you for your future suggestions.

There is a little lack of information here but I'll explain what you can do in both the cases.
Case1
The column you have is itself is rownames of your database.
You can check by doing
row.names(dataframe)
In this case, just add one more column in your dataframe for row numbers and then you can search for your value and take the next data by giving the number.
Case2 When there is a column in which u have this data.
Then just do
a = row.name(df[df$col1=="blue",])
b= df[a+1,1]
b would be your yellow.
Let me know if this solves the problem

Related

How to I remove .. and all characters that follow in column names? [duplicate]

This question already has answers here:
Using explicitly numbered repetition instead of question mark, star and plus
(4 answers)
Closed 2 years ago.
I am working with a dataset(df) that is gene x cell line identifier. The gene names are annotated with an additional character string that I want to remove. For example SP1 is annotated SP1..6667. I want to remove the ..6667 to have the column names only SP1.
The following code worked to do this:
colnames(df) <- gsub("\\..*","",colnames(df)) # remove character string after gene name
The problem is that a few genes have a single . in their names and that I do not want to remove. For example HLA.A is labeled HLA.A..3105. I want to remove the ..3105 to give HLA.A but my current code removes .A..3105 to give HLA.
How can I modify my gsub function to specify .. instead of any . ?
All you need to do is alter the regex call like below:
colnames(df) <- gsub("\\.{2}.*","",colnames(df))
This tells it to start the substitution once it spots exactly two periods.

How do you the return column(s) number(s) based on class of said column? [duplicate]

This question already has answers here:
How to find all numeric columns in data
(2 answers)
Closed 4 years ago.
I have a list of 185 data-frames. I'm trying to edit them so each data frame only shows its numeric columns and also 2 specific, non-numeric ones.
I've had many issues with solving this, so I plan to use a for loop and find the column numbers of all numeric columns, use match to do the same for the two specific ones and then use c() to overwrite the data-frames.
I can pull the column number for the specific ones with
match("Device_Name",colnames(DFList$Dataframe))
successfully.
However, I cannot figure out how to return the numbers for all integer columns in a data-frame.
I have tried
match(is.numeric(colnames(DFList$Dataframe)),colnames(DFList$Dataframe))
and
match(class == "numeric",colnames(DFList$Dataframe),colnames(DFList$Dataframe))
to name a few, but now I am just taking wild stabs in the dark. Any advice would be welcome.
which(sapply(DFList$Dataframe,is.numeric))

how to view head of as.data.frame in R? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I have a huge data set with 20 columns and 20,000 rows, according to the manual of a program I use, we have to put the data as a data frame, though I'm not I understand what it does.. and I can't seem to view the head data frame I created.
I wrote in Bold the part that I don't understand, I'm very new with R, can a kind mind explain to me how the following works?
First I read the CSV file
vData = read.csv("my_matrix.csv");
1) Here we create the data frame as per the manual, what does -c(1:8) do exactly??
dataExpr0 = as.data.frame(t(vData[, -c(1:8)]))
2) Here, to understand what the above part does, I tried to view only the header of the data frame, with the following line, but it display the first 2 columns for the 20,000 rows of data. Is there a way to view only the first 2 rows?
head(dataExpr0, n = 2)
Let's disect what your call is doing, from the inside out.
Basic Indexing
When indexing a data.frame or matrix (assuming 2 dimensions), you access a single element of it with the square bracket notation, as you're seeing. For instance, to see the value in the fourth row, fifth column, you'd use vData[4,5]. This can work with ranges of rows and/or columns as well, such as vData[1:4,5] returning the first 4 rows and the 5th column as a vector.
Note: the range 1:4 can also be an arbitrary vector of numbers, such as vData[c(1,2,5),c(4,8)] which returns a 3 by 2 matrix.
BTW: by default, when the resulting slice/submatrix has one of its dimensions reduced to 1 (as in the latter example), R will drop it to the lower structure (e.g., matrix -> vector -> scalar). In this case, it will drop vData[1:4,5] to a vector. You can prevent this from happening by adding what appears to be a third dimension to the square brackets: vData[1:4,5,drop=FALSE], meaning "do not drop the simplified dimension". Now, you should get a matrix of 4 rows and 1 column in return.
You can read a much more thorough explanation of how to subset data.frames by reading (for example) some of the "Hadleyverse". If you do that, I highly encourage you to make it an interactive session: play in R as you read, to help cement the methods.
Negative Indexing
Negative indices mean "everything except what is listed". In your example, you are subsetting the data to extract everything except columns 1:8. So your vData[,-c(1:8)] is returning all rows and columns 9 through 20, a 20K by 12 matrix. Not small.
Transposition
You probably already know what t() does: transpose the matrix so that it is now 12 by 20K.
A word of warning: if all of your data.frame columns are of the same class (e.g., 'character', 'logical'), then all is fine. However, the fact that data.frames allow disparate types of data in different columns is not a feature shared by matrices. If one data.frame column is different than the others, they will be converted to the highest common format, e.g., logical < integer < numeric < character.
Back to a data.frame
After you transpose it (which converts to a matrix), you convert back to a data.frame, which may or may not be necessary depending on how to intend to deal with the data later. For instance, if the row names are not meaningful, then it may not be that useful to convert into a data.frame. That's relatively immaterial, but I'm a fan of not over-converting things. I'm also a fan of using the simpler data structure, and matrices are typically faster than data.frames.
Head
... merely gives you the top n rows of a data.frame or matrix. In your case, since you transposed it, it is now 20K columns wide, which may be a bit unwieldy on the command line.
Alternatives
Based on what I provided earlier, perhaps you just want to look at the top few rows and first few columns? dataExpr0[1:5,1:5] will work, as will (identically) head(dataExpr0[,1:5], n=5).
More Questions?
I strongly encourage you to read more of the Hadleyverse and become a little more familiar with subsetting and basic data management. It is fundamental to using R, and StackOverflow is not always patient enough to answer baseline questions like this. This forum is best suited for those who have already done some research, read documentation and help pages, and tried some code, and only after that cannot figure out why it is not working. You provided some basic code with is good, but SO is not ideally suited to teach how to start with R.

Is there a way to omit the first column when reading a csv [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Only read limited number of columns in R
I have a csv file that is quite large, and so I only want to read the data in R that is relevant. The csv file is 4 columns wide and a several million rows down. But the first column is unnecessary, (as it is a repeated string for every row).
Is there a way to only get the 2nd to 4th columns when reading in the csv file...(its easy enough to remove the original first column post reading it in...but was wondering if there was a more efficient way of doing this).
To expand on Joshua's comment:
data <- read.csv("data.csv",colClasses=c("NULL",NA,NA,NA))
"NULL" (note the quotes!) means skip the column, NA means that R chooses the appropriate data type for that column.

command to remove row from a data frame [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to delete a row in R
I can't figure out how to simply remove row (n) from a dataframe in R.
R's documentation and intro manual are so horribly written, they are virtually zero help on this very simple problem.
Also, every explanation i've found here/ on google is for removing rows that contain strings, or duplicates, etc, which have been excessively advanced for my problem and lead me to introduce more bugs and get nowhere. I just want to remove a row.
Thanks in advance for your help.
fyi the list is in the variable eld, which has 5 columns and 33 rows. I would like to remove row 14. I initialized eld with the following command
eld <- read.table("election2012.txt")
so my desired result is
eldNew <- eld(minus row 14)
eldNew <- eld[-14,]
See ?"[" for a start ...
For ‘[’-indexing only: ‘i’, ‘j’, ‘...’ can be logical
vectors, indicating elements/slices to select. Such vectors
are recycled if necessary to match the corresponding extent.
‘i’, ‘j’, ‘...’ can also be negative integers, indicating
elements/slices to leave out of the selection.
(emphasis added)
edit: looking around I notice
How to delete the first row of a dataframe in R? , which has the answer ... seems like the title should have popped to your attention if you were looking for answers on SO?
edit 2: I also found How do I delete rows in a data frame? , searching SO for delete row data frame ...
Also http://rwiki.sciviews.org/doku.php?id=tips:data-frames:remove_rows_data_frame

Resources