I have a question about R. If I havr a date like this
# name : Matrix
col1. col2
row1
row2
How can read the name of the subset of data from header and make a frame of same name in R. So that I can use it by the name Matrix.
And if I needed to read number of rows as specified in the header. What should I do. For example if this was the header, I would like to read 2 columns and 2 rows and name this frame Matrix
# name : Matrix
# row: 2
# col: 2
col1. col2
row1
row2
Thank you all.
Related
I have dozens of very heavy Excel files that I need to import into R (then rebind). Each file has 2 sheets, where the second sheet (name: "Results") consists of 100K rows at least and has about 350 columns.
I would like to read a subset of the sheet "Results" from each file by columns, but most importantly, by specific rows. Each "ID" in the data, has a main row and then multiple rows below which contain data in specific columns. I would like to read the main row only (this leaves each file with 50-400 rows (depending on the file) and 150 variables). The first column that numbers main rows does not have a header.
This is what the data looks like (simplified):
I would like to import only the rows whose first column isn't empty but numbered (i.e., 1., 13., 34., 211.) and particular columns, in this example columns 2,3,5 (i.e., name, ID, status). The desired output would be:
Is there a simple way to do this?
Let's say a is our excel file, as data frame.
library(readxl)
a <- as.data.frame(read_excel("Pattern/File.xlsx",sheet = "Results"))
For instance, we want to select columns 1 to 3, so use
subset(a[,1:3],is.na(a[1])==FALSE)
By this function, you are subsetting the input data frame with values different than NA in first column.
Output:
...1 name ID
1 1 Dan us1d
4 13 Nev sa2e
6 34 Sam il5a
Note first column name (" ...1 "). This is autogenerated by read_excel() function, but should not be a problem.
This question already has answers here:
filling in columns with matching IDs from two dataframes in R
(2 answers)
Simple lookup to insert values in an R data frame
(5 answers)
Closed 1 year ago.
Relatively new to R...
I am trying to create a column in a data frame (df1, that currently has a single column), in which each new value will be determined based on the value in the existing column - by referring to another (reference) data frame (df2) that has two columns and is effectively like a hash. I was trying to avoid making an actual has because that doesn't seem to be the done thing in R.
So, the reference dataframe df2 looks like this:
col1 col2
A 71
R 156
N 114
D 115
...
The values in col1 only occur once each in the column
The data frame df1 that I'm working on might look like this (for example):
D
D
R
A
N
A
D
...
So, for each row in df1, I'd like to create a new column where the script takes the col1 value from df1, looks up df2, finds the matching value in col1, and then takes the corresponding value from col2 and places it in the new column in df1. So, if it worked, I'd end up with df1 looking like this:
D 115
D 115
R 156
A 71
N 114
A 71
D 115
...
This question already has answers here:
What is about the first column in R's dataset mtcars?
(4 answers)
Closed 3 years ago.
I have xy data for gene expression in multiple samples. I wish to subset the first column so I can order the genes alphabetically and perform some other filtering.
> setwd("C:/Users/Will/Desktop/BIOL3063/R code assignment");
> df = read.csv('R-assignments-dataset.csv', stringsAsFactors = FALSE);
Here is a simplified example of the dataset I'm working with, it has 270 columns (tissue samples) and 7065 rows (gene names).
The first column is a list of gene names (A2M, AAAS, AACS etc.) and each column is a different tissue sample, thus showing the gene expression in each tissue sample.
The question being asked is "Sort the gene names alpahabetically (A-Z) and print out the first 20 gene names"
My thought process would be to subset the first column (gene names) and then perform order() to sort alphabetically, after which I can use head() to print the first 20.
However when I try
> genes <- df[1]
It simply subsets the first column that has data in it (TCGA-A6-2672_TissueA) rather than the one to its left.
Also
> genes <- df[,df$col1];
> genes;
data frame with 0 columns and 7065 rows
> order(genes);
integer(0)
Appears to create a list of gene names in R studio's viewer but I cannot perform any manipulation on it.
I am unable to correctly locate the first column in the data.frame, since it does not have a column header, and I also have the same problem when doing the same thing with row 1 (sample names) as well.
I'm a complete novice at R and this is part of an assignment I'm working on, it seems I'm missing something fundamental but I can not figure out what.
Cheers guys
Please include a sample of your text file as text instead of an image.
I have created a dataset similar to yours:
X Y
1 a b
2 c d
3 d g
Note that your tissue columns have a header but your gene names do not. Therefore these will be interpreted as rownames, see ?read.table:
If row.names is not specified and the header line has one less entry
than the number of columns, the first column is taken to be the row
names.
Reading it in R:
df <- read.table(text = ' X Y
1 a b
2 c d
3 d g')
So your gene names are not at df[1] but instead in rownames(df), so to get these genes <- rownames(df) or to add these to the existing df you can use df$gene <- rownames(df)
There are numerous ways to convert your row names to a column see for example this question.
If you are asking what I think you are asking, you just need to subset inside the as.data.frame function, which will auto-generate a "header", as you call it. It will be called V1, the first variable of your new data frame.
genes <- as.data.frame(df[,1])
genes$V1
1 A
2 C
3 A
4 B
5 C
6 D
7 A
8 B
As per the comment below, the issue could be avoided if you remove the comma from your subsetting syntax. When you select columns from a data.frame, you only need to index the column, not the rows.
genes <- df[1]
I am trying to subset/filter in a data frame.
Col1 Col2
A 23454,34543
B 23456
C 34543,34532
I want to subset and get Row B alone example : B 23456, based on the length of Col2 either as grep or some other function.
I'm beginning for r programming language, help me how to add list of value in data frame columns.
my expected data frame will be.
U_ID Value
1 list(`First`="ty",'Second'="89")
2 list(`First`= c("20","10","40"),`Second`="user")
3 list(`First`="vendor",`Second`="yu",`Four`=list(list(`ty`="78",'pt'="kkkpp")))
4 NULL
5 list(`First`="client")