enter image description here
I used to this
mydata3 <- data.frame(sapply(mydata2, as.integer))
But now I see that row names which is gene names, has been converted to number like 1-200). But I should point that same command I used sometime ago when it was working well. So I thought there are some problems with my file then i used old file on which this command was working but i am seeing same problem like gene name is converted in to number here is full script:
countsTable<-read.table("JW.txt",header=TRUE,stringsAsFactors=TRUE,row.names=1)
mydata2 <- countsTable/1000
mydata3 <- data.frame(sapply(mydata2, as.integer))
str(mydata3)
Please let me know.
sapply works over columns of your data.frame mydata2, and returns respective output per column. as such, it does not return the row-names of your data.frame, so you either have to re-assign those, or re-assign the new column data into your original data.frame, like:
mydata2[] <- sapply(mydata2, as.integer)
Thus you can keep all of the original attributes.
Related
I have a csv file with headers in the form :
a,b,c,d
1,6,5,6,8
df <- read_csv("test.csv")
For some reason there's the value 1 in the example is incorrect and to correct the file, Id like to shift all the other values to the left and thus drop 1 but preserving the columns ending with :
a,b,c,d
6,5,6,8
How can I achieve that ?
What about this:
headers <- names(df)
new_df <- df[, 2:length(df)]
names(new_df) <- headers
In one line of code, the structure command creates an object and assigns attributes:
structure(df[,2:length(df)], names = names(df)[1:(length(df)-1)])
Recognizing that a data.frame is a list of equal-length vectors, where each vector represents a column, the following will also work:
structure(df[2:length(df)], names = names(df)[1:(length(df)-1)])
Note no comma in df[1:length(df)].
Also, I like the trick of removing items from a vector or list using a negative index. So I think an even cleaner bit of code is:
structure(df[-1], names = names(df)[-length(df)])
I would like to assign names to rows in R but so far I have only found ways to assign names to columns. My data is in two columns where the first column (geo) is assigned with the name of the specific location I'm investigating and the second column (skada) is the observed value at that specific location. To clarify, I want to be able to assign names for every location instead of just having them all in one .txt file so that the data is easier to work with. Anyone with more experience than me that knows how to handle this in R?
First you need to import the data to your global environment. Try the function read.table()
To name rows, try
(assuming your data.frame is named df):
rownames(df) <- df[, "geo"]
df <- df[, -1]
Well, your question is not that clear...
I assume you are trying to create a data.frame with named rows. If you look at the data.frame help you can see the parameter row.names description
NULL or a single integer or character string specifying a column to be used as row names, or a character or integer vector giving the row names for the data frame.
which means you can manually specify the row names when you create the data.frame or the column containing the names. The former can be achived as follows
d = data.frame(x=rnorm(10), # 10 random data normally distributed
y=rnorm(10), # 10 random data normally distributed
row.names=letters[1:10] # take the first 10 letters and use them as row header
)
while the latter is
d = data.frame(x=rnorm(10), # 10 random data normally distributed
y=rnorm(10), # 10 random data normally distributed
r=letters[1:10], # take the first 10 letters
row.names=3 # the column with the row headers is the 3rd
)
If you are reading the data from a file I will assume you are using the command read.table. Many of its parameters are the same of data.frame, in particular you will find that the row.headers parameter works the same way:
a vector of row names. This can be a vector giving the actual row names, or a single number giving the column of the table which contains the row names, or character string giving the name of the table column containing the row names.
Finally, if you have already read the data.frame and you want to change the row names, Pierre's answer is your solution
I am trying to remove endings from sample names in my data frame. There are about 200 samples so I was hoping there was a way to end the name before the first - (common to each sample).
Examples of names are:
Glyc.1.20C.1wk-ATGGTTCACCCG-CATCAGTACGCC-R1.fastq
Glyc.1.20C.2m-CACTACGCTAGA-GTTCCTCCATTA-R1.fastq
Glyc.1.20C.2wk-GCTCGAAGATTC-CGAGGGAAAGTC-R1.fastq
Glyc.1.20C.3m-GTAGGTGCTTAC-GCATAAACGACT-R1.fastq
Using the change colnames(x) <- c("Glyc.1.20C.1wk, etc) would take me forever.
Any ideas?
If df is your dataframe, take the names, remove everything after the first -, and reset the names to the new short values...
names(df) <- gsub("\\-.+","",names(df))
I am attempting to automate a simple process of importing some data and using the spread function from the tidyr package to make it wide format data.
Below is a simplified example
Ticker <- c(rep("GOOG",5), rep("AAPL",5))
Prices <- rnorm(10, 95, 5)
Date <- rep(sapply(c("2015-01-01", "2015-01-02", "2015-01-03", "2015-01-04", "2015-01-05"),as.Date), 2)
exStockData <- data.frame(Ticker, Date, Prices)
After reading in a data frame like exStockData, I'd like to be able to create a data frame like the one below
library(tidyr)
#this is the data frame I'd like to be able to create
desiredDataFrame <- spread(exStockData, Ticker, Prices)
However, the column used for the key argument of the spread function will not always be called Ticker and the column used for the value argument of the function will not always be called Prices. The column names are read in from a different portion of the file that gets imported.
#these vectors are removed because the way my text file is read in
#I don't actually have these vectors
rm(Ticker, Prices, Date)
#the name of the first column (which serves as the key in
#the spread function) of the exStockData data frame will
#vary, and is read in from the file and stored as a one
#element character vector
secID <- "Ticker"
#the name of the last column in the data frame
#(which serves as the value in the spread function)
#is stored also stored as a one element character vector
fields <- "Prices"
#I'd like to be able to dynamically specify the column
#names using these other character vectors
givesAnError <- spread(exStockData, get(secID), get(fields))
The "See also" section of the documentation for the spread function mentions the spread_ function which is intended to be used in this situation.
In this case the solution is to use:
solved <- spread_(exstockData, secID, fields)
I've been trying to create a dataframe from my original dataframe, where rows in the new dataframe would represent mean of every 20 rows of the old dataframe. I discovered a function called colMeans, which does the job pretty well, the only problem, which still persists is how to change that vector of results back to dataframe, which can be further analysed.
my code for colMeans: (matrix1 in my original dataframe converted to matrix, this was the only way I managed to get it to work)
a<-colMeans(matrix(matrix1, nrow=20));
But here I get the numeric sequence, which has all the results concatenated in one single column(if I try for example as.data.frame(a)). How am I supposed to get this result back into dataframe where each column includes only the results for specific column name and not all the averages.
I hope my question is clear, thanks for help.
Based on the methods('as.data.frame'), as.data.frame.list is an option to convert each element of a vector to columns of a data.frame
as.data.frame.list(a)
data
m1 <- matrix(1:20, ncol=4, dimnames=list(NULL, paste0('V', 1:4)))
a <- colMeans(m1)