I have a csv file with headers in the form :
a,b,c,d
1,6,5,6,8
df <- read_csv("test.csv")
For some reason there's the value 1 in the example is incorrect and to correct the file, Id like to shift all the other values to the left and thus drop 1 but preserving the columns ending with :
a,b,c,d
6,5,6,8
How can I achieve that ?
What about this:
headers <- names(df)
new_df <- df[, 2:length(df)]
names(new_df) <- headers
In one line of code, the structure command creates an object and assigns attributes:
structure(df[,2:length(df)], names = names(df)[1:(length(df)-1)])
Recognizing that a data.frame is a list of equal-length vectors, where each vector represents a column, the following will also work:
structure(df[2:length(df)], names = names(df)[1:(length(df)-1)])
Note no comma in df[1:length(df)].
Also, I like the trick of removing items from a vector or list using a negative index. So I think an even cleaner bit of code is:
structure(df[-1], names = names(df)[-length(df)])
Related
enter image description here
I used to this
mydata3 <- data.frame(sapply(mydata2, as.integer))
But now I see that row names which is gene names, has been converted to number like 1-200). But I should point that same command I used sometime ago when it was working well. So I thought there are some problems with my file then i used old file on which this command was working but i am seeing same problem like gene name is converted in to number here is full script:
countsTable<-read.table("JW.txt",header=TRUE,stringsAsFactors=TRUE,row.names=1)
mydata2 <- countsTable/1000
mydata3 <- data.frame(sapply(mydata2, as.integer))
str(mydata3)
Please let me know.
sapply works over columns of your data.frame mydata2, and returns respective output per column. as such, it does not return the row-names of your data.frame, so you either have to re-assign those, or re-assign the new column data into your original data.frame, like:
mydata2[] <- sapply(mydata2, as.integer)
Thus you can keep all of the original attributes.
I would like to assign names to rows in R but so far I have only found ways to assign names to columns. My data is in two columns where the first column (geo) is assigned with the name of the specific location I'm investigating and the second column (skada) is the observed value at that specific location. To clarify, I want to be able to assign names for every location instead of just having them all in one .txt file so that the data is easier to work with. Anyone with more experience than me that knows how to handle this in R?
First you need to import the data to your global environment. Try the function read.table()
To name rows, try
(assuming your data.frame is named df):
rownames(df) <- df[, "geo"]
df <- df[, -1]
Well, your question is not that clear...
I assume you are trying to create a data.frame with named rows. If you look at the data.frame help you can see the parameter row.names description
NULL or a single integer or character string specifying a column to be used as row names, or a character or integer vector giving the row names for the data frame.
which means you can manually specify the row names when you create the data.frame or the column containing the names. The former can be achived as follows
d = data.frame(x=rnorm(10), # 10 random data normally distributed
y=rnorm(10), # 10 random data normally distributed
row.names=letters[1:10] # take the first 10 letters and use them as row header
)
while the latter is
d = data.frame(x=rnorm(10), # 10 random data normally distributed
y=rnorm(10), # 10 random data normally distributed
r=letters[1:10], # take the first 10 letters
row.names=3 # the column with the row headers is the 3rd
)
If you are reading the data from a file I will assume you are using the command read.table. Many of its parameters are the same of data.frame, in particular you will find that the row.headers parameter works the same way:
a vector of row names. This can be a vector giving the actual row names, or a single number giving the column of the table which contains the row names, or character string giving the name of the table column containing the row names.
Finally, if you have already read the data.frame and you want to change the row names, Pierre's answer is your solution
So I have to import some data into R and find it reasonably difficult.
I have multiple similar tables in a directory and would like to make a script looks for specific row (based on string not raw number) and add them to a new table.
Example data:
Table one:
name Johnny
registeration data 01012001
userid>= 47
table two:
name Jimmy
registeration data 02052005
userid>= 1972
What I want is a table that contains:
userid>= 47
userid>= 1972
Note: separated by tap..
What I tried to do is the following:
A: Create a list of files in the working directory:
list = (Sys.glob("*.table"))
B: created one a table using lappy:
table <- lapply(list, function(x) read.table(x, header = FALSE, sep = "\t", fill = TRUE))
C. Tried to grep the word "userid" (failed):
table[grep("userid", rownames("userid")), ]
Error: incorrect number of dimensions
Is there a 'simpler way' to fitch row of interest (userid>= in the example) based on string without relying on external packages? I can also think about using "grep userid *.table > newtable" in bash, but I want to use only R.
How about this (given that the rownames are in the first variable as in your example):
# list all tables in current directory, optionally recursively
tbls <- list.files(getwd(),'.table$') # if more dirs, maybe add recursive = TRUE
# create a list of tables
tbls_r <- lapply(tbls,function(x) read.table(x,header=FALSE,sep='\t',stringAsFactors = FALSE))
# using lapply to extract the row of interest
tbls_r <- lapply(tbls_r, function(x) x[x[,1] == 'userid>=',])
With using lapply, we apply a function to each element of the list (in this case each table). x references a single table, so with x[,1] == 'userid>=' I'm creating a logical vector (TRUE and FALSE) to see which values of the first column (indexed by x[,1] - note I'm leaving the first position empty as I want to index all rows but only the first column) are equal to the desired string.
I then use this logical vector right away, to index the table itself, returning only the rows which have a corresponding TRUE value in the vector).
# Bind the resulting rows to a single table
result_table <- do.call(rbind,tbls_r)
Hope that clears it up.
Edit:
If you just want to extract the values you can use this:
tbls_r <- sapply(tbls_r, function(x) x[x[,1] == 'userid>=',2])
In this case, I'm specifying during indexing, that I only want the column 2, leaving me with only the values.
Also I'm using sapply instead of lapply, which already returns a handy vector instead of a list. So no need for calling do.call.
If you then want a data.frame, just go with something along the lines of
res <- data.frame(UID = tbls_r,stringsAsFactors=FALSE)
Of course you can add more variables to this data.frame given they have the same length.
I am trying to name the columns of matrix from data from a vector.
suppose I have the following matrix:
A <- matrix(1:110, ncol=11)
and also a vector with 11 values from read.table:
code <- data1$code
I would like to do something like:
colnames(A)=data.frame(code)
to put the names of the columns using the values from the vector code
It will be far simpler just to pass code (or perhaps as.character(code), if it is a factor variable
colnames(A) <- as.character(code)
Passing a data.frame with one column will not work, as this has length =1 (the one column).
A data.frame is a list with two elements of the correct lengths to dimnames you could set both rownames and colnames at the same time.
colnames gives me the column names for a whole dataframe. Is there any way to get the name of one specified column. i would need this for naming labels when plotting data in ggplot.
So say my data is like this:
df1 <- data.frame(a=sample(1:50,10), b=sample(1:50,10), c=sample(1:50,10))
I would need something like paste(colnames(df1[,1])) which obviously won't work.
any ideas?
you call the name like this:
colnames(df1)[1]
# i.e. call the first element of colnames not colnames of the first vector
however by removing your comma e.g.:
colnames(df1[1])
you can also call the names, becauseusing only [x] not [,x] or [[x]] keeps the data.frame structure not reducing to a vector unlike $x and [,x]
names(df1)[1]
will give you the name of the first column. So too will
names(df1[1])
Neither uses a comma.
Would colnames(df1)[1] solve the problem?