I want to delete the header from a dataframe that I have. I read in the data from a csv file then I transposed it, but it created a new header that is the name of the file and the row that the data is from in the file.
Here's an example for a dataframe df:
a.csv.1 a.csv.2 a.csv.3 ...
x 5 6 1 ...
y 2 3 2 ...
I want to delete the a.csv.n row, but when I try df <- df[-1,] it deletes row x and not the top.
If you really, really, really don't like column names, you may convert your data frame to a matrix (keeping possible coercion of variables of different class in mind), and then remove the dimnames.
dd <- data.frame(x1 = 1:5, x2 = 11:15)
mm1 <- as.matrix(dd)
mm2 <- matrix(mm1, ncol = ncol(dd), dimnames = NULL)
I add my previous comment here as well:
?data.frame: "The column names should be non-empty, and attempts to use empty names will have unsupported results.".
Set names to NULL
names(df) <- NULL
You can also use the header option in read.csv
You can use names(df) to change the names of header or col names. If newnames is a list of names as newname<-list("col1","col2","col3"), then names(df)<-newname will give you a data with col names as col1 col2 col3.
As # Henrik said, the col names should be non-empty. Setting the names(df)<-NULLwill give NA in col names.
If your data is csv file and if you use header=TRUE to read the data in R then the data will have same colnames as csv file, but if you set the header=FALSE, R will assign the colnames as V1,V2,...and your colnames in the original csv file appear as a first row.
anydata.csv
a b c d
1 1 2 3 13
2 2 3 1 21
read.csv("anydata.csv",header=TRUE)
a b c d
1 1 2 3 13
2 2 3 1 21
read.csv("anydata.csv",header=FALSE)
V1 V2 V3 V4
1 a b c d
2 1 2 3 13
3 2 3 1 21
You could use
setNames(dat, rep(" ", length(dat)))
where dat is the name of the data frame. Then all columns will have the name " " and hence will be 'invisible'.
It comes with some years of delay but you can simply use a vector renaming de columns:
## if you want to delete all column names:
colnames(df)[] <- ""
## if you want to delete let's say column 1:
colnames(df)[1] <- ""
## if you want to delete 1 to 3 and 7:
colnames(df)[c(1:3,7)] <- ""
As already mentioned not having column names just isn't something that is going to happen with a data frame, but I'm kind of guessing that you don't care so much if they are there you just don't want to see them when you print your data frame? If so, you can write a new print function to get around that, like so:
> dat <- data.frame(var1=c("A","B","C"),var2=rnorm(3),var3=rnorm(3))
> print(dat)
var1 var2 var3
1 A 1.2771777 -0.5726623
2 B -1.5000047 1.3249348
3 C 0.1989117 -1.4016253
> ncol.print <- function(dat) print(matrix(as.matrix(dat),ncol=ncol(dat),dimnames=NULL),quote=F)
> ncol.print(dat)
[,1] [,2] [,3]
[1,] A 1.2771777 -0.5726623
[2,] B -1.5000047 1.3249348
[3,] C 0.1989117 -1.4016253
Your other option it set your variable names to unique amounts of whitespace, for example:
> names(dat) <- c(" ", " ", " ")
> dat
1 A 1.2771777 -0.5726623
2 B -1.5000047 1.3249348
3 C 0.1989117 -1.4016253
You can also write a function do this:
> blank.names <- function(dat){
+ for(i in 1:ncol(dat)){
+ names(dat)[i] <- paste(rep(" ",i),collapse="")
+ }
+ return(dat)
+ }
> dat <- data.frame(var1=c("A","B","C"),var2=rnorm(3),var3=rnorm(3))
> dat
var1 var2 var3
1 A -1.01230289 1.2740237
2 B -0.13855777 0.4689117
3 C -0.09703034 -0.4321877
> blank.names(dat)
1 A -1.01230289 1.2740237
2 B -0.13855777 0.4689117
3 C -0.09703034 -0.4321877
But generally I don't think any of this should be done.
A function that I use in one of my R scripts:
read_matrix <- function (csvfile) {
a <- read.csv(csvfile, header=FALSE)
matrix(as.matrix(a), ncol=ncol(a), dimnames=NULL)
}
How to call this:
iops_even <- read_matrix('even_iops_Jan15.csv')
iops_odd <- read_matrix('odd_iops_Jan15.csv')
You can simply do:
print(df.to_string(header=False))
if you want to remove the line indexes as well, you can do:
print(df.to_string(index=False,header=False))
Related
I want to replace column name by referring to a table.
Below is my question.
data <- read.table(textConnection("
a b c d e
row1 1 2 3 4 5
"), header = TRUE)
Newtitle <- read.table(textConnection("
id id2
a kitty
d oren
g dyron
"), header = TRUE)
If the Newtitle$id match with column name in data,
then I want to replace data's column name by Newtitle$id2, otherwise just keep the original column name.
kitty b c oren e
row1 1 2 3 4 5
Any hints please?
Need to be careful with the difference between factors and characters.
Newtitle$id <- as.character(Newtitle$id)
Newtitle$id2 <- as.character(Newtitle$id2)
rownames(Newtitle) <- Newtitle$id
replaced <- names(data) %in% Newtitle$id
names(data)[replaced] <- Newtitle[names(data)[replaced], "id2"]
Three text files are in the same directory ("data001.txt", "data002.txt", "data003.txt"). I write a loop to read each data file and generate three data tables;
for(i in files) {
x <- read.delim(i, header = F, sep = "\t", na = "*")
setnames(x, 2, i)
assign(i,x)
}
So let's say each individual table looks something like this:
var1 var2 var3
row1 2 1 3
I've used rbind to combine all of the tables...
combined <- do.call(rbind, mget(ls(pattern="^data")))
and get something like this:
var1 var2 var3
row1 2 1 3
var1 var2 var3
row1 3 2 4
var1 var2 var3
row1 1 3 5
leaving me with superfluous column names. At the moment I can get around this by just deleting that specific row containing the column names, but it's a bit clunky.
colnames(combined) = combined[1, ] # make the first row the column names
combined <- combined[-1, ] # delete the now-unnecessary first row
toDelete <- seq(1, nrow(combined), 2) # define which rows to be deleted i.e. every second odd row
combined <- combined[ toDelete ,] # delete them suckaz
This does give me what I want...
var1 var2 var3
row1 2 1 3
row1 3 2 4
row1 1 3 5
But I feel like a better way would simply be to extract the values of "row1" as a vector or as a list or whatever, and combine them all together into one data table. I feel like there is a quick and easy way to do this but I haven't been able to find anything yet. I've had a look here and here and here.
One possibility is to take the second row (that I want), and convert it into a matrix (then transpose it to make it a row instead of column!?) and rbind:
data001.txt <- as.matrix(data001.txt[2,])
data001.txt <- t(data001.txt)
combined <- rbind(data001.txt, data002.txt)
This gives me more or less what I want except without the column name headers (e.g. va1, var2, var3).
v1 v2 v3
2 1 3
3 2 4
Any ideas? Would this second method work well if there is some way to add the column names? I feel like it's less clunky than the first method. Thanks for any input :)
edit - solved in answer below.
Figured it out. Converting to data matrix and using set.names from data.table package required. Say you have a range of text data files like the one that follows, and you want to extract just the seventh column (the one with the numbers, not letters), and combine them together in their own data table including the row names:
chemical1 a b c d e 1 g h i j k l m
chemical2 a b c d e 2 g h i j k l m
chemical3 a b c d e 3 g h i j k l m
chemical4 a b c d e 4 g h i j k l m
chemical5 a b c d e 5 g h i j k l m
setting row.names = 1 and header = F.
setwd("directory")
files <- list.files(pattern = "data") # take all files with 'data' in their name
for(i in files) {
x <- read.delim(i, row.names = 1, header = F, sep = "\t", na = "*")
setnames(x, 6, i) # if the data you want is in column six. Sets data file name as the column name.
x <- as.matrix(x[6]) # just take the sixth column with the numeric data (delete everything else)
x <- t(x) # transform (if you want..)
assign(i,x)
}
combined <- do.call(rbind, mget(ls(pattern="^data"))) # combine the data matrices into one table
write.table(combined, file="filename.csv", sep=",", row.names=T, col.names = NA)
I am trying to read just one column of data in R. I know that the shortcut to do it is to do something like (assuming d1 is a data frame): d1[[3]] to read the third column. However, I'm just curious how would this simple function look like if you used read function instead? How would you make it a vector rather than a truncated data frame?
Here's an example of reading just one column from a .csv file
dat <- data.frame(a = letters[1:3], b = LETTERS[1:3], c = 1:3, d = 3:1)
dat
a b c d
1 a A 1 3
2 b B 2 2
3 c C 3 1
# write dat to a csv file
write.csv(dat,file="mydata.csv")
# scan the first row only from the file
firstrow <- scan("mydata.csv", sep=",", what=character(0), nlines=1)
# which position has the desired column (header = b in this cases)
col.pos <- match("b", firstrow)
# number of columns in data
nc <- length(firstrow)
# default of NA for desired column b; NULL for others
colClasses <- replace(rep("NULL", nc), col.pos, NA)
# read just column b
cols.b <- read.csv("mydata.csv", colClasses = colClasses)
cols.b
b
1 A
2 B
3 C
The above reads in a data frame. If you want to read a vector,
cols.b <- read.csv("mydata.csv", colClasses = colClasses)[, 1]
cols.b
[1] A B C
Levels: A B C
I'm new in R and I don't know how exacly adding row in data frame.
I add two vectors:
b=c("one","lala",1)
d=c("two","lele",2)
I want add this to data.frame called a.
a<-rbind(a,b)
now I have one correct row
A B C
1 one lala 1
next I add
a<-rbind(a,d)
and result is:
A B C
1 one lala 1
2 NA NA NA
and console write me warning messages: invalid factor level, NA generated.
What I do wrong or what is better simple way to add new line.
But I don't want in start create full data.frame. I want adding lines.
When you do
c("one","lala",1)
this creates a vector of strings. The 1 is converted to character type,
so that all elements in the vector are the same type.
Then rbind(a,b) will try to combine a which is a data frame and b
which is a character vector and this is not what you want.
The way to do this is using rbind with data frame objects.
a <- NULL
b <- data.frame(A="one", B="lala", C=1)
d <- data.frame(A="two", B="lele", C=2)
a <- rbind(a, b)
a <- rbind(a, d)
Now we can see that the columns in data frame a are the proper type.
> lapply(a, class)
$A
[1] "factor"
$B
[1] "factor"
$C
[1] "numeric"
>
Notice that you must name the columns when you create the different data
frame, otherwise rbind will fail. If you do
b <- data.frame("one", "lala", 1)
d <- data.frame("two", "lele", 2)
then
> rbind(b, d)
Error in match.names(clabs, names(xi)) :
names do not match previous names
You need to add stringsAsFactors = FALSE to the BOTH the data.frame() function and the rbind() function.
In some versions of R, but not others, rbind() will automatically convert strings to factors. For example, in R version 3.6.2, rbind will do the factor conversion automatically even if the global setting is options(stringsAsFactors = FALSE). This is not the case in R version 4.0.4, and so stringsAsFactors = FALSE does not need to be added to the rbind() statement in version 4.0.4.
Just one more point, as i test, for example:
we have a data frame as df, and 6 columns as below, when try to use rbind to combine
the 2nd line to the 1st line,
df <- data.frame()
df <- rbind(df, row1)
df <- rbind(df, row2)
it will happen like this
col1 col2 col3 col4 col5 col6
1 1 1 Pilot Greg Andy Dwyer 95.00
2 1 1 Pilot Greg NA 92.00
As i test, set stringsAsFactors = FALSE not only should happen on initialize df dataframe, but also should apply to rbind function, after converting to below:
df <- data.frame(stringsAsFactors = FALSE)
df <- rbind(df, row1, stringsAsFactors = FALSE)
df <- rbind(df, row2, stringsAsFactors = FALSE)
it works fine
col1 col2 col3 col4 col5 col6
1 1 1 Pilot Greg Andy Dwyer 95.00
2 1 1 Pilot Greg Audi Sier 92.00
I have a dataset where one of the columns are only "#" sign. I used the following code to remove this column.
ia <- as.data.frame(sapply(ia,gsub,pattern="#",replacement=""))
However, after this operation, one of the integer column I had changed to factor.
I wonder what happened and how can i avoid that. Appreciate it.
A more correct version of your code might be something like this:
d <- data.frame(x = as.character(1:5),y = c("a","b","#","c","d"))
> d[] <- lapply(d,gsub,pattern = "#",replace = "")
> d
x y
1 1 a
2 2 b
3 3
4 4 c
5 5 d
But as you'll note, this approach will never actually remove the offending column. It's just replacing the # values with empty character strings. To remove a column of all # you might do something like this:
d <- data.frame(x = as.character(1:5),
y = c("a","b","#","c","d"),
z = rep("#",5))
> d[,!sapply(d,function(x) all(x == "#"))]
x y
1 1 a
2 2 b
3 3 #
4 4 c
5 5 d
Surely if you want to remove an offending column from a data frame, and you know which column it is, you can just subset. So, if it's the first column:
df <- df[,-1]
If it's a later column, increment up.