I am new to R. I am trying to use the "write.csv" command to write a csv file in R. Unfortunately, when I do this, the resulting data frame produces colnames with a prefix X in it eventhough the file already has a column name.
It produces, X_name1 ,X_name2
Please kindly tell me your suggestions
I have added an example code similar to my data.
a<- c("1","2")
b <- c("3","4")
df <- rbind(a,b)
df <- as.data.frame(df)
names(df) <- c("11_a","12_b")
write.csv(df,"mydf.csv")
a <- read.csv("mydf.csv")
a
#Result
X X11_a X12_b
1 a 1 2
2 b 3 4
All I need is to have only "11_a" and "12_b" as column names. But it incudes prefix X also.
Use check.names=FALSE when reading your data back in - names starting with numbers are not generally acceptable in R:
read.csv(text="11_a,12_b
a,1,2
b,3,4", check.names=FALSE)
# 11_a 12_b
#a 1 2
#b 3 4
read.csv(text="11_a,12_b
a,1,2
b,3,4", check.names=TRUE)
# X11_a X12_b
#a 1 2
#b 3 4
All you have to do is add header=TRUE to your code when you read in the .csv file. It would look like:
a <- read.csv("mydf.csv", header=TRUE)
Related
I have a dataframe that looks like this, and would like to convert the left-most column into an actual index, with the label "id".
I tried to rename it with colnames<- but it didn't work, as ncol() only returns 2.
I tried to export it as a csv file, but index was similarly not captured in the file.
There are a couple ways to do this. Firstly, the method you used to write to csv you can also include the row.names = TRUE parameter in the write.csv function. Otherwise, the following commands also work:
library(tibble)
df <- tibble::rownames_to_column(df, "VALUE")
(taken from Convert row names into first column)
Example data
d <- data.frame(layer=2:4, gridpop=c(1000,2000,3000))
rownames(d) <- c("A", "B", "C")
d
# layer gridpop
#A 2 1000
#B 3 2000
#C 4 3000
Solution
d$id <- rownames(d)
d
# layer gridpop id
#A 2 1000 A
#B 3 2000 B
#C 4 3000 C
Or if the row names represent numbers
d$id <- as.numeric(rownames(d))
I have this problem but in r:
How can I read a double-semicolon-separated .csv with quoted values using pandas?
The solution there is to drop the additional columns generated. I'd like to know if there's a way to read the file separated by ;; without generating those addiotional columns.
Thanks!
Read it in normally using read.csv2 (or whichever variant you prefer, including read.table, read.delim, readr::read_csv2, data.table::fread, etc), and then remove the even-numbered columns.
dat <- read.csv2(text = "a;;b;;c;;d\n1;;2;;3;;4")
dat
# a X b X.1 c X.2 d
# 1 1 NA 2 NA 3 NA 4
dat[,-seq(2, ncol(dat), by = 2)]
# a b c d
# 1 1 2 3 4
It is usually recommended to properly clean your data before attempting to parse it, instead of cleaning it WHILE parsing, or worse, AFTER. Either use Notepad++ to Replace all ;; occurences or R itself, but do not delete the original files (also a rule of thumb - never delete sources of data).
my.text <- readLines('d:/tmp/readdelim-r.csv')
cleaned <- gsub(';;', ';', my.text)
writeLines(cleaned, 'd:/tmp/cleaned.csv')
my.cleaned <- read.delim('d:/tmp/cleaned.csv', header=FALSE, sep=';')
I am trying to train a data that's converted from a document term matrix to a dataframe. There are separate fields for the positive and negative comments, so I wanted to add a string to the column names to serve as a "tag", to differentiate the same word coming from the different fields - for example, the word hello can appear both in the positive and negative comment fields (and thus, represented as a column in my dataframe), so in my model, I want to differentiate these by making the column names positive_hello and negative_hello.
I am looking for a way to rename columns in such a way that a specific string will be appended to all columns in the dataframe. Say, for mtcars, I want to rename all of the columns to have "_sample" at the end, so that the column names would become mpg_sample, cyl_sample, disp_sample and so on, which were originally mpg, cyl, and disp.
I'm considering using sapplyor lapply, but I haven't had any progress on it. Any help would be greatly appreciated.
Use colnames and paste0 functions:
df = data.frame(x = 1:2, y = 2:1)
colnames(df)
[1] "x" "y"
colnames(df) <- paste0('tag_', colnames(df))
colnames(df)
[1] "tag_x" "tag_y"
If you want to prefix each item in a column with a string, you can use paste():
# Generate sample data
df <- data.frame(good=letters, bad=LETTERS)
# Use the paste() function to append the same word to each item in a column
df$good2 <- paste('positive', df$good, sep='_')
df$bad2 <- paste('negative', df$bad, sep='_')
# Look at the results
head(df)
good bad good2 bad2
1 a A positive_a negative_A
2 b B positive_b negative_B
3 c C positive_c negative_C
4 d D positive_d negative_D
5 e E positive_e negative_E
6 f F positive_f negative_F
Edit:
Looks like I misunderstood the question. But you can rename columns in a similar way:
colnames(df) <- paste(colnames(df), 'sample', sep='_')
colnames(df)
[1] "good_sample" "bad_sample" "good2_sample" "bad2_sample"
Or to rename one specific column (column one, in this case):
colnames(df)[1] <- paste('prefix', colnames(df)[1], sep='_')
colnames(df)
[1] "prefix_good_sample" "bad_sample" "good2_sample" "bad2_sample"
You can use setnames from the data.table package, it doesn't create any copy of your data.
library(data.table)
df <- data.frame(a=c(1,2),b=c(3,4))
# a b
# 1 1 3
# 2 2 4
setnames(df,paste0(names(df),"_tag"))
print(df)
# a_tag b_tag
# 1 1 3
# 2 2 4
My goal is to be able to allocate column names to a data frame that I create based on a passed variable. For instance:
i='column1'
data.frame(i=1)
i
1 1
Above the column name is 'i' when I want it to be 'column1'. I know the following works but isn't as efficient as I'd like:
i='column1'
df<-data.frame(x=1)
setnames(df,i)
column1
1 1
It's good to learn how base R works this way:
i <- 'cloumn1'
df <- `names<-`(data.frame(1), i)
df
# cloumn1
#1 1
Aside from the answers posted by other users, I think you may be stuck with the solution you've already presented. If you already have a data frame with the intended number of rows, you can add a new column using brackets:
df <- data.frame('column1'=1)
i <- 'column2'
df[[i]] <- 2
df
column1 column2
1 2
If the idea is to get rid of the setNames, you would probably never do this but
i <- 'column1'
data.frame(`attr<-`(list(1), "names", i))
# column1
# 1 1
You can see in data.frame, it has the code
x <- list(...)
vnames <- names(x)
so, you can mess with the name attribute.
Not exactly sure how you want it more efficient but you could add all the column names at once after your data frame has been assembled with colnames. Here's an example based on yours.
data.frame(Td)
a b
1 1 4
2 1 5
nam<-c("Test1","Test2")
colnames(Td)<-nam
data.frame(Td)
Test1 Test2
1 1 4
2 1 5
You could simply pass the name of your column variable and its values as arguments to a dataframe, without adding more lines:
df <- data.frame(column1=1)
df
# column1
#1 1
I have, for example, this three datasets (in my case, they are many more and with a lot of variables):
data_frame1 <- data.frame(a=c(1,5,3,3,2), b=c(3,6,1,5,5), c=c(4,4,1,9,2))
data_frame2 <- data.frame(a=c(6,0,9,1,2), b=c(2,7,2,2,1), c=c(8,4,1,9,2))
data_frame2 <- data.frame(a=c(0,0,1,5,1), b=c(4,1,9,2,3), c=c(2,9,7,1,1))
on each data frame I want to add a variable resulting from a transformation of an existing variable on that data frame. I would to do this by a loop. For example:
datasets <- c("data_frame1","data_frame2","data_frame3")
vars <- c("a","b","c")
for (i in datasets){
for (j in vars){
# here I need a code that create a new variable with transformed values
# I thought this would work, but it didn't...
get(i)$new_var <- log(get(i)[,j])
}
}
Do you have some valid suggestions about that?
Moreover, it would be great for me if it were possible also to assign the new column names (in this case new_var) by a character string, so I could create the new variables by another for loop nested in the other two.
Hope I've not been too tangled in explain my problem.
Thanks in advance.
You can put your dataframes in a list and use lapply to process them one by one. So no need to use a loop in this case.
For example you can do this :
data_frame1 <- data.frame(a=c(1,5,3,3,2), b=c(3,6,1,5,5), c=c(4,4,1,9,2))
data_frame2 <- data.frame(a=c(6,0,9,1,2), b=c(2,7,2,2,1), c=c(8,4,1,9,2))
data_frame3 <- data.frame(a=c(0,0,1,5,1), b=c(4,1,9,2,3), c=c(2,9,7,1,1))
ll <- list(data_frame1,data_frame2,data_frame3)
lapply(ll,function(df){
df$log_a <- log(df$a) ## new column with the log a
df$tans_col <- df$a+df$b+df$c ## new column with sums of some columns or any other
## transformation
### .....
df
})
the dataframe1 becomes :
[[1]]
a b c log_a tans_col
1 1 3 4 0.0000000 8
2 5 6 4 1.6094379 15
3 3 1 1 1.0986123 5
4 3 5 9 1.0986123 17
5 2 5 2 0.6931472 9
I had the same need and wanted to change also the columns in my actual list of dataframes.
I found a great method here (the purrr::map2 method in the question works for dataframes with different columns), followed by
list2env(list_of_dataframes ,.GlobalEnv)