I have a dataframe that looks like this, and would like to convert the left-most column into an actual index, with the label "id".
I tried to rename it with colnames<- but it didn't work, as ncol() only returns 2.
I tried to export it as a csv file, but index was similarly not captured in the file.
There are a couple ways to do this. Firstly, the method you used to write to csv you can also include the row.names = TRUE parameter in the write.csv function. Otherwise, the following commands also work:
library(tibble)
df <- tibble::rownames_to_column(df, "VALUE")
(taken from Convert row names into first column)
Example data
d <- data.frame(layer=2:4, gridpop=c(1000,2000,3000))
rownames(d) <- c("A", "B", "C")
d
# layer gridpop
#A 2 1000
#B 3 2000
#C 4 3000
Solution
d$id <- rownames(d)
d
# layer gridpop id
#A 2 1000 A
#B 3 2000 B
#C 4 3000 C
Or if the row names represent numbers
d$id <- as.numeric(rownames(d))
Related
Just a quick question: how can I replace some values with others if these values are present in all the dataframe's column? Functions like mapvalues and recode work only if the column is specified, but in my case the dataframe has 89 columns so that would be time-consuming.
For the sake of clarity, take in consideration the following example. I want to replace [NULL] with another value.
Example:
a <- c("NULL",2,"NULL")
b <- c(3, "NULL", 1)
df <- data.frame(a, b)
df
a b
0 NULL 3
1 2 NULL
2 NULL 1
The difference between the example and my case is that the dataset is [35383 x 89], and the values I want to replace are more than one.
Thank you in advance for your time.
An extension to the comment by Ronak Shah. You can add 0 if you want like that. Or you can replace it with desired values, if you like that.
For example, replace the NULLs with mean of the respective columns:
#Run a loop to convert the characters into numbers because for your case it is all characters
#This will change the NULL to NAs.
for (i in colnames(df)){
df[,i] <- as.numeric(df[,i])
}
#Now replace the NAs with the mean of the column
for (i in colnames(df)){
df[,i][is.na(df[,i])] <- mean(df[,i], na.rm=TRUE)
}
You can similarly do this for median also. Let me know in the comment if you have any doubts.
For starters, I have added a few more rows to your example to better show how the code works
df
# a b
#1 NULL 3
#2 2 NULL
#3 NULL 1
#4 a 14
#5 1 a
#6 14 5
First, create two vectors: one with whe values you want to replace (pattern) and one with replacements in the same order. To make sure you have done it right, put them together in a data frame and take a look at the rows (this will also help in next step)
In this case, I want NULL to be 0, "a" to be "alpha", and so on, as shown below
pattern <- c("NULL", "a", 14, 1)
replacement <- c(0, "alpha", "fourteen", "one")
subs <- data.frame(pattern, replacement)
subs
# pattern replacement
#1 NULL 0
#2 a alpha
#3 14 fourteen
#4 1 one
To finish it, we will make a for tthat each time we will pick a pattern and its replacement from the subs data frame we created, and with these values execute a map_df(). This function iterates over the columns from our original data frame (df) and apply the gsub() function with the pattern and replacement
for (i in 1:nrow(subs)) {
df <- map_df(df, gsub, pattern = subs$pattern[i], replacement = subs$replacement[i])
}
df
# a b
#1 0 3
#2 2 0
#3 0 one
#4 alpha fourteen
#5 one alpha
#6 fourteen 5
I hope this was clear. Let me know if you have any doubts
I am trying to train a data that's converted from a document term matrix to a dataframe. There are separate fields for the positive and negative comments, so I wanted to add a string to the column names to serve as a "tag", to differentiate the same word coming from the different fields - for example, the word hello can appear both in the positive and negative comment fields (and thus, represented as a column in my dataframe), so in my model, I want to differentiate these by making the column names positive_hello and negative_hello.
I am looking for a way to rename columns in such a way that a specific string will be appended to all columns in the dataframe. Say, for mtcars, I want to rename all of the columns to have "_sample" at the end, so that the column names would become mpg_sample, cyl_sample, disp_sample and so on, which were originally mpg, cyl, and disp.
I'm considering using sapplyor lapply, but I haven't had any progress on it. Any help would be greatly appreciated.
Use colnames and paste0 functions:
df = data.frame(x = 1:2, y = 2:1)
colnames(df)
[1] "x" "y"
colnames(df) <- paste0('tag_', colnames(df))
colnames(df)
[1] "tag_x" "tag_y"
If you want to prefix each item in a column with a string, you can use paste():
# Generate sample data
df <- data.frame(good=letters, bad=LETTERS)
# Use the paste() function to append the same word to each item in a column
df$good2 <- paste('positive', df$good, sep='_')
df$bad2 <- paste('negative', df$bad, sep='_')
# Look at the results
head(df)
good bad good2 bad2
1 a A positive_a negative_A
2 b B positive_b negative_B
3 c C positive_c negative_C
4 d D positive_d negative_D
5 e E positive_e negative_E
6 f F positive_f negative_F
Edit:
Looks like I misunderstood the question. But you can rename columns in a similar way:
colnames(df) <- paste(colnames(df), 'sample', sep='_')
colnames(df)
[1] "good_sample" "bad_sample" "good2_sample" "bad2_sample"
Or to rename one specific column (column one, in this case):
colnames(df)[1] <- paste('prefix', colnames(df)[1], sep='_')
colnames(df)
[1] "prefix_good_sample" "bad_sample" "good2_sample" "bad2_sample"
You can use setnames from the data.table package, it doesn't create any copy of your data.
library(data.table)
df <- data.frame(a=c(1,2),b=c(3,4))
# a b
# 1 1 3
# 2 2 4
setnames(df,paste0(names(df),"_tag"))
print(df)
# a_tag b_tag
# 1 1 3
# 2 2 4
I am new to R. I am trying to use the "write.csv" command to write a csv file in R. Unfortunately, when I do this, the resulting data frame produces colnames with a prefix X in it eventhough the file already has a column name.
It produces, X_name1 ,X_name2
Please kindly tell me your suggestions
I have added an example code similar to my data.
a<- c("1","2")
b <- c("3","4")
df <- rbind(a,b)
df <- as.data.frame(df)
names(df) <- c("11_a","12_b")
write.csv(df,"mydf.csv")
a <- read.csv("mydf.csv")
a
#Result
X X11_a X12_b
1 a 1 2
2 b 3 4
All I need is to have only "11_a" and "12_b" as column names. But it incudes prefix X also.
Use check.names=FALSE when reading your data back in - names starting with numbers are not generally acceptable in R:
read.csv(text="11_a,12_b
a,1,2
b,3,4", check.names=FALSE)
# 11_a 12_b
#a 1 2
#b 3 4
read.csv(text="11_a,12_b
a,1,2
b,3,4", check.names=TRUE)
# X11_a X12_b
#a 1 2
#b 3 4
All you have to do is add header=TRUE to your code when you read in the .csv file. It would look like:
a <- read.csv("mydf.csv", header=TRUE)
I think I'm missing something super simple, but I seem to be unable to find a solution directly relating to what I need: I've got a data frame that has a letter as the row name and a two columns of numerical values. As part of a loop I'm running I create a new vector (from an index) that has both a letter and number (e.g. "f2") which I then need to be the name of a new row, then add two numbers next to it (based on some other section of code, but I'm fine with that). What I get instead is the name of the vector/index as the title of the row name, and I'm not sure if I'm missing a function of rbind or something else to make it easy.
Example code:
#Data frame and vector creation
row.names <- letters[1:5]
vector.1 <- c(1:5)
vector.2 <- c(2:6)
vector.3 <- letters[6:10]
data.frame <- data.frame(vector.1,vector.2)
rownames(data.frame) <- row.names
data.frame
index.vector <- "f2"
#what I want the data frame to look like with the new row
data.frame <- rbind(data.frame, "f2" = c(6,11))
data.frame
#what the data frame looks like when I attempt to use a vector as a row name
data.frame <- rbind(data.frame, index.vector = c(6,11))
data.frame
#"why" I can't just type "f" every time
index.vector2 = paste(index.vector, "2", sep="")
data.frame <- rbind(data.frame, index.vector2 = c(6,11))
data.frame
In my loop the "index.vector" is a random sample, hence where I can't just write the letter/number in as a row name, so need to be able to create the row name from a vector or from the index of the sample.
The loop runs and a random number of new rows will be created, so I can't specify what number the row is that needs a new name - unless there's a way to just do it for the newest or bottom row every time.
Any help would be appreciated!
Not elegant, but works:
new_row <- data.frame(setNames(list(6, 11), colnames(data.frame)), row.names = paste(index.vector, "2", sep=""))
data.frame <- rbind(data.frame, new_row)
data.frame
# vector.1 vector.2
# a 1 2
# b 2 3
# c 3 4
# d 4 5
# e 5 6
# f22 6 11
I Understood the problem , but not able to resolve the issue. Hence, suggesting an alternative way to achieve the same
Alternate solution: append your row labels after the data binding in your loop and then assign the row names to your dataframe at the end .
#Data frame and vector creation
row.names <- letters[1:5]
vector.1 <- c(1:5)
vector.2 <- c(2:6)
vector.3 <- letters[6:10]
data.frame <- data.frame(vector.1,vector.2)
#loop starts
index.vector <- "f2"
data.frame <- rbind(data.frame,c(6,11))
row.names<-append(row.names,index.vector)
#loop ends
rownames(data.frame) <- row.names
data.frame
output:
vector.1 vector.2
a 1 2
b 2 3
c 3 4
d 4 5
e 5 6
f2 6 11
Hope this would be helpful.
If you manipulate the data frame with rbind, then the newest elements will always be at the "bottom" of your data frame. Hence you could also set a single row name by
rownnames(data.frame)[nrow(data.frame)] = "new_name"
I have a data frame df like this
1 2 3 4
A B C A
where the colnames are {1,2,3,4}. I would like to select one of the column of the data frame according to an index that I set externally
colf <- as.numeric(mo)
fmo <- df[[colf]]
Many thanks,
First things first I don't recommend having numbers as column names. Saying that, this should help you out.
> df <- data.frame("1"="A","2"="B","3"="C")
> df
X1 X2 X3
1 A B C
> df$X1 #Get column by name
[1] A
Levels: A
> df[,1] #Get first column
[1] A
Levels: A
>
Treat the data frame as a matrix and index it using [row,column] notation, i.e.
fmo = df[,colf]
This will always get column number colf.