Automatically creating and filling data frames in R - r

Here is the code that I am working with.
rnumbers <- data.frame(replicate(5,runif(20000, 0, 1)))
dt <- c(.001)
A <- dt*1
B <- dt*.5
## A = 0
## B = 1
rstate <- rnumbers # copy the structure
rstate[] <- NA # preserve structure with NA's
# Init:
rstate[1, ] <- rnumbers[1, ] < .02 & rnumbers[1, ] > 0.01
step_generator <- function(col, rnum){
for (i in 2:length(col) ){
if( rnum[i] < B) { col[i] <- 0 }
else { if (rnum[i] < A) {col[i] <- 1 }
else {col[i] <- col[i-1] } }
}
return(col)
}
# Run for each column index:
for(cl in 1:5){ rstate[ , cl] <-
step_generator(rstate[,cl], rnumbers[,cl]) }
rstate1 <- transform(rstate, time = rep(dt))
rstate2 <- transform(rstate1, cumtime = cumsum(time))
This gives me a data frame with 5 columns that contain state switches over time. Time interval is in the 6th column (seconds) and cumulative time is in the 7th column (seconds). Now I want to see how long each state lasts in seconds. This is what I am doing -
1) lengths <- rle(rstate2[,1])
>Run Length Encoding
lengths: int [1:15] 366 3278 1817 451 3033 1655 1901 748 742 1780 ...
values : num [1:15] 0 1 0 1 0 1 0 1 0 1 ...
2) lengths1 <- data.frame(state = lengths$values, duration = lengths$lengths)
> lengths1
state duration
1 0 366
2 1 3278
3 0 1817
4 1 451
5 0 3033
6 1 1655
7 0 1901
8 1 748
9 0 742
10 1 1780
11 0 26
12 1 458
13 0 305
14 1 1039
15 0 2401
3) library("plyr")
lengths2 <- transform(lengths1, time = duration*dt)
lengths3 <- arrange(lengths2, desc(state))
> lengths3
state duration time
1 1 3278 3.278
2 1 451 0.451
3 1 1655 1.655
4 1 748 0.748
5 1 1780 1.780
6 1 458 0.458
7 1 1039 1.039
8 0 366 0.366
9 0 1817 1.817
10 0 3033 3.033
11 0 1901 1.901
12 0 742 0.742
13 0 26 0.026
14 0 305 0.305
15 0 2401 2.401
4) col1 <- ddply(lengths3, .(state), function(df) 1/mean(df$time))
> col1
state V1
1 0 0.7553583
2 1 0.7439685
So, col1 is showing me "1/mean(time in each state)" for column1 of rstate2. What I would like to do is iterate steps 1-4 for every column in rstate2 and generate a data frame that looks like this :
> rates
state col1 col2 col3 col4 col5
1 0 0.1 0.2 0.3 0.4 0.5
2 1 0.3 0.4 0.5 0.6 0.7
Where the numbers for each column are equal to the 1/mean(df$time) for each of the column from rstate2.
Thank you for any and all help.

I'd do this using the development version of data.table (v 1.8.11) in this manner:
require(data.table) # 1.8.11
require(reshape2)
DT <- data.table(rstate2)
DT.m <- melt(DT, id=6, measure=1:5)
ans <- DT.m[, {dl=data.table:::duplist(list(value));
list(state=value[dl], time=c(diff(dl),
.N-dl[length(dl)]+1)*dt)
}, by=list(variable)]
ans <- ans[, 1/mean(time), by=list(variable, state)]
dcast.data.table(ans, state ~ variable)
state X1 X2 X3 X4 X5
1: 0 0.9875568 1.0777521 0.3227194 2.2371365 0.7237054
2: 1 1.0127608 0.4442799 0.2802691 0.2887169 1.0576415
Unfortunately, it's still building on R-Forge. So, probably you can install 1.8.10 from CRAN and use reshape2's melt and cast (which'll output a data.frame) and convert the result back to a data.table and do the grouping as follows:
require(data.table) # 1.8.10
require(reshape2)
DT.m <- data.table(melt(rstate2, id=6, measure=1:5))
ans <- DT.m[, {dl=data.table:::duplist(list(value));
list(state=value[dl], time=c(diff(dl),
.N-dl[length(dl)]+1)*dt)
}, by=list(variable)]
ans <- ans[, 1/mean(time), by=list(variable, state)]
dcast(ans, state ~ variable)

Related

For loop with a function for a moving/rolling average?

Essentially (in R), I want to apply a moving average function over a period of time (eg. date and time variables) to see how a particular metric changes over time. However, the metric in itself is a function. The scores can either be 1 (pro), 0 (neutral), or -1 (neg). The function for the metric is:
function(pro, neg, total) {
x <- (pro / total) * 100
y <- (neg / total) * 100
x - y
}
So the percentage of 1's minus the percentage of -1's is the metric value.
Given timestamps for each recorded score, I want to evaluate the metric as a moving average across all rows. I assumed that a for loop would be the best way to apply this but I am stuck in how to do this.
Does anyone have any thoughts / advice?
As mentioned in the comments, rollapply() from zoo is a good option. I took the liberty to generate some example data, apologies if it doesn't resemble yours.
library(zoo)
f <- function(x, l) {
p <- sum(x == 1) / l
n <- sum(x == -1) / l
(p - n)*100
}
# Or more efficiently
f <- function(x, l=length(x)) {
(sum(x)/l)*100
}
set.seed(1)
N <- 25
dtf <- data.frame(time=as.Date(15000+(1:N)), score=sample(-1:1, N, rep=TRUE))
score <- read.zoo(dtf)
l <- 8
zts <- cbind(score, rolling=rollapply(score, l, f, l, fill=NA))
zts
# score rolling
# 2011-01-27 -1 NA
# 2011-01-28 0 NA
# 2011-01-29 0 NA
# 2011-01-30 1 12.5
# 2011-01-31 -1 25.0
# 2011-02-01 1 12.5
# 2011-02-02 1 0.0
# 2011-02-03 0 -25.0
# 2011-02-04 0 0.0
# 2011-02-05 -1 -12.5
# 2011-02-06 -1 -12.5
# 2011-02-07 -1 -12.5
# 2011-02-08 1 0.0
# 2011-02-09 0 25.0
# 2011-02-10 1 37.5
# 2011-02-11 0 62.5
# 2011-02-12 1 62.5
# 2011-02-13 1 50.0
# 2011-02-14 0 37.5
# 2011-02-15 1 25.0
# 2011-02-16 1 0.0
# 2011-02-17 -1 NA
# 2011-02-18 0 NA
# 2011-02-19 -1 NA
# 2011-02-20 -1 NA

Binary representation of breast cancer wisconsin database

I want to produce a binary representation of the well-known breast cancer Wisconsin database.
The initial data set has 31 numerical variables, and one categorical variable.
id_number diagnosis radius_mean texture_mean perimeter_mean area_mean smoothness_mean compactness_mean concavity_mean concave_points_mean symmetry_mean
1 842302 M 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 0.2419
2 842517 M 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 0.1812
3 84300903 M 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 0.2069
4 84348301 M 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 0.2597
5 84358402 M 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 0.1809
I want to produce a binary representation of this dataframe by:
transforming the diagnosis column (levels= M , B) to two columns diagnosis_M and diagnosis_B and put 1 or 0 in the relevant row depending on the value in the initial column (M or B).
Looking for the median of each numerical column and split it as two columns depending on whether the values are greater or lower than the mean value. eg: for the column radius_mean, split it in radius_mean_great in-which we put 1 if the values > mean, o else; and a column radius_mean_low inversely.
library(mlbench)
library("RCurl")
library("curl")
UCI_data_URL <- getURL('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data')
names <- c('id_number', 'diagnosis', 'radius_mean', 'texture_mean', 'perimeter_mean', 'area_mean', 'smoothness_mean', 'compactness_mean', 'concavity_mean','concave_points_mean', 'symmetry_mean', 'fractal_dimension_mean', 'radius_se', 'texture_se', 'perimeter_se', 'area_se', 'smoothness_se', 'compactness_se', 'concavity_se', 'concave_points_se', 'symmetry_se', 'fractal_dimension_se', 'radius_worst', 'texture_worst', 'perimeter_worst', 'area_worst', 'smoothness_worst', 'compactness_worst', 'concavity_worst', 'concave_points_worst', 'symmetry_worst', 'fractal_dimension_worst')
breast.cancer.fr <- read.table(textConnection(UCI_data_URL), sep = ',', col.names = names)
Well there are several ways to binarize the base, I found the following I hope it serves
df <- breast.cancer.fr[,3:32]
df2 <- matrix(NA, ncol = 2*ncol(df), nrow = nrow(df))
for(i in 1:ncol(df)){
df2[,2*i-1]<- as.numeric(df[,i] > mean(df[,i]))
df2[,2*i] <- as.numeric(df[,i] <= mean(df[,i]))}
colnames(df2) <- c(rbind(paste0(names(df),"_great"),paste0(names(df),"_low")))
library(dplyr)
df3 <- select(breast.cancer.fr,id_number,diagnosis) %>% mutate(diagnosis_M = as.numeric(diagnosis == "M")) %>%
mutate(diagnosis_B = as.numeric(diagnosis == "B"))
df <- cbind(df3[,-2],df2)
df[1:10,1:7]
id_number diagnosis_M diagnosis_B radius_mean_great radius_mean_low texture_mean_great texture_mean_low
1 842302 1 0 1 0 0 1
2 842517 1 0 1 0 0 1
3 84300903 1 0 1 0 1 0
4 84348301 1 0 0 1 1 0
5 84358402 1 0 1 0 0 1
6 843786 1 0 0 1 0 1
7 844359 1 0 1 0 1 0
8 84458202 1 0 0 1 1 0
9 844981 1 0 0 1 1 0
10 84501001 1 0 0 1 1 0

Calculate mean of a proportion of the data.frame

I'm working with data that looks similar to this:
cat value n
1 100 18
2 0 19
3 -100 15
4 100 13
5 0 17
6 -100 18
In the real data, there are many cats and value can be any number between -100 and 100 (no NA).
What I want to do is to calculate the mean of value based on terciles defined by n
So, for example, since sum(n)=100 what I want to do is to get n's as close as possible to 33 and calculate the mean of value. So for the first tercile, 18 isn't quite 33, so I need to take 15 values from cat=2. So the mean for the first tercile should be (100*18+0*15)/(18+15). The second tercile would be the remaining ns from cat=2, then as many as are needed to get to 33: (0*4+-100*15+100*13+0*1)/(4+15+13+1). Similar for the last tercile.
I got started writing this, but ended up with lots of nasty for loops and if statements. I'm hoping that you see an easier way to deal with this than I do. Thanks in advance!
A solution with data.table:
setDT(df)[rep(1:.N,n)
][,indx:=c(rep("a",33),rep("b",33),rep("c",34))
][,.(mean_val_indx=mean(value)),by=indx]
this gives:
indx mean_val_indx
1: a 54.545455
2: b -6.060606
3: c -52.941176
Which are the means of value for the three parts of the data.
Broken down in the intermediate steps:
1: replice the rows according n
setDT(df)[rep(1:.N,n)]
this gives (shortened):
cat value n
1: 1 100 18
2: 1 100 18
....
17: 1 100 18
18: 1 100 18
19: 2 0 19
20: 2 0 19
....
36: 2 0 19
37: 2 0 19
38: 3 -100 15
....
99: 6 -100 18
100: 6 -100 18
2: create an index with [,indx:=c(rep("a",33),rep("b",33),rep("c",34))]
setDT(df)[rep(1:.N,n)
][,indx:=c(rep("a",33),rep("b",33),rep("c",34))]
this gives:
> dt
cat value n indx
1: 1 100 18 a
2: 1 100 18 a
....
17: 1 100 18 a
18: 1 100 18 a
19: 2 0 19 a
20: 2 0 19 a
....
32: 2 0 19 a
33: 2 0 19 a
34: 2 0 19 b
35: 2 0 19 b
....
99: 6 -100 18 c
100: 6 -100 18 c
3: summarise value by indx with [,.(mean_val_indx=mean(value)),by=indx]
You could try something like this, data being your example dataframe:
longData<-unlist(apply(data[,c("value","n")],1,function(x){
rep(x["value"],x["n"])
}))
aggregate(longData,list(cut(seq_along(longData),breaks=3,right=FALSE)),mean)
longData will be a vector of length 100 with, using your example, 18 repetitions of -100, 19 repetitions of 0 etc.
The cut in the aggregate will divide longData into three groups, and the mean of each group will be calculated.
If already the data is very long repetition by "n" is perhaps unwanted.
The following solution doesn't do this. Moreover, 1/3 of the sum of the
"n"-values is not rounded to the nearest integer.
"i" is the vector of row numbers where terciles end. Since it is possible
that several terciles end at the same row, those row numbers are replicated.
The result is the vector "k".
For each index "j" the cumulative sum of "data$value"*"data$n" up to "k[j]"
covers "ms[k[j]]" terciles, so "ms[j]-j" terciles have to be subtracted
to get the cumulative sum up to the "j"th tercile.
m <- 3
sn <- sum(data$n)
ms <- m * cumsum(data$n) / sn
d <- diff(c(0,floor(ms)))
i <- which(d>0)
k <- rep(i,d[i])
vn <- data$value * data$n
sums <- cumsum(vn)[k] - (ms[k]-(1:m))*data$value[k]*sn/m
means <- m*diff(c(0,sums))/sn
The means of the terciles are:
> means
[1] 54 -6 -54
In this example "i" is equal to "k". But if terciles are replaced by deciles,
i.e. "m" is not 3 but 10, they are distinct:
> m
[1] 10
> i
[1] 1 2 3 4 5 6
> k
[1] 1 2 2 3 3 4 5 5 6 6
> means
[1] 100 80 0 -30 -100 60 50 0 -80 -100
I compared the speed of the 4 answers, using out small example with 8 rows:
> ##### "longData"-Answer #####
>
> system.time( for ( i in 1:1000 ) { A1 <- f1(data) } )
User System verstrichen
3.48 0.00 3.49
> ##### "sapply"-Answer #####
>
> system.time( for ( i in 1:1000 ) { A2 <- f2(data) } )
User System verstrichen
1.00 0.00 0.99
> ##### "data.table"Answer #####
>
> system.time( for ( i in 1:1000 ) { A3 <- f3(data) } )
User System verstrichen
4.73 0.00 4.79
> ##### this Answer #####
>
> system.time( for ( i in 1:1000 ) { A4 <- f4(data) } )
User System verstrichen
0.43 0.00 0.44
The "sapply"-Answer is even false:
> A1
Group.1 x
1 [0.901,34) 54.545455
2 [34,67) -6.060606
3 [67,100) -52.941176
> A2
(0,33] (33,67] (67,100]
-100.00000 0.00000 93.93939
> A3
indx mean_val_indx
1: a 54.545455
2: b -6.060606
3: c -52.941176
> A4
[1] 54 -6 -54
>
This is basically the same as NicE although perhaps useful as a different way fo assembling the rep and cutting operations:
sapply(split( sort(unlist( mapply(rep, res$value, res$n) )),
cut(seq(sum(res$n)), breaks=c(0,33,67,100) )),
mean)
(0,33] (33,67] (67,100]
-100.00000 0.00000 93.93939

Selecting datachunks depending on condition

I have some question of selecting data chunks depending on condition I provide.
Its a multi step process which I think should be done in function and can be applied to the other data sets by lapply.
I have have data.frame which has 19 column (but the example data here has only two) I want to first check the first column (time) rows they should be in range 90 and 54000 if some of them not in this range skip them. After count those chunks, count how many of mag columns show full positive and neg/pos values. If the chunk contains negative number count it as switched state. and give the switching rate something like (total numbers of chunks which shows switched state)/(total number of chunks which range in between 90:54000)
for the data chunks which satisfies the range 90:54000, check the mag
for the first observation of the number <0 together with corresponding time
numbers <- c(seq(1,-1,length.out = 601),seq(1,0.98,length.out = 601))
time <- c(seq(90,54144,length.out = 601),seq(90,49850,length.out = 601))
data = data.frame(rep(time,times=12), mag=rep(numbers, times=6))
n <- 90:54000
dfchunk<- split(data, factor(sort(rank(row.names(data))%%n)))
ext_fsw<-lapply(dfchunk,function(x)x[which(x$Mag<0)[1],])
x.n <- data.frame(matrix(unlist(ext_fsw),nrow=n, byrow=T)
Here is what the real dataset look like:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
1 90 0 0 0 0.0023 -0.0064 0.9987 0.0810 0.0375 0.9814 0.0829 0.0379 0.9803 0.0715 0.0270 0.9823
2 180 0 0 0 0.0023 -0.0064 0.9987 0.0887 -0.0281 0.9818 0.0956 -0.0288 0.9778 0.0796 -0.0469 0.9772
3 270 0 0 0 0.0023 -0.0064 0.9987 -0.0132 -0.0265 0.9776 0.0087 -0.0369 0.9797 0.0311 -0.0004 0.9827
4 360 0 0 0 0.0023 -0.0064 0.9987 0.0843 0.0369 0.9752 0.0765 0.0362 0.9749 0.0632 0.0486 0.9735
5 450 0 0 0 0.0023 -0.0064 0.9987 0.1075 -0.0660 0.9737 0.0914 -0.0748 0.9698 0.0586 -0.0361 0.9794
6 540 0 0 0 0.0023 -0.0064 0.9987 0.0006 0.0072 0.9808 -0.0162 -0.0152 0.9797 0.0369 0.0118 0.9763
Here is the expected outputs (just and example)
For part 1:
ss (swiched state) total countable chunks switching probability
5 10 5/10
For part 2:
time mag
27207 -0.03
26520 -0.98
32034 -0.67
.
.
.
.
etc
Okay, I think have this figured out. I put them into two functions. For each function, you give a dataframe and a column name, and it'll return the requested data.
library(dplyr)
thabescity <- function(data, col){
filter_vec <- data[col] < 0
new_df <- data %>%
filter(filter_vec) %>%
filter(90 <= time & time <= 54000) %>%
group_by(time) %>%
summarise()
ss <- nrow(new_df)
total <- length(unique(data$time))
switching_probability <- ss/total
results <- c(ss, total, switching_probability)
output <- as.data.frame(cbind(ss, total, switching_probability))
return(output)
}
print(thabescity(data, "mag"))
ss total switching_probability
1 298 1201 0.2481266
You can make a list and run it in a loop to do all the columns and have it come out in a list:
data_names <- names(data)[2:length(names(data))]
first_problem <- list()
for(name in data_names){
first_problem[[name]] <- thabescity(data, name)
}
first_problem[["mag"]]
ss total switching_probability
1 298 1201 0.2481266
The second problem is a bit easier:
thabescity2 <- function(data, col){
data <- data[,c("time", col)]
filter_vec <- data[col] < 0
new_df <- data %>%
filter(filter_vec) %>%
filter(90 <= time & time <= 54000) %>%
group_by(time) %>%
filter(row_number() == 1)
return(new_df)
}
print(thabescity2(data, "mag"))
Source: local data frame [298 x 2]
Groups: time
time mag
1 27207.09 -0.003333333
2 27297.18 -0.006666667
3 27387.27 -0.010000000
4 27477.36 -0.013333333
5 27567.45 -0.016666667
6 27657.54 -0.020000000
7 27747.63 -0.023333333
8 27837.72 -0.026666667
9 27927.81 -0.030000000
10 28017.90 -0.033333333
.. ... ...
You can do the same thing as above to go through the whole dataframe:
data_names <- names(data)[2:length(names(data))]
second_problem <- list()
for(name in data_names){
second_problem[[name]] <- thabescity2(data, name)
}
second_problem[["mag"]]
Source: local data frame [298 x 2]
Groups: time
time mag
1 27207.09 -0.003333333
2 27297.18 -0.006666667
3 27387.27 -0.010000000
4 27477.36 -0.013333333
5 27567.45 -0.016666667
6 27657.54 -0.020000000
7 27747.63 -0.023333333
8 27837.72 -0.026666667
9 27927.81 -0.030000000
10 28017.90 -0.033333333
.. ... ...
Double check my results, but I think this does what you want.

To attach suffix to name column in a table so that it may be read in R

I have a table that looks like this:
Gene U2803 U2823 U2840 U2841 U2862 U2872 U2897 U2982 U2991 U2994 U2998 U2999 U3001 U3007 U3012 U2980
A1BG-AS 7.3159 9.3802 10.77 8.701 13.6066 8.3253 9.0556 9.8801 9.0776 11.2029 7.61 10.8403 9.2378 12.1697 9.7482 5.5327
A1BG 7.4715 5.2955 10.2275 6.3606 10.1463 5.9968 6.2673 8.6119 6.153 6.7903 4.0843 13.0875 6.8167 8.3186 6.7643 5.14
A1CF 0 0 0 0 0.0026 0 0 0 0 0 0 0 0 0 0.0037 0
A2LD1 1.776 1.125 1.3508 1.2489 2.1252 2.1057 1.0177 1.6063 1.0053 0.9571 1.4972 1.3998 1.0935 2.4737 1.2063 1.7788
A2ML1 0.1024 0.092 0.0473 0.071 0.1227 0.2047 0.2481 0.1089 0.0499 0.1381 0.057 0.0953 0.0433 0.0651 0.0598 0.0434
A2M 5.4296 0.1688 2.4767 0.2507 0.5087 4.2835 2.2989 8.6027 3.1126 0.4565 0.167 2.9066 3.195 0.942 5.8904 6.7635
A4GALT 0.2918 11.5673 4.9554 0 1.6693 1.6301 0.4985 2.4444 0.6217 1.4638 3.2648 0.5773 3.1071 7.651 0.4068 5.133
A4GANLT 0 0 0 0 0.0575 0.1018 0 0.0422 0 0 0 0.0257 0.0276 0 0 0.0288
AAA1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
AAA1 18.789 24.8681 29.8037 33.3986 37.8269 24.4719 21.1101 26.9985 21.9897 25
If you notice two genes in column1 have same names, AAA1 and A4GALT. How can I add suffix to these genes so that it is not recognized as duplicate names while reading this table in R.
A small example in R or awk will be of great help.
Thank you.
This awk adds a new number to each occurence of the gene.
awk 'a[$1]{$1=$1"_"a[$1]}{a[$1]++}1' file
Hope it helps :)
That previous example was bugged.
This actually works as described
awk 'a[$1]{a[$1]++}NF&&a[$1]{$1=$1"_"a[$1]}!a[$1]{a[$1]++}1' file
The reason the first one didn't increment was due to an unforseen side effect of renaming $1, $1 was already changed when it reached the increment so the new value was being incremented , not the original.
Anyway it works now :)
P.s if someone knows how to reduce this let me know :)
This is pretty easy to do as a post-read-in step in R.
Imagine we have a file like "x" below.
x <- tempfile()
cat("A 1 2\nB 3 4\nC 5 6 13\nA 7 8\nB 9 10\nA 11 12\n", file=x)
You've tried to read it in like this, but ran into problems because of duplicated row.names:
read.table(file = x, row.names = 1, header = FALSE,
fill = TRUE, stringsAsFactors = FALSE) # Error
# Error in read.table(file = x, row.names = 1, header = FALSE) :
# duplicate 'row.names' are not allowed
Read it in with the row.names as a column first, and then work from there.
temp <- read.table(file=x, header = FALSE, fill = TRUE,
stringsAsFactors = FALSE)
temp
# V1 V2 V3 V4
# 1 A 1 2 NA
# 2 B 3 4 NA
# 3 C 5 6 13
# 4 A 7 8 NA
# 5 B 9 10 NA
# 6 A 11 12 NA
FYI, a matrix can have duplicated rownames (but I don't really suggest this):
temp1 <- as.matrix(temp[-1])
rownames(temp1) <- temp[, 1]
temp1
# V2 V3 V4
# A 1 2 NA
# B 3 4 NA
# C 5 6 13
# A 7 8 NA
# B 9 10 NA
# A 11 12 NA
Instead, look at one of the functions that can be used to create unique names, such as make.names or make.unique. The latter seems more appropriate for this scenario.
make.names(temp$V1, unique=TRUE)
# [1] "A" "B" "C" "A.1" "B.1" "A.2"
make.unique(temp$V1, sep="_")
# [1] "A" "B" "C" "A_1" "B_1" "A_2"
You could incorporate it as follows:
rownames(temp) <- make.unique(temp$V1, sep="_")
temp$V1 <- NULL
temp
# V2 V3 V4
# A 1 2 NA
# B 3 4 NA
# C 5 6 13
# A_1 7 8 NA
# B_1 9 10 NA
# A_2 11 12 NA

Resources