After I'm done with some manipulation in Dataframe, I got a result dataframe. But the index are not listed properly as below.
MsgType/Cxr NoOfMsgs AvgElpsdTime(ms)
161 AM 86 30.13
171 CM 1 104
18 CO 27 1244.81
19 US 23 1369.61
20 VK 2 245
21 VS 11 1273.82
112 fqa 78 1752.22
24 SN 78 1752.22
I would like to get the result as like below.
MsgType/Cxr NoOfMsgs AvgElpsdTime(ms)
1 AM 86 30.13
2 CM 1 104
3 CO 27 1244.81
4 US 23 1369.61
5 VK 2 245
6 VS 11 1273.82
7 fqa 78 1752.22
8 SN 78 1752.22
Please guide how I can get this ?
These are the rownames of your dataframe, which by default are 1:nrow(dfr). When you reordered the dataframe, the original rownames are also reordered. To have the rows of the new order listed sequentially, just use:
rownames(dfr) <- 1:nrow(dfr)
Or, simply
rownames(df) <- NULL
gives what you want.
> d <- data.frame(x = LETTERS[1:5], y = letters[1:5])[sample(5, 5), ]
> d
x y
5 E e
4 D d
3 C c
2 B b
1 A a
> rownames(d) <- NULL
> d
x y
1 E e
2 D d
3 C c
4 B b
5 A a
The index is actually the data frame row names. To change them, you can do something like:
rownames(dd) = 1:dim(dd)[1]
or
rownames(dd) = 1:nrow(dd)
Personally, I never use rownames.
In your example, I suspect that you don't need to worry about them either, since you are just renaming them 1 to n. In particular, when you subset your data frame the rownames will again be incorrect. For example,
##Simple data frame
R> dd = data.frame(a = rnorm(6))
R> dd$type = c("A", "B")
R> rownames(dd) = 1:nrow(dd)
R> dd
a type
1 2.1434 A
2 -1.1067 B
3 0.7451 A
4 -0.1711 B
5 1.4348 A
6 -1.3777 B
##Basic subsetting
R> dd_sub = dd[dd$type=="A",]
##Rownames are "wrong"
R> dd_sub
a type
1 2.1434 A
3 0.7451 A
5 1.4348 A
Related
I have a list:
l1<-list(A=1:10, B=100:120, C=300:310, D=400:430)
How do I convert it to dataframe with 2 columns:
C1 C2
R1 1 A
R2 2 A
...
R10 10 A
R11 100 B
R12 101 B
....
R73 429 D
R73 430 D
I tried:
df1 <- data.frame(matrix(unlist(l1), nrow=length(l1), byrow=T))
But I'm getting an error because the vectors in my list have multiple lengths. Also my actual list consist of Dates and not just integers.
Just use stack:
stack(l1)
> head(stack(l1))
values ind
1 1 A
2 2 A
3 3 A
4 4 A
5 5 A
6 6 A
> tail(stack(l1))
values ind
68 425 D
69 426 D
70 427 D
71 428 D
72 429 D
73 430 D
Update
stack won't work with dates. If you have actual date objects, you can do:
data.frame(ind = rep(names(l1), lengths(l1)),
val = as.Date(unlist(l1), origin = "1970-01-01"))
or
data.frame(ind = rep(names(l1), lengths(l1)), val = do.call(c, l1))
Sample data:
l1<-list(A=Sys.Date()+(1:10),
B=Sys.Date()+(100:120),
C=Sys.Date()+(300:310),
D=Sys.Date()+(400:430))
Here's one method: Similar to #Duck answer using Map and do.call
tmp <- Map(data.frame,N = l1,L = names(l1))
out <- do.call(rbind,tmp)
rownames(out) <- NULL
> tail(out)
N L
68 425 D
69 426 D
70 427 D
71 428 D
72 429 D
73 430 D
Maybe a long solution, but using mapply() and do.call() you can reach the expected result. First, you can extract the names of the list as well as the number of elements. Then, using mapply() you can create a list for the first column in your desired result. After that you combine mapply(), do.call(), rbind() and cbind() to end up with df. Here the code:
#Code
#names
v1 <- names(l1)
#length
v2 <- unlist(lapply(l1, length))
#Create values
l2 <- mapply(function(x,y) rep(x,y),v1,v2)
#Bind
df <- as.data.frame(do.call(rbind,mapply(cbind,l2,l1)))
df$V2 <- as.numeric(df$V2)
Output (some rows):
head(df,15)
V1 V2
1 A 1
2 A 24
3 A 25
4 A 37
5 A 69
6 A 70
7 A 71
8 A 72
9 A 73
10 A 2
11 B 3
12 B 4
13 B 5
14 B 6
15 B 7
In R, I have a data frame which includes a ID column. I need to find all the rows that have the same ID but are different in the X1 variable.
For example,
d
ID X1 X2
a 19 F
b 19 F
c 16 T
a 16 T
a 19 T
d 17 T
b 15 F
b 19 F
c 17 T
c 17 T
d 17 T
e 15 T
f 14 T
g 16 T
The result will be:
df1
ID X1 X2
a 19 F
b 19 F
c 16 T
a 16 T
b 15 F
c 17 T
t <- table(d$X1, d$ID)
t[t>1] <- 1
t <- apply(t,2,sum)
t <- t[t>1]
d1 <- data.frame(ID = names(t))
d1 <- merge(d1, d, by = "ID", all.x=T,all.y=F)
d1 <- unique(d1[,1:2])
d1
ID X1
1 a 19
2 a 16
4 b 15
5 b 19
7 c 16
8 c 17
We can include the 3rd column as well, but you'd need to give some logic to pick which value of it to retain. For instance, there were 2 values of a where X1 was 19, one with X2 T and one where it was F. To choose between the 2 you could keep the first matching row for X2, the last, or choose T above F, etc.
We can remove the single ids first. Then get a count of the ids left. If there is a single id left we remove it:
newdf <- df1[duplicated(df1$ID, fromLast=TRUE),]
tbl <- table(newdf$ID)
newdf[!newdf$ID %in% names(tbl[tbl < 2]),]
# ID X1 X2
# 1 a 19 FALSE
# 2 b 19 FALSE
# 3 c 16 TRUE
# 4 a 16 TRUE
# 7 b 15 FALSE
# 9 c 17 TRUE
Does this work?
df1[rownames(unique(df1[,c("ID","X1")])),]
I have a table called data:
A 22
B 333
C Not Av.
D Not Av.
How can I get a subset, from which all rows containing "Not Av." are excluded? It is important to mention that I have the index of a column to be checked (in this case colnum = 2), but I don't have its name.
I tried this, but it does not work:
data<-subset(data,colnum!="Not Available")
df <- read.csv(text="A,22
B,333
C,Not Av.
D,Not Av.", header=F)
df[df[,2] != "Not Av.",]
You don't really need the subset function. Just use [:
> set.seed(42)
> DF <- data.frame(x = LETTERS[1:10],
y = sample(c(1, 2, 3, "Not Av."), 10, replace = TRUE))
> DF
x y
1 A Not Av.
2 B Not Av.
3 C 2
4 D Not Av.
5 E 3
6 F 3
7 G 3
8 H 1
9 I 3
10 J 3
> DF[DF[2] != "Not Av.",]
x y
3 C 2
5 E 3
6 F 3
7 G 3
8 H 1
9 I 3
10 J 3
In case you still want to use the subset function:
df<-subset(df,!grepl("Not Av",df[,2]))
I've been trying to use prop.table() to get the proportions of data I have but keep getting errors. My data is..
Letter Total
a 10
b 34
c 8
d 21
. .
. .
. .
z 2
I want a third column that gives the proportion of each letter.
My original data is in a data frame so I've tried converting to a data table and then using prop.table ..
testtable = table(lettersdf)
prop.table(testtable)
When I try this I keep getting the error,
Error in margin.table(x, margin) : 'x' is not an array
Any help or advise is appreciated.
:)
If the Letter column in your data does not have duplicate values, like this
Df <- data.frame(
Letter=letters,
Total=sample(1:50,26),
stringsAsFactors=F)
you can just do this instead of using prop.table:
Df$Prop <- Df$Total/sum(Df$Total)
> head(Df)
Letter Total Prop
1 a 45 0.074875208
2 b 1 0.001663894
3 c 13 0.021630616
4 d 15 0.024958403
5 e 24 0.039933444
6 f 39 0.064891847
> sum(Df[,3])
[1] 1
If there are duplicated values, like in this object
Df2 <- data.frame(
Letter=sample(letters,50,replace=T),
Total=sample(1:50,50),
stringsAsFactors=F)
you can make a table to sum the frequency of unique Letters,
Table <- table(rep(Df2$Letter,Df2$Total))
> Table
a b c d e f h j k l m n o p q t v w x y z
48 16 99 2 40 75 45 42 66 6 62 27 88 99 32 96 85 64 53 161 69
and then use prop.table on this table object:
> prop.table(Table)
a b c d e f h j k l m
0.037647059 0.012549020 0.077647059 0.001568627 0.031372549 0.058823529 0.035294118 0.032941176 0.051764706 0.004705882 0.048627451
n o p q t v w x y z
0.021176471 0.069019608 0.077647059 0.025098039 0.075294118 0.066666667 0.050196078 0.041568627 0.126274510 0.054117647
You could also make this into a data.frame:
Df2.table <- cbind(
data.frame(Table,stringsAsFactors=F),
Prop=as.numeric(prop.table(Table)))
> head(Df2.table)
Var1 Freq Prop
1 a 48 0.037647059
2 b 16 0.012549020
3 c 99 0.077647059
4 d 2 0.001568627
5 e 40 0.031372549
6 f 75 0.058823529
I have two dataframes with different dimensions,
df1 <- data.frame(names= sample(LETTERS[1:10]), duration=sample(0:100, 10))
>df1
names duration
1 J 97
2 G 57
3 H 53
4 A 23
5 E 100
6 D 90
7 C 73
8 F 60
9 B 37
10 I 67
df2 <- data.frame(names= LETTERS[1:5], names_new=letters[1:5])
> df2
names names_new
1 A a
2 B b
3 C c
4 D d
5 E e
I want to replace in df1 the values that match df1$names and df2$names but using the df2$names_new. My desired output would be:
> df1
names duration
1 J 97
2 G 57
3 H 53
4 a 23
5 e 100
6 d 90
7 c 73
8 F 60
9 b 37
10 I 67
This is the code I'm using but I wonder if there is a cleaner way to do it with no so many steps,
df2[,1] <- as.character(df2[,1])
df2[,2] <- as.character(df2[,2])
df1[,1] <- as.character(df1[,1])
match(df1[,1], df2[,1]) -> id
which(!is.na(id)==TRUE) -> idx
id[!is.na(id)] -> id
df1[idx,1] <- df2[id,2]
Many thanks
Here's an approach from qdapTools:
library(qdapTools)
df1$names <- df1$names %lc+% df2
The %l+% is a binary operator version of lookup. The left are the terms and the right side is the lookup table. The + means that any noncomparables will revert back to the original. This is a wrapper for the data.table package and is pretty speedy.
Here is the output including set.seed(1) for reproducibility:
set.seed(1)
df1 <- data.frame(names= sample(LETTERS[1:10]), duration=sample(0:100, 10),stringsAsFactors=F)
df2 <- data.frame(names= LETTERS[1:5], names_new=letters[1:5],stringsAsFactors=F)
library(qdapTools)
df1$names <- df1$names %lc+% df2
df1
## names duration
## 1 c 20
## 2 d 17
## 3 e 68
## 4 G 37
## 5 b 74
## 6 H 47
## 7 I 98
## 8 F 93
## 9 J 35
## 10 a 71
Are all names in df2 also in df1? And do you intent to keep them as a factor? If so, you might find this solution helpful.
idx <- match(levels(df2$names), levels(df1$names))
levels(df1$names)[idx] <- levels(df2$names_new)
This works but requires that names and names_new are character and not factor.
set.seed(1)
df1 <- data.frame(names= sample(LETTERS[1:10]), duration=sample(0:100, 10),stringsAsFactors=F)
df2 <- data.frame(names= LETTERS[1:5], names_new=letters[1:5],stringsAsFactors=F)
rownames(df1) <- df1$names
df1[df2$name,]$names <- df2$names_new
Another option using merge:
transform(merge(df1,df2,all.x=TRUE),
names=ifelse(is.na(names_new),as.character(names),
as.character(names_new)))
Another way using match would be (if df1$names and df1$names are characters of course)
df1[match(df2$names, df1$names), "names"] <- df2$names_new