I have a dataframe, for example like this:
df <- data.frame(ID=c(8, 2, 5, 1, 4), value=c("a", "b", "c", "d", "e"))
ID value
1 8 a
2 2 b
3 5 c
4 1 d
5 4 e
I know how to select rows with a given value in the "ID" column. But how to get rows conditional on their "ID"-values in a specified order?
Example: How to extract "value" for rows with ID 4, 2 and 5 in the given order? The result I want to get is "e", "b", "c".
Using %in% gives me the results in wrong order:
df[df$ID %in% c(4, 2, 5), "value"]
[1] b c e
Levels: a b c d e
I found a workaround using rownames, but I feel like there must be a better solution to this.
# workaround
rownames(df) <- df$ID
df[as.character(c(4, 2, 5)), "value"]
[1] e b c
Levels: a b c d e
Any suggestions?
Thank you!
You can use merge and order by a new introduced rank column :
dat = merge(df,data.frame(ID=c(4,2,5),v=1:3))
dat[order(dat$v),"value"]
[1] e b c
Or one linear option:
with(merge(df,data.frame(ID=c(4,2,5),v=1:3)),value[order(v)])
sapply(c(4,2,5), function(x) df[df$ID==x,"value"])
Related
firstVector <- c("A", "B", "C", "D", "E")
secondVector <- c(1, 2, 3, 4, 5)
thirdVector <- c("a", "b", "c", "d", "e")
myDataFrame <- data.frame(firstVector, secondVector, thirdVector)
How do I extract row 3 and 4 from my data frame? I want to print it row 3 and 4 in order it to look like this:
firstVector secondVector thirdVector
3 C 3 c
4 D 4 d
You can subset your dataframe like this [rows,columns]:
myDataFrame[c(3,4),]
In your case you want a vector containing rows 3 and 4, therefore c(3,4), you can add more columns in the vector to subset more rows, for example c(1,2,3,12).
If you dont provide an argument it returns the whole dimension. In your example you subset rows, and return all the columns
it's the same for columns:
myDataFrame[c(3,4),c(1,2)]
you can subset rows 3 and 4 and columns 1 and 2.
Another way to do this is using :
c(1:4) means from 1 to 4
Hope this helps
I would like to improve my piece of code. Let's say you want to remove duplicate rows that have the same 'label' and 'id'. The way I do it is:
library(data.table)
dt <- data.table(label = c("A", "A", "B", "B", "C", "A", "A", "A"),
id = c(1, 1, 2, 2, 3, 4, 5, 5))
tmp = dt[label == 'A',]
tmp = unique(tmp, by = 'id')
dt = dt[label != 'A',]
dt = rbind(dt, tmp)
Is there a smarter/shorter way to accomplish that? If possible by reference?
This code looks very ugly and implies a lot of copies.
(Moreover I have to do this operation for a few labels, but not all of them. So this implies 4 lines for every label...)
Thanks !
Example:
label id
A 1
A 1
B 2
B 2
C 3
A 4
A 5
A 5
Would give :
label id
A 1
B 2
B 2
C 3
A 4
A 5
Note that line 3 and 4 stay duplicated since the label is equal to 'B' and not to 'A'.
There is no need to create tmp and then rbind it again. You can simply use the duplicated function as follows:
dt[label != "A" | !duplicated(dt, by=c("label", "id"))]
# label id
# 1: A 1
# 2: B 2
# 3: B 2
# 4: C 3
# 5: A 4
# 6: A 5
If you want to do this over several labels:
dt[!label %in% c("A", "C") | !duplicated(dt, by=c("label", "id"))]
See ?duplicated to learn more about de-duplication functions in data.table.
This could be also done using an if/else condition
dt[, if(all(label=='A')) .SD[1L] else .SD, by = id]
# id label
#1: 1 A
#2: 2 B
#3: 2 B
#4: 3 C
#5: 4 A
#6: 5 A
I have a dataframe with rows of repeating values for example:
id
A
A
A
B
B
C
C
D
D
What I would like to achieve is a line of code that retains only one value for each value in another vector, for example in:
keeps <- c("A", "C")
The result should be this:
id
A
C
Try this:
df[df$id %in% c("A", "C") & !duplicated(df$id),,drop = FALSE]
# id
# 1 A
# 6 C
or this:
unique(df[df$id %in% c("A", "C"),,drop = FALSE])
# id
# 1 A
# 6 C
I have three tables that I'm attempting to merge into one.
The main table is similar to:
Table1 <- data.frame("Data" = c(1, 2, 3, 4, 5), "Desc" = c("A", "A", "A", "B", "B"))
TableA <- data.frame("Values" = c(6, 2, 3))
TableB <- data.frame("Values" = c(2, 7))
I want to add another column to Table1 with the values from TableA and TableB, but Values coming from TableA must be placed in a row containing "A" in the "Desc" column and TableB values in rows containing "B" in the "Desc" column. The number of rows in Table A equal the number of rows Table1 with "A" and same for TableB.
The resulting Table should look like:
Table1 <- data.frame("Data" = c(1, 2, 3, 4, 5), "Desc" = c("A", "A", "A", "B", "B"), "Values" = c(6, 2, 3, 2, 7))
> Table1
Data Desc Values
1 1 A 6
2 2 A 2
3 3 A 3
4 4 B 2
5 5 B 7
First note that these are "data.frames", not "tables". A "table" is actually a different class in R and they aren't the same thing. This strategy should work
Table1$Values <- NA
Table1$Values[Table1$Desc=="A"] <- TableA$Value
Table1$Values[Table1$Desc=="B"] <- TableB$Value
Table1
# Data Desc Values
# 1 1 A 6
# 2 2 A 2
# 3 3 A 3
# 4 4 B 2
# 5 5 B 7
If you have multiple Table (TableA, TableB, TableC,...etc) and if you need to match the suffix of Table. to Table1 column Desc
ls1 <- ls(pattern="Table")
ls1
#[1] "Table1" "TableA" "TableB"
library(stringr)
indx <- str_extract(ls1[-1], perl('(?<=Table)[A-Z]'))
lst1 <- mget(ls1[-1])
do.call(rbind,
lapply(seq_along(lst1),function(i) {
x1 <- lst1[[i]]
x2 <- Table1[!is.na(match(Table1$Desc, indx[i])),]
x2$Values <- x1$Values
x2}
))
# Data Desc Values
#1 1 A 6
#2 2 A 2
#3 3 A 3
#4 4 B 2
#5 5 B 7
In the first step, after I created the objects (Table.), looked for the object names ls(pattern="Table")
Extracted the suffix LETTERS A, B from the objects that needs to be matched. Used regex lookbehind i.e. (?<=Table)[A-Z] matches a substring (uppercase letter) preceded by the string Table and extract the substring.
mget returns the value of the objects as a list
Loop using lapply. Match the Desc column in Table1 with the extracted suffix and created a new column
test1 <- as.matrix(c(1, 2, 3, 4, 5))
row.names(test1) <- c("a", "d", "c", "b", "e")
test2 <- as.matrix(c(6, 7, 8, 9, 10))
row.names(test2) <- c("e", "d", "c", "b", "a")
test1
[,1]
a 1
d 2
c 3
b 4
e 5
test2
[,1]
e 6
d 7
c 8
b 9
a 10
How can I reorder test2 so that the rows are in the same order as test1? e.g.
test2
[,1]
a 10
d 7
c 8
b 9
e 6
I tried to use the reorder function with: reorder (test1, test2) but I could not figure out the correct syntax. I see that reorder takes a vector, and I'm here using a matrix. My real data has one character vector and another as a data.frame. I figured that the data structure would not matter too much for this example above, I just need help with the syntax and can adapt it to my real problem.
test2 <- test2[rownames(test1),,drop=FALSE]
After fixing your code snipped to actually generate what your example shows (hint: test1 had names a,b,c,d,e; you meant a,d,c,b,1 as it shows now), this was easier thanks to match():
R> test2[match(row.names(test2), row.names(test1)),1,drop=FALSE]
[,1]
a 10
d 7
c 8
b 9
e 6
R>
They key here is that match() does what you want:
R> match(row.names(test2), row.names(test1))
[1] 5 2 3 4 1