I have a data frame with three variables and 250K records. As an example consider
df <- data.frame(V1=c(1,2,4), V2=c("a","a","b"), V3=c(2,3,1))
V1 V2 V3
1 a 2
2 a 3
4 b 1
and want to swap values between V1 and V3 based on the value of V2 as follows:
if V2 == 'b' then V1 <- V3 and V3 <- V1
resulting in
V1 V2 V3
1 a 2
2 a 3
1 b 4
I tried a do loop but it takes forever. If I use Perl, it takes seconds. I believe this task can be done efficiently in R as well. Any suggestions are appreciated.
Try this
df <- data.frame(V1=c(1,2,4), V2=c("a","a","b"), V3=c(2,3,1))
df[df$V2 == "b", c("V1", "V3")] <- df[df$V2 == "b", c("V3", "V1")]
which yields:
> df
V1 V2 V3
1 1 a 2
2 2 a 3
3 1 b 4
You can use transform to do this.
df <- transform(df, V3 = ifelse(V2 == 'b', V1, V3), V1 = ifelse(V2 == 'b', V3, V1))
Editted I got tripped up with column names, sorry. This works.
If you don't mind the rows ending up in different orders, this is kind of a 'cute' way to do this:
dat <- read.table(textConnection("V1 V2 V3
1 a 2
2 a 3
4 b 1"),sep = "",header = TRUE)
tmp <- dat[dat$V2 == 'b',3:1]
colnames(tmp) <- colnames(dat)
rbind(dat[dat$V2 != 'b',],tmp)
Basically, that's just grabbing the rows where V2 == 'b', reverses the columns and slaps it back together with everything else. This can be extended if you have more columns that don't need switching; you'd just use an integer index with those values transposed, rather than just 3:1.
Related
I would like to count multiple patterns, but including * (any character) in them.
Here is an example search for: Y*Y, YY* and X*X simultaneously
df <- data.frame(
V1 = c("A", "B", "C", "D"),
V2 = c("XXYYYYY", "XXYYXX" , "XYXXYX", "XYYXYX")
)
And here is my try:
library(stringr)
df$V3 <- str_count(df$V2, "Y+Y+")
df$V4 <- str_count(df$V2, "YY+")
df$V5 <- str_count(df$V2, "X+X+")
I am not sure how to specify a random character in a string and how to count two or more patterns at once.
Expected output:
V1 V2 V3 V4 V5
A XXYYYYY 1 1 1
B XXYYXX 1 1 2
C XYXXYX 2 0 3
D XYYXYX 2 1 3
I have been breaking my head over translating this question to a data.table solution. (to keep it simple I'll use the same data set)
When V2 == "b I want to swap the columns between V1 <-> V3.
dt <- data.table(V1=c(1,2,4), V2=c("a","a","b"), V3=c(2,3,1))
#V1 V2 V3
#1: 1 a 2
#2: 2 a 3
#3: 4 b 1
The code below would be the working solution for data.frame, however because of the amount of frustration this has given me because I was using a data.table without realising I'm now determined to find a solution for data.table.
dt <- data.table(V1=c(1,2,4), V2=c("a","a","b"), V3=c(2,3,1))
df <- as.data.frame(dt)
df[df$V2 == "b", c("V1", "V3")] <- df[df$V2 == "b", c("V3", "V1")]
# V1 V2 V3
#1 1 a 2
#2 2 a 3
#3 1 b 4
I have tried writing a lapply function looping through my target swapping list, tried to narrow down the problem to only replace one value, attempted to call the column names in different ways but all without success.
This was the closest attempt I've managed to get:
> dt[dt$V2 == "b", c("V1", "V3")] <- dt[dt$V2 == "b", c(V3, V1)]
#Warning messages:
#1: In `[<-.data.table`(`*tmp*`, dt$V2 == "b", c("V1", "V3"), value = c(1, :
# Supplied 2 items to be assigned to 1 items of column 'V1' (1 unused)
#2: In `[<-.data.table`(`*tmp*`, dt$V2 == "b", c("V1", "V3"), value = c(1, :
# Supplied 2 items to be assigned to 1 items of column 'V3' (1 unused)
How can we get the data.table solution?
We can try
dt[V2=="b", c("V3", "V1") := .(V1, V3)]
For amusement only. #akruns' solution is clearly superior. I reasoned that I could create a temporary copy, make the conditional swap, and then delete the copy all using [.data.table operations in sequence:
dt[, tv1 := V1][V2=="b", V1 := V3][V2=="b", V3 := tv1][ , tv1 := NULL]
> dt
V1 V2 V3
1: 1 a 2
2: 2 a 3
3: 1 b 4
I have the following question:
I have a list (L1) with two parts and each 4 identical variables.
The variable 4 is also the name of the part of the list. e.g. $a = a
a <- data.frame(V1=c("a","b","c"), V2=c(4,7,9), V3=1:3, V4=c("a","a","a"))
b <- data.frame(V1=c("d","e","f"), V2=c(10,14,16), V3=1:3, V4=c("b","b","b"))
L1 <- list(a=a, b=b)
L1
$a
V1 V2 V3 V4
a 4 1 a
b 7 2 a
c 9 3 a
$b
V1 V2 V3 V4
d 10 1 b
e 14 2 b
f 16 3 b
I would like to extract the rows of each part of the list with V3==2. If there is no row in the list with this value V1 to V3 should be extracted with NA and V4 should contain the name of the part of the list.
In the example the outcome should look like this:
V1 V2 V3 V4
b 7 2 a
e 14 2 b
If I select a value e.g. V3==4 then my result should look like this:
V1 V2 V3 V4
<NA> <NA> <NA> a
<NA> <NA> <NA> b
I can extract a column with
unlist(lapply(L1, "[",3)) but I can't figure out how to extract rows which have a certain value in a variable.
I also tried to combine lapply with the subset function, but this didn't work for me.
Thank's for your help!
This should work. The first command returns a list, the second one converts it to a data frame. If the value is not in the data, it returns NA (for the list) or a row of NAs (for the df).
l <- lapply(L1, function(x) {i <- which(x$V3 == 2)
if (length(i) > 0) x[i, ]
else NA })
df <- rbind(l[[1]], l[[2]])
We could create a function using data.table. We rbind the list elements with rbindlist, grouped by 'V4', if the 'V3' is not equal to the given value, we return the NA elements (.SD[.N+1]) or else return the Subset of Data.table (.SD[tmp]).
library(data.table)
f1 <- function(lst, val){
rbindlist(lst)[, {tmp <- V3==val
if(!any(tmp)) .SD[.N+1]
else .SD[tmp]},
by = V4][, names(lst[[1]]), with=FALSE]
}
f1(L1, 4)
# V1 V2 V3 V4
#1: NA NA NA a
#2: NA NA NA b
f1(L1, 3)
# V1 V2 V3 V4
#1: c 9 3 a
#2: f 16 3 b
f1(L1, 2)
# V1 V2 V3 V4
#1: b 7 2 a
#2: e 14 2 b
You can also bind_rows with dplyr
list(a = a, b = b) %>%
bind_rows(.id = "source") %>%
filter(V2 == 2)
I want to transpose the output given by the last command and write it to a data.frame. I want that dataframe to have 2 columns. First column will have column names and the second column will have data type for the column in each row. How could I achieve it? I tried variety of things but didnt get what I am looking for
smoke <- matrix(c(51,43,22,92,28,21,68,22,9),ncol=3,byrow=TRUE)
smoke <- as.data.frame(smoke)
table1=sapply (smoke, class)
table1
You could also skip the table1 part and go straight from smoke to the desired result.
> data.frame(nm = names(smoke), cl = sapply(unname(smoke), class))
# nm cl
# 1 V1 numeric
# 2 V2 numeric
# 3 V3 numeric
You could try this:
data.frame(var.name = names(table1), var.class = table1, row.names=NULL)
# var.name var.class
#1 V1 numeric
#2 V2 numeric
#3 V3 numeric
You might be looking for the melt command.
library(reshape2)
smoke <- matrix(c(51,43,22,92,28,21,68,22,9),ncol=3,byrow=TRUE)
smoke <- as.data.frame(smoke)
table1 <- sapply (smoke, class)
smoke.melt <- melt(smoke)
levels(smoke.melt$variable) <- table1
> smoke.melt
variable value
1 numeric 51
2 numeric 92
3 numeric 68
4 numeric 43
5 numeric 28
6 numeric 22
7 numeric 22
8 numeric 21
9 numeric 9
Just convert table1 to data.frame and adjust:
dd = data.frame(table1)
dd
table1
V1 numeric
V2 numeric
V3 numeric
dd$VarName = rownames(dd)
dd
table1 VarName
V1 numeric V1
V2 numeric V2
V3 numeric V3
dd = dd[,c(2,1)]
dd
VarName table1
V1 V1 numeric
V2 V2 numeric
V3 V3 numeric
names(dd)[2] = "type"
dd
VarName type
V1 V1 numeric
V2 V2 numeric
V3 V3 numeric
I have variables v1,v2,etc and I want to create a dataframe.
I want to avoid doing:
df <-data.frame(v1,v2,...)
I would like to refer to the index in each of the variables and do something like:
for (i in 1:n){
df <-data.frame(v[i])
}
or do a max and min:
df <-data.frame(v1 to vn)
I just can't figure out what the proper syntax is.
You can do:
as.data.frame(mget(paste0("v", 1:n)))
v1 <- 1:3
v2 <- 2:4
v3 <- 3:5
as.data.frame(mget(paste0("v", 1:3)))
# v1 v2 v3
# 1 1 2 3
# 2 2 3 4
# 3 3 4 5