I have data in a form like this (reproducible code below):
#> y x char
#> 1 1 1 a
#> 2 1 2 b
#> 3 1 3 c
#> 4 2 1 d
#> 5 2 2 e
#> 6 2 3 f
#> 7 3 1 g
#> 8 3 2 h
#> 9 3 3 i
df <- data.frame(stringsAsFactors=FALSE,
y = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
x = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
char = c("a", "b", "c", "d", "e", "f", "g", "h", "i")
)
df
Is there an easy way to print to the screen using y values as the y axis, and x values as the x axis and the third column (char) as the values? A solution with map() would be great.
So my desired output would look like
abc
def
ghi
I started trying to loop through y and x, with a view to using purrr::map(), but I haven't gotten very far.
if (df$y==1 & df$x==1){
print(df$char)
}
That's what tidyr::spread() is for:
spread(df, x, char)
You can also convert your data.frame into a matrix:
a <- matrix(
data = df$char,
nrow = length(unique(df$x)),
ncol = length(unique(df$y)),
"dimnames" = list(unique(df$y), unique(df$x)),
byrow = TRUE
)
it will be:
1 2 3
1 "a" "b" "c"
2 "d" "e" "f"
3 "g" "h" "i"
To concatenate the strings into a column as you wish:
for (r in 1:nrow(a)) {
print(paste(a[r, ], collapse = ''))
}
[1] "abc"
[1] "def"
[1] "ghi"
Related
I have 2 vectors.
x=c("a", "b", "c", "d", "a", "b", "c")
y=structure(c(1, 2, 3, 4, 5, 6, 7, 8), .Names = c("a", "e", "b",
"c", "d", "a", "b", "c"))
I would like to match a to a, b to b in sequence accordingly, so that x[2] matches y[3] rather than y[7]; and x[5] matches y[6] rather than y[1], so on and so forth.
lapply(x, function(z) grep(z, names(y), fixed=T))
gives:
[[1]]
[1] 1 6
[[2]]
[1] 3 7
[[3]]
[1] 4 8
[[4]]
[1] 5
[[5]]
[1] 1 6
[[6]]
[1] 3 7
[[7]]
[1] 4 8
which matches all instances. How do I get this sequence:
1 3 4 5 6 7 8
So that elements in x can be mapped to the corresponding values in y accordingly?
You are actually looking for pmatch
pmatch(x,names(y))
[1] 1 3 4 5 6 7 8
You can change the names attributes according to the number of times each element appeared and then subset y:
x2 <- paste0(x, ave(x, x, FUN=seq_along))
#[1] "a1" "b1" "c1" "d1" "a2" "b2" "c2"
names(y) <- paste0(names(y), ave(names(y), names(y), FUN=seq_along))
y[x2]
#a1 b1 c1 d1 a2 b2 c2
# 1 3 4 5 6 7 8
Another option using Reduce
Reduce(function(v, k) y[-seq_len(v)][k],
x=x[-1L],
init=y[x[1L]],
accumulate=TRUE)
Well, I did it with a for-loop
#Initialise the vector with length same as x.
answer <- numeric(length(x))
for (i in seq_along(x)) {
#match the ith element of x with that of names in y.
answer[i] <- match(x[i], names(y))
#Replace the name of the matched element to empty string so next time you
#encounter it you get the next index.
names(y)[i] <- ""
}
answer
#[1] 1 3 4 5 6 7 8
Another possibility:
l <- lapply(x, grep, x = names(y), fixed = TRUE)
i <- as.integer(ave(x, x, FUN = seq_along))
mapply(`[`, l, i)
which gives:
[1] 1 3 4 5 6 7 8
Similar solution to Ronak, but it does not persist changes to y
yFoo<-names(y)
sapply(x,function(u){res<-match(u,yFoo);yFoo[res]<<-"foo";return(res)})
Result
#a b c d a b c
#1 3 4 5 6 7 8
How do I pull out unique values from each column in a data frame (both numeric and strings) and make into one column?
a = c("a", "b", "c", "d", "a")
b = c(1, 2, 3, 4, 3)
df <- cbind(a, b)
The preferred output would be:
variable Level
a a
a b
a c
a d
b 1
b 2
b 3
b 4
The sample data above is simple but the intent is to be able to use the answer for multiple data frame with different column names and data in them. Thank you.
Quick + scalable
Tidyr's gather and dplyr's distinct gives you a quick way to get that structure. (I left the package calls in the functions so you can remember which one is from which package, which I always forget.)
library(tidyverse)
a = c("a", "b", "c", "d", "a")
b = c(1, 2, 3, 4, 3)
data.frame(a,b) %>% tidyr::gather() %>% dplyr::distinct()
key value
1 a a
2 a b
3 a c
4 a d
5 b 1
6 b 2
7 b 3
8 b 4
We place it in a list, get the unique elements, set the names with letters and then stack to data.frame
d1 <- stack(setNames(lapply(list(a, b), unique), letters[1:2]))[2:1]
colnames(d1) <- c('variable', 'Level')
df data.frame creation:
a = c("a", "b", "c", "d", "a")
b = c(1, 2, 3, 4, 3)
df <- cbind(a, b)
Columns name extraction
names<-colnames(df)
Data Extration
variable<-NULL
Level<-NULL
for(i in 1:length(names))
{
variable<-c(variable,rep(names[i],length(unique(df[,i]))))
Level<-c(Level,unique(df[,i]))
}
Your generic output
db<-cbind(variable,Level)
db
variable Level
[1,] "a" "a"
[2,] "a" "b"
[3,] "a" "c"
[4,] "a" "d"
[5,] "b" "1"
[6,] "b" "2"
[7,] "b" "3"
[8,] "b" "4"
I have a matrix with last column contains characters:
A
B
B
A
...
I would like to replace A with 1 and B with 2 in R. The expected result should be:
1
2
2
1
...
If you are 100% confident only "A" and "B" appear
sample_data = c("A", "B", "B", "A")
sample_data
# [1] "A" "B" "B" "A"
as.numeric(gsub("A", 1, gsub("B", 2, sample_data)))
# [1] 1 2 2 1
Using factor or a simple lookup table would be much more flexible:
sample_data = c("A", "B", "B", "A")
Recommended:
as.numeric(factor(sample_data))
# [1] 1 2 2 1
Possible alternative:
as.numeric(c("A" = "1", "B" = "2")[sample_data])
# [1] 1 2 2 1
I have a vector like:
c("A", "B", "C", "D", "E", "F")
and I'd like to create a dataframe like
"from" "to"
A B
B C
C D
D E
E F
how can I accomplish that?
Another way:
data.frame(from = vec[-length(vec)], to = vec[-1])
na.omit(data.frame(from = vec, to = dplyr::lead(vec)))
from to
1 A B
2 B C
3 C D
4 D E
5 E F
Another way is to use zoo package,
library(zoo)
rollapply(vec, 2, by = 1, paste)
Here is one method using embed and rearranging columns:
# data
temp <- c("A", "B", "C", "D", "E", "F")
embed(temp, 2)[, c(2,1)]
[,1] [,2]
[1,] "A" "B"
[2,] "B" "C"
[3,] "C" "D"
[4,] "D" "E"
[5,] "E" "F"
to put this into a data.frame, wrap it in data.frame:
setNames(data.frame(embed(temp, 2)[, c(2,1)]), c("from", "to"))
from to
1 A B
2 B C
3 C D
4 D E
5 E F
We could also do:
vec <- c("A", "B", "C", "D", "E", "F")
x <- rep(seq(length(vec)), each=2)[-length(vec)*2][-1]
# [1] 1 2 2 3 3 4 4 5 5 6
data.frame(matrix(vec[x], ncol = 2, byrow = T))
Or alternatively:
data.frame(t(sapply(seq(length(vec)-1), function(i) c(vec[i], vec[i+1]))))
# X1 X2
# 1 A B
# 2 B C
# 3 C D
# 4 D E
# 5 E F
I have a vector of objects (object) along with a corresponding vector of time frames (tframe) in which the objects were observed. For each unique pair of objects, I want to calculate the number of time frames in which both objects were observed.
I can write the code using for() loops, but it takes a long time to run as the number of unique objects increases. How might I change the code to speed up the run time?
Below is an example with 4 unique objects (in reality I have about 300). For example, objects a and c were both observed in time frames 1 and 2, so they get a count of 2. Objects b and d were never observed in the same time frame, so they get a count of 0.
object <- c("a", "a", "a", "b", "b", "c", "c", "c", "c", "d")
tframe <- c(1, 1, 2, 2, 3, 1, 2, 2, 3, 1)
uo <- unique(object)
n <- length(uo)
mpairs <- matrix(NA, nrow=n*(n-1)/2, ncol=3, dimnames=list(NULL,
c("obj1", "obj2", "sametf")))
row <- 0
for(i in 1:(n-1)) {
for(j in (i+1):n) {
row <- row+1
mpairs[row, "obj1"] <- uo[i]
mpairs[row, "obj2"] <- uo[j]
# no. of time frames in which both objects in a pair were observed
intwin <- intersect(tframe[object==uo[i]], tframe[object==uo[j]])
mpairs[row, "sametf"] <- length(intwin)
}}
data.frame(object, tframe)
object tframe
1 a 1
2 a 1
3 a 2
4 b 2
5 b 3
6 c 1
7 c 2
8 c 2
9 c 3
10 d 1
mpairs
obj1 obj2 sametf
[1,] "a" "b" "1"
[2,] "a" "c" "2"
[3,] "a" "d" "1"
[4,] "b" "c" "2"
[5,] "b" "d" "0"
[6,] "c" "d" "1"
You can use crossproduct to get the counts of agreement. You can then reshape the
data, if required.
Example
object <- c("a", "a", "a", "b", "b", "c", "c", "c", "c", "d")
tframe <- c(1, 1, 2, 2, 3, 1, 2, 2, 3, 1)
# This will give you the counts
# Use code from Jean's comment
tab <- tcrossprod(table(object, tframe)>0)
# Reshape the data
tab[lower.tri(tab, TRUE)] <- NA
reshape2::melt(tab, na.rm=TRUE)