I am self-taught useR so please bear with me.
I have something similar the following dataset:
individual value
a 0.917741317
a 0.689673689
a 0.846208486
b 0.439198006
b 0.366260159
b 0.689985484
c 0.703381117
c 0.29467743
c 0.252435687
d 0.298108973
d 0.42951805
d 0.011187204
e 0.078516181
e 0.498118235
e 0.003877632
I would like to create a matrix with the values for a in column1, values for b in column2, etc. [I also add a 1 at the bottom of every column for a later algebra operations]
I have tried so far:
for (i in unique(df$individual)) {
values <- subset(df$value, df$individual == i)
m <- cbind(c(values[1:3],1))
}
I get a (4,1) matrix with the last individual values. What is missing to make it additive for each loop and get all as many columns as individuals?
This operation is called "reshaping". There is a base function, but I find it easier with the reshape2 package:
DF <- read.table(text="individual value
a 0.917741317
a 0.689673689
a 0.846208486
b 0.439198006
b 0.366260159
b 0.689985484
c 0.703381117
c 0.29467743
c 0.252435687
d 0.298108973
d 0.42951805
d 0.011187204
e 0.078516181
e 0.498118235
e 0.003877632", header=TRUE)
DF$id <- 1:3
library(reshape2)
DF2 <- dcast(DF, id ~ individual)
DF2[,-1]
# a b c d e
#1 0.9177413 0.4391980 0.7033811 0.2981090 0.078516181
#2 0.6896737 0.3662602 0.2946774 0.4295180 0.498118235
#3 0.8462085 0.6899855 0.2524357 0.0111872 0.003877632
Related
I'm trying to add those data with each other, but I found "N/A" in the final output when I enter new names didn't exists in the first vector, so how can i handle it to show all the data without any "N/A"
I think you just want to merge/append two factors, an easy approach would be to convert them to character, append them and make it a factor again.
Just a simple example with letters
p <- as.factor(LETTERS[3:8])
q <- as.factor(LETTERS[1:5])
as.factor(c(as.character(p), as.character(q)))
# [1] C D E F G H A B C D E
# Levels: A B C D E F G H
Couldn't find any solution to this question online, but apologies if I missed it.
I have a list of several vectors (all character in this example), of different lengths:
ll <- list(f1 = c("a","b","c"),f2 = c("d","e"),f3 = "f")
I want to convert it into a data.frame that will cover all combinations of the lists elements. So the resulting data.frame will be:
data.frame(f1 = rep(f1,2), f2 = rep(f2,3), f3 = rep(f3,6))
Is there any function that achieves that?
expand.grid should work in this case -
expand.grid(ll)
# f1 f2 f3
#1 a d f
#2 b d f
#3 c d f
#4 a e f
#5 b e f
#6 c e f
Another similar alternative would be purrr::cross_df.
purrr::cross_df(ll)
what is the fastest way to convert the data.table:
1: A B C
2: D E F
3: G H I
into the vector: G H I D E F A B C
I use:
X <- X[order(nrow(X):1),]
X <- melt(t(X))$value
But my feeling is, that this can be optimized :-)
Thank you
One option is to reverse the index, transpose to a matrix and concatenate
c(t(X[.N:1]))
I have a following vector h=c("a","b","c","d","e")
I would like to create the dataset that looks like that using lag() function:
pr <- data.frame(your_github = h,
review_this1 = lag(h),
review_this2 = lag(h,2))
However, when I use lag the following happens:
col2=c(NA,"a","b","c","d") and col3=(NA,NA,"a","b","c")
but I need to get outcome similar to data.frame(col1=c("a","b","c","d","e"),col2=c("b","c","d","e","a"), col3=("c","d","e","a","b")) where values in col2 and col3 are looped (i.e the 2nd column is just teh 1st one that is lagged by 1, but the 1st item in 2nd is teh last item in st column).
Something like this?
library(dplyr)
h = c("a","b","c","d","e")
pr <- data.frame(your_github = h,
review_this1 = ifelse(is.na(lead(h)), h[1], lead(h)),
review_this2 = ifelse(is.na(lead(h, 2)), h[2:1], lead(h, 2)))
pr
# your_github review_this1 review_this2
#1 a b c
#2 b c d
#3 c d e
#4 d e a
#5 e a b
With base R you can achieve this with head and tail (test on tio here):
h<-letters[1:5]
pr <- data.frame(your_github = h,
review_this1 = c(tail(h, -1), head(h, -1)),
review_this2 = c(tail(h, -2), head(h, -2)))
print(pr)
Output:
your_github review_this1 review_this2
1 a b c
2 b c d
3 c d e
4 d e a
5 e a b
The idea is to take the start of the vector h with tail and concatenate it with the end of the vector taken by head minus what we got from tail so we have the same length at end for each column (vector) of the dataframe.
If you want to cycle the vector with last value becoming the first, just reverse the signs in tail and head.
I'm trying to build a factor column that relates to two other factor columns with completely different factor levels. Here's example data.
set.seed(1234)
a<-sample(LETTERS[1:10],50,replace=TRUE)
b<-sample(letters[11:20],50,replace=TRUE)
df<-data.frame(a,b)
df$a<-as.factor(df$a)
df$b<-as.factor(df$b)
The rule I want to make creates a new column, c, that bases it's factor level value based on the value of column a.
if any row in column a ="F", that row in column c will equal whatever the entry is for column b. The code I'm trying:
dfn<-dim(df)[1]
for (i in 1:dfn){
df$c[i]<-ifelse(df$a[i]=="F",df$b[i],df$a[i])
}
df
only spits out the numbered index of the factor level for column b and not the actual entry. What have I done wrong?
I think you'll need to do a little finagling of character values. This seems to do it.
w <- df$a == "F"
df$c <- factor(replace(as.character(df$a), w, as.character(df$b)[w]))
Here is a quick look at the new column,
factor(replace(as.character(df$a), w, as.character(df$b)[w]))
# [1] B G G G I G A C G s G k C J C I C C B C D D B A C I n J I A
# [31] E C D p B H C C J I l G D G D p G E C H
# Levels: A B C D E G H I J k l n p s
As my previous comment, a solution with dplyr:
df %>% mutate(c = ifelse(a == "F", as.character(b), as.character(a)))
If you plan on doing anything involving combinations of the columns as factors, for example, comparisons, you should refactor to the same set of levels.
u<-union(levels(df$a),levels(df$b))
df$a<-factor(df$a,u)
df$b<-factor(df$b,u)
df$c<-df$a
ind<-df$a=="F"
df$c[ind]<-df$b[ind]
By taking this precaution, you can sensibly do
> sum(df$c==df$b)
[1] 6
> sum(df$a=="F")
[1] 6
otherwise the first line will fail.