I have a following vector h=c("a","b","c","d","e")
I would like to create the dataset that looks like that using lag() function:
pr <- data.frame(your_github = h,
review_this1 = lag(h),
review_this2 = lag(h,2))
However, when I use lag the following happens:
col2=c(NA,"a","b","c","d") and col3=(NA,NA,"a","b","c")
but I need to get outcome similar to data.frame(col1=c("a","b","c","d","e"),col2=c("b","c","d","e","a"), col3=("c","d","e","a","b")) where values in col2 and col3 are looped (i.e the 2nd column is just teh 1st one that is lagged by 1, but the 1st item in 2nd is teh last item in st column).
Something like this?
library(dplyr)
h = c("a","b","c","d","e")
pr <- data.frame(your_github = h,
review_this1 = ifelse(is.na(lead(h)), h[1], lead(h)),
review_this2 = ifelse(is.na(lead(h, 2)), h[2:1], lead(h, 2)))
pr
# your_github review_this1 review_this2
#1 a b c
#2 b c d
#3 c d e
#4 d e a
#5 e a b
With base R you can achieve this with head and tail (test on tio here):
h<-letters[1:5]
pr <- data.frame(your_github = h,
review_this1 = c(tail(h, -1), head(h, -1)),
review_this2 = c(tail(h, -2), head(h, -2)))
print(pr)
Output:
your_github review_this1 review_this2
1 a b c
2 b c d
3 c d e
4 d e a
5 e a b
The idea is to take the start of the vector h with tail and concatenate it with the end of the vector taken by head minus what we got from tail so we have the same length at end for each column (vector) of the dataframe.
If you want to cycle the vector with last value becoming the first, just reverse the signs in tail and head.
Related
I have the following vector:
samples <- c("bl", "ra", "ye", "gp", "dk")
which I would like to add to the dataframe
df <- data.frame(Country = "FR", Name = "Jean", A="",B="",C="",D="",E="",F="",G="",H="",I="",J="",K="",L="",M="",N="",O="",P="",Q="",R="",S="",T="",U="",V="",W="ok",X="ok",Y="ok",Z="ok",A1="ok",B1="ok")
and give the output
Country Name A B C D E F G H I J K L M N O P Q R S T ....
1 FR Jean bl ra ye gp dk
The aim:
Place elements within the vector into the dataframe that already contains some values.
The first element has to be in column 3
Subsequent elements have to be in every 5th column from the first element i.e columns 7, 11, 15, 19, ... (4i-1)
A for loop that automatically adds the elements every 5th column from the first element. Depending on the situation, I may have a much longer vector than what I specified. It would be tedious to assign each element to the column names individually.
There is no need of a loop. You can directly assign samples to the corresponding columns.
df[, seq_along(samples) * 4 - 1] <- samples
df
# Country Name A B C D E F G H I J K L M N O P Q R S T U V W X Y Z A1 B1
# 1 FR Jean bl ra ye gp dk ok ok ok ok ok ok
First define a target_pos vector containing the position of your target columns, then iterate over the vector to append samples to your df.
Note there's some differences between your input df and your desired output. For example you don't have column K in your df but you have that in your desired output.
target_pos <- seq(3, 5*length(samples) - 3, 4)
for (i in 1:length(target_pos)) df[, target_pos[i]] <- samples[i]
df
#> Country Name A B C D E F G H I J L M N O P Q R S T U V W X Y Z A1 B1
#> 1 FR Jean bl ra ye gp dk ok ok ok ok ok ok
Data
df<-data.frame(Country = "FR", Name = "Jean", A="",B="",C="",D="",E="",F="",G="",H="",I="",J="",L="",M="",N="",O="",P="",Q="",R="",S="",T="",U="",V="",W="ok",X="ok",Y="ok",Z="ok",A1="ok",B1="ok")
samples=c("bl","ra","ye","gp","dk")
Couldn't find any solution to this question online, but apologies if I missed it.
I have a list of several vectors (all character in this example), of different lengths:
ll <- list(f1 = c("a","b","c"),f2 = c("d","e"),f3 = "f")
I want to convert it into a data.frame that will cover all combinations of the lists elements. So the resulting data.frame will be:
data.frame(f1 = rep(f1,2), f2 = rep(f2,3), f3 = rep(f3,6))
Is there any function that achieves that?
expand.grid should work in this case -
expand.grid(ll)
# f1 f2 f3
#1 a d f
#2 b d f
#3 c d f
#4 a e f
#5 b e f
#6 c e f
Another similar alternative would be purrr::cross_df.
purrr::cross_df(ll)
I want to return the column name from columns A through E based on matching data in column in F. I then want to return the value into a new column G.
For example:
df <- structure(list(A = c(-0.113802816901408, -0.613802816901408,
0.136197183098592, 0.126197183098592, 0.286197183098592), B = c(-0.294595070422536,
-0.504595070422535, 0.125404929577464, 0.135404929577464, 0.275404929577465
), C = c(-0.277065727699531, -0.507065727699531, 0.282934272300469,
0.0729342723004693, 0.122934272300469), D = c(-0.222699530516432,
-0.132699530516432, -0.162699530516432, 0.127300469483568, -0.0126995305164321
), E = c(-0.246845657276995, -0.426845657276995, -0.186845657276995,
0.133154342723005, 0.113154342723004), F = c(-0.222699530516432,
-0.426845657276995, 0.136197183098592, 0.133154342723005, 0.275404929577465
)), row.names = c(NA, 5L), class = "data.frame")
So the vector for column G should end up being: D, E, A, E, B
Ideally, if there are multiple matches (which I don't think my example has), it would be good to send such information to a new column or perhaps to throw an error. This second issue is not as important though.
Compare the first 5 columns with F column and use max.col to get the column number with the same value.
df$G <- names(df)[max.col(df[1:5] == df$F)]
df
# A B C D E F G
#1 -0.1138028 -0.2945951 -0.27706573 -0.22269953 -0.2468457 -0.2226995 D
#2 -0.6138028 -0.5045951 -0.50706573 -0.13269953 -0.4268457 -0.4268457 E
#3 0.1361972 0.1254049 0.28293427 -0.16269953 -0.1868457 0.1361972 A
#4 0.1261972 0.1354049 0.07293427 0.12730047 0.1331543 0.1331543 E
#5 0.2861972 0.2754049 0.12293427 -0.01269953 0.1131543 0.2754049 B
In case of multiple matches max.col returns a random column number. You can handle it by specifying ties.method. See ?max.col for more details.
i have two data frames.
one is structured like this:
code. name.
1111 A B
1122 C D
2122 C D
2133 G H
the other is:
code_2. name.
11 F
21 G
i want to obtain a third df that, in relation to code match, concatenate my data present in the first data frame, using a "OR" separator. The code value that I want to mantain is the the one of the second df. It is important that the match among code values would be made on the first and second number of the code belonging to the first dataframe.
code. name.
11 A B OR C D
21 C D OR G H
thank you for your suggestions!
You can use aggregate, i.e.
aggregate(name. ~ substr(code., 1, 2), df, paste, collapse = ' OR ')
# substr(code., 1, 2) name.
#1 11 A B OR C D
#2 21 C D OR G H
You can take care of the column names as usual.
If you prefer tidyverse, you can try something like:
df %>%
group_by(code. = str_extract(as.character(code.), "^.{2}")) %>%
summarise(name. = paste(name., collapse = " OR "))
code. name.
<chr> <chr>
1 11 A B OR C D
2 21 C D OR G H
It groups by the first two elements from "code." and then combines the "name." column based on those elements.
Or the same using sub():
df %>%
group_by(code. = sub("^(.{2}).*", "\\1", as.character(code.))) %>%
summarise(name. = paste(name., collapse = " OR "))
Or the same using substring():
df %>%
group_by(code. = substring(as.character(code.), 1, 2)) %>%
summarise(name. = paste(name., collapse = " OR "))
I am self-taught useR so please bear with me.
I have something similar the following dataset:
individual value
a 0.917741317
a 0.689673689
a 0.846208486
b 0.439198006
b 0.366260159
b 0.689985484
c 0.703381117
c 0.29467743
c 0.252435687
d 0.298108973
d 0.42951805
d 0.011187204
e 0.078516181
e 0.498118235
e 0.003877632
I would like to create a matrix with the values for a in column1, values for b in column2, etc. [I also add a 1 at the bottom of every column for a later algebra operations]
I have tried so far:
for (i in unique(df$individual)) {
values <- subset(df$value, df$individual == i)
m <- cbind(c(values[1:3],1))
}
I get a (4,1) matrix with the last individual values. What is missing to make it additive for each loop and get all as many columns as individuals?
This operation is called "reshaping". There is a base function, but I find it easier with the reshape2 package:
DF <- read.table(text="individual value
a 0.917741317
a 0.689673689
a 0.846208486
b 0.439198006
b 0.366260159
b 0.689985484
c 0.703381117
c 0.29467743
c 0.252435687
d 0.298108973
d 0.42951805
d 0.011187204
e 0.078516181
e 0.498118235
e 0.003877632", header=TRUE)
DF$id <- 1:3
library(reshape2)
DF2 <- dcast(DF, id ~ individual)
DF2[,-1]
# a b c d e
#1 0.9177413 0.4391980 0.7033811 0.2981090 0.078516181
#2 0.6896737 0.3662602 0.2946774 0.4295180 0.498118235
#3 0.8462085 0.6899855 0.2524357 0.0111872 0.003877632