Change rows index dataframe R - r

I'm new on programming on R and I need a simple thing. I have a dataframe like this:
A number
3 1 3
4 1 4
11 2 11
12 2 12
18 3 18
19 3 19
the first column is the one obtained by R default. I'd like to exchange this one with the column "number" always having the name of the column. Something like this:
number A
3 1
4 1
11 2
12 2
18 3
19 3
I need to do it because it is a large dataset and going on the correspondence between two columns is lost.

It seems like you want to remove your row names?
df <- data.frame("Colours" = c("Red", "Red", "Green", "Yellow"),
"Number" = c(1,2,3,6))
rownames(df) <- c(1,2,3,6)
df
Colours Number
1 Red 1
2 Red 2
3 Green 3
6 Yellow 6
Setting rownames as NULL, we will remove the row names and they will be called by just row number now.
rownames(df) <- NULL
df
Colours Number
1 Red 1
2 Red 2
3 Green 3
4 Yellow 6

Related

How to split a dataframe into multiple at the same time based on the value of one column

crop.genos <- data.frame(crop=rep(1:6, each=4),genos=rep(1:4, 6))
crop.genos$crop.genotype <- paste(crop.genos$crop, crop.genos$genos, sep="")
Here I got a data frame with three columns: crop, genos, crop.geotype. And I want to get six different dataframe based on the crop catogory (such like the example below), all the rest columns are remained
crop genos crop.genotype
1 1 1 11
2 1 2 12
3 1 3 13
Use split:
l <- split(crop.genos, crop.genos$crop)
names(l) <- paste0('df', names(l))
list2env(l, env = .GlobalEnv)
output
> df1
# crop genos crop.genotype
#1 1 1 11
#2 1 2 12
#3 1 3 13
#4 1 4 14

Create new column with shared ID to randomly link two rows in R

I am using R and working with this sample dataframe.
library(tibble)
library(stats)
set.seed(111)
conditions <- factor(c("1","2","3"))
df_sim <-
tibble::tibble(StudentID = 1:10,
Condition = sample(conditions,
size = 10,
replace = T),
XP = stats::rpois(n = 10,
lambda = 15))
This creates the following tibble.
StudentID
Condition
XP
1
2
8
2
3
11
3
3
16
4
3
12
5
1
22
6
3
16
7
1
18
8
3
8
9
2
14
10
1
17
I am trying create a new column in my dataframe called DyadID. The purpose of this column is to create a variable that is uniquely shared by two students in the dataframe — in other words, two students (e.g. Student 1 and Student 9) would share the same value (e.g. 4) in the DyadID column.
However, I only want observations linked together if they share the same Condition value. Condition contains three unique values (1, 2, 3). I want condition 1 observations linked with other condition 1 observations, 2 with 2, and 3 with 3.
Importantly, I'd like the students to be linked together randomly.
Ideally, I would like to stay within the tidyverse as that is what I am most familiar with. However, if that's not possible or ideal, any solution would be appreciated.
Here is a possible outcome I am hoping to achieve.
StudentID
Condition
XP
DyadID
1
2
8
4
2
3
11
1
3
3
16
2
4
3
12
1
5
1
22
3
6
3
16
NA
7
1
18
3
8
3
8
2
9
2
14
4
10
1
17
NA
Note that two students did not receive a pairing, because there was an odd number in condition 1 and condition 3. If there is an odd number, the DyadID can be NA.
Thank you for your help with this!
Using match to get a unique id according to Condition and sample for randomness.
library(dplyr)
df_sim <- df_sim %>% mutate(dyad_id = match(Condition,sample(unique(Condition))))

How to extract a list of columns name based on the means of their data?

I'm pretty new to R and hope i'll make myself clear enough.
I have a table of several columns which are factors. I want to make a score for each of these columns. Then I want to calculate the mean of each score, and display the list of columns ranked by their mean scores, is that possible ?
Table would be:
head(musico[,69:73])
AVIS1 AVIS2 AVIS3 AVIS4 AVIS5
1 2 1 2 3 2
2 2 5 2 3 2
3 3 2 5 5 1
4 1 2 5 5 5
5 1 5 1 3 1
6 4 1 4 5 4
I want to make a score for each:
musico$score1<-0
musico$score1[musico$AVIS1==1]<-1
musico$score1[musico$AVIS1==2]<-0.5
then do the mean of each column score: mean of score1, mean of score2, ...:
mean(musico$score1), mean(musico$score2), ...
My goal is to have a list of titles (avis1, avis2,...) ranked by their mean score.
Any advice appreciated !
Here's one way using base although it is somewhat unclear what you want. What does score1 have to do with AVIS1? I think you may be missing some of the data from musico.
Based on the example provided, here's a base R solution. vapply loops through the data.frame and produces the mean for each column. Then the stack and order are only there to make the output a dataframe that looks nice.
music <- read.table(text = "
AVIS1 AVIS2 AVIS3 AVIS4 AVIS5
1 2 1 2 3 2
2 2 5 2 3 2
3 3 2 5 5 1
4 1 2 5 5 5
5 1 5 1 3 1
6 4 1 4 5 4", header = TRUE)
means <- vapply(music, mean, 1)
stack(means[order(means, decreasing = TRUE)])
values ind
4 4.000000 AVIS4
3 3.166667 AVIS3
2 2.666667 AVIS2
5 2.500000 AVIS5
1 2.166667 AVIS1
This is how I would do it by first introducing a scores vector to be used as a lookup. I assume that scores are decreasing by 0.5 and that the number of scores needed are according to the maximum number of levels found in your columns (i.e. 6 seen in AVIS1).
Then using tidyr you can organise your data set such that you have to variables (i.e. AVIS and Value) containing the respective levels. Then add a score variable with the mutate function from dplyr in which the position of the score in the score vector matches the value in the Value variable. From here you can find the mean scores corresponding to the AVIS levels, arrange them accordingly and put them in a list.
music <- read.table(text = "
AVIS1 AVIS2 AVIS3 AVIS4 AVIS5
1 2 1 2 3 2
2 2 5 2 3 2
3 3 2 5 5 1
4 1 2 5 5 5
5 1 5 1 3 1
6 4 1 4 5 4", header = TRUE) # your data
scores <- seq(1, by = -0.5, length.out = 6) # vector of scores
library(tidyr)
library(dplyr)
music2 <- music %>%
gather(AVIS, Value) %>% # here you tidy the data
mutate(score = scores[Value]) %>% # match score to value
group_by(AVIS) %>% # group AVIS levels
summarise(score.mean = mean(score)) %>% # find mean scores for AVIS levels
arrange(desc(score.mean))
list <- list(AVIS = music2$AVIS) # here is the list
> list$AVIS
[1] "AVIS1" "AVIS5" "AVIS2" "AVIS3" "AVIS4"

Create a new level column based on unique row sets

I want to create a new column with new variables (preferably letters) to count the frequency of each set later on.
Lets say I have a data frame called datatemp which is like:
datatemp = data.frame(colors=rep( c("red","blue"), 6), val = 1:6)
colors val
1 red 1
2 blue 2
3 red 3
4 blue 4
5 red 5
6 blue 6
7 red 1
8 blue 2
9 red 3
10 blue 4
11 red 5
12 blue 6
And I can see my unique row sets where colors and val columns have identical inputs together, such as:
unique(datatemp[c("colors","val")])
colors val
1 red 1
2 blue 2
3 red 3
4 blue 4
5 red 5
6 blue 6
What I really want to do is to create a new column in the same data frame where each unique set of row above has a level, such as:
colors val freq
1 red 1 A
2 blue 2 B
3 red 3 C
4 blue 4 D
5 red 5 E
6 blue 6 F
7 red 1 A
8 blue 2 B
9 red 3 C
10 blue 4 D
11 red 5 E
12 blue 6 F
I know that's very basic, however, I couldn't come up with an useful idea for a huge dataset.
So make the question more clear, I am giving another representation of desired output below:
colA colB newcol
10 11 A
12 15 B
10 11 A
13 15 C
Values in the new column should be based on uniqueness of first two columns before it.
www's solution maps the unique values in your value column to letters in freq column. If you want to do create a factor variable for each unique combination of colors and val, you could do something along these lines:
library(plyr)
datatemp = data.frame(colors=rep( c("red","blue"), 6), val = 1:6)
datatemp$freq <- factor(paste(datatemp$colors, datatemp$val), levels=unique(paste(datatemp$colors, datatemp$val)))
datatemp$freq <- mapvalues(datatemp$freq, from = levels(datatemp$freq), to = LETTERS[1:length(levels(datatemp$freq))])
I first create a new factor variable for each unique combination of val and colors, and then use plyr::mapvalues to rename the factor levels to letters.
We can concatenate the val and color column and create it as factor, then we can change the factor level by letters.
datatemp$Freq <- as.factor(paste(datatemp$val, datatemp$colors, sep = "_"))
levels(datatemp$Freq) <- LETTERS[1:length(levels(datatemp$Freq))]
datatemp
# colors val Freq
# 1 red 1 A
# 2 blue 2 B
# 3 red 3 C
# 4 blue 4 D
# 5 red 5 E
# 6 blue 6 F
# 7 red 1 A
# 8 blue 2 B
# 9 red 3 C
# 10 blue 4 D
# 11 red 5 E
# 12 blue 6 F

Assign ID across 2 columns of variable

I have a data frame in which each individual (row) has two data points per variable.
Example data:
df1 <- read.table(text = "IID L1.1 L1.2 L2.1 L2.2
1 1 38V1 38V1 48V1 52V1
2 2 36V1 38V2 50V1 48Y1
3 3 37Y1 36V1 50V2 48V1
4 4 38V2 36V2 52V1 50V2",
stringsAsFactor = FALSE, header = TRUE)
I have many more columns than this in the full dataset and would like to recode these values to label unique identifiers across the two columns. I know how to get identifiers and relabel a single column from previous questions (Creating a unique ID and How to assign a unique ID number to each group of identical values in a column) but I don't know how to include the information for two columns, as R identifies and labels factors per column.
Ultimately I want something that would look like this for the above data:
(df2)
IID L1.1 L1.2 L2.1 L2.2
1 1 1 1 1 4
2 2 2 4 2 5
3 3 3 2 3 1
4 4 1 5 4 3
It doesn't really matter what the numbers are, as long as they indicate unique values across both columns. I've tried creating a function based on the output from:
unique(df1[,1:2])
but am struggling as this still looks at unique entries per column, not across the two.
Something like this would work...
pairs <- (ncol(df1)-1)/2
for(i in 1:pairs){
refs <- unique(c(df1[,2*i],df1[,2*i+1]))
df1[,2*i] <- match(df1[,2*i],refs)
df1[,2*i+1] <- match(df1[,2*i+1],refs)
}
df1
IID L1.1 L1.2 L2.1 L2.2
1 1 1 1 1 4
2 2 2 4 2 5
3 3 3 2 3 1
4 4 4 5 4 3
You could reshape it to long format, assign the groups and then recast it to wide:
library(data.table)
df_m <- melt(df, id.vars = "IID")
setDT(df_m)[, id := .GRP, by = .(gsub("(.*).","\\1", df_m$variable), value)]
dcast(df_m, IID ~ variable, value.var = "id")
# IID L1.1 L1.2 L2.1 L2.2
#1 1 1 1 6 9
#2 2 2 4 7 10
#3 3 3 2 8 6
#4 4 1 5 9 8
This should also be easily expandable to multiple groups of columns. I.e. if you have L3. it should work with that as well.

Resources