Assigning complex values to character elements of data frame in R - r

There are three columns in my data frame which are characters, "A","B", and "C" (this order can vary for different data frames). I want to assign values to them, A= 1+0i, B=2+3i and C=3+2i. I use as.complex(factor(col1)) and the same thing for column two and three, but it makes all three column equal to 1+0i!!
col1 <- c("A","A", "A")
col2 <- c("B", "B","B")
col3 <- c("C","C","C")
df <- data.frame(col1,col2,col3)
print(df)
A= 1+0i
B=2+3i
C=3+2i
df2<- transform(df, col1=as.complex(as.factor(col1)),col2=as.complex(as.factor(col2)),col3=as.complex(as.factor(col3)))
sapply(df2,class)
View(df2)

So this is a weird thing you're doing. You have a column of strings, letters like "A" and "B". Then you have objects with the same names, A = 1 + 0i, etc. Normally we don't treat object names as "data", but you're sort of mixing the two here. The solution I'd propose is to make everything data: combine your A, B, and C values into a vector, and give the vector names accordingly. Then we can replace the values in the data frame with the corresponding values from our named vector:
vec = c(A, B, C)
names(vec) = c("A", "B", "C")
df[] = lapply(df, \(x) vec[x])
df
# col1 col2 col3
# 1 1+0i 2+3i 3+2i
# 2 1+0i 2+3i 3+2i
# 3 1+0i 2+3i 3+2i

Related

Finding the Frequency of Values Across Character Strings

I have three different character vectors of different lengths. Some have overlapping values, others have unique values. These values appear a different number of times in each vector. For example,
A <- c("A", "A", "B")
B <- c("A", "B", "C", "D")
C <- c("B", "A", "C", "E", "F")
I want to know
How many unique values there are, in total.
What those values are
The frequency of each value across all lists, and I want to be able to filter it (ex: values that appear less then or equal to two times across all lists)
Edit to clarify the above point: I want to know how many times a value comes up across all lists. For example, I want to know that the value A comes up 4 times and the value F only once.
How do I go about doing this? I can't find a stringr command to do this and I am new to working with strings.
#Unique items
> unique(A)
[1] "A" "B"
#count of unique items
> length(unique(A))
[1] 2
#frequency of each unique value
df_A <- data.frame(A =A) #data frame prepared
> dplyr::mutate(dplyr::group_by(df_A, A), freq = n())
# A tibble: 3 x 2
# Groups: A [2]
A freq
<chr> <int>
1 A 2
2 A 2
3 B 1
#filter
df_A <- dplyr::mutate(dplyr::group_by(df_A, A), freq = n())
df_A$A[df_A$freq < 2]
> df_A$A[df_A$freq < 2]
[1] "B"
EDIT
#unique items across all lists
> unique(c(A, B, C))
[1] "A" "B" "C" "D" "E" "F"
#Freq across all lists
tabulate(as.factor(c(A,B,C)))
[1] 4 3 2 1 1 1
#OR
> table(c(A, B, C))
A B C D E F
4 3 2 1 1 1
You can use following steps:
To find unique elements:
uq <- unique(A)
To total of unique elements:
library(car)
A1 <- recode(A, "'A' = 1; 'B' = 2")
# This will give frequencies for all the elements
names(which(table(A1) == max(table(A1))))
tab <- sort(table(a)) # to sort the result in ascending order of frequency
How many unique values there are, in total.
table(unique(A1))

How to find common variables in different data frames?

I have several data frames with similar (but not identical) series of variables (columns). I want to find a way for R to tell me what are the common variables across different data frames.
Example:
`a <- c(1, 2, 3)
b <- c(4, 5, 6)
c <- c(7, 8, 9)
df1 <- data.frame(a, b, c)
b <- c(1, 3, 5)
c <- c(2, 4, 6)
df2 <- data.frame(b, c)`
With df1 and df2, I would want some way for R to tell me that the common variables are b and c.
1) For 2 data frames:
intersect(names(df1), names(df2))
## [1] "b" "c"
To get the names that are in df1 but not in df2:
setdiff(names(df1), names(df2))
1a) and for any number of data frames (i.e. get the names common to all of them):
L <- list(df1, df2)
Reduce(intersect, lapply(L, names))
## [1] "b" "c"
2) An alternative is to use duplicated since the common names will be the ones that are duplicated if we concatenate the names of the two data frames.
nms <- c(names(df1), names(df2))
nms[duplicated(nms)]
## [1] "b" "c"
2a) To generalize that to n data frames use table and look for the names that occur the same number of times as data frames:
L <- list(df1, df2)
tab <- table(unlist(lapply(L, names)))
names(tab[tab == length(L)])
## [1] "b" "c"
Use intersect:
intersect(colnames(df1),colnames(df2))
OR
We can also check for the colname using %in%:
colnames(df1)[colnames(df1) %in% colnames(df2)]
Output:
[1] "b" "c"

Split a column of character vectors and return a list

I have the following dataframe:
df <- data.frame(Sl.No = c(1:6),
Variable = c('a', 'a,b', 'a,b,c', 'b', 'c', 'b,c'))
Sl.No Variable
1 a
2 a,b
3 a,b,c
4 b
5 c
6 b,c
I want to separate the unique values in the variable column as list
myList <- ("a", "b", "c")
I have tried the following code:
separator <- function(x) strsplit(x, ",")[[1]][[1]]
unique(sapply(df$Variable, separator))
This however gives me the following output:
"a"
I request some help. I have searched but seem unable to find an answer to this.
We can split the Variable column at "," and get all the values and select only the unique ones.
unique(unlist(strsplit(df$Variable, ",")))
#[1] "a" "b" "c"
If the Variable column is factor convert it into character before using strsplit.

Create a ordered list in a dataframe [duplicate]

This question already has answers here:
Sorting each row of a data frame [duplicate]
(2 answers)
Row wise Sorting in R
(2 answers)
Row-wise sort then concatenate across specific columns of data frame
(2 answers)
Closed 5 years ago.
I have the following data frame:
col1 <- c("a", "b", "c")
col2 <- c("c", "a", "d")
col3 <- c("b", "c", "a")
df <- data.frame(col1,col2,col3)
I want to create a new column in this data frame that has, for each row, the ordered list of the columns col1, col2, col3. So, for the first row it would be a list like "a", "b", "c".
The way I'm handling it is to create a loop but since I have 50k rows, it's quite inefficient, so I'm looking for a better solution.
rown <- nrow(df)
i = 0
while(i<rown){
i = i +1
col1 <- df$col1[i]
col2 <- df$col2[i]
col3 <- df$col3[i]
col1 <- as.character(col1)
col2 <- as.character(col2)
col3 <- as.character(col3)
list1 <- c(col1, col2, col3)
list1 <- list1[order(sapply(list1, '[[', 1))]
a <- list1[1]
b <- list1[2]
c <- list1[3]
df$col.list[i] <- paste(a, b, c, sep = " ")
}
Any ideas on how to make this code more efficient?
EDIT: the other question is not relevant in my case since I need to paste the three columns after sorting each row, so it's the paste statement that is dynamic, I'm not trying to change the data frame by sorting.
Expected output:
col1 col2 col3 col.list
a c b a b c
b a c a b c
c d a a c d

Using row-wise column indices in a vector to extract values from data frame [duplicate]

This question already has an answer here:
Get the vector of values from different columns of a matrix
(1 answer)
Closed 5 years ago.
Using vector of column positional indexes such as:
> i <- c(3,1,2)
How can I use the index to extract the 3rd value from the first row of a data frame, the 1st value from the second row, the 2nd value from the third row, etc.
For example, using the above index and:
> dframe <- data.frame(x=c("a","b","c"), y=c("d","e","f"), z=c("g","h","i"))
> dframe
x y z
1 a d g
2 b e h
3 c f i
I would like to return:
> [1] "g", "b", "f"
Just use matrix indexing, like this:
dframe[cbind(seq_along(i), i)]
# [1] "g" "b" "f"
The cbind(seq_along(i), i) part creates a two column matrix of the relevant row and column that you want to extract.
How about this:
Df <- data.frame(
x=c("a","b","c"),
y=c("d","e","f"),
z=c("g","h","i"))
##
i <- c(3,1,2)
##
index2D <- function(v = i, DF = Df){
sapply(1:length(v), function(X){
DF[X,v[X]]
})
}
##
> index2D()
[1] "g" "b" "f"

Resources