SparkR: Assign values of a column with condition - r

I want to replace values of a column with a certain condition.
Example of R data frame:
df <- data.frame(id=c(1:7),value=c("a", "b", "c", "d", "e", "c", "c"))
I want to replace values "c" and "d", in column value by "e".
In R, it can be done this way
df[df$value %in% c("c","d"),]$value <- "e"
I tried to do the same thing in sparkR. Tried ifelse, when functions but couldn't give me the desired result.
Does anyway run into the same issue?

The first comment of mtoto works well (with spark 3.0.1) and should be transformed in answer and accepted.
df$value <- ifelse(df$value %in% c("c","d"), "e", df$value)
Another valid slightly different method to replace strings in a column could be the following:
df$value <- regexp_replace(df$value, "c", "e")

Related

Combining multiple lists into data frame w/ two columns: one for elements of all the lists, and one that has the name of the origin list

Sorry for the very basic question, but I imagine there's a very easy way to do what I want to do and I'm drawing a blank.
I have three basic lists in R, for example:
list_1 = c("A", "B", "C")
list_2 = c("D", "E", "F")
list_3 = c("G", "H", "I")
I want to combine these into a data frame with two columns, the first that has the elements of all of the lists -- "A", "B", "C", "D", "E", "F", "G", "H", "I" -- and the second that has the name of the list where the element was originally located -- "list_1", "list_1", "list_1", "list_2", "list_2", "list_2", "list_3", "list_3", "list_3."
I've tried various of the classic merging functions (rbind, bind_rows, append, etc.) but none seem to do specifically what I'm looking for. Hoping someone has the magic solution!

How to fix a subsetting issue in R

I am trying to subset my dataframe, but when I do some of the factors are not being brought in and left behind.
When I try this code it gives me a dataframe that has 2048 obs, but then when I try the next set of code I still have COW, Negative Control, and Positive Control in the subset.
Controls_data <- subset(data_all, SampleID == c('COW', 'Negative Control', 'Positive Control'))
Sample_data <- subset(data_all, SampleID != c("COW", "Negative Control", "Positive Control"))
I should have 6,144 in the Controls_data. I double checked this in excel because I thought that maybe they were spelled differently or had spaces.
As #arg0naut and #Gregor both writes and suggests. Your problem is that == uses R's standard reuse rules and then does pairwise comparison. So that is not what you want to do.
Compare the outputs from the following lines of codes.:
letters == c("c", "e")
letters %in% c("c", "e")
letters == c("c", "e", "d")
Notice the warning the last case. In your case, the left hand side happens to be a multiple of the right and you are not warned.
You could also use the match function in your case:
match(c("c", "e", "d"), letters)

h2o.relevel to set an increasing order of a factor in R H2O

I used h2o.relevel to reorder the levels of a factor df$x. But, when I tried to get the min or max using h2o.which_min(df$x) and h2o.which_max, the output was: NAN. This tells me that h2o.relevel does not set a increasing order for instance.
Example:
x: factor w/4 levels "B" "D" "A" "C". df is the dataframe.
I tried this: With h2o.relevel(df$x, levels = c("A", "B", "C", "D")), I'm able to rearrange the levels TO "A", "B", "C", "D", but A is not the minimum and D is not the maximum. h2o.which_min(df$x) and h2o.which_max return NAN.
How can I make A the min value and D the max value? Please help. Thank you
Enum (aka factor, aka categorical) in H2O are not ordinal.
So it's not possible to do comparisons in this way.
If you really want to do this, I recommend duplicating the column so that the original remains a factor and the duplicate is an integer.

How to select columns from R dataframe?

I know we can extract specific columns from R dataframe with df[,c("A","B","E")]. However, There are so many columns I want to pick out, and I can not type them one by one. I have a dataframe B that contains all column headers that I want to extract from Dataframe A. How can I extract columns from Dataframe A based on the headers I put in DataframeB ?
I tried A[,B[, 1]] but I got incorrect number of dimensions I got same error when I tried to print B[, 1].
With dput(B) I got > dput(B)
c("A", "B", "C",
"D", "E", "F",
"G")

Comparing character vectors in R to find unique and/or missing values

I have two character vectors, x and y.
x <- c("a", "b", "c", "d", "e", "f", "g")
y <- c("a", "c", "d", "e", "g")
The values inside x do not ever repeat (i.e., they are all unique). The same goes for vector y. My question is, how can I get R to compare the two vectors, and then tell me which elements are missing from y with respect to x? Otherwise stated, I want R to tell me that "b" and "f" are missing from y.
(Note, in my real data, x and y each contain a few thousand observations, which is why I would like to do this programmatically. There is likely a very simple answer, but I wasn't sure what to search for in the R help files).
Thanks to anyone who can help!
setdiff(x,y)
Will do the job for you.
> x[!x %in% y]
[1] "b" "f"
or:
> x[-match(y,x)]
[1] "b" "f"
>
I think this should work:
x[!(x %in% y)]
First it checks for all x that are not in y, then it uses that as an index on the original.

Resources