ggpairs only plotting 1 of 5 plots then error - r

I am getting the below error when trying to plot the dat data frame
library(GGally)
library(ggplot2)
dat = data.frame(a=rnorm(5) , b= rnorm(5) ,c =rnorm(5) , d=rnorm(5) , e= c(1,2,3,4,5))
dat
a b c d e
1 0.21444531 1.9972134 2.1988103 -0.47624689 1
2 -0.32468591 0.6007088 1.3124130 -0.78860284 2
3 0.09458353 -1.2512714 -0.2651451 -0.59461727 3
4 -0.89536336 -0.6111659 0.5431941 1.65090747 4
5 -1.31080153 -1.1854801 -0.4143399 -0.05402813 5
ggpairs(dat ,mapping=aes(color =e),upper=list(continuous=wrap("cor",size=2)), columns = c("a","b","c","d"))
Error:
Error in $<-.data.frame(tmp, "label", value = ": ") :
replacement has 1 row, data has 0
I would like to color the data points using column "e"
Any ideas?

If you factorize e then it runs:
dat$e <- factor(dat$e)
ggpairs(dat,mapping=aes(color=e),upper=list(continuous=wrap("cor",size=2)), columns = c("a","b","c","d"))
But that is a pretty ugly figure not to mention a useless comparison.
If you eliminate the mapping then the code also runs fine:
ggpairs(dat,upper=list(continuous=wrap("cor",size=2)), columns = c("a","b","c","d"))

Related

Partitioning Data creates unexpected results

I am trying to partition my data to a 60% Training and 40% Test Set using the following code.
split <- sample.split(divdat, SplitRatio = 0.6)
split
train.div <- subset(divdat, split == "TRUE")
test.div <- subset(divdat, split == "FALSE")
However, when using this code it splits my data as if it were 50/50. I have two hundred observations but and I get 100 observations for each. Any ideas what I am doing wrong here?
Function sample.split splits not by row, but by labels. to do it should change the first argument of sample.split to column values where you store labels. Then you'll observe 60/40 ration of training/test sets. I.e.
library(caTools)
divdat <- data.frame(id = 1:10, chars = letters[1:10], labels = c("X", "Y"))
split <- sample.split(divdat$labels, SplitRatio = 0.6)
train.div <- subset(divdat, split == "TRUE")
test.div <- subset(divdat, split == "FALSE")
train.div
test.div
Output:
> train.div
id chars labels
2 2 b Y
3 3 c X
5 5 e X
6 6 f Y
9 9 i X
10 10 j Y
> test.div
id chars labels
1 1 a X
4 4 d Y
7 7 g X
8 8 h Y

Fill series in R

I want to update column 2 so that the the value pairs update to (a,1)(b,1)(c1) and (d,2)(e,2)(f,2) and (g,3)(h,3)(i,3) and so on. How do I loop through?
Here is the sample data frame:
data_set <- as.data.frame(matrix(nrow=9))
data_set$column1_set1 <- c("a","b","c","d","e","f","g","h","i")
data_set$column2_set1 <- c(0,0,0,0,0,0,0,0,0)
data_set <- data_set[,-1]
data_set <- data.frame(column1_set1 = letters[1:9],
column2_set1 = rep(1:3, each=3))
With the given data set you can use this to update column 2 in pairs: a,1 etc
Paste comma in the set1 and repeat of 1:3 each=3 times!
data_set$column2_set1 =paste0(data_set$column1_set1,",",rep(1:3, each=3))
===
You could have used mutate as well with dplyr :
data_set%>%
mutate("column2_set1" = paste0(column1_set1,",",rep(1:3, each=3)))
output :
column1_set1 column2_set1
1 a a,1
2 b b,1
3 c c,1
4 d d,2
5 e e,2
6 f f,2
7 g g,3
8 h h,3
9 i i,3

Merging columns with overlapping data in R data frames

a<-data.frame(cbind("Sample"=c("100","101","102","103"),"Status"=c("Y","","","partial")))
b<-data.frame(cbind("Sample"=c("100","101","102","103","106"),"Status"=c("NA","Y","","","Y")))
desired<-data.frame(cbind("Sample"=c("100","101","102","103","106"),"Status"=c("Y","Y","","partial","Y")))
I have sample processing data in multiple sources and I'd like to combine them into a master list. How can I merge the "Status" column between 2 data frames such that a overrules b in order to collate "Y" and "partial" for each sample? Thank you in advance.
require(data.table)
a<-data.table(cbind("Sample"=c("100","101","102","103"),"Status"=c("Y","","","partial")))
b<-data.table("Sample"=c("100","101","102","103","106"),"Status"=c("NA","Y","","","Y"))
c <- merge(a, b, by = "Sample", all=TRUE)
c[,Status := ifelse(!is.na(Status.x), Status.x, Status.y)]
c[,`:=` (Status.x=NULL, Status.y = NULL)]
I assume you want to keep the values from a and b with an order of priority, Y covers partial that covers NA that covers nothing.
d <- merge(a,b,by="Sample",all=TRUE)
d$Status <- ""
d$Status[apply(c,1,function(x){any(is.na(x))})] <- "" # cleaning the NAs I introduced with the merge
d$Status[apply(c,1,`%in%`, x = "NA")] <- NA # or "NA" if you want to keep it this way, or "" if you want to get rid of them
d$Status[apply(c,1,`%in%`, x = "partial")] <- "partial"
d$Status[apply(c,1,`%in%`, x = "Y")] <- "Y"
d <- d[,c(1,4)]
# Sample Status
# 1 100 Y
# 2 101 Y
# 3 102
# 4 103 partial
# 5 106 Y

How to change values in a column of a data frame based on conditions in another column?

I would like to have an equivalent of the Excel function "if". It seems basic enough, but I could not find relevant help.
I would like to assess "NA" to specific cells if two following cells in a different columns are not identical. In Excel, the command would be the following (say in C1): if(A1 = A2, B1, "NA"). I then just need to expand it to the rest of the column.
But in R, I am stuck!
Here is an equivalent of my R code so far.
df = data.frame(Type = c("1","2","3","4","4","5"),
File = c("A","A","B","B","B","C"))
df
To get the following Type of each Type in another column, I found a useful function on StackOverflow that does the job.
# determines the following Type of each Type
shift <- function(x, n){
c(x[-(seq(n))], rep(6, n))
}
df$TypeFoll <- shift(df$Type, 1)
df
Now, I would like to keep TypeFoll in a specific row when the File for this row is identical to the File on the next row.
Here is what I tried. It failed!
for(i in 1:length(df$File)){
df$TypeFoll2 <- ifelse(df$File[i] == df$File[i+1], df$TypeFoll, "NA")
}
df
In the end, my data frame should look like:
aim = data.frame(Type = c("1","2","3","4","4","5"),
File = c("A","A","B","B","B","C"),
TypeFoll = c("2","3","4","4","5","6"),
TypeFoll2 = c("2","NA","4","4","NA","6"))
aim
Oh, and by the way, if someone would know how to easily put the columns TypeFoll and TypeFoll2 just after the column Type, it would be great!
Thanks in advance
I would do it as follows (not keeping the result from the shift function)
df = data.frame(Type = c("1","2","3","4","4","5"),
File = c("A","A","B","B","B","C"), stringsAsFactors = FALSE)
# This is your shift function
len=nrow(df)
A1 <- df$File[1:(len-1)]
A2 <- df$File[2:len]
# Why do you save the result of the shift function in the df?
Then assign if(A1 = A2, B1, "NA"). As akrun mentioned ifelse is vectorised: Btw. this is how you append a column to a data.frame
df$TypeFoll2 <- c(ifelse(A1 == A2, df$Type, NA), 6) #Why 6?
As 6 is hardcoded here something like:
df$TypeFoll2 <- c(ifelse(A1 == A2, df$Type, NA), max(df$Type)+1)
Is more generic.
First off, 'for' loops are pretty slow in R, so try to think of this as vector manipulation instead.
df = data.frame(Type = c("1","2","3","4","4","5"),
File = c("A","A","B","B","B","C"));
Create shifted types and files and put it in new columns:
df$TypeFoll = c(as.character(df$Type[2:nrow(df)]), "NA");
df$FileFoll = c(as.character(df$File[2:nrow(df)]), "NA");
Now, df looks like this:
> df
Type File TypeFoll FileFoll
1 1 A 2 A
2 2 A 3 B
3 3 B 4 B
4 4 B 4 B
5 4 B 5 C
6 5 C NA NA
Then, create TypeFoll2 by combining these:
df$TypeFoll2 = ifelse(df$File == df$FileFoll, df$TypeFoll, "NA");
And you should have something that looks a lot like what you want:
> df;
Type File TypeFoll FileFoll TypeFoll2
1 1 A 2 A 2
2 2 A 3 B NA
3 3 B 4 B 4
4 4 B 4 B 4
5 4 B 5 C NA
6 5 C NA NA NA
If you want to remove the FileFoll column:
df$FileFoll = NULL;

R $ operator is invalid for atomic vectors

I have a dataset where one of the columns are only "#" sign. I used the following code to remove this column.
ia <- as.data.frame(sapply(ia,gsub,pattern="#",replacement=""))
However, after this operation, one of the integer column I had changed to factor.
I wonder what happened and how can i avoid that. Appreciate it.
A more correct version of your code might be something like this:
d <- data.frame(x = as.character(1:5),y = c("a","b","#","c","d"))
> d[] <- lapply(d,gsub,pattern = "#",replace = "")
> d
x y
1 1 a
2 2 b
3 3
4 4 c
5 5 d
But as you'll note, this approach will never actually remove the offending column. It's just replacing the # values with empty character strings. To remove a column of all # you might do something like this:
d <- data.frame(x = as.character(1:5),
y = c("a","b","#","c","d"),
z = rep("#",5))
> d[,!sapply(d,function(x) all(x == "#"))]
x y
1 1 a
2 2 b
3 3 #
4 4 c
5 5 d
Surely if you want to remove an offending column from a data frame, and you know which column it is, you can just subset. So, if it's the first column:
df <- df[,-1]
If it's a later column, increment up.

Resources