Using apply with a user-defined function in R - r

I have defined the following function in r:
#A function that compares color and dates to determine if there is a match
getTagColor <- function(color, date){
for (i in (1:nrow(TwistTieFix))){
if ((color == TwistTieFix$color_match[i]) &
(date > TwistTieFix$color_match[i]) &
(date <= TwistTieFix$julian_cut_off_date[i])) {
Data$color_code <- TwistTieFix$color_code[i]
print(Data$color_code)
}
}
}
I then used apply() in an attempt to apply the function to each row.
#Apply the above function to the data set
testData <- apply(Data, 1, getTagColor(Data$tag_color,Data$julian_date))`
The goal of the code is to use two variables in Data and find another value to put into a new column in Data (color_code) that will be based on the information in TwistTieFix. When I run the code, I get a list of warnings saying
In if ((color == TwistTieFix$color_match[i]) & (date > ... :
the condition has length > 1 and only the first element will be used
I cannot determine why the function does not use the date and color from each row and use it in the function (at least that is what I think is going wrong here). Thanks!
Here are examples of the data frames being used:
TwistTieFix
color_name date color_code cut_off_date color_match julian_start julian_cut_off_date
yellow 2013-08-12 y1 2001-07-02 yellow 75 389
blue 2000-09-28 b1 2001-08-12 blue 112 430
Data
coll_date julian_date tag_color
2013-08-13 76 yellow
2013-08-14 76 yellow
2000-09-29 112 blue
Data has a lot more columns of different variables, but I am not allowed to include all of the columns. However, I have included the columns in Data that I am referencing in function. The data sets are loaded into r using read.csv and are from Excel csv files.

To me, it seems like you want to join Data and TwistTieFix where tag_color=color_match and julian_start <= julian_date <= julian_cut_off_date. Here are your sample data.sets in dput form
TwistTieFix <- structure(list(color_name = structure(c(2L, 1L), .Label = c("blue",
"yellow"), class = "factor"), date = structure(c(2L, 1L), .Label = c("2000-09-28",
"2013-08-12"), class = "factor"), color_code = structure(c(2L,
1L), .Label = c("b1", "y1"), class = "factor"), cut_off_date = structure(1:2, .Label = c("2001-07-02",
"2001-08-12"), class = "factor"), color_match = structure(c(2L,
1L), .Label = c("blue", "yellow"), class = "factor"), julian_start = c(75L,
112L), julian_cut_off_date = c(389L, 430L)), .Names = c("color_name",
"date", "color_code", "cut_off_date", "color_match", "julian_start",
"julian_cut_off_date"), class = "data.frame", row.names = c(NA,
-2L))
Data <- structure(list(coll_date = structure(c(2L, 3L, 1L), .Label = c("2000-09-29",
"2013-08-13", "2013-08-14"), class = "factor"), julian_date = c(76L,
76L, 112L), tag_color = structure(c(2L, 2L, 1L), .Label = c("blue",
"yellow"), class = "factor")), .Names = c("coll_date", "julian_date",
"tag_color"), class = "data.frame", row.names = c(NA, -3L))
An easy way to perform this merge would be using the data.table library. You can do
#convert to data.table and set keys
ttf<-setDT(TwistTieFix)
setkey(ttf, color_match, julian_start)
dt<-setDT(Data)
setkey(dt, tag_color, julian_date)
#merge and extract columns
ttf[dt, roll=T][julian_start<julian_cut_off_date,list(coll_date,
julian_date=julian_start, tag_color=color_match, color_code)]
to get
coll_date julian_date tag_color color_code
1: 2000-09-29 112 blue b1
2: 2013-08-13 76 yellow y1
3: 2013-08-14 76 yellow y1

Related

create data frame from nested entries

I have a data frame test like this:
dput(test)
structure(list(X = 1L, entityId = structure(1L, .Label = "HOST-123", class = "factor"),
displayName = structure(1L, .Label = "server1", class = "factor"),
discoveredName = structure(1L, .Label = "server1", class = "factor"),
firstSeenTimestamp = 1593860000000, lastSeenTimestamp = 1603210000000,
tags = structure(1L, .Label = "c(\"CONTEXTLESS\", \"CONTEXTLESS\", \"CONTEXTLESS\", \"CONTEXTLESS\", \"CONTEXTLESS\", \"CONTEXTLESS\", \"CONTEXTLESS\", \"CONTEXTLESS\"), c(\"app1\", \"client\", \"org\", \"app1\", \"DATA_CENTER\", \"PURPOSE\", \"REGION\", \"Test\"), c(NA, \"NONE\", \"Host:Environment:test123\", \"111\", \"222\", \"GENERAL\", \"444\", \"555\")", class = "factor")), .Names = c("X",
"entityId", "displayName", "discoveredName", "firstSeenTimestamp",
"lastSeenTimestamp", "tags"), class = "data.frame", row.names = c(NA,
-1L))
There is a column called tags which should become a dataframe. I need to get rid of the first row in tags (which keep saying CONTEXTLESS, expand the second column in tags(make them columns. Lastly I need to insert the 3rd column values in tags under each expanded columns.
For example in needs to look like this:
structure(list(entityId = structure(1L, .Label = "HOST-123", class = "factor"),
displayName = structure(1L, .Label = "server1", class = "factor"),
discoveredName = structure(1L, .Label = "server1", class = "factor"),
firstSeenTimestamp = 1593860000000, lastSeenTimestamp = 1603210000000,
app1 = NA, client = structure(1L, .Label = "None", class = "factor"),
org = structure(1L, .Label = "Host:Environment:test123", class = "factor"),
app1.1 = 111L, data_center = 222L, purppose = structure(1L, .Label = "general", class = "factor"),
region = 444L, test = 555L), .Names = c("entityId", "displayName",
"discoveredName", "firstSeenTimestamp", "lastSeenTimestamp",
"app1", "client", "org", "app1.1", "data_center", "purppose",
"region", "test"), class = "data.frame", row.names = c(NA, -1L
))
I need to remove the 1st vector that keeps saying "contextless", add the second vector the columns. Each 2nd vector value should be a column name. Last vector should be values of the newly added columns.
If you are willing to drop the first "row" of garbage and then do a ittle cleanup of the parse-side-effects, then this might be a good place to start:
read.table(text=gsub("\\),", ")\n", test$tags[1]), sep=",", skip=1, #drops line
header=TRUE)
c.app1 client org app1 DATA_CENTER PURPOSE REGION Test.
1 c(NA NONE Host:Environment:test123 111 222 GENERAL 444 555)
The read.table function uses the scan function which doesn't know that "c(" and ")" are meaningful. The other alternative might be to try eval(parse(text= .)) (which would know that they are enclosing vectors) on the the second and third lines, but I couldn't see a clean way to do that. I initially tried to separate the lines using strsplit, but that caused me to loose the parens.
Here's a stab at some cleanup via that addition of some more gsub operations:
read.table(text=gsub("c\\(|\\)","", # gets rid of enclosing "c(" and ")"
gsub("\\),", "\n", # inserts line breaks
test$tags[1])),
sep=",", #lets commas be parsed
skip=1, #drops line
header=TRUE) # converts to colnames
app1 client org app1.1 DATA_CENTER PURPOSE REGION Test
1 NA NONE Host:Environment:test123 111 222 GENERAL 444 555
The reason for the added ".1" in the second instance of app1 is that R colnames in dataframes need to be unique unless you override that with check.names=FALSE
Here is a tidyverse approach
library(dplyr)
library(tidyr)
str2dataframe <- function(txt, keep = "all") {
# If you can confirm that all vectors are of the same length, then we can make them into columns of a data.frame
out <- eval(parse(text = paste0("data.frame(", as.character(txt),")")))
# rename columns as X1, X2, ...
nms <- make.names(seq_along(out), unique = TRUE)
if (keep == "all")
keep <- nms
`names<-`(out, nms)[, keep]
}
df %>%
mutate(
tags = lapply(tags, str2dataframe, -1L),
tags = lapply(tags, function(d) within(d, X2 <- make.unique(X2)))
) %>%
unnest(tags) %>%
pivot_wider(names_from = "X2", values_from = "X3")
df looks like this
> df
X entityId displayName discoveredName firstSeenTimestamp lastSeenTimestamp
1 1 HOST-123 server1 server1 1.59386e+12 1.60321e+12
tags
1 c("CONTEXTLESS", "CONTEXTLESS", "CONTEXTLESS", "CONTEXTLESS", "CONTEXTLESS", "CONTEXTLESS", "CONTEXTLESS", "CONTEXTLESS"), c("app1", "client", "org", "app1", "DATA_CENTER", "PURPOSE", "REGION", "Test"), c(NA, "NONE", "Host:Environment:test123", "111", "222", "GENERAL", "444", "555")
Output looks like this
# A tibble: 1 x 14
X entityId displayName discoveredName firstSeenTimestamp lastSeenTimestamp app1 client org app1.1 DATA_CENTER PURPOSE REGION Test
<int> <fct> <fct> <fct> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 HOST-123 server1 server1 1593860000000 1603210000000 NA NONE Host:Environment:test123 111 222 GENERAL 444 555

Filter two tables with crosstalk

I am creating a Flexdashboard in R. I want the dashboard to contains both a table and a series of visualizations, that would be filtered through inputs.
As I need to deliver a dashboard locally (without a server running in the background), I am unable to use Shiny, hence I rely on crosstalk.
I know that the crosstalk package provides limited functionality in the front-end. For instance, the documentation says that you can't aggregate the SharedData object.
Nonetheless, I am not clear if I can use the same inputs to filter two different dataframes.
For example, lets say I have:
Dataframe One: Contains original data
df1 <- structure(list(owner = structure(c(1L, 2L, 2L, 2L, 2L), .Label = c("John",
"Mark"), class = "factor"), hp = c(250, 120, 250, 100, 110),
car = structure(c(2L, 2L, 2L, 1L, 1L), .Label = c("benz",
"bmw"), class = "factor"), id = structure(1:5, .Label = c("car1",
"car2", "car3", "car4", "car5"), class = "factor")), .Names = c("owner",
"hp", "car", "id"), row.names = c(NA, -5L), class = "data.frame")
Dataframe Two: Contains aggregated data
df2 <- structure(list(car = structure(c(1L, 2L, 1L, 2L), .Label = c("benz",
+ "bmw"), class = "factor"), owner = structure(c(1L, 1L, 2L, 2L
+ ), .Label = c("John", "Mark"), class = "factor"), freq = c(0L,
+ 1L, 2L, 2L)), .Names = c("car", "owner", "freq"), row.names = c(NA,
+ -4L), class = "data.frame")
These two dataframes contain columns with identical values - car and owner. As well as, additional columns too.
I could create two different objects:
library(crosstalk)
shared_df1 <- SharedData$new(df1)
shared_df2 <- SharedData$new(df2)
and than:
filter_select("owner", "Car owner:", shared_df1, ~ owner)
filter_select("owner", "Car owner:", shared_df2, ~ owner)
However, that would mean that the user will need to fill inputs that are essentially identical, twice. Also, if the table is large, this would double the size of the memory needed to use the dashboard.
Is it possible to work around this problem in crosstalk?
Ah I recently ran into this too, there is another argument to SharedData$new(..., group = )! The group argument seems to do the trick. I found out by accident when I had two dataframes and used the group =.
If you make a sharedData object, it will include
a dataframe
a key to select rows by - preferably unique, but not necessarily.
a group name
What I think happens is that crosstalk filters the sharedData by the key - for all sharedData objects in the same group! So as long as two dataframes use the same key, you should be able to filter them together in one group.
This should work for your example.
---
title: "blabla"
output:
flexdashboard::flex_dashboard:
orientation: rows
social: menu
source_code: embed
theme: cerulean
---
```{r}
library(plotly)
library(crosstalk)
library(tidyverse)
```
```{r Make dataset}
df1 <- structure(list(owner = structure(c(1L, 2L, 2L, 2L, 2L), .Label = c("John", "Mark"), class = "factor"), hp = c(250, 120, 250, 100, 110), car = structure(c(2L, 2L, 2L, 1L, 1L), .Label = c("benz", "bmw"), class = "factor"), id = structure(1:5, .Label = c("car1", "car2", "car3", "car4", "car5"), class = "factor")), .Names = c("owner", "hp", "car", "id"), row.names = c(NA, -5L), class = "data.frame")
df2 <- structure(list(car = structure(c(1L, 2L, 1L, 2L), .Label = c("benz",
"bmw"), class = "factor"), owner = structure(c(1L, 1L, 2L, 2L
), .Label = c("John", "Mark"), class = "factor"), freq = c(0L,
1L, 2L, 2L)), .Names = c("car", "owner", "freq"), row.names = c(NA,
-4L), class = "data.frame")
```
#
##
### Filters
```{r}
library(crosstalk)
# Notice the 'group = ' argument - this does the trick!
shared_df1 <- SharedData$new(df1, ~owner, group = "Choose owner")
shared_df2 <- SharedData$new(df2, ~owner, group = "Choose owner")
filter_select("owner", "Car owner:", shared_df1, ~owner)
# You don't need this second filter now
# filter_select("owner", "Car owner:", shared_df2, ~ owner)
```
### Plot1 with plotly
```{r}
plot_ly(shared_df1, x = ~id, y = ~hp, color = ~owner) %>% add_markers() %>% highlight("plotly_click")
```
### Plots with plotly
```{r}
plot_ly(shared_df2, x = ~owner, y = ~freq, color = ~car) %>% group_by(owner) %>% add_bars()
```
##
### Dataframe 1
```{r}
DT::datatable(shared_df1)
```
### Dataframe 2
```{r}
DT::datatable(shared_df2)
```
I spent some time on this by trying to extract data from plot_ly() using plotly_data() without luck until I figured out the answer. That's why there's some very simple plots with plotly.
Recently, I've also wanted to use one filter to filter 2 visualizations.
Brief description of my situation
I've wanted to use one filter to filter a boxplot and a table.
Source data has been a data frame. I've wanted to use some of variables for the boxplot and also calculate some statistics (like mean, standard deviation, mode, number of records).
Functions I've needed to use to display results: plotly::plot_ly(), DT::datatable(), crosstalk::bscols().
I've found out that there are 3 key information to solve this situation
Key 1) It's necessary to correctly create shared data.
In my case, I've had to use crosstalk::SharedData$new() twice.
Correct shared data, to be used as source for visualizations, can be used if firstly keys 2 and 3 are fulfilled.
Key 2) When creating shared data, use the same group argument as "Lodewic Van Twillert" explained on 16 Mar 2018.
Key 3) Ensure that all SharedData instances refer conceptually to the same data points, and share the same keys.
Start with ensuring that a data frame has row names even if row names are character vector with numbers (like "1", "2", ...).
Used literature for this key 3: https://rstudio.github.io/crosstalk/using.html. (I suggest to mainly read subtitle "Grouping".)
Summary of steps I've used to fulfill key information from above
Key 3) This one could be tricky in order to fulfill relevant conditions of key 3 above.
The approach I've chosen creates one table containing all data and this table (data frame) will be used to create both shared data.
I've applied data manipulations to original data frame (risk_scores_df) so now this data has a new column.
I've created a new data frame with statistics.
I've joined both data frames using
risk_scores_df <- dplyr::left_join... so now the original data frame contains all prepared data.
I've run print(rownames(risk_scores_df)) to ensure that my updated data frame has row names.
Now, I've had one data frame containing all data (needed for both visualizations) that fulfill conditions of information of key 3 above.
Key 2) I've simply added group = "sd1" in both crosstalk::SharedData$new()
Key 1) This one could be also tricky if a wrong approach is chosen.
Here, the key to create correct shared data instances is to use that one table with all data and choose only rows and columns needed for a relevant shared data.
Example - in my case, I've run codes in Option 1 to create two shared data instances, but also Option 2 is possible.
Option 1 (choosing of only needed rows and columns is in crosstalk::SharedData$new())
rs_df_sd1 <- crosstalk::SharedData$new(
risk_scores_df[, c(1, 2, 5)],
group = "sd1"
)
rs_df_sd1a <- crosstalk::SharedData$new(
risk_scores_df[risk_scores_df$NumRecords > 0 &
is.na(risk_scores_df$NumRecords) == F,
c(1, 6:11)],
group = "sd1"
)
Option 2 (choosing of only needed rows and columns is in additional variables)
sd1 <- risk_scores_df[, c(1, 2, 5)]
sd1a <- risk_scores_df[risk_scores_df$NumRecords > 0 &
is.na(risk_scores_df$NumRecords) == F,
c(1, 6:11)]
rs_df_sd1 <- crosstalk::SharedData$new(sd1, group = "sd1")
rs_df_sd1a <- crosstalk::SharedData$new(sd1a, group = "sd1")
Completing the solution
At this point I've created shared data instances rs_df_sd1 and rs_df_sd1a that can be used as main sources for visualizations that will be filtered using crosstalk::bscols().
Brief example:
box_n_jitter_chart1 <- plotly::plot_ly(rs_df_sd1) %>% add_trace(...
DT_table1 <- DT::datatable(rs_df_sd1a)
crosstalk::bscols(
widths = c(6, 12, NA),
crosstalk::filter_select(
id = "idAvgRisk",
label = "Account",
sharedData = rs_df_sd1,
group = ~Account,
multiple = F
),
box_n_jitter_chart1,
DT_table1
)
Note: DT::datatable() can also use rs_df_sd1a$data() and cells = list(values = base::rbind(... (see that cells = ... is used; see more about using cells e.g. at https://plotly.com/r/reference/table/) but because method data() is used (see more e.g. at https://rdrr.io/cran/crosstalk/man/SharedData.html#method-data) then it will not work with crosstalk::bscols.

How to use ifelse and paste functions

I am learning the use of the ifelse function from Zuur et al (2009) A Beginners guide to R. In one exercise, there is a data frame called Owls which contains data about about 27 nests and two night of observations.
structure(list(Nest = structure(c(1L, 1L, 1L, 1L), .Label = "AutavauxTV", class = "factor"),
FoodTreatment = structure(c(1L, 2L, 1L, 1L), .Label = c("Deprived",
"Satiated"), class = "factor"), SexParent = structure(c(1L,
1L, 1L, 1L), .Label = "Male", class = "factor"), ArrivalTime = c(22.25,
22.38, 22.53, 22.56), SiblingNegotiation = c(4L, 0L, 2L,
2L), BroodSize = c(5L, 5L, 5L, 5L), NegPerChick = c(0.8,
0, 0.4, 0.4)), .Names = c("Nest", "FoodTreatment", "SexParent",
"ArrivalTime", "SiblingNegotiation", "BroodSize", "NegPerChick"
), row.names = c(NA, 4L), class = "data.frame")
The two nights differed as to the feeding regime (satiated or deprived) and are indicated in the Foodregime variable. The task is to use ifelse and past functions that make a new categorical variable that defines observations from a single night at a particular nest.
In the solutions the following code is suggested:
Owls <- read.table(file = "Owls.txt", header = TRUE, dec = ".")
ifelse(Owls$FoodTreatment == "Satiated", Owls$NestNight <- paste(Owls$Nest, "1",sep = "_"), Owls$NestNight <- paste(Owls$Nest, "2",sep = "_"))
and apparently it creates a new variable with values the endings of which vary ("-1" or "-2")
however when I call the original dataframe, all "-1" endings in the NestNight variable disappears and are turned to "-2."
Why does this happen? Did the authors miss something from the code or it's me who is not getting it?
Many thanks
EDIT: Sorry, I wanted to give a reproducible example by copying my data using dput but it did not work. If you can let me know how I can correct it so that it appears properly, I'd be grateful too!
Solution
If you do the assignment outside the ifelse structure, it works:
Owls$NestNight <- ifelse(Owls$FoodTreatment == "Satiated",
paste(Owls$Nest, "1",sep = ""),
paste(Owls$Nest, "2",sep = ""))
Explanation
What happens in your case is simply if you would execute the following two lines:
Owls$NestNight <- paste(Owls$Nest, "1",sep = "")
Owls$NestNight <- paste(Owls$Nest, "2",sep = "")
You first assign paste(Owls$Nest, "1",sep = "") to Owls$NestNight and then you reassign paste(Owls$Nest, "2",sep = "") to it. The ifelse is not affected by this, but you don't assign it's result to any variable.
Maybe it is more clear if you test this simple code:
c(a <- 1:5, a <- 6:10) #c is your ifelse, a is your Owls$NestNight
a #[1] 6 7 8 9 10

prop.table doesn't work in a for-loop?

This may be a very simple question, but I don't see how to answer it.
I have the following reproducible code, where I have two small dataframes that I use to calculate a percentage value based on each column total:
#dataframe x
x <- structure(list(PROV = structure(c(1L, 1L), .Label = "AG", class = "factor"),
APT = structure(1:2, .Label = c("AAA", "BBB"), class = "factor"),
PAX.2013 = c(5L, 4L), PAX.2014 = c(4L, 2L), PAX.2015 = c(4L,0L)),
.Names = c("PROV", "APT", "PAX.2013", "PAX.2014", "PAX.2015"),
row.names = 1:2, class = "data.frame")
#dataframe y
y <- structure(list(PROV = structure(c(1L, 1L), .Label = "AQ", class = "factor"),
APT = structure(1:2, .Label = c("CCC", "AAA"), class = "factor"),
PAX.2013 = c(3L, 7L), PAX.2014 = c(2L, 1L), PAX.2015 = c(0L,3L)),
.Names = c("PROV", "APT", "PAX.2013", "PAX.2014", "PAX.2015"),
row.names = 1:2, class = "data.frame")
#list z (with x and y)
z <- list(x,y)
#percentage value of x and y based on columns total
round(prop.table(as.matrix(z[[1]][3:5]), margin = 2)*100,1)
round(prop.table(as.matrix(z[[2]][3:5]), margin = 2)*100,1)
as you can see, it works just fine.
Now I want to automate for all the list, but I can't figure out how to get the results. This is my simple code:
#for-loop that is not working
for (i in length(z))
{round(prop.table(as.matrix(z[[i]][3:5]), margin = 2)*100,1)}
You have two problems.
First, you have not put a range into your for loop so you are just trying to iterate over a single number and second, you are not assigning your result anywhere on each iteration.
Use 1:length(z) to define a range. Then assign the results to a variable.
This would work:
my_list <- list()
for (i in 1:length(z)){
my_list[[i]] <- round(prop.table(as.matrix(z[[i]][3:5]),
margin = 2)*100,1)
}
my_list
But it would be more efficient and idiomatic to use lapply:
lapply(1:length(z),
function(x) round(prop.table(as.matrix(z[[x]][3:5]), margin = 2)*100,1))
Barring discussions whether for-loops is the best approach, you had two issues. One, your for loop only iterates over 2 (which is length(z)) instead of 1:2. Two, you need to do something with the round(....) statement. In this solution, I added a print statement.
for (i in 1:length(z)){
print(round(prop.table(as.matrix(z[[i]][3:5]), margin = 2)*100,1))
}

Color data points based on sample classification

A pairwise scatterplot showing relationship between genes (columns in data frame) across multiple samples (rows in data frame) is created. The samples belong to two distinct groups: group "A" and "B". Since one dot in plot represent one sample, I need to color the data points (dots) according to groups with two different colors, say group A with "green" and group B with "red". Is it possible to do that?
Any kind of help will be appreciated.
plot(DF[1:6], pch = 21) #command used for plotting, DF is data frame
Sample Data Frame Example:
CBX3 PSPH ATP2C1 SNX10 MMD ATP13A3
B 10.589844 6.842970 8.084550 8.475023 9.202490 10.403811
A 10.174385 5.517944 7.736994 9.094834 9.253766 10.133408
B 10.202084 5.669137 7.392141 7.522270 7.830969 9.123178
B 10.893231 6.630709 7.601690 7.894177 8.979142 9.791841
B 10.071038 5.091222 7.032585 8.305581 7.903737 8.994821
A 10.005002 4.708631 7.927246 7.292527 8.257853 10.054630
B 10.028055 5.080944 6.421961 7.616856 8.287496 9.642294
A 10.144115 6.626483 7.686203 7.970934 7.919615 9.475175
A 10.675386 6.874047 7.900560 7.605519 8.585158 8.858613
A 9.855063 5.164399 6.847923 8.072608 8.221344 9.077744
A 10.994228 6.545318 8.606128 8.426329 8.787876 9.857079
A 10.501266 6.677360 7.787168 8.444976 8.928174 9.542558
GGally has a good function for this as well.
library(GGally)
ggpairs(dd, color = 'CLASS',columns = 2:ncol(dd) )
It might not be that easy to do with base graphics. You could easily do this with lattice. With this sample data.frame
dd<-structure(list(CLASS = structure(c(2L, 1L, 2L, 2L, 2L, 1L, 2L,
1L, 1L, 1L, 1L, 1L), .Label = c("A", "B"), class = "factor"),
CBX3 = c(10.589844, 10.174385, 10.202084, 10.893231, 10.071038,
10.005002, 10.028055, 10.144115, 10.675386, 9.855063, 10.994228,
10.501266), PSPH = c(6.84297, 5.517944, 5.669137, 6.630709,
5.091222, 4.708631, 5.080944, 6.626483, 6.874047, 5.164399,
6.545318, 6.67736), ATP2C1 = c(8.08455, 7.736994, 7.392141,
7.60169, 7.032585, 7.927246, 6.421961, 7.686203, 7.90056,
6.847923, 8.606128, 7.787168), SNX10 = c(8.475023, 9.094834,
7.52227, 7.894177, 8.305581, 7.292527, 7.616856, 7.970934,
7.605519, 8.072608, 8.426329, 8.444976), MMD = c(9.20249,
9.253766, 7.830969, 8.979142, 7.903737, 8.257853, 8.287496,
7.919615, 8.585158, 8.221344, 8.787876, 8.928174), ATP13A3 = c(10.403811,
10.133408, 9.123178, 9.791841, 8.994821, 10.05463, 9.642294,
9.475175, 8.858613, 9.077744, 9.857079, 9.542558)), .Names = c("CLASS",
"CBX3", "PSPH", "ATP2C1", "SNX10", "MMD", "ATP13A3"), class = "data.frame", row.names = c(NA, -12L))
you can do
library(lattice)
splom(~dd[,-1], groups=dd$CLASS)
to get
You can add color to the points by specifying the argument col
to plot
DF <- read.delim(textConnection(
"category CBX3 PSPH ATP2C1 SNX10 MMD ATP13A3
B 10.589844 6.842970 8.084550 8.475023 9.202490 10.403811
A 10.174385 5.517944 7.736994 9.094834 9.253766 10.133408
B 10.202084 5.669137 7.392141 7.522270 7.830969 9.123178
B 10.893231 6.630709 7.601690 7.894177 8.979142 9.791841
B 10.071038 5.091222 7.032585 8.305581 7.903737 8.994821
A 10.005002 4.708631 7.927246 7.292527 8.257853 10.054630
B 10.028055 5.080944 6.421961 7.616856 8.287496 9.642294
A 10.144115 6.626483 7.686203 7.970934 7.919615 9.475175
A 10.675386 6.874047 7.900560 7.605519 8.585158 8.858613
A 9.855063 5.164399 6.847923 8.072608 8.221344 9.077744
A 10.994228 6.545318 8.606128 8.426329 8.787876 9.857079
A 10.501266 6.677360 7.787168 8.444976 8.928174 9.542558"))
plot(DF[2:7],col = ifelse(DF$category == 'A','red','green'))
A list of valid color values can be obtained by calling colors(). Vectors with a gradient of colors can be created via rainbow(), and just for fun, I use this little function for choosing pretty colors when making a figure.
(Edited per suggestions from #MrFlick)
#! #param n The number of colors to be selected
colorchoose <- function (n = 1, alpha, term = F)
{
cols <- colors()
mod <- ceiling(sqrt(length(cols)))
plot(xlab = "", ylab = "", main = "click for color name",
c(0, mod), c(0, mod), type = "n", axes = F)
s<-seq_along(cols)
dev.hold()
points(s%%mod, s%/%mod, col = cols, pch = 15, cex = 2.4)
dev.flush()
p <- locator(n)
return(cols[round(p$y) * mod + round(p$x)])
}

Resources