I'm trying to create a Venn diagram of two data frames, but am only able receive incorrect results. An example of the data sets of the same structure:
Chemical
ChemID
Oxidopamine
D016627
Melatonin
D016627
I've only received incorrect results from the following:
VennDiagram::venn.diagram(
x = list(Lewy, Park),
category.names = c("ChemID, ChemID"),
filename ="venndiagramm.png",
output=TRUE)
Ideally, I would like to export an image of number of overlapping chemicals between the two sets.
Welcome to SO! As far as I guess your data structure (two dataframes Lewy and Park, each with the column ChemID), try the following:
VennDiagram::venn.diagram(
x = list(Lewy$ChemID, Park$ChemID), # expects vectors, not dataframes
# category.names = c("ChemID, ChemID"), # see if these are rather to construct nice labels
filename ="venndiagramm.png",
output=TRUE)
You may increase the chance of a useful answer by providing minimal working data samples by dput(). Of course you can use simulated data. Try to explain what exactly did not work.
See also ? venn.diagram
Related
I'm trying to save vegan::simper() output as a data frame so that I can filter objects and eventually export as a table for publication. However the simper output is of class = list and I'm not sure how to convert this to a data frame. Here is some sample code using Dune.
# Species and environmental data
dune <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/dune2.spe.txt', row.names = 1)
dune.env <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/dune2.env.txt', row.names = 1)
data(dune)
data(dune.env)
(sim <- with(dune.env, simper(dune, Management)))
summary(sim)
class(sim)
To complement the comment by #dcarlson: simper result object is a complicated beast and there is no easy way of getting a table – especially as I don't know what kind of table you are looking for. The result object stores every pair of factor classes in a separate table. You can extract all those tables with summary(sim). If you want to see only one table, for instance for the pair SF_HF, use
summary(sim)$SF_HF
(and to see available names, use names(sim)). Then it is up to you collect the table you desire from these individual tables. All information is there.
And read the warnings in the manual page.
If you want to get something similar as the short printed output, look at vegan:::print.simper to see how it can be done.
I am trying to create a Venn diagram for common differentially expressed genes across 3 data sets. I created a list that contains the differentially expressed genes, then I used the venn.diagram() function with the following arguments: x (which is my list of gene names in the three data sets) , filename,category.names and output. However, the Venn diagram is turning out completely blank, no category names nor numbers inside intersections.
My code looks like this:
venn.diagram(up, filename = 'venn_up.png', category.names = c('up_PC3', 'up_LAPC4', 'up_22Rv1'), output = TRUE)
Has anyone faced a similar problem? Thanks all!
Without reproducible dataset it is hard, so I created one:
genes <- paste("gene",1:1000,sep="")
x <- list(
up_PC3 = sample(genes,300),
up_LAPC4 = sample(genes,525),
up_22Rv1 = sample(genes,440)
)
You can use the following code to run a Venn diagram:
library(VennDiagram)
venn.diagram(x, filename = "venn_up.png", category.names = c('up_PC3', 'up_LAPC4', 'up_22Rv1'))
Than check at the right folder of your working directory for the output:
I'm facing quiet a lot of challenges currently by doing text analysis with R.
Therefore I have in a table the columns Date, Text and Likes
I want to count how often a certain word occurs within the texts of a column (max 1 per column) and how often not.
I want to plot the results by displaying the result like in this picture
but I would like dots for "occurrence" and "not occurrence" of the searched word with different colors as dots and aggregate it monthly on y-axis and likes on x-axis
It would be great if you could help me with this challenge
As update I have here the sample data available https://drive.google.com/file/d/1IWqDoRFBTL8er8VmvisHDeB5uM3BGgJe/view?usp=sharing
It looks like there are several moving parts here so let me outline the tasks I think you are looking for assistance with:
Determine if a word appears in text, row by row.
Plot this information.
Display the information by category, i.e. word found or not found.
Provide some sort of smoothed fit over the data.
You can accomplish the first task by using your choice of pattern matching function. grepl for example will search with the pattern as its first argument. You may want to look into other parameters such as case sensitivity to ensure they match your needs. You'll want to store this result into another column, assuming you use ggplot. Then, you can pass the data to ggplot and use the col argument to have it separate out categories for you.
It doesn't appear that your data is readily available from your question. In the future, it generally helps if you can share some sample data. I have made my own sample which should be similar to what you describe. See the example code below.
library(tidyverse)
library(ggplot2)
set.seed(5)
data <- data.frame(Date = seq.Date(from = as.Date("2021-01-01"),
to = as.Date("2021-03-01"),
by = "day"),
fruit = sample(c("banana", "orange", "apple")),
likes = runif(60, 100, 1000))
data$good_fruit <- ifelse(grepl("orange", data$fruit), "orange", "not orange")
data %>%
ggplot() +
geom_point(aes(Date, likes, col = good_fruit)) +
geom_smooth(aes(Date, likes))
Since I threw together literally random data, there is not much a pattern here, but I think this illustrates the general idea of what you wanted to show? If you wanted a more specific kind of aggregation, I would recommend performing that manipulation before passing to ggplot, but for a rough fit this should work.
Sample Image
I would like to create a simple descriptive tree diagram from a data frame with minimal manual work.
It could look like this:
But it would need to have sample size in each of the boxes.
I am after the following functionality:
Generate the plot based on a data frame (rather than the more manual options shown here)
Ability to change the order of branches (e.g. sex/agegroup/status vs. status/sex/agegroup)
Add labels to each branch
Provide summary statistics for branch (e.g. male\n=200 female\n=300) either counts, or perhaps total length of stay up that that point in the tree.
I found this tread (here) that uses the ape-package that can do phylogenetic tree, which are close to what I am after.
Here is an example using the 'lung' dataset
lung$status <- factor(lung$status)
lung$sex <- factor(lung$sex)
lung$ph.ecog <- factor(lung$ph.ecog)
lung$Age[lung$age >60]<- "60+"; lung$Age[lung$age <=60]<- "<60"
lung$Age <- factor(lung$Age)
library(ape)
newdata <- as.phylo(x=~sex/Age/status/ph.ecog,data=lung)
plot.phylo(x=newdata,show.tip.label=TRUE,show.node.label=TRUE,no.margin=TRUE, root.edge=T)
This is giving close to what I want (although I am not interested in the final nodes, which are patients). It meets criteria 1 & 2, but not 3 and 4. The help of plot.phylo points towards show.node.label() that might fix requirement 3, but I cannot get this to work. Have not found any example that helps with the 4th functionality requirement.
Using leaflet, I'm trying to plot some lines and set their color based on a 'speed' variable. My data start at an encoded polyline level (i.e. a series of lat/long points, encoded as an alphanumeric string) with a single speed value for each EPL.
I'm able to decode the polylines to get lat/long series of (thanks to Max, here) and I'm able to create segments from those series of points and format them as a SpatialLines object (thanks to Kyle Walker, here).
My problem: I can plot the lines properly using leaflet, but I can't join the SpatialLines object to the base data to create a SpatialLinesDataFrame, and so I can't code the line color based on the speed var. I suspect the issue is that the IDs I'm assigning SL segments aren't matching to those present in the base df.
The objects I've tried to join, with SpatialLinesDataFrame():
"sl_object", a SpatialLines object with ~140 observations, one for each segment; I'm using Kyle's code, linked above, with one key change - instead of creating an arbitrary iterative ID value for each segment, I'm pulling the associated ID from my base data. (Or at least I'm trying to.) So, I've replaced:
id <- paste0("line", as.character(p))
with
lguy <- data.frame(paths[[p]][1])
id <- unique(lguy[,1])
"speed_object", a df with ~140 observations of a single speed var and row.names set to the same id var that I thought I created in the SL object above. (The number of observations will never exceed but may be smaller than the number of segments in the SL object.)
My joining code:
splndf <- SpatialLinesDataFrame(sl = sl_object, data = speed_object)
And the result:
row.names of data and Lines IDs do not match
Thanks, all. I'm posting this in part because I've seen some similar questions - including some referring specifically to changing the ID output of Kyle's great tool - and haven't been able to find a good answer.
EDIT: Including data samples.
From sl_obj, a single segment:
print(sl_obj)
Slot "ID":
[1] "4763655"
[[151]]
An object of class "Lines"
Slot "Lines":
[[1]]
An object of class "Line"
Slot "coords":
lon lat
1955 -74.05228 40.60397
1956 -74.05021 40.60465
1957 -74.04182 40.60737
1958 -74.03997 40.60795
1959 -74.03919 40.60821
And the corresponding record from speed_obj:
row.names speed
... ...
4763657 44.74
4763655 34.8 # this one matches the ID above
4616250 57.79
... ...
To get rid of this error message, either make the row.names of data and Lines IDs match by preparing sl_object and/or speed_object, or, in case you are certain that they should be matched in the order they appear, use
splndf <- SpatialLinesDataFrame(sl = sl_object, data = speed_object, match.ID = FALSE)
This is documented in ?SpatialLinesDataFrame.
All right, I figured it out. The error wasn't liking the fact that my speed_obj wasn't the same length as my sl_obj, as mentioned here. ("data =
object of class data.frame; the number of rows in data should equal the number of Lines elements in sl)
Resolution: used a quick loop to pull out all of the unique lines IDs, then performed a left join against that list of uniques to create an exhaustive speed_obj (with NAs, which seem to be OK).
ids <- data.frame()
for (i in (1:length(sl_obj))) {
id <- data.frame(sl_obj#lines[[i]]#ID)
ids <- rbind(ids, id)
}
colnames(ids)[1] <- "linkId"
speed_full <- join(ids, speed_obj)
speed_full_short <- data.frame(speed_obj[,c(-1)])
row.names(speed_full_short) <- speed_full$linkId
splndf <- SpatialLinesDataFrame(sl_obj, data = speed_full_short, match.ID = T)
Works fine now!
I may have deciphered the issue.
When I am pulling in my spatial lines data and I check the class it reads as
"Spatial Lines Data Frame" even though I know it's a simple linear shapefile, I'm using readOGR to bring the data in and I believe this is where the conversion is occurring. With that in mind the speed assignment is relatively easy.
sl_object$speed <- speed_object[ match( sl_object$ID , row.names( speed_object ) ) , "speed" ]
This should do the trick, as I'm willing to bet your class(sl_object) is "Spatial Lines Data Frame".
EDIT: I had received the same error as OP, driving me to check class()
I am under the impression that the error that was populated for you is because you were trying to coerce a data frame into a data frame and R wasn't a fan of that.