Currently, I am trying to make an image of multiple violin graphs that I read in from a text file. The text file is formatted in a way so that there a "count" column which is just incrementing by 1 to show the index of the results, and there are also multiple columns each being the results of a different variable size. Below is an example of a portion of the text file.
Count X1.1 X1.2 X1.3 X1.4
1 174.647 173.368 172.713 172.264
2 169.549 166.791 167.010 165.682
3 174.341 170.821 169.861 169.103
4 178.305 177.736 177.796 176.067
5 160.614 159.842 158.548 157.145
So I would like to create a new violin graph for each column using ggplot (1.1, 1.2, etc.) that can be displayed side by side.
library(ggplot2)
myData <- read.csv("E2_1_RingSize.text", sep = "\t", header=TRUE)
I've read in the file I would want, and am able to plot one column at a time by hard coding in the column name. See below
graph1 <- ggplot(myData, aes(x=Count, y=X1.1) + geom_violin()
But I'm unsure how to include all of the columns at once. It's most likely an easy fix, only 1-2 lines, but I'm not that experienced in R/RStudio and so I've got no clue.
What you need to do is pivot your data.frame so it's in long format:
dat %>%
tidyr::pivot_longer(-Count) %>%
ggplot(aes(x=as.factor(name), y=value)) + geom_violin()
I am looking at some data downloaded from ICPSR and I am specifically using their R data file (.rda). Beneath the column name of each data file, there are some descriptions of the variables (a.k.a labels). An example is attached as well.
I tried various ways to get the label including base::label, Hmisc::label, labelled::var_label, sjlabelled::get_label and etc. But none worked.
So I am asking any ideas on how to extract the labels from this data file?
Thanks very much in advance!
this could work using purrr
#load library
library(purrr)
#get col n
n <- ncol(yourdata)
#extract labels as vector
labels <- map_chr(1:n, function(x) attr(yourdata[[x]], "label") )
This worked for me (I am working with ICPSR 35206):
attributes(yourdata)$variable.labels -> labels
Make sure that your attribute referring to the labels is actually called "variable.labels".
First of all, i am a beginner so i apreciate your patience and time to trying help me. i have one excel file with 3 columns: Shopname, 2016 and 2017 wich are particular values for a comparison.
Id like to iterate over the excel file and plot two bars one with the value for shop X in the year 2016 and other bar for 2017.
ill post here what i wrote until this moment, i can see the printings but not the plots... what could i make better?
> #importing excel file
> #and ploting each line comparison between 2 columns
> library(xlsx)
> xl_data <- read.xlsx("File.xlsx", "Plan1")
> df<- data.frame(xl_data)
> # plot using facets
> ggplot(aes(x=time, y=sold, group=shop)) +geom_bar(stat="identity")+
facet_grid(.~xl_data)
Afonso,
You don't need a loop for that. One way to accomplish it would be with ggplot's facetting capability:
#### load needed libraries
library(tidyr)
library(ggplot2)
### load data -- this is coming from Excel
dt <- tribble(
~LOJAS, ~y2016, ~y2017,
"CD NEREU" , 168459.86, 223637.46,
"LJ CANOINH", 14480.03, 80006.86,
"LJ MAL338" , 21095.07, 62768.54,
"LJ SBENTO" , 43290.47, 43168.34)
### arrange data for plotting
dt %>%
gather(time, sold, y2016, y2017) %>%
# plot using facets
ggplot(aes(x=time, y=sold, group=LOJAS)) +
geom_bar(stat="identity") +
facet_grid(.~LOJAS)
I'm still in the process of learning R using Swirl and RStudio, and a goal I've set for myself is to recreate this graph. I have a small dataset that I will link below (it's saved as a plain text CSV file that I import into R with headings enabled).
If I try to plot that dataset without changing anything, I get this, which is obviously not the goal.
At first I thought the problem would be in the class of my imported dataset, defined as kt. After class(kt) turned out to be data.frame I figured that wasn't the problem. Should I be trying to rewrite the table to something that R can plot instantly, or should I be trying to extract each species individually, plot them separately and then combining the different plots into one graph? Perhaps there is something wrong with my dates, I know that R handles dates in a specific way. Maybe these solutions are not even needed and I'm just doing something stupidly simple wrong, but I can't find it myself.
Your help is much appreciated.
Dataset:
Species,week 0,week 1,week 2,week 3,week 4,week 5,week 6,week 7,week 8,week 9,week 10,week 11,week 12,week 13,week 14,week 15,week 16,week 17,week 18
Caesalpinia coriaria,0.0%,24.0%,28.0%,28.0%,32.0%,37.0%,40.0%,46.0%,52.0%,56.0%,63.0%,64.0%,68.0%,71.0%,72.0%,,,,
Coccoloba swartzii,0.0%,0.0%,1.0%,10.0%,19.0%,31.0%,33.0%,39.0%,43.0%,48.0%,52.0%,52.0%,52.0%,52.0%,52.0%,52.0%,52.0%,55.0%,
Cordia dentata,0.0%,5.0%,18.0%,21.0%,24.0%,26.0%,27.0%,30.0%,32.0%,32.0%,32.0%,32.0%,32.0%,32.0%,33.0%,33.0%,33.0%,34.0%,35.0%
Guaiacum officinale,0.0%,0.0%,0.0%,0.0%,4.0%,5.0%,5.0%,5.0%,7.0%,8.0%,8.0%,8.0%,8.0%,8.0%,8.0%,8.0%,8.0%,,
Randia aculeata,0.0%,0.0%,0.0%,4.0%,13.0%,14.0%,18.0%,19.0%,21.0%,21.0%,21.0%,21.0%,21.0%,22.0%,22.0%,22.0%,22.0%,,
Schoepfia schreberi,0.0%,0.0%,0.0%,0.0%,0.0%,0.0%,1.0%,4.0%,8.0%,11.0%,13.0%,21.0%,21.0%,24.0%,24.0%,25.0%,27.0%,,
Prosopis juliflora,0.0%,7.5%,31.3%,34.2%,,,,,,,,,,,,,,,
Something like this??
# get rid of "%" signs
df <- data.frame(sapply(df,function(x)gsub("%","",x,fixed=T)))
# convert cols 2:20 to numeric
df[,2:20] <- sapply(df[,2:20],function(x)as.numeric(as.character(x)))
library(reshape2)
library(ggplot2)
gg <- melt(df,id="Species")
ggplot(gg,aes(x=variable,y=value,color=Species,group=Species)) +
geom_line()+
theme_bw()+
theme(legend.position="bottom", legend.title=element_blank())
There are lots of problems here.
First, if your dataset really has those % signs, then R interprets the data as character and imports it as factors. So first we have to get rid of the % (using gsub(...), and then we have to convert what's left to numeric. With factors, you have to convert to character first, then numeric, so: as.numeric(as.character(...)). All of this could have been avoided if you exported the data without the % signs!!!
Plotting multiple curves with different colors is something the ggplot package was designed for (among many other things), so we use that. ggplot prefers data in "long" format - all the data in one column, with a second column distinguishing different datasets. Your data is in "wide" format - data in different columns. So we convert to long using melt(...) from the reshape2 package. The result, gg has three columns: Species, variable and value. value contains the actual data and variable contains the week number.
So now we create a ggplot object, setting the x-axis to the variable column, the y-axis to the value column, with color mapped to Species, and we tell ggplot to plot lines (using geom_line(...)).
The rest is to position the legend at the bottom, and turn off some of the ggplot default formatting.
I have a specific question: How can I choose either fill or color of a ggplot according to the data of an SpatialPolygonsDataFrame-object? For example consider the following SpatialPolygonsDataFrame sf:
sf <- readShapePoly("somePolygonShapeFile")
It allows me to access the the example data field FK like:
sf$FK // or
sf#data$FK
Now, I want to prepare a simple ggplot:
p <- ggplot(sf, aes(x=long, y=lat, group=group, FK=???))
However, I don't know what to pass to FK in aes(). Experiences from gridded data frames (grid.extent(...)) made me think, I could directly put in FK=FK. This does not seem to work for SpatialPolygonsDataFrame-objects. Trying FK=sf$FK or FK=sf#data$FK is not allowed because:
Error: Aesthetics must either be length one, or the same length as the data
I guess, the solution is trivial, but I simply don't get it at the moment.
Thanks to #juba, #rsc and #SlowLearner I've found out, that the installation of gpclib was still missing to be able to give the gpclibPermit. With this done, fortifying sf using a specified region is not problem anymore. Using the explanation from ggplot2/wiki I am able to transfer all data fields of the original shapefile into a plotting-friendly dataframe. The latter finally works as was intendet for plotting the shapefile in R. Here is the final code with the actual workingDir-variable content left out:
require("rgdal") # requires sp, will use proj.4 if installed
require("maptools")
require("ggplot2")
require("plyr")
workingDir <- ""
sf <- readOGR(dsn=workingDir, layer="BK50_Ausschnitt005")
sf#data$id <- rownames(sf#data)
sf.points <- fortify(sf, region="id")
sf.df <- join(sf.points, sf#data, by="id")
ggplot(sf.df,aes(x=long, y=lat, fill=NFK)) + coord_equal() + geom_polygon(colour="black", size=0.1, aes(group=group))
First, you should use the readOGR function from the rgdal library to read your shapefile (then you won't have problems with gpclib). Here is an example of how to do that.
Second, are you trying to pass the sf object to ggplot as-is? If so, you need to use fortify() to convert your spatial object into a data frame. There should be some kind of identifying column in sf#data such as ID or NAME. So try something like:
sf.df <- fortify(df, region = "NAME")
...and use sf.df for plotting using ggplot.