Related
My dataframes sometimes contain NA values. These were previously blanks, characters like 'BAD' or actual 'NA' characters from the imported .csv file. I have changed everything in my dataframes to numeric - this changes all non-numeric characters to NA. So far, so good.
I am aware I can use the following using dataframe 'df' to ensure a line is always drawn between data points, ensuring there are no gaps:
ggplot(na.omit(df), aes(x=Time, y=pH)) +
geom_line()
However, sometimes I wish to plot 2 or more dataframes using ggplot2 to get a single plot. I do this because my x axis (Time) is indeed the same for all dataframes, but the specific numbers are different. I was having immense trouble merging these dataframes because the rows are not equal. Otherwise I would merge, melt the data and use ggplot2 as normal to make a multiple-lined line plot.
I have since learnt you can plot multiple dataframes manually on ggplot at the 'geom level':
ggplot() +
geom_line(df1, aes(x=Time1, y=pH1), colour='green') +
geom_line(df2, aes(x=Time2, y=pH2), colour='red') +
geom_line(df3, aes(x=Time3, y=pH3), colour='blue') +
geom_line(df4, aes(x=Time4, y=pH4), colour='yellow')
However, how can I now ensure NA values are omitted and the lines are connected?! It all seems to work, but my 4 plots have gaps in them where the NA values are!
I am new to R, but enjoying it so far and realise there are usually multiple solutions to an issue. Any help or advice appreciated.
EDIT (for anyone who later sees this)
So, after playing around for 30 mins I realised I could first use the no.omit function separately on each dataframe, name these new objects and then just these plot these instead on ggplot. This works fine. Also, the above code was incorrect anyway if I wanted a suitable legend.
New, correct code:
df1.omit <- na.omit(df1)
df2.omit <- na.omit(df2)
df3.omit <- na.omit(df3)
df4.omit <- na.omit(df4)
ggplot() +
geom_line(df1.omit, aes(x=Time1, y=pH1, colour="Variable 1") +
geom_line(df2.omit, aes(x=Time2, y=pH2, colour="Variable 2") +
geom_line(df3.omit, aes(x=Time3, y=pH3, colour="Variable 3") +
geom_line(df4.omit, aes(x=Time4, y=pH4, colour="Variable 4")
So, after playing around for 30 mins I realised I could first use the no.omit function separately on each dataframe, name these new objects and then just these plot these instead on ggplot. This works fine. Also, the above code was incorrect anyway if I wanted a suitable legend.
df1.omit <- na.omit(df1)
df2.omit <- na.omit(df2)
df3.omit <- na.omit(df3)
df4.omit <- na.omit(df4)
ggplot() +
geom_line(df1.omit, aes(x=Time1, y=pH1, colour="Variable 1") +
geom_line(df2.omit, aes(x=Time2, y=pH2, colour="Variable 2") +
geom_line(df3.omit, aes(x=Time3, y=pH3, colour="Variable 3") +
geom_line(df4.omit, aes(x=Time4, y=pH4, colour="Variable 4")
I'm brand new to using R geospatially.
Working .kmz: https://www.cnrfc.noaa.gov/ - from the second drop down right below the map pane titled 'Download Overlay Files', I've downloaded and I'm using the "Drainage Basins" kml that should download as "basins.kml"
library(rgdal)
library(tidyverse)
From looking at the .kml in a text editor, it looks like the KML layer name is
"cnrfc_09122018_basins_thin", so reading it in with:
cnrfc_basins <- readOGR("basins.kml", "cnrfc_09122018_basins_thin")
gives me a "Large SpatialPolygonsDataFrame".
To be able to plot, it looks like I need to "fortify it" (?), and make a more ordinary data.frame, so from some other posts I've come across:
cnrfc_basins_fortify <- merge(broom::tidy(cnrfc_basins),
as.data.frame(cnrfc_basins), by.x="id", by.y=0)
plotting with this:
ggplot() + geom_path(data = cnrfc_basins_fortify, aes(x=long, y=lat, group = group)) +
coord_quickmap()
gives me the data I'm expecting:
But, for these around one hundred polygons or so, I have hundreds of thousands of data.frame rows. How do I reduce these, so I have just one row for each polygon?(Each polygon, which is representative of a particular basin, has a unique five digit ID already, in the 'Name' column). Having fewer rows seems it will make working with the file easier and quicken joins, when I will join data to these unique polygons.
Any advice greatly appreciated.
All you have to is directly extract the #data contained in the SpatialPolygonsDataFrame:
poly = cnrfc_basins#data
That should give you a 339-row data.frame with the unique identifiers you need (without the geometric metadata)
> head(poly)
Name
0 EFBC1
1 CSKC1
2 CMIC1
3 FMDC1
4 NMFC1
5 NFDC1
First of all, i am a beginner so i apreciate your patience and time to trying help me. i have one excel file with 3 columns: Shopname, 2016 and 2017 wich are particular values for a comparison.
Id like to iterate over the excel file and plot two bars one with the value for shop X in the year 2016 and other bar for 2017.
ill post here what i wrote until this moment, i can see the printings but not the plots... what could i make better?
> #importing excel file
> #and ploting each line comparison between 2 columns
> library(xlsx)
> xl_data <- read.xlsx("File.xlsx", "Plan1")
> df<- data.frame(xl_data)
> # plot using facets
> ggplot(aes(x=time, y=sold, group=shop)) +geom_bar(stat="identity")+
facet_grid(.~xl_data)
Afonso,
You don't need a loop for that. One way to accomplish it would be with ggplot's facetting capability:
#### load needed libraries
library(tidyr)
library(ggplot2)
### load data -- this is coming from Excel
dt <- tribble(
~LOJAS, ~y2016, ~y2017,
"CD NEREU" , 168459.86, 223637.46,
"LJ CANOINH", 14480.03, 80006.86,
"LJ MAL338" , 21095.07, 62768.54,
"LJ SBENTO" , 43290.47, 43168.34)
### arrange data for plotting
dt %>%
gather(time, sold, y2016, y2017) %>%
# plot using facets
ggplot(aes(x=time, y=sold, group=LOJAS)) +
geom_bar(stat="identity") +
facet_grid(.~LOJAS)
I would like to plot my figure using R (ggplot2). I'd like to have a line graph like image 2.
here my.data:
B50K,B50K+1000C50K,B50K+2000C50K,B50K+4000C50K,B50K+8000C50K,gen,xaxile
0.3795,0.4192,0.4675,0.5357,0.6217,T18-Yield,B50K
0.3178,0.3758,0.4249,0.5010,0.5870,T20-Yield,B50K+1000C50K
0.2795,0.3266,0.3763,0.4636,0.5583,T21-Yield,B50K+2000C50K
0.2417,0.2599,0.2898,0.3291,0.3736,T18-Fertility,B50K+4000C50K
0.2002,0.2287,0.2531,0.2962,0.3485,T19-Fertility,B50K+8000C50K
0.1642,0.1911,0.2151,0.2544,0.2951,T20-Fertility
***--> The delimiter is ",". By the way, I have not any useful .r script which would be helpful or useful.
The illustrated image shows my figure in Microsoft word.
I have tried several scripts via internet but non of them have not worked.
would you please help me to have a .r script to read my data file like img1 and plot my data like illustrated figure.
The trick is to reshape your data (using melt from the reshape2 package) so that you can easily map colours and linetypes to gen.
# Your data - note i also added an extra comma after the fifth column in row 6.
# It would be easier if you gave data using dput as described in comments above - thanks
dat <- read.table(text="B50K,B50K+1000C50K,B50K+2000C50K,B50K+4000C50K,B50K+8000C50K,xaxile,gen
0.3795,0.4192,0.4675,0.5357,0.6217,B50K,T18-Yield
0.3178,0.3758,0.4249,0.5010,0.5870,B50K+1000C50K,T20-Yield
0.2795,0.3266,0.3763,0.4636,0.5583,B50K+2000C50K,T21-Yield
0.2417,0.2599,0.2898,0.3291,0.3736,B50K+4000C50K,T18-Fertility
0.2002,0.2287,0.2531,0.2962,0.3485,B50K+8000C50K,T19-Fertility
0.1642,0.1911,0.2151,0.2544,0.2951,,T20-Fertility",
header=T, sep=",", na.strings="")
# load the pckages you need
library(ggplot2)
library(reshape2)
# assume xaxile column is unneeded? - did you add this column yourself?
dat$xaxile <- NULL
# reshape data for plotting
dat.m <- melt(dat)
# plot
ggplot(dat.m, aes(x=variable, y=value, colour=gen,
shape=gen, linetype=gen, group=gen)) +
geom_point() +
geom_line()
You can then use scale_linetype_manual and scale_shape_manual to manually specify how you want the plot to look. This post will help, but there are many others as well
I'm still in the process of learning R using Swirl and RStudio, and a goal I've set for myself is to recreate this graph. I have a small dataset that I will link below (it's saved as a plain text CSV file that I import into R with headings enabled).
If I try to plot that dataset without changing anything, I get this, which is obviously not the goal.
At first I thought the problem would be in the class of my imported dataset, defined as kt. After class(kt) turned out to be data.frame I figured that wasn't the problem. Should I be trying to rewrite the table to something that R can plot instantly, or should I be trying to extract each species individually, plot them separately and then combining the different plots into one graph? Perhaps there is something wrong with my dates, I know that R handles dates in a specific way. Maybe these solutions are not even needed and I'm just doing something stupidly simple wrong, but I can't find it myself.
Your help is much appreciated.
Dataset:
Species,week 0,week 1,week 2,week 3,week 4,week 5,week 6,week 7,week 8,week 9,week 10,week 11,week 12,week 13,week 14,week 15,week 16,week 17,week 18
Caesalpinia coriaria,0.0%,24.0%,28.0%,28.0%,32.0%,37.0%,40.0%,46.0%,52.0%,56.0%,63.0%,64.0%,68.0%,71.0%,72.0%,,,,
Coccoloba swartzii,0.0%,0.0%,1.0%,10.0%,19.0%,31.0%,33.0%,39.0%,43.0%,48.0%,52.0%,52.0%,52.0%,52.0%,52.0%,52.0%,52.0%,55.0%,
Cordia dentata,0.0%,5.0%,18.0%,21.0%,24.0%,26.0%,27.0%,30.0%,32.0%,32.0%,32.0%,32.0%,32.0%,32.0%,33.0%,33.0%,33.0%,34.0%,35.0%
Guaiacum officinale,0.0%,0.0%,0.0%,0.0%,4.0%,5.0%,5.0%,5.0%,7.0%,8.0%,8.0%,8.0%,8.0%,8.0%,8.0%,8.0%,8.0%,,
Randia aculeata,0.0%,0.0%,0.0%,4.0%,13.0%,14.0%,18.0%,19.0%,21.0%,21.0%,21.0%,21.0%,21.0%,22.0%,22.0%,22.0%,22.0%,,
Schoepfia schreberi,0.0%,0.0%,0.0%,0.0%,0.0%,0.0%,1.0%,4.0%,8.0%,11.0%,13.0%,21.0%,21.0%,24.0%,24.0%,25.0%,27.0%,,
Prosopis juliflora,0.0%,7.5%,31.3%,34.2%,,,,,,,,,,,,,,,
Something like this??
# get rid of "%" signs
df <- data.frame(sapply(df,function(x)gsub("%","",x,fixed=T)))
# convert cols 2:20 to numeric
df[,2:20] <- sapply(df[,2:20],function(x)as.numeric(as.character(x)))
library(reshape2)
library(ggplot2)
gg <- melt(df,id="Species")
ggplot(gg,aes(x=variable,y=value,color=Species,group=Species)) +
geom_line()+
theme_bw()+
theme(legend.position="bottom", legend.title=element_blank())
There are lots of problems here.
First, if your dataset really has those % signs, then R interprets the data as character and imports it as factors. So first we have to get rid of the % (using gsub(...), and then we have to convert what's left to numeric. With factors, you have to convert to character first, then numeric, so: as.numeric(as.character(...)). All of this could have been avoided if you exported the data without the % signs!!!
Plotting multiple curves with different colors is something the ggplot package was designed for (among many other things), so we use that. ggplot prefers data in "long" format - all the data in one column, with a second column distinguishing different datasets. Your data is in "wide" format - data in different columns. So we convert to long using melt(...) from the reshape2 package. The result, gg has three columns: Species, variable and value. value contains the actual data and variable contains the week number.
So now we create a ggplot object, setting the x-axis to the variable column, the y-axis to the value column, with color mapped to Species, and we tell ggplot to plot lines (using geom_line(...)).
The rest is to position the legend at the bottom, and turn off some of the ggplot default formatting.