Plot two columns in R from csv file - r

I have just started to learn R and I have a problem with plotting some values read from a CSV file.
I have managed to load the csv file:
timeseries <- read.csv(file="R/scripts/timeseries.csv",head=FALSE,sep=",")
When checking the content of timeseries, I get the correct results (so far, so good):
1 2016-12-29T19:00:00Z 6
...
17497 2016-12-30T00:00:00Z 3
Now, I am trying to plot the values - the date should be on the x-axis and the values on the y-axis.
I found some SO questions about this topic: How to plot a multicolumn CSV file?. But I am unable to make it work following the instructions.
I tried:
matplot(timeseries[, 1], timeseries[, -1], type="1")
Also, I tried various barplot and matplot modifications but I usuassly get some exception like this one: Error in plot.window(...) : need finite 'xlim' values
Could someone suggest how to tackle this problem? Sorry for elementary question...

You need to make sure your dates have class Date.
dates <- c("2016-12-29T19:00:00Z", "2016-12-30T00:00:00Z")
values <- c(6,3)
df <- data.frame(dates, values)
df$dates <- as.Date(df$dates)
Then you could use ggplot2
library(ggplot2)
qplot(df$dates, df$values) + geom_line()
or even the default
plot(df$dates, df$values, type = "l")
or with lattice as in the question you referred to
library(lattice)
xyplot(df$values ~ df$dates, type = "l")

Related

R: Cleaning GGally Plots

I am using the R programming language and I am new the GGally library. I followed some basic tutorials online and ran the following code:
#load libraries
library(GGally)
library(survival)
library(plotly)
I changed some of the data types:
#manipulate the data
data(lung)
data = lung
data$sex = as.factor(data$sex)
data$status = as.factor(data$status)
data$ph.ecog = as.factor(data$ph.ecog)
Now I visualize:
#make the plots
#I dont know why, but this comes out messy
ggparcoord(data, groupColumn = "sex")
#Cleaner
ggparcoord(data)
Both ggparcoord() code segments successfully ran, however the first one came out pretty messy (the axis labels seem to have been corrupted). Is there a way to fix the labels?
In the second graph, it makes it difficult to tell how the factor variables are labelled on their respective axis (e.g. for the "sex" column, is "male" the bottom point or is "female" the bottom type). Does anyone know if there is a way to fix this?
Finally, is there a way to use the "ggplotly()" function for "ggally" objects?
e.g.
a = ggparcoord(data)
ggplotly(a)
Thanks
Looks like your data columns get converted to a factor when adding the groupColumn. To prevent that you could exclude the groupColumn from the columns to be plotted:
BTW: Not sure about the general case. But at least for ggparcoord ggplotly works.
library(GGally)
library(survival)
data(lung)
data = lung
data$sex = as.factor(data$sex)
data$status = as.factor(data$status)
data$ph.ecog = as.factor(data$ph.ecog)
#I dont know why, but this comes out messy
ggparcoord(data, seq(ncol(data))[!names(data) %in% "sex"], groupColumn = "sex")

Plotting Basic Time Series Data in R - Not Plotting Correctly

I'm trying to plot some time series data. My plot looks like the following:
I'm uncertain as to why it displays the date as such. I'm using R Markdown in R studio. Below is my code:
agemployment<-read.csv("Employment-Level1.csv", header=TRUE)
Tried to change the class of Date:
as.Date(as.character(agemployment$Date),format="%m%d%Y")
That did nothing. Rest of code here:
`attach(agemployment)
View(agemployment)
head(agemployment)
agemployment<-ts(agemployment,frequency=12,start=c(2008, 1))
plot(agemployment, col="black", main="Agriculture Employment Level",
ylab="Total Employment Level (Thousands)", ylim=c(0, 250),lwd=2,
xaxs="i", yaxs="i", lty=1)'
This produces the above plot. I'm uncertain what I'm doing wrong. I would appreciate any help. Thank you!
EDIT:
Data here:
I suspect your issues are somehow driven by attach, generally attaching data frames is not a good practice. The following super-simple code worked for me:
# small dataset from your example, I use package readr to load it as data frame
df = readr::read_csv("DATE,Employment
1/1/2008,1245
2/1/2008,1280
3/1/2008,1343
4/1/2008,1251
5/1/2008,1236
6/1/2008,1265")
ts <- ts(data = df$Employment, frequency = 12, start = c(2008, 1))
plot(ts)
Using the file generated reproducibly in the Note at the end read the file into a zoo object making the index of class "yearmon" (representing year and month without day). Then plot it.
library(zoo)
z <- read.csv.zoo("Employment-Level1.csv", format = "%m/%d/%Y", FUN = as.yearmon)
plot(z)
or
library(ggplot2)
autoplot(z) + scale_x_yearmon()
(continued after plots)
If you wanted to convert z to a ts object or data frame:
tt <- as.ts(z)
DF <- fortify.zoo(z)
Note
Lines <- "DATE,Employment
1/1/2008,1245
2/1/2008,1280
3/1/2008,1343
4/1/2008,1251
5/1/2008,1236
6/1/2008,1265"
cat(Lines, file = "Employment-Level1.csv") # write out file
Realize that by providing an image in the question it means that everyone who answers must retype your data so in the future please provide the input data to questions in a reproducible form as we have done here.

define population level for PCA analysis in adegenet

I want to perform a PCA analysis in adegenet starting from a genepop file without defined populations.
I imported the data like this:
datapop <- read.genepop('tous.gen', ncode=3, quiet = FALSE)
it works, and I can perform a PCA after scaling the data.
But I would like to plot the results / individuals on the PCA axis according to their population of origin using s.class. I have a vcf file with a three lettre code for each individual. I imported it in R:
pops_list <- read.csv('liste_pops.csv', header=FALSE)
but now how can I use it to define population levels in the genind object datapop?
I tried something likes this:
setPop(datapop, formula = NULL)
setPop(datapop) <- pops_list
but it doesn't work; even the first line doesn't work: I get this message:
"Erreur : formula must be a valid formula object."
And then how should I use it in s.class?
thanks
Didier
Without a working example it is kind of hard to tell but perhaps you can find the solution to your problem here: How to add strata information to a genind
Either way from your examples and given how the setPop method works, your line setPop(datapop, formula = NULL) would not work because you would not be defining anything. You would actually have to do:
setPop(datapop) <- pops_list
while also guaranteeing that pops_list is a factor with the appropriate format
I know this is a bit late, but the way to do this is to add pops_list as the strata and then use setPop() to select a certain column:
strata(datapop) <- pops_list
setPop(datapop) <- ~myPop # set the population to the column called "myPop" in the data frame

Stacked bar in R

I have a table exported in csv from PostgreSQL and I'd like to create a stacked bar graph in R. It's my first project in R.
Here's my data and what I want to do:
It the quality of the feeder bus service for a certain provider in the area. For each user of the train, we assign a service quality based of synchronization between the bus and the train at the train stations and calculate the percentage of user that have a ideal or very good service, a correct service, a deficient service or no service at all (linked to that question in gis.stackexchange)
So, It's like to use my first column as my x-axis labels and my headers as my categories. The data is already normalized to 100% for each row.
In Excel, it's a couple of clicks and I wouldn't mind typing a couple of line of codes since it's the final result of an already quite long plpgsql script... I'd prefer to continue to code instead of moving to Excel (I also have dozens of those to do).
So, I tried to create a stacked bar using the examples in Nathan Yau's "Visualize This" and the book "R in Action" and wasn't quite successful. Normally, their examples use data that they aggregate with R and use that. Mine is already aggregated.
So, I've finally come up with something that works in R:
but I had to transform my data quite a bit:
I had to transpose my table and remove my now-row (ex-column) identifier.
Here's my code:
# load libraries
library(ggplot2)
library(reshape2)
# load data
stl <- read.csv("D:/TEMP/rabat/_stl_rabattement_stats_mtl.csv", sep=";", header=TRUE)
# reshape for plotting
stl_matrix <- as.matrix(stl)
# make a quick plot
barplot(stl_matrix, border=NA, space=0.1, ylim=c(0, 100), xlab="Trains", ylab="%",
main="Qualité du rabattement, STL", las = 3)
Is there any way that I could use my original csv and have the same result?
I'm a little lost here...
Thanks!!!!
Try the ggplot2 and reshape library. You should be able to get the chart you want with
stl$train_order <- as.numeric(rownames(stl))
stl.r <- melt(stl, id.vars = c("train_no", "train_order"))
stl.r$train_no <- factor(
stl.r$train_no,
levels = stl$train_no[order(stl$train_order)])
ggplot(stl.r, aes(x = factor(train_no), y = value, fill = variable)) + geom_bar(stat = 'identity')
It appears that you transposed the matrix manually. This can be done in R with the t() function.
Add the following line after the as.matrix(stl) line:
stl_matrix <- t(stl_matrix)

Time data values in R

how can I have a data set of only time intervals (no dates) in R, like the following:
TREATMENT_A TREATMENT_B
1:01:12 0:05:00
0:34:56 1:08:09
and compute mean times, etc, and draw boxplots with time intervals in the y-axis?
I am new to R, and I searched for this but found no example in the net.
Thanks
The chron-package has a 'times' class that supports arithmetic. You could also do all of that with POSIXct objects and format the date-time output to not include the date. I thought axis.POSIXct function has a format argument that should let you have time outputs. However, it does not seem to get dispatched properly, so I needed to construct the axis "by hand."
dft <- data.frame(x= factor( sample(1:2, 100, repl=TRUE)),
y= Sys.time()+rnorm(100)*4000 )
boxplot(y~x, data=dft, yaxt='n')
axis(2, at=seq(from=range(dft$y)[1], to =range(dft$y)[2], by=3000) ,
labels=format.POSIXct(seq(from=range(dft$y)[1], to =range(dft$y)[2], by=3000),
format ="%H:%M:%S") )
There did turn out to be an appropriate method, Axis.POSIXt (to which I thought boxplot should have been turning for plotting, but it did not seem to recognize the class of the 'y' argument):
boxplot(y~x, data=dft, yaxt='n')
Axis(side=2, x=range(dft$y), format ="%H:%M:%S")
Regarding your request for something "simpler", take a look at theis ggplot2 based solution, using the dft dataframe defined above with POSIXct times. (I did try with the chron-times object but got a message saying ggplot did not support that class):
require(ggplot2); p <- ggplot(dft, aes(x,y))
p + geom_boxplot()
Check out the "lubridate" package, and the "hms" function within it.

Resources