Iterating over a excel file and ploting a 2 columns comparison - r

First of all, i am a beginner so i apreciate your patience and time to trying help me. i have one excel file with 3 columns: Shopname, 2016 and 2017 wich are particular values for a comparison.
Id like to iterate over the excel file and plot two bars one with the value for shop X in the year 2016 and other bar for 2017.
ill post here what i wrote until this moment, i can see the printings but not the plots... what could i make better?
> #importing excel file
> #and ploting each line comparison between 2 columns
> library(xlsx)
> xl_data <- read.xlsx("File.xlsx", "Plan1")
> df<- data.frame(xl_data)
> # plot using facets
> ggplot(aes(x=time, y=sold, group=shop)) +geom_bar(stat="identity")+
facet_grid(.~xl_data)

Afonso,
You don't need a loop for that. One way to accomplish it would be with ggplot's facetting capability:
#### load needed libraries
library(tidyr)
library(ggplot2)
### load data -- this is coming from Excel
dt <- tribble(
~LOJAS, ~y2016, ~y2017,
"CD NEREU" , 168459.86, 223637.46,
"LJ CANOINH", 14480.03, 80006.86,
"LJ MAL338" , 21095.07, 62768.54,
"LJ SBENTO" , 43290.47, 43168.34)
### arrange data for plotting
dt %>%
gather(time, sold, y2016, y2017) %>%
# plot using facets
ggplot(aes(x=time, y=sold, group=LOJAS)) +
geom_bar(stat="identity") +
facet_grid(.~LOJAS)

Related

How to make Violin Plots from a text file

Currently, I am trying to make an image of multiple violin graphs that I read in from a text file. The text file is formatted in a way so that there a "count" column which is just incrementing by 1 to show the index of the results, and there are also multiple columns each being the results of a different variable size. Below is an example of a portion of the text file.
Count X1.1 X1.2 X1.3 X1.4
1 174.647 173.368 172.713 172.264
2 169.549 166.791 167.010 165.682
3 174.341 170.821 169.861 169.103
4 178.305 177.736 177.796 176.067
5 160.614 159.842 158.548 157.145
So I would like to create a new violin graph for each column using ggplot (1.1, 1.2, etc.) that can be displayed side by side.
library(ggplot2)
myData <- read.csv("E2_1_RingSize.text", sep = "\t", header=TRUE)
I've read in the file I would want, and am able to plot one column at a time by hard coding in the column name. See below
graph1 <- ggplot(myData, aes(x=Count, y=X1.1) + geom_violin()
But I'm unsure how to include all of the columns at once. It's most likely an easy fix, only 1-2 lines, but I'm not that experienced in R/RStudio and so I've got no clue.
What you need to do is pivot your data.frame so it's in long format:
dat %>%
tidyr::pivot_longer(-Count) %>%
ggplot(aes(x=as.factor(name), y=value)) + geom_violin()

Two different issues with dates and qplots

I am a student and have been given a project to study climate data from Giovanni (NASA). Our code is provided and we are left to 'find our way' and therefore other answers don't seem to relate to the style of code i've been given. Further to this i am a beginner in R so changing the code is very difficult.
Basically i'm trying to create a time-series plot from the following code:
## Function for loading Giovanni time series data
load_giovanni_time <- function(path){
file_data <- read.csv(path,
skip=6,
col.names = c("Date",
"Temperature",
"NA",
"Site",
"Bleached"))
file_data$Date <- parse_date_time(file_data$Date, orders="ymdHMS")
return(file_data)
}
## Creat a list of files
file.list <- list.files("./Data/courseworktimeseries/")
file.list <- as.list(paste0("./Data/courseworktimeseries/", file.list))
# for(i in file.list){
# load_giovanni_time(i)
# }
#Load all the files
all_data <- lapply(X=file.list,
FUN=load_giovanni_time)
all_data <- as.data.frame(do.call(rbind, all_data))
## Inspect the data with a plot
p <- qplot(data=all_data,
x=Date,
y=Temperature,
colour=Site,
linetype=Bleached,
geom="line")
print(p)
Now the first problem is that when the data is merged into one dataset, it changes all the dates (the starting date range is 2002-2015 and it changes to 2002-2030), which obviously ruins the plot. I found that i can stop the dates changing by deleting this code:
file_data$Date <- parse_date_time(file_data$Date, orders="ymdHMS")
However, when this is deleted, i get the following error:
geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?
Could anyone help me get round this without editing the code too much? I feel like it's a problem with the line of the code formatting the date incorrectly or something so i imagine it's only a small problem. I'm just very much a beginner and have to implement the code within 1-2 days.
Thanks
For anyone who ever has this problem... I found the solution.
file_data$Date <- parse_date_time(file_data$Date, orders="ymdHMS")
This line of code reads in the date information from my CSV in that order. In excel my date was the other way round so if it said 30th December 2015 (30/12/2015), R would read it in as 2030-12-20, screwing up the data.
In Excel select all dates, CTRL+1 and then change format to match the date R is 'parsing'.
All done =)

Have trouble running googlevis with my dataset

I am new to R programming. I was trying to visualize some dataset. I was using Googlevis in R and was unable to visualize it.
The error I got was:
Error: Length of logical index vector must be 1 or 8, got: 14835
Can someone help?
Dataset is here:
https://www.kaggle.com/c/predict-west-nile-virus/data
Code is below
# Read competition data files:
library(readr)
data_dir <- "C:/Users/Wesley/Desktop/input"
train <- read_csv(file.path(data_dir, "train.csv"))
spray <- read_csv(file.path(data_dir, "spray.csv"))
# Generate output files with write_csv(), plot() or ggplot()
# Any files you write to the current directory get shown as outputs
# Install and read packages
library(lubridate)
library(googleVis)
# Create useful date columns
spray$Date <- as.Date(as.character(spray$Date),format="%Y-%m-%d")
spray$Week <- isoweek(spray$Date)
spray$Year <- year(spray$Date)
# Create a total count of measurements
spray$Total <- 1
for(i in 1:nrow(spray)) {
spray$Total[i] = i
}
# Aggregate data by Year, Week, Trap and order by old-new
spray_agg <- aggregate(cbind(Total)~Year+Week+Latitude+Longitude,data=spray,sum)
spray_agg <- spray[order(spray$Year,spray$Week),]
# Create a misc format for Week for Google Vis Motion Chart
spray_agg$Week_Format <- paste(spray_agg$Year,"W",spray_agg$Week,sep="")
# Function to create a motion chart together with a overview table
# It takes the aggregated data as input as well as a year of choice (2007,2009,2011,2013)
# It filters out "no presence" weeks since they distort the graphical view
# Next to that it creates an overview table of that year
# With gvisMerge you can merge the 3 html outputs into 1
create_motion <- function(data=spray_agg,year=2011){
data_motion <- data[data$Year==year]
motion <- gvisMotionChart(data=data_motion,idvar="Total",timevar="Week_Format",xvar="Longitude",yvar="Latitude"
,sizevar=0.1,colorvar="Blue",options=list(width="600"))
return(motion)
}
# Get the per year motion charts
#motion1 <- create_motion(spray_agg,2007)
#motion2 <- create_motion(spray_agg,2009)
motion3 <- create_motion(spray_agg,2011) : (Error: Length of logical index vector must be 1 or 8, got: 14835)
motion4 <- create_motion(spray_agg,2013) :(Error: Length of logical index vector must be 1 or 8, got: 14835)
# Merge them together into 1 dashboard
output <- gvisMerge(gvisMerge(motion1,motion2,horizontal=TRUE),gvisMerge(motion3,motion4,horizontal=TRUE),horizontal=FALSE)
plot(output)
# Plot the output in your browser

How to plot several line plots in one

I would like to plot my figure using R (ggplot2). I'd like to have a line graph like image 2.
here my.data:
B50K,B50K+1000C50K,B50K+2000C50K,B50K+4000C50K,B50K+8000C50K,gen,xaxile
0.3795,0.4192,0.4675,0.5357,0.6217,T18-Yield,B50K
0.3178,0.3758,0.4249,0.5010,0.5870,T20-Yield,B50K+1000C50K
0.2795,0.3266,0.3763,0.4636,0.5583,T21-Yield,B50K+2000C50K
0.2417,0.2599,0.2898,0.3291,0.3736,T18-Fertility,B50K+4000C50K
0.2002,0.2287,0.2531,0.2962,0.3485,T19-Fertility,B50K+8000C50K
0.1642,0.1911,0.2151,0.2544,0.2951,T20-Fertility
***--> The delimiter is ",". By the way, I have not any useful .r script which would be helpful or useful.
The illustrated image shows my figure in Microsoft word.
I have tried several scripts via internet but non of them have not worked.
would you please help me to have a .r script to read my data file like img1 and plot my data like illustrated figure.
The trick is to reshape your data (using melt from the reshape2 package) so that you can easily map colours and linetypes to gen.
# Your data - note i also added an extra comma after the fifth column in row 6.
# It would be easier if you gave data using dput as described in comments above - thanks
dat <- read.table(text="B50K,B50K+1000C50K,B50K+2000C50K,B50K+4000C50K,B50K+8000C50K,xaxile,gen
0.3795,0.4192,0.4675,0.5357,0.6217,B50K,T18-Yield
0.3178,0.3758,0.4249,0.5010,0.5870,B50K+1000C50K,T20-Yield
0.2795,0.3266,0.3763,0.4636,0.5583,B50K+2000C50K,T21-Yield
0.2417,0.2599,0.2898,0.3291,0.3736,B50K+4000C50K,T18-Fertility
0.2002,0.2287,0.2531,0.2962,0.3485,B50K+8000C50K,T19-Fertility
0.1642,0.1911,0.2151,0.2544,0.2951,,T20-Fertility",
header=T, sep=",", na.strings="")
# load the pckages you need
library(ggplot2)
library(reshape2)
# assume xaxile column is unneeded? - did you add this column yourself?
dat$xaxile <- NULL
# reshape data for plotting
dat.m <- melt(dat)
# plot
ggplot(dat.m, aes(x=variable, y=value, colour=gen,
shape=gen, linetype=gen, group=gen)) +
geom_point() +
geom_line()
You can then use scale_linetype_manual and scale_shape_manual to manually specify how you want the plot to look. This post will help, but there are many others as well

Reading from CSV and Plotting Boxes in R

I am looking for the most convenient way of creating boxplots for different values and groups read from a CSV file in R.
First, I read my Sheet into memory:
Sheet <- read.csv("D:/mydata/Table.csv", sep = ";")
Which just works fine.
names(Sheet)
gives me correctly the Headlines of the different columns.
I can also access and filter different groups into separate lists, like
myData1 <- Sheet[Sheet$Group == 'Group1',]$MyValue
myData2 <- Sheet[Sheet$Group == 'Group2',]$MyValue
...
and draw a boxplot using
boxplot(myData1, myData2, ..., main = "Distribution")
where the ... stand for more lists I have filled using the selection method above.
However, I have seen that using some formular could do these steps of selection and boxplotting in one go. But when I use something like
boxplot(Sheet~Group, Sheet)
it won't work because I get the following error:
invalid type (list) for variable 'Sheet'
The data in the CSV looks like this:
No;Gender;Type;Volume;Survival
1;m;HCM;150;45
2;m;UCM;202;103
3;f;HCM;192;5
4;m;T4;204;101
...
So i have multiple possible groups and different values which I'd like to represent as a box plot for each group. For example, I could group by gender or group by type.
How can I easily draw multiple boxes from my CSV data without having to grab them all manually out of the data?
Thanks for your help.
Try it like this:
Sheet <- data.frame(Group = gl(2, 50, labels=c("Group1", "Group2")),
MyValue = runif(100))
boxplot(MyValue ~ Group, data=Sheet)
Using ggplot2:
ggplot(Sheet, aes(x = Group, y = MyValue)) +
geom_boxplot()
The advantage of using ggplot2 is that you have lots of possibilities for customizing the appearance of your boxplot.

Resources