Graphing distribution of independent variable for any given dataset - r

I am looking to automate the process of exploratory data analysis and would like to graph the distribution (using line plots, histograms, density curves, etc.) of all the input variables. As my code stands, I am simply getting 4 blank graphics windows. What am I doing incorrectly? If there is a better approach, I am open to that as well.
mydata <- read.csv("http://www.ats.ucla.edu/stat/data/binary.csv")
for (i in names(mydata)){
qplot(data=mydata,i,geom="bar", fill="admit")
dev.new()
}

This situation looks like one where aes_string would come in handy (along with the addition of print around your call to ggplot). I found this post showing how to use aes_string with ggplot inside a loop.
So it would look something like this:
mydata$admit = factor(mydata$admit)
for(i in names(mydata)) {
print(ggplot(mydata) + geom_bar(aes_string(x = i, fill = "admit")))
}
I'm working in RStudio so have skipped the dev.new part of your code. I found I needed to convert admit to a factor, as well.

Related

How to plot a histogram of a specific data frame column in R

I am super new to coding with R, Im taking is as part of a bachelors degree program. I am super stuck on something I feel should be basic but I cannot get my code to work and I am not sure why. The prompt is:
"In this problem we will be using the mpg data set, to get access to the data set you need to load the tidyverse library.
Complete the following steps:
Create a histogram for the cty column with 10 bins"
and for my code I have:
library(tidyverse)
print(mpg)
df <- mpg[ , c("city")]
histo <- ggplot(data = df, aes(x=median)) + geom_histogram(bins=10)
print(histo)
The first print was just to make sure the data loaded correctly, which it did. I am not sure about the second print function, the histo one. Ive gotten various error messages or bugs so Ive been just moving stuff around and trying different commands to get it to work. Im following the steps previously outlined in our reading, but I cannot seem to get this to work. Any help would be appreciated.
I have tried removing the print(histo) function and just leaving the ggplot, but that give me a blank white box instead of a plot, or no plot is printed.

Is there any analogs of uniPlot function?

I was looking through some functions and found library(MVN), where I wanted to use the uniPlot function, which is quite neat as it quickly provides summary plots for each column in data frame. I was using :
uniPlot(ready, type="histogram")
But the function was depreciated, I was wondering whether anyone knows anything similar to this function ( plot histograms with overlaying normal curve)?
You can perform this in the current MVN version as following:
mvn(data = iris[1:50,1:3], mvnTest = "royston", univariatePlot =
"histogram")

Plotting functions in Julia

I am trying to plot a function in Julia, but keep getting errors. I don't understand what is wrong. The input and output of $\varphi$ is a scalar. I've used x=1530:1545 and still get an error-- can anyone enlighten me? I am very confused.
I am using Julia 0.7.
EDIT:
I got it to work with a slight modification--I changed
x = 1530:1545
added the following two lines
y = t.(x)
plot(x,y)
Why did I have to do this though?
This feature is currently not available in PyPlots.jl, if you would like to have it in the future, your best bet is to file an issue.
However, you can get that functionality via Plots.jl and using PyPlot as a backend.
It would look like this (I'll take a simpler function):
using Plots
pyplot()
start_point = 0
end_point = 10
plot_range = start_point:end_point
plot(sqrt,plot_range) # if you want the function exactly at 0,1,2,3...
plot(plot_range,sqrt) # works the same
plot(sqrt,start_point,end_point) # automatically chooses the interior points

Plot going off graph in gvisMotionChart

I have created a plot in R using googleVis, specifically gvisMotionChart, plotting a number of variables.
I am primarily using the line graph and it is all good when I view the graph with all variables, however when I select some of the individual variables it zooms in sunch that some of the plot for this variable is no longer on the graph. I know it should zoom in just to view this variable and can exclude other variables (which is a good feature) but it zooms in too much so that the variable I am after is not entirely on the graph.
This doesn't happen with all variables, and I can get around it by also selecting other variables either side of the one which I want to view, but it would be good if I could fix this. Has anyone come across a similar problem before and know a way around it?
Thanks in advance
EDIT: I have an example of this using the data Batting from the Lahman package. (I know nothing about basaeball so the analysis probably doesn't make sense, in fact looking at the results it almost certainly doesn't but it displays my point). If you run the following code:
library(Lahman)
recent <- subset(Batting, yearID > 2000)
homeruns <- aggregate(HR ~ stint + yearID, data = recent, FUN = sum)
avgHR <- mean(homeruns$HR)
homeruns$HR <- homeruns$HR - avgHR
m <- gvisMotionChart(data = homeruns, idvar = "stint", timevar = "yearID")
plot(m)
Then select the line graph, then subset on number 2, the top part of the graph is cut off
It seems to be Google's bug. I could even reproduce this same error in their "Visualization Playground" (https://code.google.com/apis/ajax/playground/?type=visualization#motion_chart) making part of the data negative.
I've already reported the issue as a bug: https://code.google.com/p/google-visualization-api-issues/issues/detail?id=1479
Might the force be with them!
I just had the same problem w/ a Sankey plot. I resolved it by deleting entries with value==0. However, I just tried to reproduce your example and could not reproduce your bug, so perhaps this has already been solved?

R create a function to plot and do a regression for several rows of a data set

i am new to programming / R and i have a question that might be very easy.
my function is:
par(mfrow=c(2,2))
plot_QQ=function(x) {for(i in 2:x)
plot(c(data_raw[,Group1[i]]),c(data_raw[,Group1[1]]), xlab=paste("replicate",i), ylab="replicate 1")
abline(lm(c(data_raw[,Group1[i]])c(data_raw[,Group1[1]]))}
group1 is an vector c("","","") to grap specific the data. This function is working, but R does not draw the abline() in all plots. (only in the "last" plot c(data_raw[,Group1[i=x]]),c(data_raw[,Group1[1]]) the line is drawn.
sorry for such an easy question and thx for helping
greetz
In future you should supply some simulated data so that people can run your code, it's unclear what exactly you're trying to do. You don't need the c() functions, and your lm call isn't proper. Also you don't have curly braces around your for loop. Try this.
par(mfrow=c(2,2))
plot_QQ=function(x) {for(i in 2:x){
plot(data_raw[,Group1[i]],data_raw[,Group1[1]], xlab=paste("replicate",i), ylab="replicate 1")
abline(lm(data_raw[,Group1[i]]~data_raw[,Group1[1]])}}

Resources