Plot DataFrames in Julia using Plots - julia

In Julia, is there a way to plot a dataframe similarly to df.plot() in Python's Pandas?
More specifically, I am using Plots, plotlyjs() and the DataFrames package.

Just to add to Przemyslaw answer, there's an extension of Plots.jl called StatsPlots.jl which provides lots of extra methods and macros to plot DataFrames in a concise way while also including things like grouping variables in the data etc.
Worth checking out the full readme over there to see what it can do, but two equivalent ways of achieving the same output you see in Prezemyslaw's answer are:
julia> #df df plot([:series1 :series2 :series3])
or, more concisely:
julia> #df df plot(cols())
Note in either case StatsPlots will label the series according to the names of their respective source columns automatically.

Suppose you have a DataFrame:
using DataFrames, Plots
df = DataFrame(series1 = 1:10, series2 = sin.(1:10), series3=rand(10));
Than you can do:
plot(Matrix(df), labels=permutedims(names(df)), legend=:topleft)

Related

Creating multiple scatter plots with a for loop in R

I am looking for a way to use a for loop to loop through multiple columns of a csv and plot them.
Here is an example of how I have been making a scatter plot:
ggplot(top_scorers, aes(x=Win%,y=PER))
Top_scorers is the name of the csv with Win% and PER being columns in the file. I was hoping for a way to keep the x value the same while looping through different columns for the y value. If this is confusing please let me know and I will try to clear up any issues. Thanks
You can use a lapply based solution over the columns of your dataframe. Using you want to keep the first one as x dimension, use aes_string in ggplot
mtcars
library(ggplot2)
lapply(colnames(mtcars)[2:length(colnames(mtcars))], function(nm){
ggplot(mtcars) +
geom_point(aes_string(x =colnames(mtcars)[1],
nm))
})

Dcast() weird output

I have two dataframes. Applying the same dcast() function to the two get me different results in the output. Both the dataset have the same structure but different size. The first one has more than 950 rows:
The code I apply is:
trans_matrix_complete <- mod_attrib$transition_matrix
trans_matrix_complete[which(trans_matrix_complete$channel_from=="_3RDLIVE"),]
trans_matrix_complete <- rbind(trans_matrix_complete, df_dummy)
trans_matrix_complete$channel_to <- factor(trans_matrix_complete$channel_to,
levels = c(levels(trans_matrix_complete$channel_to)))
trans_matrix_complete <- dcast(trans_matrix_complete,
channel_from ~ channel_to,value.var = 'transition_probability')
And the trans_matrix_complete output I get is the following:
Something is not working as it should be as with the smaller dataframe of just few lines I get the following outcome:
Where
a) the row number is different. I'm not sure why there are two dots listed in the first case
b) and too, trying to assign rownames to the dataframe by
row.names(trans_matrix_complete) <- trans_matrix_complete$channel_from
does not work for the large dataframe, as despite the row.names contact the dataframe show up exactly as in the first image, without names assigned to rows.
Any idea about this weird behavior?
I resolved moving from dcast() to spread() of the package tidyverse using the following function:
trans_matrix_complete<-spread(trans_matrix_complete,
channel_to,transition_probability)
By applying spread() the two dataframe the matrix output is of the same format and accept rownames without any issue.
So I suspect it is all realted to the fact that dcast() and reshape2 package are not maintained anymore
Regards

R ggplot2: create data frame for multiple ROC plots with different line lengths

In case there is an easier way, I am trying to overlay the plots of 4 different "performance" objects from the ROCR package. The gist is that each of these objects contains two vectors of equal length, one for the X values and one for the Y values, but the X/Y vectors are not the same length between objects.
Currently I am just extracting and plotting these curves manually with plot() and lines(), to create this:
It's not terrible, but I think I would have better control with ggplot. The only problem is I can't think of a way to create a data.frame() from these vectors with ggplot.
ggplot prefers data in long format, so different lengths for different lines doesn't matter.
The structure is pretty easy - you have one column that defines the line, iteration, in your case, with values either 1, 2, 3, or 4 (probably make this one a factor); one column that gives x, and one column that gives y.
Since you don't provide any code or sample data, I'll assume that's as much of an answer as you're looking for. You can use c() on individual vectors or rbind() on individual data frames to combine them. Or dplyr::bind_rows or data.table::rbindlist() to operate on a list of data frames.

How do you make vector of variables with sequenced name in R?

I'm trying to find an easier way to make a vector of sequenced variable name.
for example, there are many variables in the data, and I want to select h190361, h190362, h190363, h190364, h190365 from the data.
in SAS or STATA or SPSS, if you want to pick some sequenced variables, you can simply write 'h190361-h190365' or 'from h190361 to h190365'
But I don't know any simple syntax for R.
hard way will be to write all the variable names,
x <- c(df$h190361, df$h190362, df$h190363, df$h190364, df$h190365)
but if there are many variables, it will be too much working.
another way that I thought of is to use 'paste' syntax,
k <- paste("h190", 361:365, sep = "")
x <- df[,k]
which returns desired result.
however, this seems not natural and not simple as SAS, SPSS, or STATA.
is there more easier way or simple syntax to select sequenced variables in R?
Thank you.
Maybe select from dplyr package?
select(df, h190361:h190365)
or with pipe:
df %>% select(h190361:h190365)
But be careful! select(df, X:Y) means "take columns X and Y from df plus everything in between", so if you have some columns of names, say, X, Y, Z between h190361 and h190365, they would be included too.
If you can find which columns you want to subset out easily, then you can just do something like..
df2 <- df[,1:4]
However, this approach would only work on sequenced columns.
Another approach would be to use regular expression.
df2 <- df[,grep("h190",colnames(df))]
You can change the pattern in grep() to suit ur needs.

Specifying names of columns to be used in a loop R

I have a df with over 30 columns and over 200 rows, but for simplicity will use an example with 8 columns.
X1<-c(sample(100,25))
B<-c(sample(4,25,replace=TRUE))
C<-c(sample(2,25,replace =TRUE))
Y1<-c(sample(100,25))
Y2<-c(sample(100,25))
Y3<-c(sample(100,25))
Y4<-c(sample(100,25))
Y5<-c(sample(100,25))
df<-cbind(X1,B,C,Y1,Y2,Y3,Y4,Y5)
df<-as.data.frame(df)
I wrote a function that melts the data generates a plot with X1 giving the x-axis values and faceted using the values in B and C.
plotdata<-function(l){
melt<-melt(df,id.vars=c("X1","B","C"),measure.vars=l)
plot<-ggplot(melt,aes(x=X1,y=value))+geom_point()
plot2<-plot+facet_grid(B ~ C)
ggsave(filename=paste("X_vs_",l,"_faceted.jpeg",sep=""),plot=plot2)
}
I can then manually input the required Y variable
plotdata("Y1")
I don't want to generate plots for all columns. I could just type the column of interest into plotdata and then get the result, but this seems quite inelegant (and time consuming). I would prefer to be able to manually specify the columns of interest e.g. "Y1","Y3","Y4" and then write a loop function to do all those specified.
However I am new to writing for loops and can't find a way to loop in the specific column names that are required for my function to work. A standard for(i in 1:length(df)) wouldn't be appropriate because I only want to loop the user specified columns
Apologies if there is an answer to this is already in stackoverflow. I couldn't find it if there was.
Thanks to Roland for providing the following answer:
Try
for (x in c("Y1","Y3","Y4")) {plotdata(x)}
The index variable doesn't have to be numeric

Resources