Trying to make a bar graph in ggplot using R with mongodb - r

I am very new to R, and what I am trying to achieve is that I have a dataset in CSV format stored in mongodb. I have already linked Rstudio and mongodb and the data is successfully imported in Rstudio. Now, I want to do some visualization of the data. I want to make some bar graphs, piecharts, heatmaps etc. But all the tutorials that I have seen they use dataframes in ggplot. How do I convert the imported data from CSV file to a dataframe? I know I might sound stupid but Im a beginner, any help would be appreciated. The dataset that im using is the 2017 CSV file from this link: https://www1.nyc.gov/site/nypd/stats/reports-analysis/stopfrisk.page

What is the structure of the "csv" you've imported from the database? You could try converting it to a data.frame using as.data.frame. If class(x) has more than one class, e.g. tibble and data.frame, a method will use the class which it is designed to use. If your object is of class e.g. tibble and data.frame, ggplot will know how to handle that.

Related

rangedummarizedexperiment for deseq2

I'm trying to use the DESeq2 package in R for differential gene expression, but I'm having trouble creating the required RangedSummarizedExperiment object from my input data. I have found several tutorials and vignettes for doing this, but they all seem to apply to a raw data set that is different from mine. My data has gene names as row names and patient id as column names, and the data is simply integer count data. There has to be a simple way to create the RangedSummarizedExperiment object from this type of input data, but I haven't yet found a way. Can anybody help? Thanks.
I had a similar problem understanding how to use this data structure. I eventually managed to do without it by using DESeqDataSetFromMatrix. You can see an example in the first code block of Modify r object with rpy2 (this code is pure R, rpy2 stuff comes after). In this example, I have genes as rows and samples as columns, so it is likely you will be able to adopt the same approach.

How to convert Spark R dataframe into R list

This is my first time to try Spark R to do the same work I did with RStudio, on Databricks Cloud Community Edition. But met some weird problems.
It seems that Spark R do support packages like ggplot2, plyr, but the data has to be in R list format. I could generate this type of list in R Studio when I am using train <- read.csv("R_basics_train.csv"), variable train here is a list when you use typeof(train).
However, in Spark R, when I am reading the same csv data as "train", it will be converted into dataframe, and this is not the Spark Python DataFrame we have used before, since I cannot use collect() function to convert it into list.... When you use typeof(train), it shows the type is "S4", but in fact the type is dataframe....
So, is there anyway in Spark R that I can convert dataframe into R list so that I can use methods in ggplot2, plyr?
You can find the origional .csv training data here:
train
Later I found that using r_df <- collect(spark_df) will convert Spark DataFrame into R dataframe, although cannot use R summary() on its dataframe, with R dataframe, we can do many R operations.
It looks like they changed SparkR, so you now need to use
r_df<-as.data.frame(spark_df)
Not sure if you call this as the drawback of sparkR, but in order to leverage many good functionalities which R has to offer such as data exploration, ggplot libraries, you need to convert your pyspark data frame into normal data frame by calling collect
df <- collect(df)

reaching max.print on R

I just found a bunch of weather data that I would like to play around with in glmnet in R. First I've been reading and organizing the data in R, and right now I am just trying to look at the raw data of each variable. Unfortunately, each variable has a lot of data and R isn't able to print it all. Is there a way I can view all the raw data in R or just in the file itself? I've tried opening the file in excel to no success. Thanks!
Try to use Frequency tables, you can group by segments.
str() , summary(), table(), pairs(), plots() etc. There are several libraries (such as decr) which facilitate analyzing numerical and factor levels. Let me know if you need help with any.

R monthly plot with GGPlot2 in RMarkdown

I have a CSV file with the following data:
RegistrationDate;User_Id;Items
RegistrationDate has format like '22.05.2014 14:25'
Is there any easy way to connect CSV data to R markdown script? All of examples I've seen use random generated data, that looks too bad for reproducible research.
I need to create 2 plots with ggplot2:
a plot of users count per month.
a plot of items collected per month
I've checked a lot of graphs looks close to this one, but didn't find any right version. Looks like I don't understand something about R plotting :(.
What do you mean when you say "connect" CSV data to R markdown sript?
You mean reading them?
data <- read.csv("directory/name.csv", sep=";")
Or you mean adding somewhere a specific relation, telling that this data is related to this analysis? If the second, you can check the archivist package, stored on GitHub, that possess a set of tools for datasets and figures archivisation. There is a information how to install that package.
In question 2, you will need an extra column that specifies the month.

How can I save a data set created using the memisc package in r?

I'm using memisc to read in a massive SPSS file in order to reshape it. This seems to work very well.
However, I'd also like to be able to output the results to an SPSS-readable file that retains labels and descriptions--after all, this is the advantage of using the data-set construct memisc has set up.
I've searched the memisc documentation for any mention of an export, save, or output function. I've also tried passing a memisc data set to write.foreign (from the foreign package).
Why am I doing this in the first place? Reshaping massive data files in SPSS is a pain. R makes it easy. But the folks I'm doing this for want to maintain the labels and descriptions.
Thanks!

Resources