R pasteing data errors - r

I'm pulling my hair out because I need data sets input into R for a test but some of them are just too big to input by hand like i've been getting away with up until now. Every time I try and paste one of the teacher's data sets it shows up in error below each row. It doesn't do this for my friend on his mac (I have windows) and I have tried deleting and redownloading, even from another server! I've also tried imputing the data in excel first and that didn't work either. please help, I don't know what I can do differently.
data set from web
errors in R

Related

R- Not Loading all 5 variables only 1

Last night I attended my 1st R class and I am having some difficulties using the read.csv function. When I tried to run the function it is only uploading the 1st variable.
Does anyone know why this is happening. I don't know if it makes a difference but I am using a Mac.
I don't know if this helps, but sometimes when I create a .csv file from an excel file all of the variables collapse into one variable (I don't know why). Check the structure of your dataset using head(df) or str(df) (where df is the name of your data) and confirm that this didn't happen. If it did happen, you'll have to fix the .csv. If that isn't the problem, I'm not sure what the issue is.
For future reference it would help for you to supply a reproducible sample of your data as well as the full script that you are using.

How to continue project in new file in R

I have a large population survey dataset for a project and the first step is to make exclusions and have a final dataset for analyses. To organize my work, I must continue my work in a new file where I derive survey variables correctly. Is there a command used to continue work by saving all the previous data and code to the new file?
I don´t think I understand the problem you have. You can always create multiple .R files and split the code among them as you wish, and you can also arrange those files as you see fit in the file system (group them in the same folder with informative names and comments, etc...).
As for the data side of the problem, you can load your data into R, make any changes / filters needed, and then save it to another file with one of the billions of functions to write stuff to the disk: write.table() from base, fwrite() from data.table (which can be MUCH faster), etc...
I feel that my answer is way too obvious. When you say "project" you mean "something I have to get done" or the actual projects that you can create in rstudio. If it´s the first, then I think I have covered it. If it´s the second, I never got to use that feature so I am not going to be able to help :(
Maybe you can elaborate a bit more.

Using R with tidyquant and massiv data

While working with R I encountered a strange problem:
I am processing date in the follwing manner:
Reading data from a database into a dataframe, filling missing values, grouping and nesting the data to a combined primary key, creating a timeseries and forecastting it for every group, ungroup and clean the data, write it back into the DB.
Somehting like this:
https://cran.rstudio.com/web/packages/sweep/vignettes/SW01_Forecasting_Time_Series_Groups.html
For small data sets this works like a charm, but with lager ones (over about 100000 entries) I do get the "R Session Aborted" screen from R-Studio and the nativ R GUI just stops execution and implodes.
There is no information in every log file that I've look into. I suspect that it is some kind of (leaking) memory issue.
As a work around I'm processing the data in chunks with a for-loop. But no matter how small the chunk size is, I do get the "R Session Aborted" screen, which looks a lot like leaking memory.
The whole date consist of about 5 million rows.
I've looked a lot into packages like ff, the big-Family and matter basically everything from https://cran.r-project.org/web/views/HighPerformanceComputing.html
but this dose not seem to work well with tibbles and the tidyverse way of data processing.
So, how can I improve my scrip to work with massiv amounts of data?
How can I gather clues about why the R Session is Aborted?
Check out the article at:
datascience.la/dplyr-and-a-very-basic-benchmark
There is a table that shows runtime comparisons for some of the data wrangling tasks you are performing. From the table, it looks as though dplyr with data.table behind it is likely going to do much better than dplyr with a dataframe behind it.
There’s a link to the benchmarking code used to make the table, too.
In short, try adding a key, and try using data.table over dataframe.
To make x your key, and say your data.table is named dt, use setkey(dt,x).
While Pakes answer deals with the described problem I found a solution to the underlying problem. For Compatibility reason I used R in the 3.4.3 version. Now I'm using the newer 3.5.1 version which works quite fine.

Convert a database from MongoDB to a R data frame using Rmongo

I am trying to obtain a database that comes from Mongo DB to R, so I can make anlaysis on it. The bridge between these two is a R package: Rmongo.
As I have some policy rules, I cannot show you the dataset and my output, so I will try to explain as best as possible.
My two first commands, after installing the package, are these ones:
mg1 <- mongoDbConnect("test", "localhost", 27018)
dbShowCollections(mg1)
Which works, as it shows the collection, or the different variables.
Then, I can use the commands made by the Rmongo package, meaning:
query = dbGetQuery(mg1, 'address_history','{}')
This normally returns a data frame with all the variables on each column. But, because it is a nested file, I only get the first three variables (out of around fifty) because they are at the top of the nest. For the rest, I get one column of the data frame with the json code (so of approximately 50 variables) that I cannot seem to turn in a data frame. If someone is familiar with that, please help me.
I already saw on Stack Overflow a way to do it manually thanks to gsub, and in general pattern with the code, but this code is dissimilar, and doing it manually will not make it work.
Furthermore, there is also another command via the Rmongo package:
query2 = dbGetQueryForKeys(mg1, 'address_history', '{}', '{address:1}')
where I can return the variable that I want. Unfortunately, because this is a nested file, it also cannot find the variables that are not in the top of the nest.
Is there another command or another package that I can use? I am open to any other opportunity to get this dataset (very large) into an R data frame, so I can make any inferences.
Thank you very much!
I tried just now setting up Rmongo and mongolite for R. I got mongolite working in minutes with the starter data locally . I could not get even get the data I wanted inserted using Rmongo.
I think if you try installing mongolite you will find their documentation and package simpler. https://github.com/jeroen/mongolite

Can't see variable name list when subsetting dataset in R

I'm using R studio on a Macbook pro - This just started happening. I've tried re-downloading R & R Studio and it seems to be a recurring issue. Not sure if I have a setting applied that's making this happen - but any help would be great!
Shown below, when I try to subset variables within the dataset (MFP_Post) all the variables are on the left hand side, where I can't read any of the names.

Resources