I have been messing around with R for the last year and now want to get a little deeper. I want to learn more about the ff and big data packages because have been trouble getting through some of the documentation.
I like to learn by doing, so lets say I have a huge CSV called data.csv and its 300 mbs. It has 5 headers Url, PR, tweets, likes, age. I want to deduplicate the list based on URLs. Then I want to plot PR and likes on a scatter plot to see if there is any correlation. How would I go about doing that basic analysis?
I always get confused with the chunking of the big data processes and how you cant load everything in at once.
What are come common problems you have ran into using the ff package or big data?
Is there another package that works better?
Basically any information to get started using a lot of data in R would be useful.
Thanks!
Nico
Related
So I'm working with a network dataset from Stanford's SNAP Datasets and "SNAP" has wrappers for Python and C++ but not R - however, the data is still usable since I believe it's a mix of CSV files.
I can actually read in the .edges file and form an igraph object but want to read in the other files, get the attributes & add those attributes to the igraph object for analysis. I'm just confused on how to work with the .circles, .egofeat, .feat, and .featnames files since the documentation on the dataset is very scarce. Hoping someone has worked with the dataset in R or even another language and has any pointers to get started.
Thank you!
I am struggling a bit with an analysis I need to do. I have collected data consisting of little owl calls that were recorded along transects. I want to analyse these recordings for similarity, in order to see which recorded calls are from the same owls and which are from different owls. In that way I can make an estimate of the size of the population at my study area.
I have done a bit of research and it seems that the package warbleR seems to be suitable for this. However, I am far from an R expert and am struggling a bit with how to go about this. Do any of you have experience with these types of analyses and maybe have example scripts? It seems to me that I could use the function cross_correlation and maybe make a pca, however in the warbleR vignette I looked at they only do this for different types of calls and not for the same type call from different individuals, so I am not sure if it would work.
to be able to run analyses with warbleR you need to input the data using the "selection_table" format. Take a look at the example data "lbh_selec_table" to get a sense of the format:
library(warbleR)
data(lbh_selec_table)
head(lbh_selec_table)
The whole point of these objects is to tell R the time location in your sound files (in seconds) of the signals you want to analyze. Take a look at this link for more details on this object structure and how to import it into R.
Currently, we are working on an eye-tracking study and we are not quite satisfied with the analysis option of SMI’s BeGaze. Hence, I would like to ask you know good way to extract data from BeGaze, which can be processed by a handy R-package, which is still working under R 3.6.1 – not like ETRAN.
It would be great to do add AOIs manually, do heat maps, analyze saccades, fixation times, and ratios between AOIs.
We came across eyetrackingR, but we are still struggling with extracting BeGaze's data in a processable way.
Any help, tutorial, hint, etc. is much appreciated.
David
I am new to R and have just started to use it. I am currently experimenting with the quantmod, rugarch and rmgarch packages.
In particular, I'm implementing the last package to make a multivariate portfolio analysis for the case of the european markets. In this sense, I need to download the 3-month german treasury bills, in order to use them as risk free rate. However, as far as I known, I can´t download the the mentioned data serie from Yahoo, Google or FDRA databases, so I have already downloaded them from investing.com and I want to load them in R.
The fact here is, my data is different from the ones downloaded by the getsymbols () function of yahoo, because in this case I only have 2 columns, the date column and the closing price column. To sump up, the question arises here is, is there any way to load this type of data in R for rmgarch purposes??
thanks in advance
Not sure if this is the issue, but this is how you might go about getting the data from a csv file.
data <- read.csv(file="file/path/data.csv")
head(data) # Take a look at your data
# Do this if you want the data only replacing ColumnName with the proper name
data_only <- data$ColumnName
It looks like the input data for rugarch needs to be an xts vector. So, you might want to take a look at this. You might also want to take a look at ?read.csv.
I just found a bunch of weather data that I would like to play around with in glmnet in R. First I've been reading and organizing the data in R, and right now I am just trying to look at the raw data of each variable. Unfortunately, each variable has a lot of data and R isn't able to print it all. Is there a way I can view all the raw data in R or just in the file itself? I've tried opening the file in excel to no success. Thanks!
Try to use Frequency tables, you can group by segments.
str() , summary(), table(), pairs(), plots() etc. There are several libraries (such as decr) which facilitate analyzing numerical and factor levels. Let me know if you need help with any.