Converting (parsing?) character string of Facebook fancount data using R - r

I need to extract Facebook country likes for a number of brands. My problem is I dont even know where to start - I have spent the past 4 hours searching but to be honest Im not even sure what to search for. I can get the data but am struggling to convert into a usable format in R for time series analysis.
Any assistance would be gratefully accepted
The data I'm retrieving via Facebook Graph API for likes by country (of Coca-Cola is as follows:
[1] "{\"data\":[{\"id\":\"1542920115951985\/insights\/page_fans_country\/lifetime\",\"name\":\"page_fans_country\",\"period\":\"lifetime\",\"values\":[{\"value\":{\"BR\":17270087,\"US\":13567311,\"MX\":5674950,\"AR\":3616300,\"FR\":3409959,\"IN\":2949669,\"GB\":2670260,\"TH\":2657306,\"IT\":2401621,\"CO\":1946677,\"ID\":1921076,\"EG\":1805233,\"PK\":1665707,\"PH\":1614358,\"TR\":1607936,\"CL\":1504917,\"VN\":1384143,\"DE\":1312448,\"PL\":1201112,\"VE\":1084783,\"CA\":990114,\"RO\":932538,\"EC\":856116,\"PE\":815942,\"ES\":790320,\"AU\":759775,\"MA\":578003,\"TN\":515510,\"RS\":476986,\"NG\":476934,\"PT\":469059,\"MY\":435316,\"BE\":431930,\"ZA\":431509,\"IQ\":354145,\"SE\":352331,\"KE\":342997,\"GR\":333749,\"HU\":333281,\"NL\":330307,\"GT\":326328,\"CR\":304006,\"DZ\":300497,\"PR\":287430,\"DO\":278847},\"end_time\":\"2015-01-01T08:00:00+0000\"},{\"value\":{\"BR\":17270151,\"US\":13566624,\"MX\":5675012,\"AR\":3618242,\"FR\":3409837,\"IN\":2949969,\"GB\":2669934,\"TH\":2658044,\"IT\":2401726,\"CO\":1946797,\"ID\":1921156,\"EG\":1805337,\"PK\":1665824,\"PH\":1614402,\"TR\":1608104,\"CL\":1504979,\"VN\":1384782,\"DE\":1312138,\"PL\":1201212,\"VE\":1084776,\"CA\":990093,\"RO\":932788,\"EC\":856129,\"PE\":816002,\"ES\":790385,\"AU\":759775,\"MA\":578080,\"TN\":518210,\"RS\":477264,\"NG\":476965,\"PT\":469177,\"MY\":435296,\"ZA\":433741,\"BE\":431908,\"IQ\":364228,\"SE\":352267,\"KE\":343007,\"GR\":333771,\"HU\":333312,\"NL\":330232,\"GT\":326513,\"CR\":304021,\"DZ\":300587,\"PR\":287432,\"DO\":278892},\"end_time\":\"2015-01-02T08:00:00+0000\"}],\"title\":\"Lifetime Likes by Country\",\"description\":\"Lifetime: Aggregated Facebook location data, sorted by country, about the people who like your Page. (Unique Users)\"}],\"paging\":{\"previous\":\"https:\/\/graph.facebook.com\/cocacola\/insights\/page_fans_country?access_token=EAACEdEose0cBAMLTB1Ufx44l8Q2hT34jxjmVjONPzhqncAvv985cUXOY6Q9FZBLuL3OM8oLXDPTBroD5DY8SS9ZBd1OIhSAMwjrISQRgWh5kkJVu75Ss7aWESIlKrwBLyLt6VYHUEUlUlmCV72TSQGZBkkOeE4OaZA4gvHIZBngZDZD&since=1419897600&until=1420070400\",\"next\":\"https:\/\/graph.facebook.com\/cocacola\/insights\/page_fans_country?access_token=EAACEdEose0cBAMLTB1Ufx44l8Q2hT34jxjmVjONPzhqncAvv985cUXOY6Q9FZBLuL3OM8oLXDPTBroD5DY8SS9ZBd1OIhSAMwjrISQRgWh5kkJVu75Ss7aWESIlKrwBLyLt6VYHUEUlUlmCV72TSQGZBkkOeE4OaZA4gvHIZBngZDZD&since=1420243200&until=1420416000\"}}"
This data is for two days and the data I need to retrieve start from line 3 (BR has 17270087 fans for Coca-Cola) and ends on line 10 (DO has 278847 fans) plus the date (indicated by end time of 2015-01-01). I then need to repeat the extract for line 12 to line 19 plus the end time of 2015-01-02 for each of the country references. Ideally I also want to capture the Facebook ID on line 2 (1542920115951985) to be able to build a data frame with Facebook ID, Date, Country and Likes in each record.

Related

How to create a mult-line timeline graph of tweets posted

I have a dataset of 56,040 tweets in R called 'tweets' collected in the week following the roe v wade announcement. I'm attempting to analyze using sentiment analysis scores. I have three columns:
'stance' = included for each tweet - includes either 'life' or 'choice' depending on which stance the tweet is taking.
'POSIX' = the timestamp of when the date was posted, currently in YYYY-MM-DD-HH-MM-SS format.
'score' = the sentiment score for each tweet, ranging from around -10 to 10.
I've tried various ways without success and honestly don't know what I'm doing but figure this can't be that difficult. I'm attempting to create a line graph with two lines (one for life and another for choice) over the course of the timeframe (right after midnight on the 22nd until 11:59 on the 3rd) showing the average sentiment score of tweets by hour, controlling for the number of tweets that were sent out at that hour. Any suggestions?
So far I've attempted various ggplot and plotly attempts with no success. Pls help lol

code to sort csv. Using full date (yyyy-mm-dd) to label with different visits (PACE01, PACE02 etc)

I have a csv. I want to load it into R to get the desired outcome (highlighted). For example study id 256 was taken on differing dates (ie. different vists) I want PACE01, PACE02, PACE03 or PACE04 to be added for each recurring visit respectively.
Some years have two visits so I cant just use year. I need to take into account whole date.
Hope this makes sense, I would really appreciate your help. I have 14,000+ samples to sort.
Desired Outcome

Having trouble figuring out how to approach this exercise #R scraping #extracting web data

So, sometimes I need to get some data from the web organizing it into a dataframe and waste a lot of time doing it manually. I've been trying to figure out how to optimize this proccess, and I've tried with some R scraping approaches, but couldn't get to do it right and I thought there could be an easier way to do this, can anyone help me out with this?
Fictional exercise:
Here's a webpage with countries listed by continents: https://simple.wikipedia.org/wiki/List_of_countries_by_continents
Each country name is also a link that leads to another webpage (specific of each country, e.g. https://simple.wikipedia.org/wiki/Angola).
I would like as a final result to get a data frame with number of observations (rows) = number of countries listed and 4 variables (colums) as ID=Country Name, Continent=Continent it belongs to, Language=Official language (from the specific webpage of the Countries) and Population = most recent population count (from the specific webpage of the Countries).
Which steps should I follow in R in order to be able to reach to the final data frame?
This will probably get you most of the way. You'll want to play around with the different nodes and probably do some string manipulation (clean up) after you download what you need.

Extract data for all days for last 30 days from R data frame

I am totally new to R environment and I'm stuck at Date operations. The scenario is, I have a daily database of customer activity of a certain Store, and I need to extract last 30 months data starting from current date.
In other words, suppose today is 18-NOV-2014, I need all the data from 18-OCT-2014 till today in a separate data-frame. To extract it, what kind of iteration logic should I write in R?
You don't need an iteration. What you could do is, assuming your data.frame is called X, and the date column, DATE, you could write:
X$DATE=as.Date(X$DATE, format='%d-%B-%Y')
the 'format' argument is to match your date format you specify in you question. Then, to get the lines you are interested in, something like:
X[X$DATE>=as.Date(today(),format='%d-%B-%Y')-30)]
which is all the lines that are after today - 30 days.
Does this help at all?

Creating New Variables in R that relate to

I have 7 different variable in an excel spreadsheet that I have imported into R. They each are columns with a size of 3331. They are:
'Tribe' - there are 8 of them
'Month' - when the sampling was carried out
'Year' - the year when the sampling was carried out
'ID" - an identifier for each snail
'Weight' - weight of a snail in grams
'Length' - length of a snail shell in millimetres
'Width' - width of a snail shell in millimetres
This is a case where 8 different tribes have been asked to record data on a suspected endangered species of snail to see if they are getting rarer, or changing in size or weight.
This happened at different frequencies between 1993 and 1998.
I would like to know how to be able to create a new variables to the data so that if I entered names(Snails) # then it would list the 7 given variables plus any added variable that I have.
The dataset is limited to the point where I would like to add new variables. Such as, knowing the counts per month of snails in any given month.
This would rely on me using - Tribe,Month,Year and ID. Where if an ID (snail identifier) was were listed according to the rates in any given month then I would be able to sum them to see if there are any changes in counts. I have tried:
count=c(Tribe,Year,Month,ID)
count
But, after doing things like that, R just has a large list of that is 4X the size of the dataset. I would like to be able to create a given new variable that is of column size n=3331.
Or maybe I would like to create a simpler variable so I can see if a tribe collected at any given month. I don't know how I can do this.
I have looked at other forums and searched but, there is nothing that I can see that helps me in my case. I appreciate any help. Thanks
I'm guessing you need to organise your variables in a single structure, such as a data.frame.
See ?data.frame for the help file.
To get you started, you could do something like:
snails <- data.frame(Tribe,Year,Month,ID)
snails
# or for just the first few rows
head(snails)
Then this would have your data looking similar to your Excel file like:
Tribe Year Month ID
1 1 1 1 a
2 2 2 2 b
3 3 3 3 c
<<etc>>
Then if you do names(snails) it will list out your column names.
You could possibly avoid some of this mucking about by just importing your Excel file either directly from Excel, or saving as a csv (comma separated values) file first and then using read.csv("name_of_your_file.csv")
See http://www.statmethods.net/input/importingdata.html for some more specifics on this.
To tabulate your data, you can do things like...
table(snails$Tribe)
...to see the number of snail records collected by each tribe. Or...
table(snails$Tribe,snails$Year)
...to see the trends in each tribe by each year. The $ character will let you access the named variable (column) inside a data.frame in the same way you are currently using the free floating variables. This might seem like more work initially, but it will pay off greatly when you need to do some more involved analysis.
Take for example if you want to only analyse the weights from tribe "1", you could do:
snails$Weight[snails$Tribe==1]
# mean of these weights
mean(snails$Weight[snails$Tribe==1])
There are a lot more things I could explain but you would probably be better served by reading an excellent website like Quick-R here: http://www.statmethods.net/management/index.html to get you doing some more advanced analysis and plotting.

Resources