How do I convert the data stored in google datastore to its original numerical format?
Currently saving data to google datastore through IoT, where the original data looks like:
CO2 438 gm3 3/18/19 at 10:13:48 am
CO2 436 gm3 3/18/19 at 10:12:43 am
CO2 438 gm3 3/18/19 at 10:11:38 am
CO2 438 gm3 3/18/19 at 10:10:33 am
CO2 439 gm3 3/18/19 at 10:09:28 am
CO2 440 gm3 3/18/19 at 10:08:23 am
and the Pub/sub data structure looks like
{ gc_pub_sub_id: '312422947136384',
device_id: '38001c000851363136363935',
event: 'CO2',
**data: <Buffer 34 34 32>,**
published_at: '2019-03-18T09:14:53.711Z' }
the final data looks like
id=5639047607222272 NDQz 38001c000851363136363935 CO2 312423130805644 2019-03-18T09:15:58.764Z
id=5069056390463488 NDQy 38001c000851363136363935 CO2 312422947136384 2019-03-18T09:14:53.711Z
..
and so the data i.e. 438 is converted to NDQz. My question is, how do I convert NDQz to its original numerical format, 438?
The format was Base64 and the stored data was able to be converted to the original numerical format.
Related
I have an unbalanced panel by country as the following:
cname year disability_PC family_PC ... allFunctions_PC
Denmark 1992 953.42 1143.25 ... 9672.43
Denmark 1995 1167.33 1361.62 ... 11002.45
Denmark 2000 1341 1470.54 ... 11200
Finland 1991 1095 955 ... 7164
Finland 1996 1067 1040 ... 7600
And so on for more years and countries. What I would like to do is to compute the mobile indexing for each of the type of social expenditures (disability_PC, family_PC, ... allFunctions_PC).
Therefore, I tried the following:
pdata %>%
group_by(cname) %>%
mutate_at(vars(disability_absPC, family_absPC, Health_absPC, oldage_absPC, unemp_absPC, housing_absPC, allFunctions_absPC),
funs(chg = ((./lag(.))*100)))
The code seems to work, as R reports the first 10 columns and correctly says "with 56 more rows, and 13 more variables". However, these are not added to the data frame. I mean, typing
view(pdata)
the variables are not existing, as if the mutate command did not create these variables.
What am I doing wrong?
Thank you for the support.
We can make this simpler with some of the select_helpers and also the funs is deprecated. In place, we can use the list
library(dplyr)
pdata <- pdata %>%
group_by(cname) %>%
mutate_at(vars(ends_with('absPC')), list(chg = ~ ((./lag(.))*100))
Regarding the issue of not creating the variables, based on the OP's code, the output is not assigned to any object identifier or updated the original object (<-). If it is done, the columns will be created
I am relatively new to R, and am currently trying to implement time series on a data set to predict product volume for next six months. My data set has 2 columns Dates(-timestamp) and volume of product in inventory (on that particular day) for example like this :
Date Volume
24-06-2013 16986
25-06-2013 11438
26-06-2013 3378
27-06-2013 27392
28-06-2013 24666
01-07-2013 52368
02-07-2013 4468
03-07-2013 34744
04-07-2013 19806
05-07-2013 69230
08-07-2013 4618
09-07-2013 7140
10-07-2013 5792
11-07-2013 60130
12-07-2013 10444
15-07-2013 36198
16-07-2013 11268
I need to predict six months of product volume required in inventory after end date(in my data set which is "14-06-2019" "3131076").Approx 6 year of data I am having start date 24-06-2013 and end date 14-06-2019
I tried using auto.arima(R) on my data set and got many errors. I started researching on the ways to make my data suitable for ts analysis and came to know about imputets and zoo packages.
I guess date has high relevance for inputting frequency value in the model so I did this : I created a new column and calculated the frequency of each weekday which is not the same
data1 <- mutate(data, day = weekdays(as.Date(Date)))
> View(data1)
> table(data1$day)
Friday Monday Saturday Sunday Thursday Tuesday Wednesday
213 214 208 207 206 211 212
There are no missing values against dates but we can see from above that count of each week day is not the same, some of the dates are missing, how to proceed with that ?
I have met kind of dead end , tried going through various posts here on impute ts and zoo package but didn't get much success.
Can someone please guide me how to proceed further and pardon me #admins and users if you think its spamming but it is really important for me at the moment. I tried to go through various tutorials on Time series out side but almost all of them have used air passengers data set which I think has no flaws.
Regards
RD
library(imputeTS)
library(dplyr)
library(forecast)
setwd("C:/Users/sittu/Downloads")
data <- read.csv("ts.csv")
str(data)
$ Date : Factor w/ 1471 levels "01-01-2014","01-01-2015",..: 1132 1181 1221 1272 1324 22 71 115 163 213 ...
$ Volume: Factor w/ 1468 levels "0","1002551",..: 379 116 840 706 643 1095 1006 864 501 1254 ...
data$Volume <- as.numeric(data$Volume)
data$Date <- as.Date(data$Date, format = "%d/%m/%Y")
str(data)
'data.frame': 1471 obs. of 2 variables:
$ Date : Date, format: NA NA NA ... ## 1st Error now showing NA instead of dates
$ Volume: num 379 116 840 706 643 ...
Let's try to generate that dataset :
First, let's reproduce a dataset with missing data :
dates <- seq(as.Date("2018-01-01"),as.Date("2018-12-31"),1)
volume <- floor(runif(365, min=2500, max=50000))
dummy_df <- do.call(rbind, Map(data.frame, date=dates, Volume=volume))
df <- dummy_df %>% sample_frac(0.8)
Here we generated a dataframe with Date and volume for the year 2018, with 20%missing data (sample_frac(0.8)).
This should mimic correctly your dataset with missing data for some days.
What we want from there is to find the days with no volume data :
Df_full_dates <- as.data.frame(dates) %>%
left_join(df,by=c('dates'='date'))
Now you want to replace the NA values (that correspond to days with no data) with a volume (I took 0 there but if its missing data, you might want to put the month avg or a specific value, I do not know what suits best your data from your sample) :
Df_full_dates[is.na(Df_full_dates)] <- 0
From there, you have a dataset with data for each day, you should be able to find a model to predict the volume in future months.
Tell me if you have any question
I am doing a multiple part project. To begin with I had a data set which provided the deposits per district over the years. After scrubbing the data set, I was able to create a data frame, which provides the growth of deposits by district. I have growth of deposits by 3 different kinds of institutions - foreign banks, public banks and private banks in 3 different data frames as the # of rows differs in each frame. I have been asked to create 3 maps (heat maps) with deposit growth against each of the kind of banks.
My data frame looks like the attached picture.
I want to make a heat map for the growth column. enter image description here
Thanks.
Maybe I provide some spam by this answer, so delete it without hasitation.
I'll show you how I make some heatmaps in R:
Fake data:
Gene Patient_A Patient_B Patient_C Patient_D
BRCA1 52 46 124 148
TP53 512 487 112 121
FOX3D 841 658 321 364
MAPK1 895 541 198 254
RASA1 785 554 125 69
ADAM18 12 65 85 121
hmcols <- rev(redgreen(2750))
heatmap.2(hm_mx, scale="row", key=TRUE, lhei=c(2,5), symkey="FALSE", density.info="none", trace="none", cexRow=1.1, cexCol=1.1, col=hmcols, dendrogram = "none")
In case of read.table you propably will have to convert data frame to matrix and put first column as a row names to avoid errors from R:
hm <- read.table("hm1.txt", sep = '\t', header=TRUE, stringsAsFactors=FALSE)
row.names(hm) <- hm$Gene
hm_mx <- data.matrix(hm)
hm_mx <- hm_mx[,-c(1)]
I have a csv file with the following data:
Year 1000 Barrels/Day
1/15/2000 239
2/15/2000 267
3/15/2000 162
4/15/2000 264
5/15/2000 170
6/15/2000 210
7/15/2000 264
8/15/2000 405
9/15/2000 352
10/15/2000 337
I ran the following code for it's conversion to timeseries format for processing.
library(xts)
library(forecast)
df<- read.csv("US-OIL.csv")
stocks <- xts(df[,-1], order.by=as.Date(df[,1], "%m/%d/%Y"))
ets(stocks)
But when I run the last line, I get the output with an ETS(A,N,N) model.
I am not sure why this is happening because, when I run ets() with a preloaded dataset elecequip in library(fpp) I get an output with ETS(M,Ad,M)
Not sure why this discrepancy. Please provide your comments in this matter.
You are letting ets automatically choose a model based on AIC, AICcm or BIC. The data is different for the elecquip dataset, so the model is also different.
See slide 24:
http://robjhyndman.com/talks/RevolutionR/6-ETS.pdf
BACKGROUND
I have a list of 16 data frames. A data frame in it looks like this. All the other data frames have the similar format. DateTime column is of Date class while Value column is of time series class
> head(train_data[[1]])
DateTime Value
739 2009-07-31 49.9
740 2009-08-31 53.5
741 2009-09-30 54.4
742 2009-10-31 56.0
743 2009-11-30 54.4
744 2009-12-31 55.3
I am performing forecasting for the Value column across all the data.frames in this list . The following line of code feeds data into UCM model.
train_dataucm <- lapply(train_data, transform, Value = ifelse(Value > 50000 , Value/100000 , Value ))
The transform function is used to reduce large values because UCM has some issues rounding off large values ( I don't know why though ). I just understood that from user #KRC in this link
One data frame got affected because it had large values which got transformed to log values. All the other dataframes remained unaffected.
> head(train_data[[5]])
DateTime Value
715 2009-07-31 139901
716 2009-08-31 139492
717 2009-09-30 138818
718 2009-10-31 138432
719 2009-11-30 138659
720 2009-12-31 138013
I got to know this because I manually checked each one of the 15 data frames
PROBLEM
Is there any function which can call out the data frames which got
affected due to the condition which I inserted?
The function must be able to list down the data frames which got affected and should be able to put them into a list.
If I will be able to do this, then I can apply anti log function on the values and get the actual values.
This way I can give the correct forecasts with minimal human intervention.
I hope I am clear in specifying the problem .
Thank You.
Simply check whether any of your values in a data frame is too high:
has_too_high_values = function (df)
any(df$Value > 50000)
And then collect them, e.g. using Filter:
Filter(has_too_high_values, train_data)