Cannot get head of sparkR data frame

Cannot get head of sparkR data frame - r

I use sparkR to run some sql query to get sparkR data frame like the following
data = sql(sql_query)
And I can get the dimension of data by using dim(data)
However, when I want to take a look at the data by using head(data) it will fail and give me an error java.lang
.ClassCastException: org.apache.hadoop.hive.serde2.io.TimestampWritable cannot be
cast to org.apache.hadoop.io.IntWritable
I tried the sql query in Hive and it doesn't have any problem. The weird thing here is I can get the dimension but cannot get the head.
Any idea?

Try: View(head(data, 20))...also, it would be helpful if you could give a little bit more detail on your question.

Related

object I am trying to take the mean of is not found

All I need to do for this simple assignment, which I have been able to do with no problem before, is take the mean of a column from a .csv dataset uploaded to R.
Here is the code I have:
library(readr)
X1888B_PSet03_Dataset1 <-read_csv("Downloads/188B_PSet03_Dataset1.csv")
View(X188B_PSet03_Dataset1)
mean(FillerAcc)
and this is where I get the error message
object 'FillerAcc' not found
Meanwhile, there is literally a column table in my data called FillerAcc.

Call your column this way.
mean(X1888B_PSet03_Dataset1$FillerAcc)

How to specify the date format of column in Rscript

I have tried this. But its giving me error
Datecreated<-c('Created Time')
I will get this data from cloud using APIs. I need to define the format of this Created Time column as d-mon-yy
For example- 18-Nov-17
How can I achiEve this. I am new to R.
Any help woud be appreciated.

Your screenshot is showing you assign the string ‘created time’ to the variable Datecreated. I’m not sure where or how you’re getting your data but I’m assuming that is your issue. Once you have your data you can use akrun’s answer of as.Date(column, “%d-%b-%y”)
MyDates <- c(“17-Jan-16”, “6-Feb-17”)
FormattedDates <- as.Date(MyDates, “%d-%b-%y”)

View function giving error only on one data.frame

This is my first post so I will try to be specific.
I have imported few .csv files and I am trying to combine them together.
When I am inspecting each individual data frame, as I import them, I can open them in RStudio View window and data looks correct.
However once I combine the data frames together using Master<-do.call("rbind", list(DF1,DF2,DF3,DF4)) and try try to view the Master table i get following massage:
Error in if (nchar(col_min_c) >= 16 || grepl("e", col_min_c, fixed =
TRUE) || : missing value where TRUE/FALSE needed
However, when I view all the original data frames I am able to see them with no problem.
If I use utils::View(Master) I am able to see the data frame.
So I am not sure where this issue comes from.
These are the package I am running:
require(data.table)
require(dplyr)
require(sqldf)
require(ggplot2)
require(stringr)
require(reshape2)
require(bit64)
Thanks for any help you can provide

I was able to get around this issue by transforming my table via:
Master<-sqldf("SELECT * FROM 'Master'")
So I hope this helps others in the case they come across a similar issue in the future.

I was able to view the file if I removed NA values from a long numeric column (19 char) on the far left hand side of the table (column 1)

Why am I getting different output from the Alteryx R tool

I using the Alteryx R Tool to sign an amazon http request. To do so, I need the hmac function that is included in the digest package.
I'm using a text input tool that includes the key and a datestamp.
Key= "foo"
datastamp= "20120215"
Here's the issue. When I run the following script:
the.data <- read.Alteryx("1", mode="data.frame")
write.Alteryx(base64encode(hmac(the.data$key,the.data$datestamp,algo="sha256",raw = TRUE)),1)
I get an incorrect result when compared to when I run the following:
write.Alteryx(base64encode(hmac("foo","20120215",algo="sha256",raw = TRUE)),1)
The difference being when I hardcode the values for the key and object I get the correct result. But if use the variables from the R data frame I get incorrect output.
Does the data frame alter the data in someway. Has anyone come across this when working with the R Tool in Alteryx.
Thanks for your input.

The issue appears to be that when creating the data frame, your character variables are converted to factors. The way to fix this with the data.frame constructor function is
the.data <- data.frame(Key="foo", datestamp="20120215", stringsAsFactors=FALSE)
I haven't used read.Alteryx but I assume it has a similar way of achieving this.
Alternatively, if your data frame has already been created, you can convert the factors back into character:
write.Alteryx(base64encode(hmac(
as.character(the.data$Key),
as.character(the.data$datestamp),
algo="sha256",raw = TRUE)),1)

Subsetting data frame in R after reading in data with scan

I'm reading in data about an HTTP access log. I've got a file with columns for the ip address, year, month, day, hour and requested URL. I read the file in like this:
ipdata = scan(file="sample_r.log", what=list(ip="", year=0, month=0, day=0, hour=0, verb="", url=""))
This seems to work. R-Studio says that ipdata is a list[7] and "names(ipdata)" returns
[1] "ip" "year" "month" "day" "hour" "verb" "url"
So that seems cool. I wanted to do something fun, like graph some data for a specific hour. I tried doing a subset:
s <- subset(ipdata, ipdata$hour==3)
This data looks remarkably different than the first data frame. s is a list[297275] and the following doesn't work right:
> table(ipdata$verb)
GET POST
2870709 1596748
> table(s$verb)
character(0)
Am I going about this the correct way? What I typically do is wrap my data frame in a table() and then barplot or dotplot it. Is R a good way to do this? I want to say "Show me all of the top URLs in hour 3", for example. Or "How many times did this IP address show up per hour?"
Update It looks like by using read.table instead of scan I was able to get a data frame. Apparently scan returns a list of lists or something? Definitely confusing to a n00b like myself but I'm feeling good about it now.

If you ran
dat <- as.data.frame(ipdata)
str(dat)
.... you would probably see that it was pretty much the same as the results of your read.table() operation. read.table is a wrapper for scan and does a lot of formatting and consistency checking.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Cannot get head of sparkR data frame - r

Try: View(head(data, 20))...also, it would be helpful if you could give a little bit more detail on your question.

Related

object I am trying to take the mean of is not found

How to specify the date format of column in Rscript

View function giving error only on one data.frame

Why am I getting different output from the Alteryx R tool

Subsetting data frame in R after reading in data with scan

Categories

Resources