Using data.tree package in R - r

I am trying to adapt the code found here to import employee data from an Excel file into a data frame and then use the as.node function from the data.tree package.
This is the code I have written so far
library(data.tree)
library(readxl)
baseframe <- read_excel("Test Emplist.xlsx")
baseframe$pathstring <- paste("CompanyName",
baseframe$LastName,
baseframe$FirstName,
sep = "/")
stafflist <- as.Node(baseframe)
The data frame is being created successfully. Below is the dput representation
> dput(head(baseframe))
structure(list(LastName = c("Vasa", "Vasa", "Pras", "Tang", "Sing",
"Vats"), FirstName = c("Evan", "Koma", "Shil", "Hand", "Smri",
"Saur"), pathstring = c("CompanyName/Vasa/Evan", "CompanyName/Vasa/Koma",
"CompanyName/Pras/Shil", "CompanyName/Tang/Hand", "CompanyName/Sing/Smri",
"CompanyName/Vats/Saur")), .Names = c("LastName", "FirstName",
"pathstring"), row.names = c(NA, 6L), class = c("tbl_df", "tbl",
"data.frame"))
but when I get to the line stafflist <- as.Node(baseframe) I am getting an error message saying
Error in strsplit(mypath, pathDelimiter, fixed = TRUE :
non-character argument
I'm guessing the as.node function calls another function called strsplit somewhere. I have tried running the function myself as so
strsplit(baseframe$pathstring, "/", fixed = TRUE)
which is running no problem. I'm not sure why the as.node function is throwing the error?

Related

Error in if (class(x) == "numeric") { : the condition has length > 1

I`m trying to visualise data of the following form:
date volaEUROSTOXX volaSA volaKENYA25 volaNAM volaNIGERIA
1 10feb2012 0.29844454 0.1675901 0.007862087 0.12084170 0.10247617
2 17feb2012 0.31811157 0.2260064 0.157017220 0.33648935 0.22584127
3 24feb2012 0.30013672 0.1039974 0.083863921 0.11694768 0.16388161
To do so, I first converted the date (stored as a character in the original data frame) into a date-format. Which works just fine:
vola$date <- as.Date(vola$date)
str(vola$date)
Date[1:543], format: "2012-02-10" "2012-02-17" "2012-02-24" "2012-03-02" "2012-03-09"
However, if I now try to graph my data by using the chart.TimeSeries command, I get the following:
chart.TimeSeries(volatility_annul_stringdate,lwd=2,auto.grid=F,ylab="Annualized Log Volatility",xlab="Time",
main="Log Volatility",lty=1,
legend.loc="topright")
Error in if (class(x) == "numeric") { : the condition has length > 1
I tried:
Converting my date variable (in the date format) further into a time series object:
vola$date <- ts(vola$date, frequency=52, start=c(2012,9)) #returned same error from above
Converting the whole data set using its-command:
vol.xts <- xts(vola, order.by= vola$date, unique = TRUE ) # which then returned:
order.by requires an appropriate time-based object
#even though date is a time-series
What am I doing wrong? I am rather new to RStudio.. I really want to use the chart.TimeSeries command. Can someone help me?
Thanks in advance!
My MRE:
library(PerformanceAnalytics)
vola <- structure(list(date_2 = c("2012-02-10", "2012-02-17", "2012-02-24",
"2012-03-02"), volaEUROSTOXX = c(0.298444539308548, 0.318111568689346,
0.300136715173721, 0.299697518348694), volaKENYA25 = c(0.00786208733916283,
0.157017216086388, 0.0838639214634895, 0.152377054095268), volaNAM = c(0.120841704308987,
0.336489349603653, 0.116947680711746, 0.157027021050453), volaNIGERIA = c(0.102476172149181,
0.225841268897057, 0.163881614804268, 0.317349642515182), volaSA = c(0.167590111494064,
0.226006388664246, 0.103997424244881, 0.193037077784538), date = structure(c(1328832000,
1329436800, 1330041600, 1330646400), tzone = "UTC", class = c("POSIXct",
"POSIXt"))), row.names = c(NA, -4L), class = c("tbl_df", "tbl",
"data.frame"))
vola <- subset(vola, select = -c(date))
vola$date_2 <- as.Date(vola$date_2)
chart.TimeSeries(vola,lwd=2,auto.grid=F,ylab="Annualized Log Volatility",xlab="Time",
main="Log Volatility",lty=1,
legend.loc="topright")
#This returns the above mentioned error message.
#Thus, I tried the following:
vola$date_2 <- ts(vola$date_2, frequency=52, start=c(2012,9))
chart.TimeSeries(vola,lwd=2,auto.grid=F,ylab="Annualized Log Volatility",xlab="Time",
main="Log Volatility",lty=1,
legend.loc="topright")
#Which returned a different error (as described above)
#And I tried:
vol.xts <- xts(vola, order.by= vola$date_2, unique = TRUE )
#This also returned an error message.
#My intention was to then run:
#chart.TimeSeries(vol.xts,lwd=2,auto.grid=F,ylab="Annualized Log Volatility",xlab="Time",
main="Log Volatility",lty=1,
legend.loc="topright")
The documentation of PerformanceAnalytics::chart.TimeSeries is a bit vague. The issue is that when passing a dataframe you have to set the dates as row.names. To this end I first converted your data (which is a tibble) to a data.frame. Afterwards I add the dates as rownames and drop the date column:
library(PerformanceAnalytics)
vola <- as.data.frame(vola)
vola <- subset(vola, select = -c(date))
row.names(vola) <- as.Date(vola$date_2)
vola$date_2 <- NULL
chart.TimeSeries(vola,
lwd = 2, auto.grid = F, ylab = "Annualized Log Volatility", xlab = "Time",
main = "Log Volatility", lty = 1,
legend.loc = "topright"
)

Class and type of object is different in R. How should I make it consistent?

I downloaded some tweets using 'rtweet' library. Its search_tweets() function creates a list (type) object, while its class is "tbl_df" "tbl" "data.frame". To further work on it, I need to convert this search_tweets() output into a dataframe.
comments <- search_tweets(
queryString, include_rts = FALSE,
n = 18000, type = "recent",
retryonratelimit = FALSE)
typeof(comments)
list
class(comments)
"tbl_df" "tbl" "data.frame"
I tried to convert list into dataframe by using as.data.frame(), that didn't change the type, I also tried wrapping it into as.dataframe(matrix(unlist(comments))), that didn't change the type as well
commentData <- data.frame(comments[,1])
for (column in c(2:ncol(comments))){
commentData <- cbind(commentData, comments[,column])
}
type(comments)
output : list
comments <- as.data.frame(comments)
output : list
Both these codes didn't change the type, but the class. How should I change the type? As, I'd like to store these tweets into a dataframe and consequently write them as csv (write_csv).
As I write the 'comments' to csv, it throws an error.
write_csv(comments, "comments.csv", append = TRUE)
Error: Error in stream_delim_(df, path, ..., bom = bom, quote_escape = quote_escape) :
Don't know how to handle vector of type list.
dput(comments)
dput(comments)
structure(list(user_id = c("1213537010930970624", "770697053538091008",
"39194086", "887369171603931137", "924786826870587392", "110154561",
"110154561", "1110623370389782528", "1201410499788689408", "1208038347735805953",
"15608380", "54892886", "389914405", "432597210", "1196039261125918720"
), status_id = c("1217424480366026753", "1217197024405143552",
"1217057752918392832", "1217022975108616193", "1217002616757997568",
"1216987196714094592", "1216986705170923520", "1216978052472688640",
"1216947780129710080", "1216943924796739585", "1216925375789330432",
"1216925016605880320", "1216924608944734208", "1216921598294249472",
"1214991714688987136"), created_at = structure(c(1579091589,
1579037359, 1579004154, 1578995863, 1578991009, 1578987332, 1578987215,
1578985152, 1578977935, 1578977016, 1578972593, 1578972507, 1578972410,
1578971693, 1578511572), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
screen_name = c("SufferMario", "_Mohammadtausif", "avi_rules16",
"Deb05810220", "SriPappumaharaj", "Poison435", "Poison435",
"RajeshK38457619", "KK77979342", "beingskysharma", "tetisheri",
"sohinichat", "nehadixit123", "panwarsudhir1", "NisarMewati1"
),
desired output in csv
You don't need to do anything. comments is already a data.frame. It just happens to be a special type of data.frame known as a tibble. But you can use them interchangeably. What do you want to do with comments that you currently cannot? It already should do anything a data.frame can do.
The output from typeof() is rarely helpful as it only shows you how the object is stored, not what it is. Use class() to understand how an object behaves. Nearly all "complex" objects in R are stored as lists.

Use of purrr's "modify_if" with a function

I'm trying to apply the discretize_rgr function (here) of the package funModeling to multiple columns of a dataframe.
For a single column, it is working for me in this way:
discretize_rgr(input = df.div$to_be_discretized, target = df.div$TARGET, max_n_bins=10)
So, I'm trying to use the purrr package to manage multiple columns in this way:
df.div %>%
modify_if( is.numeric, ~ discretize_rgr(., target = df.div$TARGET, max_n_bins=10))
but I'm get the following error:
Error in order(fpoints_top) : argument 1 is not a vector
What's wrong?
UPDATE (example data)
structure(list(to_be_discretized = c(0.0152096300012854, 0.0132660373578711,
0.014699121782711, 0.0157102877064037, 0.0197417484744586, 0.019651999420645
), TARGET = c(27136, 30048, 34840, 138812, 191088, 240370)), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L))

get rows from dataframe matching element of a list

Here are one dataframe/tibble and one character element(this element is one column of a tibble)
df1 <- structure(list(Twitter_name = c("CHESHIREKlD", "JellyComons",
"kirmiziburunlu", "erkekdeyimleri", "herosFrance", "IkishanShah"
), Declared_followers = c(60500L, 43100L, 31617L, 27852L, 26312L,
16021L), Real_followers = c(60241, 43054, 31073, 27853, 25736,
15856), Twitter_Id = c("783866366", "1424086592", "2367932244",
"3352977681", "2580703352", "521094407")), .Names = c("Twitter_name",
"Declared_followers", "Real_followers", "Twitter_Id"), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
myId <- c("867211097882804224", "868806957133688832", "549124465","822580282452754432",
"109344546", "482666188", "61716107", "3642392237", "595318933",
"833365943044628480", "1045015087", "859830740669800448", "860562940059045888",
"2854457294", "871784135983067136", "866922354554814464", "4839343547",
"849451474572759040", "872084673526214656", "794841530053853184")
N:B: df1 has been shortened and has indeed 128 observations.
I am looking to test all row elements of df1$Twitter_Id and see if they are in myId. I can run this:
> match(myId[1], df1$Twitter_Id)
but:
it stops at the first occurrence
I need to apply the match() function to all elements of myId.
I can't find a clean and simple way to do this, using lapply() or other functions from dplyr, tydiverse packages.
Thank you for help.
EDIT I need to be more explicit with the whole real case.
myTw <- structure(list(id_str = c("893445199661330433", "893116842558050304",
"892739336466305024", "892401780105019393", "892401594272296963",
"892365572486430720", "891964139756818432")), .Names = "id_str", row.names = c(NA,
-7L), class = c("tbl_df", "tbl", "data.frame"))
these are tweets ID.What I am looking for is to obtain which twitter users have retweeted these ones. To do this, I use the retweeters() function from package twitteR.
library(twitteR)
MyRtw <- retweeters(myTw[1])
MyRtw <- c("889135428028084224", "867211097882804224", "868806957133688832",
"549124465", "822580282452754432", "109344546", "482666188",
"61716107", "3642392237", "595318933", "833365943044628480",
"1045015087", "859830740669800448", "860562940059045888", "2854457294",
"871784135983067136", "866922354554814464", "4839343547", "849451474572759040",
"872084673526214656")
This is a list of Twitter user Id.
Now finally I want to see which users from df1$Twitte_Id have retweeted MyTw[1].
You can use the '%in%' operator.
Edit: Probably this is what you want. Here I used the data posted in your original post (before editing).
matchVector = NULL
for (id in df1$Twitter_Id) {
matchCounter <- sum(myId %in% id)
matchVector <- c(matchVector, matchCounter)
}
df1$numberOfMatches <- matchVector

what is the "class" parameter in structure()?

I am trying to use the structure() function to create a data frame in R.
I saw something like this
structure(mydataframe, class="data.frame")
Where did class come from? I saw someone using it, but it is not listed in the R document.
Is this something programmers learned in another language and carries it over? And it works. I am very confused.
Edit: I realized dput(), is what actually created a data frame looking like this. I got it figured out, cheers!
You probably saw someone using dput. dput is used to post (usually short) data. But normally you would not create a data frame like that. You would normally create it with the data.frame function. See below
> example_df <- data.frame(x=rnorm(3),y=rnorm(3))
> example_df
x y
1 0.2411880 0.6660809
2 -0.5222567 -0.2512656
3 0.3824853 -1.8420050
> dput(example_df)
structure(list(x = c(0.241188014013708, -0.522256746461544, 0.382485333260912
), y = c(0.666080872170054, -0.251265630627216, -1.84200501106852
)), .Names = c("x", "y"), row.names = c(NA, -3L), class = "data.frame")
Then, if someone wants to "copy" your data.frame, he just has to run the following:
> copied_df <- structure(list(x = c(0.241188014013708, -0.522256746461544, 0.382485333260912
+ ), y = c(0.666080872170054, -0.251265630627216, -1.84200501106852
+ )), .Names = c("x", "y"), row.names = c(NA, -3L), class = "data.frame")
I put "copy" in quotes because note the following:
> identical(example_df,copied_df)
[1] FALSE
> all.equal(example_df,copied_df)
[1] TRUE
identical yields false because when you post your dput output, often the numbers get rounded to a certain decimal point.
'class' is not a specific argument to the structure function - that's why you didn't find it in the help file.
structure takes an object and then any number of name/value pairs and sets them as attributes on the object. In this case, class was such an attribute. You can try this to add fictional 'foo' and 'bar' attributes to a vector:
x <- structure(1:3, foo=42, bar='hello')
attributes(x)
#$foo
#[1] 42
#
#$bar
#[1] "hello"
And as Joshua Ulrich and Xu Wang mentioned, you should not create a data.frame like that.
I'm scratching my head, wondering what "R Document" would not have said something about "class". It's a very basic component of the the language and how functions get applied. You should type this and read:
?class
?methods

Resources