what is the "class" parameter in structure()? - r

I am trying to use the structure() function to create a data frame in R.
I saw something like this
structure(mydataframe, class="data.frame")
Where did class come from? I saw someone using it, but it is not listed in the R document.
Is this something programmers learned in another language and carries it over? And it works. I am very confused.
Edit: I realized dput(), is what actually created a data frame looking like this. I got it figured out, cheers!

You probably saw someone using dput. dput is used to post (usually short) data. But normally you would not create a data frame like that. You would normally create it with the data.frame function. See below
> example_df <- data.frame(x=rnorm(3),y=rnorm(3))
> example_df
x y
1 0.2411880 0.6660809
2 -0.5222567 -0.2512656
3 0.3824853 -1.8420050
> dput(example_df)
structure(list(x = c(0.241188014013708, -0.522256746461544, 0.382485333260912
), y = c(0.666080872170054, -0.251265630627216, -1.84200501106852
)), .Names = c("x", "y"), row.names = c(NA, -3L), class = "data.frame")
Then, if someone wants to "copy" your data.frame, he just has to run the following:
> copied_df <- structure(list(x = c(0.241188014013708, -0.522256746461544, 0.382485333260912
+ ), y = c(0.666080872170054, -0.251265630627216, -1.84200501106852
+ )), .Names = c("x", "y"), row.names = c(NA, -3L), class = "data.frame")
I put "copy" in quotes because note the following:
> identical(example_df,copied_df)
[1] FALSE
> all.equal(example_df,copied_df)
[1] TRUE
identical yields false because when you post your dput output, often the numbers get rounded to a certain decimal point.

'class' is not a specific argument to the structure function - that's why you didn't find it in the help file.
structure takes an object and then any number of name/value pairs and sets them as attributes on the object. In this case, class was such an attribute. You can try this to add fictional 'foo' and 'bar' attributes to a vector:
x <- structure(1:3, foo=42, bar='hello')
attributes(x)
#$foo
#[1] 42
#
#$bar
#[1] "hello"
And as Joshua Ulrich and Xu Wang mentioned, you should not create a data.frame like that.

I'm scratching my head, wondering what "R Document" would not have said something about "class". It's a very basic component of the the language and how functions get applied. You should type this and read:
?class
?methods

Related

Rolling Sample standard deviation in R

I wanted to get the standard deviation of the 3 previous row of the data, the present row and the 3 rows after.
This is my attempt:
mutate(ming_STDDEV_SAMP = zoo::rollapply(ming_f, list(c(-3:3)), sd, fill = 0)) %>%
Result
ming_f
ming_STDDEV_SAMP
4.235279667
0.222740262
4.265353
0.463348209
4.350810667
0.442607461
3.864739333
0.375839159
3.935632333
0.213821765
3.802632333
0.243294783
3.718387667
0.051625808
4.288542333
0.242010836
4.134689
0.198929941
3.799883667
0.112733475
This is what I expected:
ming_f
ming_STDDEV_SAMP
4.235279667
0.225532646
4.265353
0.212776157
4.350810667
0.23658801
3.864739333
0.253399417
3.935632333
0.26144862
3.802632333
0.246259684
3.718387667
0.20514358
4.288542333
0.208578409
4.134689
0.208615874
3.799883667
0.233948429
It doesn't match your output exactly, but perhaps this is what you need:
zoo::rollapply(quux$ming_f, 7, FUN=sd, partial=TRUE)
(It also works replacing 7 with list(-3:3).)
This expression isn't really different from your sample code, but the output is correct. Perhaps your original frame has a group_by still applied?
Data
quux <- structure(list(ming_f = c(4.235279667, 4.265353, 4.350810667, 3.864739333, 3.935632333, 3.802632333, 3.718387667, 4.288542333, 4.134689, 3.799883667), ming_STDDEV_SAMP = c(0.225532646, 0.212776157, 0.23658801, 0.253399417, 0.26144862, 0.246259684, 0.20514358, 0.208578409, 0.208615874, 0.233948429)), class = "data.frame", row.names = c(NA, -10L))

Labelling Variables

I have a series of variables that fall under one related question: lets say there are 20 such variables in my dataframe, each one corresponds to an option on a MC question. They are titled popn1, popn2......popn20.
I want to label each variable by its option, as an example: (popn1 = Everyone; popn2=Children)
I'm using the labelVector package.
Is there a way I can do it without writing out each variable name? Ex. is there a paste function I can use, such as
df2 <- Set_label(df1,
(paste0(popn, 1:20) = "Everyone", "Children", .... "Youth"?)
This can be done in base R quite easily. Here's some sample data (using columns instead of 20, to make it easier to view)
popn1 popn2 popn3 popn4 popn5
1 -0.4085141 3.240716 2.730837 6.428722 8.015210
2 3.1378943 2.512700 2.021546 3.333371 5.654401
3 2.4073278 1.475619 2.449742 2.817447 6.295569
It looks like you already have your new column names in a character vector:
your_column_names <- c("Everyone", "Youth", "Someone", "Something", "Somewhere")
Then you just use the setNames argument on the column names for your data:
colnames(data) <- setNames(your_column_names, colnames(data))
Everyone Youth Someone Something Somewhere
1 -0.4085141 3.240716 2.730837 6.428722 8.015210
2 3.1378943 2.512700 2.021546 3.333371 5.654401
3 2.4073278 1.475619 2.449742 2.817447 6.295569
Sample Data:
data <- structure(list(popn1 = c(-0.408514139489243, 3.13789432899688,
2.40732780606037), popn2 = c(3.24071608151551, 2.51269963339946,
1.47561933493116), popn3 = c(2.73083728435832, 2.02154567048998,
2.44974180329751), popn4 = c(6.42872215439841, 3.3333709733048,
2.81744655980154), popn5 = c(8.0152099281755, 5.65440141443164,
6.29556905855252)), class = "data.frame", row.names = c(NA, -3L
))

Class and type of object is different in R. How should I make it consistent?

I downloaded some tweets using 'rtweet' library. Its search_tweets() function creates a list (type) object, while its class is "tbl_df" "tbl" "data.frame". To further work on it, I need to convert this search_tweets() output into a dataframe.
comments <- search_tweets(
queryString, include_rts = FALSE,
n = 18000, type = "recent",
retryonratelimit = FALSE)
typeof(comments)
list
class(comments)
"tbl_df" "tbl" "data.frame"
I tried to convert list into dataframe by using as.data.frame(), that didn't change the type, I also tried wrapping it into as.dataframe(matrix(unlist(comments))), that didn't change the type as well
commentData <- data.frame(comments[,1])
for (column in c(2:ncol(comments))){
commentData <- cbind(commentData, comments[,column])
}
type(comments)
output : list
comments <- as.data.frame(comments)
output : list
Both these codes didn't change the type, but the class. How should I change the type? As, I'd like to store these tweets into a dataframe and consequently write them as csv (write_csv).
As I write the 'comments' to csv, it throws an error.
write_csv(comments, "comments.csv", append = TRUE)
Error: Error in stream_delim_(df, path, ..., bom = bom, quote_escape = quote_escape) :
Don't know how to handle vector of type list.
dput(comments)
dput(comments)
structure(list(user_id = c("1213537010930970624", "770697053538091008",
"39194086", "887369171603931137", "924786826870587392", "110154561",
"110154561", "1110623370389782528", "1201410499788689408", "1208038347735805953",
"15608380", "54892886", "389914405", "432597210", "1196039261125918720"
), status_id = c("1217424480366026753", "1217197024405143552",
"1217057752918392832", "1217022975108616193", "1217002616757997568",
"1216987196714094592", "1216986705170923520", "1216978052472688640",
"1216947780129710080", "1216943924796739585", "1216925375789330432",
"1216925016605880320", "1216924608944734208", "1216921598294249472",
"1214991714688987136"), created_at = structure(c(1579091589,
1579037359, 1579004154, 1578995863, 1578991009, 1578987332, 1578987215,
1578985152, 1578977935, 1578977016, 1578972593, 1578972507, 1578972410,
1578971693, 1578511572), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
screen_name = c("SufferMario", "_Mohammadtausif", "avi_rules16",
"Deb05810220", "SriPappumaharaj", "Poison435", "Poison435",
"RajeshK38457619", "KK77979342", "beingskysharma", "tetisheri",
"sohinichat", "nehadixit123", "panwarsudhir1", "NisarMewati1"
),
desired output in csv
You don't need to do anything. comments is already a data.frame. It just happens to be a special type of data.frame known as a tibble. But you can use them interchangeably. What do you want to do with comments that you currently cannot? It already should do anything a data.frame can do.
The output from typeof() is rarely helpful as it only shows you how the object is stored, not what it is. Use class() to understand how an object behaves. Nearly all "complex" objects in R are stored as lists.

posix time comparison in r not behaving the same in for loop and apply function

Hello i am having an interesting issue with R
When i do :
touchtimepairs = structure(list(v..length.v.. = structure(c(1543323677.254, 1543323678.137, 1543323679.181, 1543323679.918, 1543323680.729, 1543323681.803, 1543323682.523, 1543323682.977,1543323683.519, 1543323684.454), class = c("POSIXct", "POSIXt"), tzone = "CEST"),v.2.length.v.. = structure(c(1543323678.137, 1543323679.181, 1543323679.918, 1543323680.729, 1543323681.803, 1543323682.523, 1543323682.977, 1543323683.519, 1543323684.454, 1543323690.793), class = c("POSIXct", "POSIXt"), tzone = "CEST")), .Names = c("v..length.v..", "v.2.length.v.."), row.names = c(NA, 10L), class = "data.frame")
data = data.frame(a = seq(1,10), b = seq(21,30), posixtime = touchtimepairs[,1])
for(x in seq(nrow(touchtimepairs))){
a = data$[data$posixtime < touchtimepairs[x,2],]
}
it works without a problem i get results back but when i try to use apply
a = apply(touchtimepairs, 1,
function(x) data[data$posixtime < x[2],])
it does not work anymore, I get an empty data frame. The same happens with the subset() command.
Interestingly when i do > instead of < it works !
a = apply(touchtimepairs, 1,
function(x) data[data$posixtime > x[2],])
Then there is another issue:
apply in the case of the > comparison gives another result than the for loop
1951 lines with apply and
1897 with the for loop
can anyone reproduce this behavior?
The posix time has also miliseconds if that is of any interest
Many thanks
If you look at your data inside the apply anonymous function, you'll see the symptom that is causing your trouble.
apply(touchtimepairs, 1, class)
# 1 2 3 4 5 6 7 8 9 10
# "character" "character" "character" "character" "character" "character" "character" "character" "character" "character"
(It should be returning a 2-row matrix with POSIXct and POSIXt.) I should also note that I kept getting warnings about unknown timezone 'CEST'. I fixed it temporarily with attr(touchtimepairs[[1]], "tzone") <- "UTC", though that's just a kludge to stop the warnings on my console. It doesn't fix the problem and might just be my system. :-)
If you are trying to use both columns of touchtimepairs, you have two options:
If you really only need one of touchtimepairs at a time, then lapply will work:
lapply(touchtimepairs[[1]],
function(x) subset(data, posixtime < x))
If you need to use both columns at the same time, use an index on the rows:
lapply(seq_len(nrow(touchtimepairs)),
function(i) subset(data, posixtime < touchtimepairs[i,2]))
(where you'd also reference touchtimepairs[i,1] somehow).
Especially if you are trying to use both columns simultaneously, you can use Map:
Map(function(a, b) subset(data, a < posixtime & posixtime <= b),
touchtimepairs[[1]], touchtimepairs[[2]])
(This does not return anything in your sample data, so either the data is not the best representative sample, or you are not intending to use it in this fashion. Most likely the latter, I'm just guessing :-)
The biggest difference between Map and the *apply family is that it accepts one or more vectors/lists and zips them together. As an example of this "zipper" effect:
Map(func, 1:3, 11:13)
is effectively calling:
func(1, 11)
func(2, 12)
func(3, 13)

get rows from dataframe matching element of a list

Here are one dataframe/tibble and one character element(this element is one column of a tibble)
df1 <- structure(list(Twitter_name = c("CHESHIREKlD", "JellyComons",
"kirmiziburunlu", "erkekdeyimleri", "herosFrance", "IkishanShah"
), Declared_followers = c(60500L, 43100L, 31617L, 27852L, 26312L,
16021L), Real_followers = c(60241, 43054, 31073, 27853, 25736,
15856), Twitter_Id = c("783866366", "1424086592", "2367932244",
"3352977681", "2580703352", "521094407")), .Names = c("Twitter_name",
"Declared_followers", "Real_followers", "Twitter_Id"), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
myId <- c("867211097882804224", "868806957133688832", "549124465","822580282452754432",
"109344546", "482666188", "61716107", "3642392237", "595318933",
"833365943044628480", "1045015087", "859830740669800448", "860562940059045888",
"2854457294", "871784135983067136", "866922354554814464", "4839343547",
"849451474572759040", "872084673526214656", "794841530053853184")
N:B: df1 has been shortened and has indeed 128 observations.
I am looking to test all row elements of df1$Twitter_Id and see if they are in myId. I can run this:
> match(myId[1], df1$Twitter_Id)
but:
it stops at the first occurrence
I need to apply the match() function to all elements of myId.
I can't find a clean and simple way to do this, using lapply() or other functions from dplyr, tydiverse packages.
Thank you for help.
EDIT I need to be more explicit with the whole real case.
myTw <- structure(list(id_str = c("893445199661330433", "893116842558050304",
"892739336466305024", "892401780105019393", "892401594272296963",
"892365572486430720", "891964139756818432")), .Names = "id_str", row.names = c(NA,
-7L), class = c("tbl_df", "tbl", "data.frame"))
these are tweets ID.What I am looking for is to obtain which twitter users have retweeted these ones. To do this, I use the retweeters() function from package twitteR.
library(twitteR)
MyRtw <- retweeters(myTw[1])
MyRtw <- c("889135428028084224", "867211097882804224", "868806957133688832",
"549124465", "822580282452754432", "109344546", "482666188",
"61716107", "3642392237", "595318933", "833365943044628480",
"1045015087", "859830740669800448", "860562940059045888", "2854457294",
"871784135983067136", "866922354554814464", "4839343547", "849451474572759040",
"872084673526214656")
This is a list of Twitter user Id.
Now finally I want to see which users from df1$Twitte_Id have retweeted MyTw[1].
You can use the '%in%' operator.
Edit: Probably this is what you want. Here I used the data posted in your original post (before editing).
matchVector = NULL
for (id in df1$Twitter_Id) {
matchCounter <- sum(myId %in% id)
matchVector <- c(matchVector, matchCounter)
}
df1$numberOfMatches <- matchVector

Resources