I have a table in Google sheet which I have imported to R with the help of Google sheet package. Number in google sheet table are correct but once I have imported them to R, they have a different structure.
For example, I am analyzing the performance of players and I have data for the minute duration during the game like 93 ( which is that the player 93 minutes). But the data in R are showed like this 9.302000e+01.
Some numbers are correct some are in the format below. I am really struggling to figure it out what can be the problem. I have checked the format in sheets as well as in excel and it is as numeric.
As said, I have a number like in table like 93, 22, 93, 16, 45, 93, 46, 93 ( minutes played)
But in R they are some incorrect and some correct.
for example
93.0000000 2.200000e+01 93.0200000 1.600000e+01 4.500000e+01 93.0200000 46.0000000 9.302000e+01
install.packages("googlesheets")
library(googlesheets)
gs_auth(new_user = TRUE)
be <- gs_title("zilina_player_overview")
My expected result is, of course, to have a table in the format like I have it in google sheets or excel. I mean when a player played 93 minutes I will see there number 93 and not 9.302000e+01.
Any help is welcome, and thanks in advance for any advice
Related
I have a data frame in R Studio cloud that is 85 rows by 207 columns (ultimately it may be as wide as 366 columns). The actual data is created using the code in this thread:
Function to generate multiple rows of vectors in R
Now that I have the data frame set up, I'm trying to use the googlesheets library to transfer it to an existing sheet (and will periodically update it as I get more data). I'm using this code to edit the google sheet:
gs_edit_cells(sheet, ws = "Test", input=data, anchor="A2")
I let it think for a long time (more than an hour) and it sat there with "Range affected by the update" and it never went through. If I slice it into several segments (doing say 30 or 60 or 90 columns at a time) then it does transfer but is prohibitively slow. This isn't a huge dataset, is there something I could improve in my code? I have loaded the sheet in advance and it does recognize it.
I created a Qualtrics project to have subjects rate faces. I uploaded faces to the graphics library. I used Loop & Merge to repeat the same question for different faces.
How can I have a specific subject rate specific faces?
For example, consider a comma-separated-value file (or JavaScript code, or Python code, ...), with the following two lines:
123,31,41,59
124,26,31,41
Subject 123 should rate face 31, face 41, and face 59.
Subject 124 should rate face 26, face 31, and face 41.
(A face could have an alternate name, such a Qualtrics-generated image name.)
Thank you in advance.
Add the faces to your contact list, so your csv contact list upload would look like:
ExternalDataReference,face1,face2,face3
123,31,41,59
124,26,31,41
Then pipe the face variables into your loop & merge setup:
1 ${e://Field/face1}
2 ${e://Field/face2}
3 ${e://Field/face3}
Tearing my hair out on this one. Took me hours just to get rJava up and running (because mac OS X el capitan was not wanting to play nice with Java) in order to load excel-specific data importing packages etc. But in the end this hasn't helped my problem, and I'm just about at my wits end. Please help.
Basic situation is this:
Have simple excel data of time durations, over a span of a couple of years. So the two columns I'm importing are the time(duration) and year(2016,2017 etc).
In Excel the data is formatted as [h]:mm:ss so it displays correctly (data is related to number of hours worked in a month, so typically something like 80:xx:xx ~ 120:xx:xx). I'm aware that in excel, despite the cells being formatted as above, and only showing the relevant period of hours, that in reality excel has appended an (irrelevant, arbitrary) date to this hours data. I have searched and searched and found no way around this limitation in the way excel handles dates/times/durations.
I import this data into R via the "import data -> import from excel data set" menu item in R commander GUI, not the console.
However when importing the data into R, the data displays as a single number e.g. approx. 110 hrs is converted to 4.xxxxx, not as hh:mm:ss. So when running analyses and generating plots etc, instead of the actual (meaningful) 110:xx:xx type data being displayed, a completely meaningless 4.xxxxxx is displayed.
If I change the formatting of the excel cells to display the date as well as the time rather than use the [h]:mm:ss cell formatting, R erroneously interprets the data to something equally useless, like 1901/02/04 05:23 am
I have installed and loaded a variety of packages such as xlsx, XLConnect, lubridate etc but it hasn't made any difference to how R interprets the excel data on import, from the GUI at least.
Please tell me how do I either
a) edit the raw data to a format that R will understand as a time duration (and nothing but a time duration) in hh:mm:ss format, or
b) format the current data from within R after importing, so that it displays the data in the correct way rather than a useless number or arbitrary date/time?
[Please note: I can use the console, when given the commands etc needed to be executed. But I need to find a solution that ultimately will allow the data to be imported and/or manipulated from within the GUI, not from typing a bunch of commands into the console, as the end user (not me) has zero programming ability and cannot use a console, and will only ever be using R via the GUI.]
Is your code importing the data from excel as seconds?
library(lubridate)
duration <- lubridate::as.duration(400000)
as.numeric(duration, "hours")
111.1111
as.numeric(duration, "days")
4.62963
seconds_to_period(400000)
"4d 15H 6M 40S"
I am using a main R function to call a series of R functions from different scripts. In order to reproduce results, I set.seed in the beginning of my main script. In the code, sample() function to randomly select a couple of rows from a dataframe in function_8, and rand() in function_6. So a simple workflow is like below:
### Main R Function
library(dplyr)
set.seed(111)
### Begin calling other R scripts
output_1 <- function_1(...)
...
output_10 <- function_10(...)
### End Main R Function
Recently, I realized that if I make changes to my function_9 which does not contain any randomization. Random numbers generated from in function_8 changes. For example,
sample() in function_8 will get Row 2, 15, 23, 50, 54 before updating function_9.
sample() in function_8 will get Row 23, 44, 50, 95, 98 after updating function_9
However, results can be reproduced by starting a new R session.
So, I am wondering if anyone can give me some suggestions on how to properly set.seed in this situation? THX in advance!
Update
Per a deleted comment, I change the seed number to 123, which produces a set of consistent results. But I appreciate if someone can provide any in-depth explanation!
Maybe the series 111 is just have same character which doesn't change the function 8, you maybe want to generate a time based random seed, Here is a previous answer, that may help you by using system time.
I have not worked with SPSS (.sav) files before and am trying to work with some data files provided to me by importing them into R. I did not receive any explanation of the files, and because communication is difficult I am trying to figure out as much as I can on my own.
Here's my first question. This is what the Date field looks like in an R data frame after import:
> dataset2$Date[1:4]
[1] 13608172800 13608259200 13608345600 13608345600
I don't know what dates the data is supposed to be for, but I found that if I divide the above numbers by 10, that seems to give a reasonable date (in February 2013). Can anyone confirm this is indeed what the above represents?
My second question is regarding another column called Begin_time. Here's what that looks like:
> dataset2$Begin_time[1:4]
[1] 29520 61800 21480 55080
Any idea what this is representing? I want to believe this is some representation of time of day because the records are for wildlife observations, but I haven't got more info than that to try to guess. I noticed that if I take the difference between End_Time and Begin_time I get numbers like 120 and 180, which seems like minutes to me (3 hours seems reasonable to observe a wild animal), but the absolute numbers are far greater than the number of minutes in a day (1440), so that leaves me puzzled. Is this some time keeping format from SPSS? If so, what's the logic?
Unfortunately, I don't have access to SPSS, so any help would be much appreciated.
I had the same problem and this function is a good solution:
pss2date <- function(x) as.Date(x/86400, origin = "1582-10-14")
This is where I found the answer:
http://scs.math.yorku.ca/index.php/R:_Importing_dates_from_SPSS
Dates in SPSS Statistics are represented as floating point doubles holding the number of seconds since Oct 1, 1582. If you use the SPSS R plugin apis, they can be automatically converted to R dates, but any proper converter should be able to do this for you.