Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I am working on a project right now and encountered this problem.
I have a dataset consisting of two dates columns. One, say it x1, stands for check-in dates, the other, say it x2, stands for check-out dates.
Both of them are in the "year-month-day" format and have the type of string.
What I would like to do is figuring out how long does a person stay using check-in, check-out dates. I've tried multiple functions like as.Date. But all failed and I believe I just can't subtract these two dates directly as the results wouldn't represent the actual stay length.
Does anybody have any idea on how to do this in R?
Thanks!
If I understood your question, you want the difference between checkout and checkin? I would try this:
library(lubridate)
df<-data.frame(x1=c("2017-03-23","2017-03-24"),x2=c("2017-03-24","2017-03-28"))
df[]<-lapply(df,ymd)
df$x2-df$x1
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 days ago.
Improve this question
I have a problem with the coding of some variables. I am working on data for Lebanon on R on two different datasets, the World Value Survey and the Arab Barometer. Regardless of the dataset I am using, when I try to code a variable referring only to one country (in this case Lebanon), the values of the variable at the end of the coding are entirely wrong.
I have tried the same coding with other variables and with another dataset, but the problem remains, and the values are still much larger than they should be.
As can be seen from the values in the 'table' command, the values after encoding are very different.
As a beginner, I'm sure my question will be trivial, but I'm asking for help to unblock the situation.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
First off, I have looked at all the examples and previous questions and have not been able to find a usable answer to my situation.
I have a data set of 300ish independent variables I'm trying to bring into R. The variables are all classified as factors. In my csv file I'm uploading, all of the variables are pricing data with two decimal places. I have used the following code and some of the variables have been converted with decimals. However, many of the converted columns are filled with NAs; in fact, some entire columns are completely NAs.
dsl$price = as.numeric(as.factor(dsl$price)) # <- this completely changes the data into something unrecognizablbe
dsl$price = as.numeric(as.character(dsl$price)) # <- lots of NAs or totally NAs
I've tried to recode the variables in the original CSV file to numeric, but with no luck.
Convert the factor into character which can then be converted into numeric
dsl$price <- as.numeric(as.character(dsl$price))
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I have an excel file with a list of emails and channels that collected it. How can I know how many emails per channel are duplicated using R and automate it (every time I import a different file just have to run it and get the results ) ?
Thank you!!
Assuming the "df" dataframe has the relevant variables under the names "channel" and "email", then:
To get the number of unique channel-email pairs:
dim(unique(df[c("channel", "email")]))[1]
To get the sum of all channel-email observations:
sum(table(df$channel, df$email))
To get the number of duplicates, simply subtract the former from the later:
sum(table(df$channel, df$email)) - dim(unique(df[c("channel", "email")]))[1]
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I am trying to convert factors from a data-frame to numeric using the commands
data[] <- lapply (data, function(x) as.numeric(as.character(x))
But it keeps asking me for more coding. What am I doing wrong?
The data-frame is named data and it consists of 50 rows and 2 columns. Will this command change every variable in numeric right? Or shall I do something else?
screenshot after using 'dput' at http://imgur.com/Sde9QSk.png
Shouldn't you add ) at the end of your code?
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
Currently I have a column with time in the format yyyy-mm-dd hh:mm:ss, (eg. 2015-10-10 04:10:45) and I wish to extract the hour possibly using as.POSIXlt(x)$hour where x is my column.
Unfortunately, this function is returning a vector full of 0's, but if I do something like as.POSIXlt("2015-10-10 04:10:45")$hour I receive 4 which is what I want.
How can I do this with the whole column?
I was just doing the exact same thing on my dataset...
format(as.POSIXct(df$datetime, format="%Y-%m-%d %H:%M:%S"), format="%H:%M:%S")
#[1] "04:10:45"