Converting factors into numeric format with signs in R - r

Let, I have such a dataframe(df) where each elements are factors:
df
---
+100.5
+120.2
-30.0
+75.0
-600.3
How can I convert df into a numric df using R? I ill be very glad for any help. Thanks a lot.

The conversion from factors to numerical values is sometimes complicated, and I think that it is usually necessary to convert the factors first into characters, and then into numerical values.
This should work:
df_n <- as.data.frame(as.numeric(as.character(df[,1])))
colnames(df_n) <- "df_n"
head(df_n)
# df_n
#1 100.5
#2 120.2
#3 -30.0
#4 75.0
#5 -600.3
class(df_n[,1])
#[1] "numeric"
data
df <- structure(list(df = structure(c(4L, 5L, 2L, 3L, 1L),
.Label = c("-600.3", "-30", "75", "100.5", "120.2"),
class = "factor")), .Names = "df",
row.names = c(NA, -5L), class = "data.frame")
Hope this helps.

Related

Plot a frequency table in R

I have a table with the results of a survey. I mean, I don't have the full survey data, only the frequency tables, Likert scale counts. Is it possible to graph this table in R?
The table looks like this:
Question 1: (response is a factor)
response count
1 5
2 6
3 2
4 2
5 1
It's very easy to do it in Excel, but i can't in R. The only thing I could think of was to repeat the table values ​​based on the count, but there must be a simpler way...
A histogram-like chart from plot
plot(df,type = "h")
Try this ggplot2 approach. You can set your response as x variable and count as y variable and use geom_col() in order to display bars. Here the code:
library(ggplot2)
#Plot
ggplot(df,aes(x=factor(response),y=count))+
geom_col(color='black',fill='cyan3')+
xlab('Response')
Output:
Some data used:
#Data
df <- structure(list(response = 1:5, count = c(5L, 6L, 2L, 2L, 1L)), class = "data.frame", row.names = c(NA,
-5L))
If we wanted to do this without having to install any package, use the base R methods with either a named vector in a single-line
barplot(setNames(df$count, df$response))
Or with the formula method for data.frame
barplot(count ~ response, df)
-output
data
df <- structure(list(response = 1:5, count = c(5L, 6L, 2L, 2L, 1L)),
class = "data.frame", row.names = c(NA,
-5L))

how to extract the specific words from a string in R?

how can I extract "7-9", "2-5" and "2-8", then paste to new column as event_time?
event_details
2.9(S) 7-9 street【Train】#2097
2.1(S) 2-5 street【Train】#2012
2.2(S) 2-8A TBC【Train】#202
You haven't really shared the logic to extract the numbers but based on limited data that you have shared we can do :
df$new_col <- sub('.*(\\d+-\\d+).*', '\\1', df$event_details)
df
# event_details new_col
#1 2.9(S) 7-9 street【Train】 7-9
#2 2.1(S) 2-5 street【Train】 2-5
#3 2.2(S) 2-8A TBC【Train】 2-8
Or same using str_extract
df$new_col <- stringr::str_extract(df$event_details, "\\d+-\\d+")
data
df <- structure(list(event_details = structure(c(3L, 1L, 2L),
.Label = c("2.1(S) 2-5 street【Train】",
"2.2(S) 2-8A TBC【Train】", "2.9(S) 7-9 street【Train】"), class =
"factor")), class = "data.frame", row.names = c(NA, -3L))

remove double quotes from factors in a dataframe

I got a dataframe to work on where I have a bunch of variables as factors in quotation marks like ""x1"".
str(df) gives me something like this:
$ x : Factor w/ 10 Levels "\"\"x1\"\"",..: 1 7 9 ...
I tried to get rid of the quotation marks with the gsub() function but that didn´t work. Probably because I don´t know what to insert as pattern? Would be great if somebody can solve this puzzle and maybe explain to me if the "\"\"x1\"\"" is the solution to this?
An example for the dataframe would look like this:
structure(list(Sent = structure(c(2L, 2L, 2L, 2L, 2L), .Label = c("\"\"Opted out\"\"",
"\"\"Yes\"\""), class = "factor"), Responded = structure(c(2L,
2L, 2L, 2L, 2L), .Label = c("\"\"Complete\"\"", "\"\"No\"\"",
"\"\"Partial\"\""), class = "factor")), row.names = c(NA, -5L
), class = c("tbl_df", "tbl", "data.frame"), .Names = c("Sent",
"Responded"))
Thanks in advance!
vec = c('""x1""', '""x2""', '""x3""')
vec = factor(vec)
levels(vec) <- gsub('["\\]', "", levels(vec))
#> vec
#[1] x1 x2 x3
#Levels: x1 x2 x3
See how I would use ' as wrapper, when I want to use " inside a string.
Another problem it didn't work for you was probably because you didn't use the levels attribute but rather the factor variable itself.
Factor variables are internally stored as 1, 2, 3,... numbers.
As you now have provided data, you can use: (df1 being your data with the factor columns)
df1[] <- lapply(df1, function(vec){ levels(vec) <- gsub('["\\]',"",levels(vec)); vec})

Error in setting up and cleaning a dataframe R

I am attempting to generate out of sample predictions and am getting this message after running the following code Error: variable 'dummygen' was fitted with type "numeric" but type "factor" was supplied.
I checked the str to verify that the two variables I am using are both numeric and they appear to be. I did a bunch of hunting around on here and think this might be somewhat related, but I haven't been able to get the suggestions to work.
Here is the code I have so far.
library(foreign)
library(plyr)
library(rvest)
library(stringi)
library(purrr)
library(XLConnect)
library(splitstackshape)
library(tidyr)
library(dplyr)
donner_raw <- read.csv("donner.txt", sep="\t", header = FALSE)
colnames(donner_raw) <- c("age_gen", "survive")
donner_raw <- separate(donner_raw, age_gen, into = c("age", "gender"), "(?<=\\d)(?=[A-Za-z])")
logit <- glm(survive ~ age + dummygen,family=binomial(link='logit'),data=donner_raw)
newlogit <- data.frame(age=seq(1,6, length=20), dummygen=("0"))
ooslogit <- predict.glm(logit, newlogit, se.fit=TRUE)
I'm not sure where in the process of what I've done I messed up. Here is a reproducible part of the data.
dput(droplevels(head(donner_raw)))
structure(list(age = structure(c(6L, 4L, 5L, 3L, 2L, 1L), .Label = c("13", "3", "4", "45", "6", "60"), class = "factor"), gender = c("M", "F", "F", "F", "F", "F"), dummygen = structure(c(2L, 1L, 1L, 1L, 1L, 1L), .Label = c("0", "1"), class = "factor")), .Names = c("age", "gender", "survive", "dummygen"), row.names = c(NA, 6L), class = "data.frame")
Let's simply read and think about the error message:
Error: variable 'dummygen' was fitted with type "numeric" but type "factor" was supplied
This error occurs after the line:
ooslogit <- predict.glm(logit, newlogit, se.fit=TRUE)
(Presumably, at least, because you're question isn't very clear about this and provides lots of code that doesn't seem related.)
So R is telling you that when the model was fit the variable dummygen was numeric, but now you've given it a factor.
So let's look:
str(newlogit)
'data.frame': 20 obs. of 2 variables:
$ age : num 1 1.26 1.53 1.79 2.05 ...
$ dummygen: Factor w/ 1 level "0": 1 1 1 1 1 1 1 1 1 1 ...
Yep!
So your problem was that you inexplicably created the data frame newlogit by specifying:
newlogit <- data.frame(age=seq(1,6, length=20), dummygen=("0"))
which clearly specifies that the variable dummygen is not going to be numeric. Just convert it back, or remove the quotes in the first place. For example:
newlogit <- data.frame(age=seq(1,6, length=20), dummygen= 0)
or
newlogit$dummygen <- as.numeric(newlogit$dummygen)

Compare dataframe column to another dataframe column

I have a dataframe column containing page paths (let's call it A):
pagePath
/text/other_text/123-string1-4571/text.html
/text/other_text/string2/15-some_other_txet.html
/text/other_text/25189-string3/45112-text.html
/text/other_text/text/string4/5418874-some_other_txet.html
/text/other_text/string5/text/some_other_txet-4157/text.html
/text/other_text/123-text-4571/text.html
/text/other_text/125-text-471/text.html
And I have another string dataframe column let's call it (B) (the two dataframes are different and they don't have the same number of rows).
Here's an example of my column in dataframe B:
names
string1
string11
string4
string3
string2
string10
string5
string100
What I want to do is to check if my page paths (A) are containing strings from my other dataframe (B).
I had difficulties because my two dataframes haven't the same length and the data are unorganized.
EXPECTED OUTPUT
I want to have this output as a result:
pagePath names exist
/text/other_text/123-string1-4571/text.html string1 TRUE
/text/other_text/string2/15-some_other_txet.html string2 TRUE
/text/other_text/25189-string3/45112-text.html string3 TRUE
/text/other_text/text/string4/5418874-some_other_txet.html string4 TRUE
/text/string5/text/some_other_txet-4157/text.html string5 TRUE
/text/other_text/123-text-4571/text.html NA FALSE
/text/other_text/125-text-471/text.html NA FALSE
If my question needs more clarification, please mention this.
We can generate the exist column with grepl()
# Collapse B$names into one string with "|"
onestring <- paste(B$names, collapse = "|")
# Generate new column
A$exist <- grepl(onestring, A$pagePath)
Not that nice, since containing a for loop:
names <- rep(NA, length(A$pagePath))
exist <- rep(FALSE, length(A$pagePath))
for (name in B$names) {
names[grep(name, A$pagePath)] <- name
exist[grep(name, A$pagePath)] <- TRUE
}
We can use str_extract_all from stringr package but NA are replaced with character(0) so we have to change it
df$names <- as.character(str_extract_all(df$pagePath, "string[0-9]+"))
df$exist <- df$names %in% df1$names
df[df=="character(0)"] <- NA
df
# pagePath names exist
#1 /text/other_text/123-string1-4571/text.html string1 TRUE
#2 /text/other_text/string2/15-some_other_txet.html string2 TRUE
#3 /text/other_text/25189-string3/45112-text.html string3 TRUE
#4 /text/other_text/text/string4/5418874-some_other_txet.html string4 TRUE
#5 /text/other_text/string5/text/some_other_txet-4157/text.html string5 TRUE
#6 /text/other_text/123-text-4571/text.html <NA> FALSE
#7 /text/other_text/125-text-471/text.html <NA> FALSE
DATA
dput(df)
structure(list(pagePath = structure(c(1L, 5L, 4L, 7L, 6L, 2L,
3L), .Label = c("/text/other_text/123-string1-4571/text.html",
"/text/other_text/123-text-4571/text.html", "/text/other_text/125-text-471/text.html",
"/text/other_text/25189-string3/45112-text.html", "/text/other_text/string2/15-some_other_txet.html",
"/text/other_text/string5/text/some_other_txet-4157/text.html",
"/text/other_text/text/string4/5418874-some_other_txet.html"), class = "factor")), .Names = "pagePath", class = "data.frame", row.names = c(NA,
-7L))
dput(df1)
structure(list(names = structure(c(1L, 4L, 7L, 6L, 5L, 2L, 8L,
3L), .Label = c("string1", "string10", "string100", "string11",
"string2", "string3", "string4", "string5"), class = "factor")), .Names = "names", class = "data.frame", row.names = c(NA,
-8L))
Here is one way using apply:
df$exist <- apply( df,1,function(x){as.logical(grepl(x[2],x[1]))} )

Resources