R - How to extract an element from a single column data frame? - r

I have a data frame and need to access the 1st row in the 1st column (Negative=16)
[[1]]
data
Negative 16
Neutral 36
Positive 28
Very Negative 7
Very Positive 19
List of 1
$ :'data.frame': 5 obs. of 1 variable:
..$ data: int [1:5] 16 36 28 7 19
I have tried the following:
x(1,1)
# Error in x(1, 1) : could not find function "x"
x[1,1]
# Error in x[1, 1] : incorrect number of dimensions
x['Negative',1]
# Error in x["Negative", 1] : incorrect number of dimensions
x['Negative']
# $<NA>
# NULL

You can read only the first column from a data frame like this:
x <- df[1,, drop = FALSE]

Related

Convert comma separated decimals from character to numeric

For my exam i have to build some scatter plots in r. I created a data frame with 4 variables. with this data frame i want to add regression lines to my scatter plots.
the name of my data frame is "alle".
variable names are: demo, tot, besch, usd
with this code i tried to line the regression line but got following result:
reg1<- lm(tot~demo, data=alle)
Warning messages:
1: In model.response(mf, "numeric") :
using type = "numeric" with a factor response will be ignored
2: In Ops.factor(y, z$residuals) : ‘-’ not meaningful for factors
here is the structure of "alle"
str(alle)
'data.frame': 11 obs. of 4 variables:
$ demo : chr "498.300.775" "500.297.033" "502.090.235" "503.170.618" ...
$ tot : Factor w/ 11 levels "4.846.423","4.871.049",..: 1 3 4 5 2 8 7 6 10 9 ...
$ besch: Factor w/ 9 levels "68,4","68,6",..: 5 7 3 2 2 1 1 4 6 8 ...
$ usd : Factor w/ 44 levels "0,68434","0,72584",..: 26 30 29 23 28 22 24 25 15 14 ...
Tried to convert column "demo" to numeric with
alle$demo <- as.numeric(as.character(alle$demo))
it converted the column to numeric but now the rows are full with "NA"s.
I think that i all columns must be numeric.
How can I convert all 4 columns to numeric and finally plot the regression lines.
Data:
> head(alle,6)
demo tot besch usd
1 498.300.775 4.846.423 69,8 1,3705
2 500.297.033 4.891.934 70,3 1,4708
3 502.090.235 4.901.358 69,0 1,3948
4 503.170.618 4.906.313 68,6 1,3257
5 502.964.837 4.871.049 68,6 1,3920
6 504.047.964 5.010.371 68,4 1,2848
thanks
Try doing it in two steps. First get rid of the dots, then replace the commas by decimal points and coerce to numeric.
alle[] <- lapply(alle, function(x) gsub("\\.", "", x))
alle[] <- lapply(alle, function(x) as.numeric(sub(",", ".", x)))
Note:
The above solution is broken in two for readability. The following does the same but it takes just one lapply loop and should therefore be faster if the dataset is big. If the dataset is small to medium, maybe the two steps solutions is preferable.
alle[] <- lapply(alle, function(x){
as.numeric(sub(",", ".", gsub("\\.", "", x)))
})
With dplyr:
library(dplyr)
alle %>%
mutate_all(as.character) %>%
mutate_at(c("besch","usd"),function(x) as.numeric(as.character(gsub(",",".",x)))) ->alle
demo tot besch usd
1 498.300.775 4.846.423 69.8 1.3705
2 500.297.033 4.891.934 70.3 1.4708
3 502.090.235 4.901.358 69.0 1.3948
4 503.170.618 4.906.313 68.6 1.3257
5 502.964.837 4.871.049 68.6 1.3920
6 504.047.964 5.010.371 68.4 1.2848

"Number of observations <= number of random effects" error

I am using a package called diagmeta for meta-analysis purposes. I can use this package with a built in data set called Schneider2017. However when I make my own database/data set I get the following error:
Error: number of observations (=300) <= number of random effects (=3074) for term (Group * Cutoff | Study); the random-effects parameters and the residual variance (or scale parameter) are probably unidentifiable
Another thread here on SO suggests the error is caused by the data format of one or more columns. I have made sure every column's data type matches that in the Schneider2017 dataset - no effect.
Link to the other thread
I have tried extracting all of the data from the Schneider2017 dataset into excel and then importing a dataset from Excel through R studio. This again makes no difference. This suggests to me that something in the data format could be different, although I can't see how.
diag2 <- diagmeta(tpos, fpos, tneg, fneg, cutpoint,
studlab = paste(author,year,group),
data = SRschneider,
model = "DIDS", log.cutoff = FALSE,
check.nobs.vs.nRE = "ignore")
The dataset looks like this:
I expected the same successful execution and plotting as with the built-in data set, but keep getting this error.
Result from doing str(mydataset):
> str(SRschneider)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 150 obs. of 10 variables:
$ ...1 : num 1 2 3 4 5 6 7 8 9 10 ...
$ study_id: num 1 1 1 1 1 1 1 1 1 1 ...
$ author : chr "Arora" "Arora" "Arora" "Arora" ...
$ year : num 2006 2006 2006 2006 2006 ...
$ group : chr NA NA NA NA ...
$ cutpoint: chr "6" "7.0" "8.0" "9.0" ...
$ tpos : num 133 131 130 127 119 115 113 110 102 98 ...
$ fneg : num 5 7 8 11 19 23 25 28 36 40 ...
$ fpos : num 34 33 31 30 28 26 25 21 19 19 ...
$ tneg : num 0 1 3 4 6 8 9 13 15 15 ...
Just a quick follow-up on Ben's detailed answer.
The statistical method implemented in diagmeta() expects that argument cutpoint is a continuous variable. We added a corresponding check for argument cutpoint (as well as arguments TP, FP, TN, and FN) in version 0.3-1 of R package diagmeta; see commit in GitHub repository for technical details.
Accordingly, the following R commands will result in a more informative error message:
data(Schneider2017)
diagmeta(tpos, fpos, tneg, fneg, as.character(cutpoint),
studlab = paste(author, year, group), data = Schneider2017)
You said that you
have made sure every column's data type matches that in the Schneider2017 dataset
but that doesn't seem to be true. Besides differences between num (numeric) and int (integer) types (which actually aren't typically important), your data has
$ cutpoint: chr "6" "7.0" "8.0" "9.0" ...
while str(Schneider2017) has
$ cutpoint: num 6 7 8 9 10 11 12 13 14 15 ...
Having your cutpoint be a character rather than numeric means that R will try to treat it as a categorical variable (with many discrete levels). This is very likely the source of your problem.
The cutpoint variable is likely a character because R encountered some value in this column that can't be interpreted as numeric (something as simple as a typographic error). You can use SRschneider$cutpoint <- as.numeric(SRschneider$cutpoint) to convert the variable to numeric by brute force (values that can't be interpreted will be set to NA), but it would be better to go upstream and see where the problem is.
If you use tidyverse packages to load your data you should get a list of "parsing problems" that may be useful. You can also try cp <- SRschneider$cutpoint; cp[which(is.na(as.numeric(cp)))] to look at the values that can't be converted.

Convert delimited string to numeric vector in dataframe

This is such a basic question, I'm embarrassed to ask.
Let's say I have a dataframe full of columns which contain data of the following form:
test <-"3000,9843,9291,2161,3458,2347,22925,55836,2890,2824,2848,2805,2808,2775,2760,2706,2727,2688,2727,2658,2654,2588"
I want to convert this to a numeric vector, which I have done like so:
test <- as.numeric(unlist(strsplit(test, split=",")))
I now want to convert a large dataframe containing a column full of this data into a numeric vector equivalent:
mutate(data,
converted = as.numeric(unlist(strsplit(badColumn, split=","))),
)
This doesn't work because presumably it's converting the entire column into a numeric vector and then replacing a single row with that value:
Error in mutate_impl(.data, dots) : Column converted must be
length 20 (the number of rows) or one, not 1274
How do I do this?
Here's some sample data that reproduces your error:
data <- data.frame(a = 1:3,
badColumn = c("10,20,30,40,50", "1,2,3,4,5,6", "9,8,7,6,5,4,3"),
stringsAsFactors = FALSE)
Here's the error:
library(tidyverse)
mutate(data, converted = as.numeric(unlist(strsplit(badColumn, split=","))))
# Error in mutate_impl(.data, dots) :
# Column `converted` must be length 3 (the number of rows) or one, not 18
A straightforward way would be to just use strsplit on the entire column, and lapply ... as.numeric to convert the resulting list values from character vectors to numeric vectors.
x <- mutate(data, converted = lapply(strsplit(badColumn, ",", TRUE), as.numeric))
str(x)
# 'data.frame': 3 obs. of 3 variables:
# $ a : int 1 2 3
# $ badColumn: chr "10,20,30,40,50" "1,2,3,4,5,6" "9,8,7,6,5,4,3"
# $ converted:List of 3
# ..$ : num 10 20 30 40 50
# ..$ : num 1 2 3 4 5 6
# ..$ : num 9 8 7 6 5 4 3
This might help:
library(purrr)
mutate(data, converted = map(badColumn, function(txt) as.numeric(unlist(strsplit(txt, split = ",")))))
What you get is a list column which contains the numeric vectors.
Base R
A=c(as.numeric(strsplit(test,',')[[1]]))
A
[1] 3000 9843 9291 2161 3458 2347 22925 55836 2890 2824 2848 2805 2808 2775 2760 2706 2727 2688 2727 2658 2654 2588
df$NEw2=lapply(df$NEw, function(x) c(as.numeric(strsplit(x,',')[[1]])))
df%>%mutate(NEw2=list(c(as.numeric(strsplit(NEw,',')[[1]]))))

Choosing multiple columns and changing their classes using a lookup table in R?

Is it possible to use a lookup table to assign/change the classes of variables in a data frame in R? I have thousands of columns with messed up classes in one data frame (my_df), and list of what they should be in another data frame (my_lt). PSEUDO CODE I was thinking something like use my_lt$variable_name and grep() on colnames(my_df) and pass the output through as.numeric if lt$variable_class == "numeric", with some form of if..else. Any help would be much appreciated!
input - my data frame (my_df)
my_df = data.frame(q1_hight_1=c(12,31,22,12),q1_hight_2=c(24,54,23,32),q1_hight_3=c(34,23,65,34),q2_shoe_size_1=c(2,2,3,4),q2_shoe_size_2=c(4,3,3,4))
input - my lookup table (my_lt)
my_lt = data.frame(variable_name=c("hight","shoe_size"),variable_class=c("numeric","integer"))
desired output (when checking classes)
$q1_hight_1 [1] "numeric" $q1_hight_2 [1] "numeric" $q1_hight_3 [1] "numeric" $q2_shoe_size_1 [1] "integer" $q2_shoe_size_2 [1] "integer"
This does the trick, given that there's no trap in the names you give to your variables (I use a very naïve grep).
library(dplyr)
library(purr)
map2(as.character(my_lt$variable_name),
as.character(my_lt$variable_class),
function(nam,cl){ map(grep(nam,names(my_df)),function(i){class(my_df[[i]]) <<- cl})})
str(my_df)
# 'data.frame': 4 obs. of 5 variables:
# $ q1_hight_1 : num 12 31 22 12
# $ q1_hight_2 : num 24 54 23 32
# $ q1_hight_3 : num 34 23 65 34
# $ q2_shoe_size_1: int 2 2 3 4
# $ q2_shoe_size_2: int 4 3 3 4

Shiny reactive Unexpected Behavior

I'm trying to create a reactive function that looks up the indices, corresponding to the user's inputs, from a dataframe referred to as df in the code below. Just to give you an idea, here's how the dataframe df looks like:
'data.frame': 87 obs. of 6 variables:
$ Job : Factor w/ 66 levels "Applications Engineer",..: 61 14 23 31 22 15 57 26 30 13 ...
$ Company : Factor w/ 102 levels "A10 Networks",..: 95 50 83 71 80 60 20 7 30 51 ...
$ Location: Factor w/ 64 levels "Ayr","Bangalore",..: 36 22 19 29 59 7 7 55 53 63 ...
$ Posted : num 2 3 2 3 1 1 2 5 4 1 ...
$ Source : Factor w/ 2 levels "Glassdoor","Indeed": 2 2 2 2 2 2 2 2 2 2 ...
$ url : chr "http://ca.indeed.com/rc/clk?jk=71f1abcd100850c6" "http://ca.indeed.com/rc/clk?jk=504724a4d74674fe" "http://ca.indeed.com/rc/clk?jk=d2e78fb67e8c86d6" "http://ca.indeed.com/rc/clk?jk=df790aa5fc7bdc3c" ...
The reactive function mostly uses the grep function to do a text search and find the respective indices. Here's the relevant chunk of the code from server.R:
#Create a reactive function to look up the indices correponding to the inputs
index <- reactive({
ind.j <- if(input$j=='') NULL else grep(input$j,df[,'Job'],ignore.case = T)
ind.c <- {tmp<-lapply(input$c, function(x) {which(df[,'Company']==x)}); Reduce(union,tmp)}
ind.l <- if(input$l=='') NULL else grep(input$l,df[,'Location'],ignore.case = T)
ind.d <- which(df[,'Posted']<=input$d)
ind.s <- {tmp<-lapply(input$s, function(x) {which(df[,'Source']==x)}); Reduce(union,tmp)}
ind.all <- list(ind.j,ind.c,ind.l,ind.d,ind.s)
ind <- if(is.null(ind.s)) NULL else {ind.null<- which(lapply(ind.all,is.null)==TRUE) ;Reduce(intersect,ind.all[-ind.null])}
})
I have printed the results of ind.j, ind.c, ind.l,ind.d, ind.s, and ind.all to the console and they all produce the right results. however when I test the results of ind it's not quite what I expect so I'm wondering if it's the reactivity or the line of code that doesn't work.
What the ind intends to do is to take the list of all the looked-up indices, stored in ind.all, and applies the intersect function recursively to find the common elements from all the sublists in ind.all.
The index function works fine for individual filters. however when I enter values for all the indices, the function does not update to the correct list of indices as expected.
This question has been answered by in this post by jdharrison. I'm going to reiterate his answer here:
The problem you have is with the which function:
> which(rep(FALSE, 5))
integer(0)
You can change:
ind <- if(is.null(ind.s)){
NULL
}else{
ind.null<- which(lapply(ind.all,is.null)==TRUE)
Reduce(intersect,ind.all[-ind.null])
}
to
ind <- if(is.null(ind.s)){
NULL
}else{
Reduce(intersect,ind.all[!sapply(ind.all,is.null)])
}

Resources