Basic question but I'm a beginner sorry :-) And I still struggle with all these different data types etc. So I have a table with different variable names in column 1. In column 2 These variables have certain values. I want to extract now the value for a certain variable.
VarNames<-read.table(paste("O:/Daten/RatsDaten/CodesandDescription/VarNamesDir.asc"), sep="", skip=0,header=FALSE)
And the table Looks somehow like this
Test1 5
Test2 7
Test3 1
So how do I Access these Test variable values with their names? VarNames["Test1",2] didn't work..neither did any other option I've tried. Are there better data type options for this or how would I do it with a comfortable data frame?
You should have one of this 2 situations , either
Testxx are rownames of VarNames, you can test this using rownames(VarNames), and in this case you should do :
VarNames["Test1",1]
Or Testxx are components of a column, and you should do something like this :
VarNames[VarNames$v =='Test1',2]
For the first option :
m <- matrix(1:3,ncol=1,dimnames=list(paste0('Test',1:3),NULL))
m['Test1',]
Test1
1
for the second option
m1 <- data.frame(v=paste0('Test',1:3),b=1:3)
m1[m1$v=='Test1',]
v b
1 Test1 1
As your example is not reproducible, it is unclear whether the first column denotes row names or a variable with values TestX.
In case it is a variable, your table actually looks like this:
V1 V2
Test1 5
Test2 7
Test3 1
So you can get value of Test2 by calling VarNames[VarNames$V1 == "Test2",] for the whole row or VarNames[VarNames$V1 == "Test2",2] for the value only. You specify 2 since it is the second column.
If the first column denotes row names, the call is VarNames["Test2",] for the whole row, or as #agstudy answered, VarNames["Test2",1] for the value alone. You specify 1 since it is the first column provided Test2 is a row name, and thus is not contained in a column.
Related
I would count with the func table() in R how many time a value occures in a cell. But, some cell contains more value divided by colon. I report an example below:
example <- data.frame(c("A","B","A:::B"))
table(example)
the result is:
A A:::B B
1 1 1
but i want something like this
A B
2 2
I try to duplicate the rows with this characteristics, but the dataset is already too large and duplicate rows makes dataset impossible to use. How can i do?
thanks
We can split the column values by ::: and get the table
table(unlist(strsplit(example[[1]], "\\:+")))
# A B
# 2 2
Assuming I have an original version dataset containing a complete set of "texsts" (a string variable), and a second dataset that only contains those "texts" for which the new variable "value" takes a certain value (0, 1, or NA).
Now I would like to merge them back together so that the resulting dataset contains the full range of "texts" from the first dataset but also includes "value" which should be 0 if coded 0 and/or only present in the original dataset.
dat1<-data.frame(text=c("a","b","c","d","e","f","g","h")) # original dataset
dat2<-data.frame(text=c("e","f","g","h"), value=c(0,NA,1,1)) # second version
The final dataset should look like this:
> dat3
text value
1 a 0
2 b 0
3 c 0
4 d 0
5 e 0
6 f NA
7 g 1
8 h 1
However, what Base-R's merge() does is to introduce NAs where I want 0s instead:
dat3<-merge(dat1, dat2, by=c("text"), all=T)
Is there a way to define a default input for when the variable by which datasets are merged is only present in one but not the other dataset? In other words, how can I define 0 as standard input value instead of NA?
I am aware of the fact that I could temporarily change the coded NAs in the second dataset to something else to distinguish later on between "real" NAs and NAs that just get introduced, but I would really like to refrain from doing so, if there's another, cleaner way. Ideally, I would like to use merge() or plyr::join() for that purpose but couldn't find anything in the manual(s).
I know that this is not ideal too, but something to consider:
library(dplyr)
dat3 <- dplyr::left_join(dat1,dat2,all.x =T)
dat3[which(dat2$text != dat3$text),2] = 0
Or wrapping in a function to call a one-liner:
merge_NA <- function(dat1,dat2){
dat3 <- dplyr::left_join(dat1,dat2,all.x = T)
dat3[which(dat2$text != dat3$text),2] = 0
return(dat3)
}
Now, you only call:
merge_NA(dat1,dat2)
As below, dataframe factorizedss is the factorized version of a sourcedata dataframe ss.
ss <- data.frame(c('a','b','a'), c(1,2,1)); #There are string columns and number columns.
#So, I factorized them as below.
factorizedss <- data.frame(lapply(ss, as.factor)); #factorized version
indices <- data.frame(c(1,1,2,2), c(1,1,1,2)); #Now, given integer indices
With given indices, using factorizedss, is it possible to get corresponding element of the source dataframe as below? (The purpose is to access data frame element by integer number in factor level )
a 1
a 1
b 1
b 2
You can access the first column like this
factorizedss[indices[,1],][,1]
and the second in a similar way
factorizedss[indices[,2],][,2]
It gets more difficult when trying to combine them, you might have to convert them back to native types
t(rbind(as.character(factorizedss[indices[,1],][,1]),as.numeric(factorizedss[indices[,2],][,2])))
I have a data-frame with 2 columns that contains two different types of text
The first column contains codes that are strings in the form of DD-HI-HO (DD being the code)
Column 2 is free text which anyone can insert
I am trying to populate the third column based on three statements which use the logic below to give a single vector column of 1 or 0
i don't seem to be able to update a vector column to incorporate all three rules. Below is Pseudo code
Basic info:
Codes is a vector (basically a reference table with one column)
Fuzzy is a vector (basically another reference table with one column)
#----CHECK SEQUENCES----
# Check if code is applied in column 1
Data$Has.Code <- grepl(pattern = "(HC|HD|HE|HK|HM|HH|HY|HL)", Data.Raw$Col1)
# Check if string contains relevant text in col 2
Data$Has.DG <- if(length(intersect(Codes, Data$Contents)) > 0) {1}
# Check how closely Strings are related. Take the highest match If its over 45% then set flag as 1
levenshteinSim(Fuzzy ,Data$Contents)
-------Added Table with sample data
Col1, Col2, Col3
1.HC-IE, Ice-cream, 1
2.IE-GB, Volvo, 0
3,IE-DE, Iced_Lollipop, 1
Record 1,
Rule number 1 would catch "HC" in Col1 and so set Col 3 to 1 (boolean)
Rule number 2 would also catch something in Col2 for record 1 as the vector Codes contains "Ice" as an element. It wouldn't execute in any case because
Rule one supercedes it
Record 2
None of the rules would return anything for the second item so col 3 is set to 0
Record 3
A bit of a daft example but the levenschtein distance computes a 75% similarity between Col 2 and one of the elements in the vector Fuzzy. This is above our stated threshold so col 3 is set to 1
Can anyone help
Thank you for your help
I need to subset my data frame, but I do not know what condition to use.
df2<-subset(df, condition )
A part of the dataframe, `df`:
state value
a 1
b 2
c 3
a 1
b 4
c 5
I count the sum of the value column for each state using : table(df$state)
I need to create a date frame where I show just the rows where the sum of the value column is bigger then a given value x.
If x is 3, I need to have in the new data frame just the rows that have the "state" column equal to b or c.
What should I replace "condition" with? How can I use : table(df$state) in the condition?
It is not clear what are you trying to do.
table(df$state) count the occurence of each state in your data, not the sum of variable "value" for each "state".You should instead use something like this:
vv <- tapply(dat$value,dat$state,sum)
vv
a b c
2 6 8
Now you can use the result within subset, to get the sum of the value column is bigger then a given value x. For example x == 3:
subset(dat,state %in% names(vv)[vv>3])
or without using `subset ( more efficient)
dat[dat$state %in% names(vv)[vv>3],]