How to retrieve column in sequence on SQLite using R - r

I uses a censor data which provide the wavelengths from 100-300nm with sequence 1, means: 100, 101,102,103,..300. I keep the data in SQLite using R, with table name as data
> data
obs 100 101 102 103 104 ... 300
1 0.1 0.1 0.9 0.1 0.2 0.5
2 0.8 1.0 0.9 0.0 1.0 0.4
3 0.7 0.8 0.3 0.8 0.5 0.2
4 0.7 0.1 0.2 0.4 0.7 0.6
5 0.9 0.4 0.6 0.6 0.6 0.4
6 0.7 0.1 0.6 0.7 0.9 0.9
I am interested to retrieve the column number with sequence 4 only starting 100. Means: 100, 104, 108, ...
I tried using sqldf("select 100, 104, 108, ... from data") but seems not efficient work. Is there someone can help using R? thanks!

You can use paste() inside of sqldf to make things like this easier. So the basic idea would be:
sqldf(paste("select",
paste0("`",seq(100,300,4),"`",collapse=", "),
"from data"))
Columns or tables with numeric names typically need to be surrounded with backticks. So that's why I've adjusted the statement to find `100`, instead of 100.
The full statement (simplified above) looks like this:
[1] "select `100`, `104`, `108`, `112`, `116`, `120`, `124`, `128`, `132`, `136`,
`140`, `144`, `148`, `152`, `156`, `160`, `164`, `168`, `172`, `176`,
`180`, `184`, `188`, `192`, `196`, `200`, `204`, `208`, `212`, `216`,
`220`, `224`, `228`, `232`, `236`, `240`, `244`, `248`, `252`, `256`,
`260`, `264`, `268`, `272`, `276`, `280`, `284`, `288`, `292`, `296`,
`300` from data"

sqldf loads the gsubfn package which provides fn$ for dealing with string interpolation. fn$ can preface any function invocation so, for example, use fn$sqldf("... $var ...") and then $var is replaced with its value.
Note that select 100 selects the number 100 and not the column named 100 so we use select [100] instead.
cn <- toString(sprintf("[%d]", seq(100, 300, 4))) # "[100], [104], ..."
fn$sqldf("select $cn from data")
or if we want to create the SQL statement in a variable and then run it:
sql <- fn$identity("select $cn from data")
sqldf(sql)
Note that this is pretty easy to do in straight R as well:
data[paste(seq(100, 300, 4))]

Related

How to replace a specific character string with a number?

I'm working with a dataframe, entitled Clutch, of information about cards in a trading card game. One of the variables, CMD+, can consist of the following values:
"R+1"
"L+1"
"R+2"
"L+2"
0
What I want to do is to create a new variable, Clutch$C+, that takes these string values for each data point and replaces them with numbers. R+1 and L+1 are replaced with 0.5, and R+2 and L+2 are replaced with 1. 0 is unchanged.
How do I do this? Sorry if this is a basic question, my R skills aren't great at the minute, working on getting better.
probably not the most beautiful solution but this should work.
C<-rep(0,length(Clutch$CMD))
Clutch<-cbind(Clutch,C)
Clutch$C+[which(Clutch$CMD+=="R+1")]<-0.5
Clutch$C+[which(Clutch$CMD+=="L+1")]<-0.5
Clutch$C+[which(Clutch$CMD+=="R+2")]<-1
You can try:
paste0(as.numeric(gsub("\\D", "\\1", x))/2, sub("\\D", "\\1", x))
[1] "0.5+1" "0.5+1" "1+2" "1+2"
Here is one way using the fact that the result is half the digit in your string :
Clutch <- data.frame(`CMD+` = sample(c("R+1", "L+1", "R+2", "L+2", 0), 10, replace = TRUE))
Clutch[["C+"]] <- as.numeric(gsub("[^0-9]", "", Clutch$CMD))/2
Clutch
> Clutch
CMD. C+
1 R+1 0.5
2 R+2 1.0
3 R+1 0.5
4 L+1 0.5
5 L+1 0.5
6 R+1 0.5
7 R+1 0.5
8 L+1 0.5
9 0 0.0
10 L+1 0.5
You can simply use gsub
> as.numeric(gsub(".*[+]","",a))/2
[1] 0.5 0.5 1.0 1.0 0.0
If it is a data frame. You can use this-
> library(data.table)
> dt <- data.frame(CMD = c("R+1", "L+1", "R+2", "L+2", 0))
> setDT(dt)[,CMD:=as.numeric(gsub(".*[+]","",a))/2]
> dt
CMD
1: 0.5
2: 0.5
3: 1.0
4: 1.0
5: 0.0
Another idea is to use a simple ifelse statement that looks for 1 in the string and replaces with 0.5, and 2 to replace with 1, i.e.
#where x is your column,
as.numeric(ifelse(grepl('1', x), 0.5, ifelse(grepl('2', x), 1, x)))
#[1] 0.5 0.5 1.0 1.0 0.0

How to combine matrices by rowname and insert empty space in non-matching elements in R?

I want to combine two matrices with partly overlapping rownames in R. When the rownames match, values from the two matrices should end up as adjacent columns. When the rownames only occur in one matrix, empty space should be inserted for the other matrix.
Data set:
testm1 <- cbind("est"=c(1.5,1.2,0.7,4.0), "lci"=c(1.1,0.9,0.5,0.9), "hci"=c(2.0,1.7,0.8,9.0))
rownames(testm1) <- c("BadFood","NoActivity","NoSunlight","NoWater")
testm1 #Factors associated with becoming sick
testm2 <- cbind("est"=c(3.0,2.0,0.9,7.0), "lci"=c(1.3,1.2,0.2,2.0), "hci"=c(5.0,3.1,1.7,9.0))
rownames(testm2) <- c("BadFood","NoActivity","Genetics","Age")
testm2 #Factors associated with dying
Desired output:
Sick Dying
est lci hci est lci hci
BadFood 1.5 1.1 2.0 3.0 1.3 5.0
NoActivity 1.2 0.9 1.7 2.0 1.2 3.1
NoSunlight 0.7 0.5 0.8 - - -
NoWater 4.0 0.9 9.0 - - -
Genetics - - - 0.9 0.2 1.7
Age - - - 7.0 2.0 9.0
Is there a simple way to do this that would work for all matrices?
Here is a base R method that keeps everything in matrix form:
# get rownames of new matrix
newNames <- union(rownames(testm1), rownames(testm2))
# construct new matrix
newMat <- matrix(NA, length(newNames), 2*ncol(testm2),
dimnames=list(c(newNames), rep(colnames(testm1), 2)))
# fill in new matrix
newMat[match(rownames(testm1), newNames), 1:ncol(testm1)] <- testm1
newMat[match(rownames(testm2), newNames), (ncol(testm1)+1):ncol(newMat)] <- testm2
In the final two lines, match is used to find the proper row indices by row name.
This returns
newMat
est lci hci est lci hci
BadFood 1.5 1.1 2.0 3.0 1.3 5.0
NoActivity 1.2 0.9 1.7 2.0 1.2 3.1
NoSunlight 0.7 0.5 0.8 NA NA NA
NoWater 4.0 0.9 9.0 NA NA NA
Genetics NA NA NA 0.9 0.2 1.7
Age NA NA NA 7.0 2.0 9.0
I think this does what you are after though its not that pretty and requires the data to be a data.frame not a matrix. Hope it helps at least !
( Code was adapted from this question & answer https://stackoverflow.com/a/34530141/4651564 )
library(dplyr)
dat1 <- as.data.frame(testm1)
dat2 <- as.data.frame(testm2)
full_join( dat1 %>% mutate(Symbol = rownames(dat1) ),
dat2 %>% mutate(Symbol = rownames(dat2) ),
by = 'Symbol')
You can do it using merge() function.
First of all cast your test matrices into dataframes, then use merge on the dataframes, finally convert the result in a matrix (but do you necessarily need a matrix?).
Here's an example code:
testm1 <- as.data.frame(testm1)
testm2 <- as.data.frame(testm2)
result <- merge(testm1, testm2, by='row.names', all.x=T, all.y=T)
# all.x is needed if you want to save rows not matched in the merge process
result <- as.matrix(result)
If you want to obtain a data frame, simply omit the last line of code. Hope this helps.

Creating a lookup based on two values

I have an excel that contains a matrix. Here you find a screenshot of the matrix I want to use: https://www.flickr.com/photos/113328996#N07/23026818939/in/dateposted-public/
What I would like to do now is to create some kind of lookup function. So when i have the rows:
Arsenal - Aston Villa
It should look up 114.6.
Of course I could create rows with all distances like:
Arsenal - Aston Villa - 144.6
And perform a lookup function but my instincts tell me this is not the most efficient way.
Any feedback on how I can deal with above most efficiently?
This lookup-function is the basic [ operator for data.frames and matrices in R.
Take this example data (from Here)
a <- cbind(c(0.1,0.5,0.25),c(0.2,0.3,0.65),c(0.7,0.2,0.1))
rownames(a) <- c("Lilo","Chops","Henmans")
colnames(a) <- c("Product A","Product B","Product C")
a
Product A Product B Product C
Lilo 0.10 0.20 0.7
Chops 0.50 0.30 0.2
Henmans 0.25 0.65 0.1
The lookupfunktion is this:
a["Lilo","Product A"] # 0.1
a["Henmans","Product B"] # 0.65

How to assign weights to strings in WHERE clause?

The table is
id col1 col2
1 former good
2 future fair
3 now bad
4 former good
.............
GOAL : I need to SELECT only those rows that have a cumulative score higher than 0.8
1) If col1 = 'former' THEN the row gets 0.2 points, if 'now' THEN '0.7' , if 'future' THEN 0.3
2) If col2 = 'good' THEN the row gets 0.8 points, if 'bad' THEN '0.1' , if 'fair' THEN 0.5
Therefore I need to I need to assign numeric values in the WHERE clause. I want to avoid changing values in the SELECT because I need the user to be able to see the labels ('good', 'now' etc) but not numbers.
How can I do this?
SELECT *
FROM mytable
WHERE ?
Use a CASE to assign a weight based on your logic:
WHERE
CASE col1
WHEN 'former' THEN 0.2
WHEN 'now' THEN 0.7
WHEN 'future' THEN 0.3
ELSE 0
END +
CASE col2
WHEN 'good' THEN 0.8
WHEN 'bad' THEN 0.1
WHEN 'fair' THEN 0.5
ELSE 0
END > 0.8
SELECT * FROM myTable where col1 + col2 > 0.8
But provide us the real structure of the table.

How to summarize multiple files into one file based on an assigned rule?

I have ~ 100 files in the following format, each file has its own file name, but all these files are save in the same directory, let's said, filecd is follows:
A B C D
ab 0.3 0.0 0.2 0.20
cd 0.7 0.0 0.3 0.77
ef 0.8 0.1 0.5 0.91
gh 0.3 0.5 0.6 0.78
fileabb is as follows:
A B C D
ab 0.3 0.9 1.0 0.20
gh 0.3 0.5 0.6 0.9
All these files have same number of columns but different number of rows.
For each file I want to summarize them as one row (0 for all cells in the same column are < 0.8; 1 for ANY of the cells in the same column is larger than or equal to 0.8), and the summerized results will be saved in a separate csv file as follows:
A B C D
filecd 1 0 0 1
fileabb 0 1 1 1
..... till 100
Instead of reading files and processing each files separately, could it be done by R efficiently? Could you give me help on how to do so? Thanks.
For the ease of discussion. I have add following lines for sample input files:
file1 <- data.frame(A=c(0.3, 0.7, 0.8, 0.3), B=c(0,0,0.1,0.5), C=c(0.2,0.3,0.5,0.6), D=c(0.2,0.77,0.91, 0.78))
file2 <- data.frame(A=c(0.3, 0.3), B=c(0.9,0.5), C=c(1,0.6), D=c(0.2,0.9))
Please kindly give me some more advice. Many thanks.
First make a vector of all the filenames.
filenames <- dir(your_data_dir) #you may also need the pattern argument
Then read the data into a list of data frames.
data_list <- lapply(filenames, function(fn) as.matrix(read.delim(fn)))
#maybe with other arguments passed to read.delim
Now calculate the summary.
summarised <- lapply(data_list, function(dfr)
{
apply(x, 2, function(row) any(row >= 0.8))
})
Convert this list into a matrix.
summary_matrix <- do.call(rbind, summarised)
Make the rownames match the file.
rownames(summary_matrix) <- filenames
Now write out to CSV.
write.csv(summary_matrix, "my_summary_matrix.csv")

Resources