How do I create SHA256 HMAC using Ironclad in Common Lisp? - common-lisp

There is a python function I am trying to port to Common Lisp:
HEX(HMAC_SHA256(apiSecret, 'stupidstupid'))
How do I go about this with Ironclad?
The closest I've come is:
(ironclad:make-hmac apiSecret :sha256)
But it's not working; it says that apiSecret
The value "V0mn1LLQIc6GNSiBpDfDmRo3Ji8leBZWqMIolNBsnaklScgI"
is not of type
(SIMPLE-ARRAY (UNSIGNED-BYTE 8) (*)).

Ironclad internally works with arrays of bytes.
But it provides tools to convert from ascii strings to such arrays and from bytes to "hex" strings. Here is an interactive session (note that I don't know much about crypto algorithms):
CL-USER> (in-package :ironclad)
#<PACKAGE "IRONCLAD">
Converting the secret:
CRYPTO> (ascii-string-to-byte-array "V0mn1LLQIc6GNSiBpDfDmRo3Ji8leBZWqMIolNBsnaklScgI")
#(86 48 109 110 49 76 76 81 73 99 54 71 78 83 105 66 112 68 102 68 109 82 111
51 74 105 56 108 101 66 90 87 113 77 73 111 108 78 66 115 110 97 107 108 83
99 103 73)
Building the HMAC from previous value:
CRYPTO> (make-hmac * :sha256)
#<HMAC(SHA256) {1006214D93}>
Now, I am not sure this is what you want, but according to the documentation, you are supposed to update the hmac with one or more sequences:
CRYPTO> (update-hmac * (ascii-string-to-byte-array "stupidstupid"))
#<HMAC(SHA256) {1006214D93}>
... and then compute a digest:
CRYPTO> (hmac-digest *)
#(178 90 228 244 244 45 109 163 51 222 77 235 244 173 249 208 144 43 116 130
210 188 62 247 145 153 100 198 119 86 207 163)
The resulting array can be converted to an hex string:
CRYPTO> (byte-array-to-hex-string *)
"b25ae4f4f42d6da333de4debf4adf9d0902b7482d2bc3ef7919964c67756cfa3"
For completeness, here is how you could wrap those functions to replicate the original code, assuming you are in a package that imports the right symbols:
(defun hex (bytes)
(byte-array-to-hex-string bytes))
(defun hmac_sha256 (secret text)
(let ((hmac (make-hmac (ascii-string-to-byte-array secret) :sha256)))
(update-hmac hmac (ascii-string-to-byte-array text))
(hmac-digest hmac)))
Finally:
(HEX (HMAC_SHA256 "V0mn1LLQIc6GNSiBpDfDmRo3Ji8leBZWqMIolNBsnaklScgI"
"stupidstupid"))
=> "b25ae4f4f42d6da333de4debf4adf9d0902b7482d2bc3ef7919964c67756cfa3"

Related

Placing Data into bins and averaging the values within those bins in r [duplicate]

This question already has answers here:
Binning across multiple categories
(2 answers)
Closed 5 years ago.
I am very new to r but have been asked to use it by my professor to analyze our data. Currently, we are trying to conduct a changepoint analysis on a large set of data which I know how to do. But we want to first place our data into time bins of 30 seconds. Our trials are 20 minutes in length so we should have a total of 40 bins. We have columns for: time, Flow, and MAP and would like to take the values of flow and MAP within each 30 second bin and average them. This will condense 1120-2000 points of data into a much cleaner 40 data points. We are having trouble binning the data and dont even know where to start, once binned we would like to generate a table of those new 40 values (40 for MAP and 40 for Flow) so that we can use the changepoint package to find the changepoint in our set. We believe possibly clip( could be what we need.
Sorry if this is too confusing or too vague, we have no programming experience whatsoever.
Edit I believe this is different than the bacteria question because I wanted a direct output into a table rather than interpolating from a graph then into a table.
Here is a sample from our data:
RawMin Flow MAP
2.9982 51 77
3.0113 110 80
3.0240 84 77
3.0393 119 75
3.0551 93 75
3.0692 136 73
3.0839 81 73
3.0988 58 72
3.1138 125 71
3.1285 89 72
3.1432 160 73
3.1576 87 74
3.1714 128 74
3.1860 90 74
3.2015 63 76
3.2154 120 76
3.2293 65 76
3.2443 156 78
3.2585 66 78
3.2723 130 78
3.2876 89 77
3.3029 111 77
3.3171 90 75
3.3329 100 76
3.3482 127 76
3.3618 69 78
3.3751 155 78
3.3898 90 79
3.4041 127 80
3.4176 103 80
3.4325 87 79
3.4484 134 78
3.4637 57 77
3.4784 147 78
3.4937 75 78
3.5080 137 78
3.5203 123 78
3.5337 99 80
3.5476 170 80
3.5620 90 79
3.5756 164 78
3.5909 85 78
3.6061 164 77
3.6203 103 77
3.6348 140 79
3.6484 152 79
3.6611 79 80
3.6742 184 82
3.6872 128 81
3.7017 123 82
3.7152 176 81
3.7295 74 81
3.7436 153 80
3.7572 85 80
3.7708 115 79
3.7847 187 78
3.7980 105 78
3.8108 175 78
3.8252 124 79
3.8392 171 79
3.8528 127 78
3.8669 138 79
3.8811 198 79
3.8944 109 80
3.9080 171 80
3.9214 137 79
3.9341 109 81
3.9455 193 83
3.9575 108 85
3.9707 163 84
3.9853 136 82
4.0005 121 81
4.0164 164 79
4.0311 73 79
4.0450 171 78
4.0591 105 79
4.0716 117 79
4.0833 210 81
4.0940 103 85
4.1041 193 88
4.1152 163 84
4.1310 145 82
4.1486 126 79
4.1654 118 77
4.1811 130 75
4.1975 83 74
4.2127 176 73
4.2277 72 74
4.2424 177 74
4.2569 90 75
4.2705 148 76
4.2841 148 77
4.2986 123 77
4.3130 150 76
4.3280 71 77
4.3433 176 76
4.3583 90 76
4.3727 138 77
4.3874 136 79
4.4007 106 80
4.4133 167 83
4.4247 119 87
4.4360 123 88
4.4496 141 85
4.4673 117 84
4.4841 133 80
4.5005 83 79
4.5166 156 77
4.5324 97 77
4.5463 182 77
4.5605 110 79
4.5744 187 80
4.5882 121 81
4.6024 142 81
4.6171 178 81
4.6313 96 80
4.6452 180 80
4.6599 107 80
4.6741 151 79
4.6876 137 80
4.7009 132 82
4.7141 199 80
4.7279 91 81
4.7402 172 83
4.7531 172 80
4.7660 128 84
4.7785 197 83
4.7909 122 84
4.8046 129 84
4.8187 176 82
4.8328 102 81
4.8448 184 81
4.8556 145 83
4.8657 123 84
4.8768 138 86
4.8885 143 82
4.9040 135 81
4.9198 112 78
4.9362 134 77
4.9515 152 76
4.9651 83 76
4.9785 177 78
4.9912 114 79
5.0037 127 80
5.0167 200 81
5.0297 104 81
5.0429 175 81
5.0559 123 81
5.0685 106 81
5.0809 176 81
5.0937 113 82
5.1064 191 81
5.1181 178 79
5.1297 121 79
5.1404 176 80
5.1506 214 83
5.1606 132 85
5.1709 149 83
5.1829 175 80
5.1981 103 79
5.2128 169 76
5.2283 97 75
5.2431 149 74
5.2575 109 74
5.2709 97 74
5.2842 195 75
5.2975 104 75
5.3106 143 77
5.3231 185 76
5.3361 140 77
5.3487 132 78
5.3614 162 79
5.3750 98 78
5.3900 137 78
5.4047 108 76
5.4202 94 76
5.4341 186 75
5.4475 82 77
5.4608 157 80
5.4739 176 81
5.4867 90 83
5.4989 123 86
Assuming RawMin is time in minutes, you could do something like this...
df2 <- aggregate(df, #the data frame
by=list(cut(df$RawMin,seq(0,10,0.5))), #the bins (see below)
mean) #the aggregating function
df2
Group.1 RawMin Flow MAP
1 (2.5,3] 2.998200 51.0000 77.00000
2 (3,3.5] 3.251682 103.5588 76.20588
3 (3.5,4] 3.748994 135.9722 79.75000
4 (4,4.5] 4.240434 132.0857 79.25714
5 (4.5,5] 4.749781 140.1892 80.43243
6 (5,5.5] 5.246556 140.9231 78.89744
Binning is done with the cut function - here by 0.5 minute intervals between 0 and 10, which you might want to change. The bin names are the intervals - e.g. (2.5,3] means greater than 2.5, less than or equal to 3.
If you don't want RawMin included in the output, just use df[,-1] in the input to aggregate.

Analysis of DEXSeq count table

I am using a platform "Bcbio" for processing RNASeq fastqs. At the end of the process, it generates a number of files like counttables, sailfish raw data and so on. There is also a file called "combined.dexseq" which looks like;
id Control_rep1_1 Control_rep1_2 50ng_1 50ng_2 250ng_1 250ng_2
ENSG00000000003:001 458 495 688 643 619 622
ENSG00000000003:002 143 140 204 153 166 163
ENSG00000000003:003 93 65 117 101 80 112
ENSG00000000003:004 50 47 68 73 54 89
ENSG00000000003:005 66 62 85 109 71 104
ENSG00000000003:006 97 93 152 163 131 153
I want to run a DEXSeq analysis following the vignette but the problem is vignette generates the data form that I have at the very end when featureCounts() function is used.
Can anyone help me with estimating exon fold changes and using other important functions for analysis with using the file format that I have?

Filtering my R data frame is causing it to sort the data frame incorrectly

Consider the following two code snippets.
A:
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv", destfile = "./data/gdp.csv", method = "curl" )
gdp <- read.csv('./data/gdp.csv', header=F, skip=5, nrows=190) # Specify nrows, get correct answer
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv", destfile = "./data/education.csv", method = "curl" )
education = read.csv('./data/education.csv')
mergedData <- merge(gdp, education, by.x='V1', by.y='CountryCode')
# No need to remove unranked countries because we specified nrows
# No need to convert V2 from factor to numeric
sortedMergedData = arrange(mergedData, -V2)
sortedMergedData[13,1] # Get KNA, correct answer
B:
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FGDP.csv", destfile = "./data/gdp.csv", method = "curl" )
gdp <- read.csv('./data/gdp.csv', header=F, skip=5) # Don't specify nrows, get incorrect answer
download.file("https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FEDSTATS_Country.csv", destfile = "./data/education.csv", method = "curl" )
education = read.csv('./data/education.csv')
mergedData <- merge(gdp, education, by.x='V1', by.y='CountryCode')
mergedData = mergedData[which(mergedData$V2 != ""),] # Remove unranked countries
mergedData$V2 = as.numeric(mergedData$V2) # make V2 a numeric column
sortedMergedData = arrange(mergedData, -V2)
sortedMergedData[13,1] # Get SRB, incorrect answer
I would think the two code snippets would be identical, except that in A you never add the unranked countries to your dataframe and in B you add them but then remove them. Why is the sorting different for these two code snippets?
The file downloads are from Coursera's Getting and Cleaning Data class (Quiz 3, Question 3).
Edit: To avoid security concerns, I've pasted the raw .csv files below
gdp.csv - http://pastebin.com/raw.php?i=4aRZwBRd
education.csv - http://pastebin.com/raw.php?i=0pbhDCSX
Edit2: The problem is occurring in the as.numeric step. For case B, here is mergedData$V2 before and after mergedData$V2 = as.numeric(mergedData$V2) is applied:
> mergedData$V2
[1] 161 105 60 125 32 26 133 172 12 27 68 162 25 140 128 59 76 93
[19] 138 111 69 169 149 96 7 153 113 167 117 165 11 20 36 2 99 98
[37] 121 30 182 166 81 67 102 51 4 183 33 72 48 64 38 159 13 103
[55] 85 43 155 5 185 109 6 114 86 148 175 176 110 42 178 77 160 37
[73] 108 71 139 58 16 10 46 22 47 122 40 9 116 92 3 50 87 145
[91] 120 189 178 15 146 56 136 83 168 171 70 163 84 74 94 82 62 147
[109] 141 132 164 14 188 135 129 137 151 130 118 154 127 152 34 123 144 39
[127] 126 18 23 107 55 66 44 89 49 41 187 115 24 61 45 97 54 52
[145] 8 142 19 73 119 35 174 157 100 88 186 150 63 80 21 158 173 65
[163] 124 156 31 143 91 170 184 101 79 17 190 95 106 53 78 1 75 180
[181] 29 57 177 181 90 28 112 104 134
194 Levels: .. Not available. 1 10 100 101 102 103 104 105 106 107 ... Note: Rankings include only those economies with confirmed GDP estimates. Figures in italics are for 2011 or 2010.
> mergedData$V2 = as.numeric(mergedData$V2)
> mergedData$V2
[1] 72 10 149 32 118 111 41 84 26 112 157 73 110 49 35 147 166 185
[19] 46 17 158 80 58 188 159 63 19 78 23 76 15 105 122 104 191 190
[37] 28 116 94 77 172 156 7 139 126 95 119 162 135 153 124 69 37 8
[55] 176 130 65 137 97 14 148 20 177 57 87 88 16 129 90 167 71 123
[73] 13 161 47 146 70 4 133 107 134 29 127 181 22 184 115 138 178 54
[91] 27 101 90 59 55 144 44 174 79 83 160 74 175 164 186 173 151 56
[109] 50 40 75 48 100 43 36 45 61 38 24 64 34 62 120 30 53 125
[127] 33 91 108 12 143 155 131 180 136 128 99 21 109 150 132 189 142 140
[145] 170 51 102 163 25 121 86 67 5 179 98 60 152 171 106 68 85 154
[163] 31 66 117 52 183 82 96 6 169 81 103 187 11 141 168 3 165 92
[181] 114 145 89 93 182 113 18 9 42
Can anyone explain why the numbers change when I apply as.numeric()?
The real reason for getting different results are in the second case i.e. the full dataset have some footer notes, which were also read with the read.csv resulting in most of the columns to be 'factor' class because of the 'character' elements in the footer. This could have avoided either by
skipping the last few lines using skip argument in read.csv
using stringsAsFactors=FALSE in the read.csv call along with skipping the lines.
The columns were ordered based on the "levels" of the factor.
If you have already read the files without skipping the lines, convert to the respective classes. If it is 'numeric' column, convert it to numeric by as.numeric(as.character(df$column)) or as.numeric(levels(df$column))[df$column].

Return id numbers if missing over a set of variables

If I have a large database, including an 'id' var, I want to list all variables of interest, and return back to myself a list of ids that are missing each particular variable.
#Fake Data:
set.seed(11100)
missdata<-data.frame(id<-1:1000,C1<-sample(c(1,NA),1000,replace=TRUE,prob=c(.8,.2)), C2<-sample(c(1,NA),1000,replace=TRUE,prob=c(.8,.2)))
names(missdata)<-c("id","v1","v2")
#One variable solution:
missdatatest<-subset(missdata, is.na(v1),select=id)
missdatatest[1:10,]
> missdatatest[1:10,]
[1] 5 30 44 47 48 49 57 65 68 74
#Looking to build a function...
FindMissings<-function(indata,varslist,printvar){
printonevar<-function(var){
missdatalist<-subset(indata, is.na(var),select=printvar)
print(missdatalist)
}
lapply(vars,printonevar)
}
#Run function:
vars<-c("v1","v2")
FindMissings(missdata,vars,id)
#Error:
> FindMissings(missdata,vars,id)
Error in `[.data.frame`(x, r, vars, drop = drop) : undefined columns selected
Any help would be appreciated. I originally wrote a function to do this in SAS, and it works perfectly fine, but I'm trying to move a lot of my work into R.
There's no need for such a function. Just use lapply:
> lapply(missdata[-1], function(x) which(is.na(x)))
$v1
[1] 5 30 44 47 48 49 57 65 68 74 89 103 107 110 115 119 152 167
[19] 175 176 194 197 199 202 204 212 215 223 231 232 233 239 245 280 281 293...
<<SNIP>>
$v2
[1] 3 6 18 19 22 23 27 28 33 38 41 50 51 55 60 66 68 77
[19] 81 84 86 96 97 99 109 116 117 134 139 141 143 146 148 153 165 168...
<<SNIP>>
If you specifically wanted to return the values from your "id" column (not just the position of the NA values), you can modify the statement to be:
lapply(missdata[-1], function(x) missdata$id[which(is.na(x))])
If your concern is how to use this approach for specific variables, it's pretty straightforward:
vars <- c("v1","v2")
lapply(missdata[vars], function(x) which(is.na(x)))

Create a for loop which prints every number that is x%%3=0 between 1-200

Like the title says I need a for loop which will write every number from 1 to 200 that is evenly divided by 3.
Every other method posted so far generates the 1:200 vector then throws away two thirds of it. What a waste. In an attempt to be eco-conscious, this method does not waste any electrons:
seq(3,200,by=3)
You don't need a for loop, use match function instead, as in:
which(1:200 %% 3 == 0)
[1] 3 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54 57 60 63 66 69 72 75 78 81
[28] 84 87 90 93 96 99 102 105 108 111 114 117 120 123 126 129 132 135 138 141 144 147 150 153 156 159 162
[55] 165 168 171 174 177 180 183 186 189 192 195 198
Two other alternatives:
c(1:200)[c(F, F, T)]
c(1:200)[1:200 %% 3 == 0]

Resources