I cant find the error in this code, Im looking at it for hours... Valgrind says:
==23114== Invalid read of size 1
==23114== Invalid write of size 1
I tried debugging with some printfs, and i think that the error is in this function.
void rdm_hide(char *name, Byte* img, Byte* bits, int msg, int n, int size)
{
FILE *fp;
int r;/
Byte* used;
int i = 0, j = 0;
int p;
fp = fopen(name, "wb");
used = malloc(sizeof(Byte) * msg);
for(i = 0; i < msg; i++)
used[i] = -1;
while(i < 3)
{
if(img[j] == '\n')
i++;
j++;
}
for(i = 0; i < msg; i++)
{
r = genrand_int32();
p = r % n;
if(!search(p, used, msg))
{
used[i] = (Byte)p;
if(bits[i] == (Byte)0)
img[j + p] = img[j + p] & (~1);
else if(bits[i] == (Byte)1)
img[j + p] = img[j + p] | 1;
}
else
i --;
}
for(i = 0; i < size; i++)
fputc( (char) img[i], fp);
fclose(fp);
free(used);
}
Thanks for help!
==23114== Invalid read of size 1
==23114== Invalid write of size 1
I am pretty sure that's not all valgrind says.
You should
Build your program with debug info (most likely -g flag). This will let valgrind tell you exactly which line triggers invalid read and write
If the problem doesn't become obvious, edit your question and include entire valgrind output.
Re-running valgrind --track-origins=yes your-exe may provide additional useful info.
Lastly, your algorithm appears to be totally bogus. As far as I can tell, the j becomes 3 after the first while loop and never changes after that (in which case you should just use const int j = 3; and do away with j++). Also, you reference img[j + p], where p is between 0 and n. If n is indeed the size of img, then it's little surprise that j + p indexes outside of the img limits, and triggers both errors.
Related
I am learning programming on my own at home. I'm interested in C right now and so, this is my first question. I have been trying to figure out why my code is not printing the right answer but I'm not sure what is wrong. I will attach the question and my code below. If some could help me figure out what I am doing wrong I'd be really grateful.
"Write a program that takes a string and displays it, replacing each of its
letters by the letter 13 spaces ahead in alphabetical order.
'z' becomes 'm' and 'Z' becomes 'M'. Case remains unaffected.
The output will be followed by a newline.
If the number of arguments is not 1, the program displays a newline."
I'm using command line arguments to read in a string "My horse is Amazing." and the expected output should be "Zl ubefr vf Nznmvat." but I am getting this as my output "Zå ubeÇr vÇ Nznçvat."
This is my code:
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv[]){
char ch, str[100], newStr[100];
int i, len;
if(argc != 2){
printf("\n");
return (-1);
}
strcpy(str, argv[1]);
len = strlen(str);
printf("%s %d\n\n", str, len);
for (i = 0; i < len; i++)
{
ch = str[i];
if ((ch >= 65) && (ch <= 90)){
ch = ch + 13;
if(ch > 90){
ch = ch - 90;
ch = ch + 64;
}
}else if ((ch >= 97) && (ch <= 122)){
ch = ch + 13;
if(ch > 122){
ch = ch - 122;
ch = ch + 96;
}
}
newStr[i] = ch;
}
printf("%s \n", newStr);
return 0;
}
ch is a signed 1-byte variable, meaning it can only represent values between -128 and 127. Adding 13 to y results in a value outside of that range.
Below is a piece of C code run from R used to compare each row of a matrix to a vector. The number of identical values is stored in the first column of a two-column matrix.
I know it can easily be done in R (as done to check the results), but this is a first step for a more complex use case.
When openmp is not used, it works ok. When openmp is used, it give correlated (0.99) but inconsistent results.
Question1: What am I doing wrong?
Question2: I use a double for loop to fill the output matrix (ret) with zeros. What would be a better solution?
Also, inconsistencies were observed when the code was used in a package. I tried to make the code reproducible using inline, but it does not recognize the openmp statements (I tried to include 'omp.h', in the parameters of cfunction, ...).
Question3: How can we make this code work with inline?
I'm (too?) far outside my comfort zone on this topic.
library(inline)
compare <- cfunction(c(x = "integer", vec = "integer"), "
const int I = nrows(x), J = ncols(x);
SEXP ret;
PROTECT(ret = allocMatrix(INTSXP, I, 2));
int *ptx = INTEGER(x), *ptvec = INTEGER(vec), *ptret = INTEGER(ret);
for (int i=0; i<I; i++)
for (int j=0; j<2; j++)
ptret[j * I + i] = 0;
int i, j;
#pragma omp parallel for default(none) shared(ptx, ptvec, ptret) private(i,j)
for (j=0; j<J; j++)
for (i=0; i<I; i++)
if (ptx[i + I * j] == ptvec[j]) {++ptret[i];}
UNPROTECT(1);
return ret;
")
N = 3e3
M = 1e4
m = matrix(sample(c(-1:1), N*M, replace = TRUE), nc = M)
v = sample(-1:1, M, replace = TRUE)
cc = compare(m, v)
cr = rowSums(t(t(m) == v))
all.equal(cc[,1], cr)
Thanks to the comments above, I reconsidered the data race issue.
IIUC, my loop was parallelized on j (the columns). Then, each thread had its own value of i (the rows), but possible identical values across threads, that were then trying to increment ptret[i] at the same time.
To avoid this, I now loop on i first, so that only a single thread will increment each row.
Then, I realized that I could move the zero-initialization of ptret within the first loop.
It seems to work. I get identical results, increased CPU usage, and 3-4x speedup on my laptop.
I guess that solves questions 1 and 2. I will have a closer look at the inline/openmp problem.
Code below, fwiw.
#include <omp.h>
#include <R.h>
#include <Rinternals.h>
#include <stdio.h>
SEXP c_compare(SEXP x, SEXP vec)
{
const int I = nrows(x), J = ncols(x);
SEXP ret;
PROTECT(ret = allocMatrix(INTSXP, I, 2));
int *ptx = INTEGER(x), *ptvec = INTEGER(vec), *ptret = INTEGER(ret);
int i, j;
#pragma omp parallel for default(none) shared(ptx, ptvec, ptret) private(i, j)
for (i = 0; i < I; i++) {
// init ptret to zero
ptret[i] = 0;
ptret[I + i] = 0;
for (j = 0; j < J; j++)
if (ptx[i + I * j] == ptvec[j]) {
++ptret[i];
}
}
UNPROTECT(1);
return ret;
}
I am trying to write an OpenCL kernel that uses OpenCL pipes. The kernel code is given below.
uint tid = get_global_id(0);
uint numWorkItems = get_global_size(0);
int i;
int rid;
int temp = 0, temp1 = 0;
int val;
int szgr = get_local_size(0);
int lid = get_local_id(0);
for(i = tid + start_index; i < rLen; i = i + numWorkItems){
temp = 0;
val = input[i];
temp = hashTable[val - 1];
if(temp){
temp1 = projection[val - 1];
}
reserve_id_t rid1 = work_group_reserve_write_pipe(c0, szgr);
while(is_valid_reserve_id(rid1) == false){
rid1 = work_group_reserve_write_pipe(c0, szgr);
}
if(is_valid_reserve_id(rid1))
{
write_pipe(c0,rid1,lid, &temp);
work_group_commit_write_pipe(c0, rid1);
}
reserve_id_t rid2 = work_group_reserve_write_pipe(c1, szgr);
while(is_valid_reserve_id(rid2) == false){
rid2 = work_group_reserve_write_pipe(c1, szgr);
}
if(is_valid_reserve_id(rid2))
{
write_pipe(c1,rid2,lid, &temp1);
work_group_commit_write_pipe(c1, rid2);
}
}
But the work_group_reserve_write_pipe function always fails and because of this the kernels hangs at the while loop. If I remove this while loop then the code doesnt hang but writing to the pipe doesnt happen. Can someone tell me why this is happening?
The pipe is declared as a _write_only pipe.
About work_group_reserve_write_pipe:
This built-in function must be encountered by all work-items in a
work-group executing the kernel with the same argument values;
otherwise the behavior is undefined.
the loop starts from tid + start_index so after some loop iterations, some work items doesn't hit this instruction. Also a while loop is doing same undefined behaviour.
This code is for problem DIGJUMP.
It gives me correct output for all inputs i have tried (i have tried a lot of them). But the problem is that it is getting TLE while submitting it on codechef. I checked the editorial and the same solution (concept-wise) gets accepted, so it means algorithmic approach is correct. I must have something wrong in the implementation.
I tried it for a long time, but could not figure out what is wrong.
#include <string.h>
#include <vector>
#include <queue>
#include <stdio.h>
using namespace std;
class Node
{
public:
int idx, steps;
};
int main()
{
char str[100001];
scanf("%s", str);
int len = strlen(str);
vector<int> adj[10];
for(int i = 0; i < len; i++)
adj[str[i] - '0'].push_back(i);
int idx, chi, size, steps;
Node tmpn;
tmpn.idx = 0;
tmpn.steps = 0;
queue<Node> que;
que.push(tmpn);
bool * visited = new bool[len];
for(int i = 0; i < len; i++)
visited[i] = false;
while(!que.empty())
{
tmpn = que.front();
que.pop();
idx = tmpn.idx;
steps = tmpn.steps;
chi = str[idx] - '0';
if(visited[idx])
continue;
visited[idx] = true;
if(idx == len - 1)
{
printf("%d\n", tmpn.steps);
return 0;
}
if(visited[idx + 1] == false)
{
tmpn.idx = idx + 1;
tmpn.steps = steps + 1;
que.push(tmpn);
}
if(idx > 0 && visited[idx - 1] == false)
{
tmpn.idx = idx - 1;
tmpn.steps = steps + 1;
que.push(tmpn);
}
size = adj[chi].size();
for(int j = 0; j < size; j++)
{
if(visited[adj[chi][j]] == false)
{
tmpn.idx = adj[chi][j];
tmpn.steps = steps + 1;
que.push(tmpn);
}
}
}
return 0;
}
This solution won't finish in acceptable time for the problem. Remember that BFS is O(E). In a string with O(n) digits of some kind there are O(n^2) edges between those digits. For N=10^5 O(N^2) is too much.
This will need some optimizations like if we came to current node from a similar node, we wont skip further to similar nodes.
I am trying to properly use 'cblas_dtrsv' however I do not get right output and I do not know why. Here is my example(dtrsv_example.c)
#include <stdio.h>
#include <stdlib.h>
#include "mkl.h"
int main()
{
double *A, *b, *x;
int m, n, k, i, j;
m = 4, k = 4, n = 4;
printf (" Allocating memory for matrices aligned on 64-byte boundary for better \n"
" performance \n\n");
A = (double *)mkl_malloc( m*k*sizeof( double ), 64 );
b = (double *)mkl_malloc( n*sizeof( double ), 64 );
x = (double *)mkl_malloc( n*sizeof( double ), 64 );
if (A == NULL || b == NULL || x == NULL) {
printf( "\n ERROR: Can't allocate memory for matrices. Aborting... \n\n");
mkl_free(A);
mkl_free(b);
mkl_free(x);
return 1;
}
A[0] = 11;
for (i = 0; i < m; i++) {
for (j = 0; j <= i; j++) {
A[j + i*m] = (double)(j+i*m);
}
}
for (i = 0; i < n; i++) {
x[i] = (i+1)*5.0;
}
printf ("\n Computations completed.\n\n");
printf ("\n Result x: \n");
for (j = 0; j < n; j++) {
printf ("%f\n", x[i]);
}
printf ("\n Deallocating memory \n\n");
mkl_free(A);
mkl_free(b);
mkl_free(x);
printf (" Example completed. \n\n");
return 0;
}
Compilation seems fine:
icc -c -Wall -c -o dtrsv_example.o dtrsv_example.c
icc dtrsv_example.o -o dtrsv_example -L/opt/intel/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_sequential -lpthread -lm
However, I get the wrong result:
./dtrsv_example
Computations completed.
Result x:
0.000000
0.000000
0.000000
0.000000
Deallocating memory
Example completed.
Any ideas of what I might be doing wrong here?
Even though I thought I had carefully checked it, after a break I realized of my beginner mistake:
for (j = 0; j < n; j++) {
printf ("%f\n", x[i]);
}
it should be x[j] instead!
Hopefully other people can use my example to understand how the cblas_dtrsv interface is used.