RISC-V Examples Using Pointers - pointers

I have an engineering assignment where I have to use RISC-V to decide whether or not the first character of a string is capitalized. I'm having trouble finding the first character of my string. (I commented out my if condition)
enterString: .string "Enter a string: "
lowerCaseResult: .string "The first character is a lower-case letter\n"
upperCaseResult: .string "The first character is not a lower-case letter\n"
string: .space 82
li a7, 4
la a0, enterString
li a7, 8
addi a1, zero, 20
la a0, string
la s1, string
sb a1, 0(s1)
mv a0, a1
li a7, 4
#slt t0, a0,zero
##beq t0, zero, secondCondition
li a7, 10
li a7, 4
la a0, lowerCaseResult
li a7, 10
li a7, 4
la a0, upperCaseResult
li a7, 10
Any... pointers?


Recursive program in RISC-V (procedure call)

I'm trying to implement the recursive equation here, but I have no idea what's going wrong with my code.
It keeps output 1 no matter what input I gave.
The function is as follows:
T(n) = T(n-100) + 2*T(n/2) + 5 if n > 1
= 1 otherwise
.globl __start
# read from standard input a0 = n
addi a0, zero, 5
# Recursive function #
# T(n) = T(n-100)+2*T((n/2))+5 if n>1 #
# = 1 otherwise #
addi s2,x0,1 #for base case
addi s3,x0,2 #for comparison
blt a0,s3,result
addi a2,a2,5
addi a3,a3,9
addi sp,sp,-8 #store 2 registers into stack
sw x1,4(sp)
sw a0,0(sp)
bge a0,s3,L1
bge a0,s3,L2
addi s2,x0,1
addi sp,sp,8
jal x0,result
L1: #to do T(n-100)
addi a0,a0,-100
jal x1,recur
addi x6,x10,0 #T(n-100) stored in x6
lw x10,0(sp)
lw x1,4(sp)
addi sp,sp,8
jalr x0,0(x1)
L2: #to do T(n/2)
div a0,a0,s3
jal x1,recur
addi x7,x10,0 #T(n/2) stored in x7
lw a0,0(sp)
lw x1,4(sp)
addi sp,sp,8
mul x7,x7,s3
add s2,x6,x7
addi s2,s2,5
jalr x0,0(x1)
# prints the result in s2
addi a0, zero, 1
addi a1, s2, 0
# ends the program with status code 0
addi a0, zero, 10

RISC-V Recursive Factorial Function Debugging

Im trying to create a recursive factorial function in RISCV but having some problems.
Here's what we have so far:
.globl factorial
n: .word 8
la t0, n
lw a0, 0(t0)
jal ra, factorial
addi a1, a0, 0
addi a0, x0, 1
ecall # Print Result
addi a1, x0, '\n'
addi a0, x0, 11
ecall # Print newline
addi a0, x0, 10
ecall # Exit
la t1, n
beq x0, t1, finish
addi t0, t1, -1
mul a0, t0, a0
j factorial
We tried adding and changing around the registers to use, but its still not loading the correct values to the correct registers. We're also kinda stuck on how to do this recursively. Would love some help!
Your main code looks fine. All of the issues I see are in the factorial function. First, there are four clear issues with your factorial function:
# This loads the address of n not the value at label n
# You need to additionally lw t1, 0(t1) to get the value
la t1, n
# t1 is never getting modified so why would this loop ever terminate?
beq x0, t1, finish
# You should do these two operations in the opposite order
# if t1 = 1, a0 would become 0
addi t0, t1, -1
mul a0, t0, a0
j factorial
# Why ecall here? You have already returned. This is unreachable.
However, you can't just fix those and expect it to work. Your current implementation is lacking a plan of how to actually compute the factorial. I assume you were trying to make an implementation like the following:
int factorial_recursive(int n) {
if (n == 0) {
return 1;
int recursive = factorial_recursive(n-1);
return n * recursive;
A direct translation of that C code would need to use the stack to save n and the return address and properly follow calling convention. I am not prepared to write out a complete explanation of that though, so I will explain how to convert the looping version of factorial to get you started in the right direction.
The C code I will implement is RISC-V assembly:
int factorial_loop(int n) {
int out = 1;
while (n > 0) {
out *= n;
n -= 1;
return out;
For this code, n will start out in a0, but eventually it will need to be moved out so we can return out so we will allocate our registers so the function will look like:
int factorial_loop(int a0) {
int a1 = 1;
while (a0 > 0) {
a1 *= a0;
a0 -= 1;
a0 = a1;
return a0;
From here it is pretty easy to do a direct conversion.
li a1, 1 # int a1 = 1;
beq a0, x0, finish # while (a0 > 0) {
mul a1, a1, a0 # a1 *= a0;
addi a0, a0, -1 # a0 -= 1;
j loop # }
mv a0, a1 # a0 = a1;
ret # return a0;

The program returns error "attempt to execute non-instruction at 0x00000000"

I have been given HW regarding RISC-V.
The tasks is to solve the recurring equation T(n) = 2T(n/2)+n if the n or input is >=2, otherwise it returns 1. I have tried to create the solution code but it keeps giving me the (error) attempt to execute non-instruction at 0x00000000. Can someone please tell me where is my mistake and how to fix it?
Thank you for your time!
Notes: I can only starts to write the code from the "Write your recursive code here...."
.globl __start
msg_input: .string "Enter a number: "
msg_result: .string "The result is: "
newline: .string "\n"
# prints msg_input
li a0, 4
la a1, msg_input
# read from standard input
li a0, 5
# write your recursive code here, input is in a0, store the result(integer type) to t0
jal findsum
li t0, 2 #t0==2
blt a0, t0, L1 #if n<2 return 1
addi sp, sp, -8 #reserve stack area
sw ra, 0(sp) #save return address
sw a0, 4(sp) #save input
li t0, 2 #t0==2
div a0, a0, t0 #n=n/2
jal findsum #call findsum(n/2)
li t0, 2 #t0=2
mul a1, t0, a1 #a1=2*FindSum(n/2)
addi a1, a1, 2 #a1=2*FindSum(n/2)+2
j done
li a1, 1
lw ra, 0(sp)
addi sp, sp, 8
jr ra
# prints msg_result
li a0, 4
la a1, msg_result
# prints the result in t0
li a0, 1
mv a1, t0
# ends the program with status code 0
li a0, 10

"(error) attempting to write to an invalid memory address" When trying to store stack pointer

I'm trying to learn RISC-V under the Jupiter environment (risc32) and I came across a problem asking me to write a recursive program with RISC-V. I can't seem to get the sw instruction to work, as it always gives an error: invalid address
I've tried different offsets, different registers etc. nothing seems to work
.globl __start
msg_input: .string "Enter a number: "
msg_result: .string "The result is: "
newline: .string "\n"
# prints msg_input
li a0, 4
la a1, msg_input
#read from standard input
li a0, 5
#initialize stack
addi x31, x0, 2
addi sp, x0, 800
mv x5, a0
jal x1, recfunc
mv t0, x5
addi sp, sp, -8
sw x1, 0(sp)
bge x5, x31, true
lw x1, 0(sp)
addi x10, x0, 1
addi sp,sp, 8
jalr x0, 0(x1)
div x5, x5, x31
jal x1, recfunc
lw x1, 0(sp)
addi sp,sp,8
mul x10, x10, x31
addi x10, x10, 1
jalr x0, 0(x1)
#prints msg_result
li a0, 4
la a1 msg_result
#prints the result in t0
li a0, 1
mv a1, t0
#ends the program with status code 0
li x5, 10
Error occurs at:
sw x1, 0 (x2)
(error) attempting to write to an invalid memory address 0x00000318

Performance degrade while using alternative for Intel intrinsics SSSE3

I am developing a performance critical application which has to be ported into Intel Atom processor which just supports MMX, SSE, SSE2 and SSE3. My previous application had support for SSSE3 as well as AVX now I want to downgrade it to Intel Atom processor(MMX, SSE, SSE2, SSE3).
There is a serious performance downgrade when I replace ssse3 instruction particularly _mm_hadd_epi16 with this code
RegTemp1 = _mm_setr_epi16(RegtempRes1.m128i_i16[0], RegtempRes1.m128i_i16[2],
RegtempRes1.m128i_i16[4], RegtempRes1.m128i_i16[6],
Regfilter.m128i_i16[0], Regfilter.m128i_i16[2],
Regfilter.m128i_i16[4], Regfilter.m128i_i16[6]);
RegTemp2 = _mm_setr_epi16(RegtempRes1.m128i_i16[1], RegtempRes1.m128i_i16[3],
RegtempRes1.m128i_i16[5], RegtempRes1.m128i_i16[7],
Regfilter.m128i_i16[1], Regfilter.m128i_i16[3],
Regfilter.m128i_i16[5], Regfilter.m128i_i16[7]);
RegtempRes1 = _mm_add_epi16(RegTemp1, RegTemp2);
This is the best conversion I was able to come up with for this particular instruction. But this change has seriously affected the performance of the entire program.
Can anyone please suggest a better performance efficient alternative within MMX, SSE, SSE2 and SSE3 instructions to the _mm_hadd_epi16 instruction. Thanks in advance.
_mm_hadd_epi16(a, b) can be simulated with the following code:
/* (b3, a3, b2, a2, b1, a1, b0, a0) */
__m128i ab0 = _mm_unpacklo_epi16(a, b);
/* (b7, a7, b6, a6, b5, a5, b4, a4) */
__m128i ba0 = _mm_unpackhi_epi16(a, b);
/* (b5, b1, a5, a1, b4, b0, a4, a0) */
__m128i ab1 = _mm_unpacklo_epi16(ab0, ba0);
/* (b7, b3, a7, a3, b6, b2, a6, a2) */
__m128i ba1 = _mm_unpackhi_epi16(ab0, ba0);
/* (b6, b4, b2, b0, a6, a4, a2, a0) */
__m128i ab2 = _mm_unpacklo_epi16(ab1, ba1);
/* (b7, b5, b3, b1, a7, a5, a3, a1) */
__m128i ba2 = _mm_unpackhi_epi16(ab1, ba1);
/* (b6+b7, b4+b5, b2+b3, b0+b1, a6+a7, a4+a5, a2+a3, a0+a1) */
__m128i c = _mm_add_epi16(ab2, ba2);
If your goal is to take the horizontal sum of 8 16-bit values you can do this with SSE2 like this:
__m128i sum1 = _mm_shuffle_epi32(a,0x0E); // 4 high elements
__m128i sum2 = _mm_add_epi16(a,sum1); // 4 sums
__m128i sum3 = _mm_shuffle_epi32(sum2,0x01); // 2 high elements
__m128i sum4 = _mm_add_epi16(sum2,sum3); // 2 sums
__m128i sum5 = _mm_shufflelo_epi16(sum4,0x01); // 1 high element
__m128i sum6 = _mm_add_epi16(sum4,sum5); // 1 sum
int16_t sum7 = _mm_cvtsi128_si32(sum6); // 16 bit sum
