* fix(sac): make temperature a property to fix checkpoint resume bug
Temperature was stored as a plain float and not restored after loading
a checkpoint, causing incorrect loss computations until update_temperature()
was called. Changed to a property that always computes from log_alpha,
ensuring correct behavior after checkpoint loading.
* simplify docstrings
* chore: replace hard-coded 'action' values with constants throughout all the source code
* chore(tests): replace hard-coded action values with constants throughout all the test code
* chore(rl): move rl related code to its directory at top level
* chore(style): apply pre-commit to renamed headers
* test(rl): fix rl imports
* docs(rl): update rl headers doc