Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
tamiwiki:internal:networks:tami_sre [2023/03/05 23:43] – 444b | tamiwiki:internal:networks:tami_sre [2023/05/26 21:51] (current) – removed corshunov | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== TAMI Site Reliability Engineering ====== | ||
- | This page details our efforts to keep the systems online reliable | ||
- | ---- | ||
- | Currently, we have a Raspberry pi in the space that runs a realtime Status webpage from UptimeRobot | ||
- | The Status page of our Services is [[https:// | ||
- | |||
- | |||
- | |||
- | |||
- | ===== Troubleshooting steps ===== | ||
- | The first step is to identify the nature of the root cause and whether it is related to the network or the infrastructure. | ||
- | Use the [https:// | ||
- | * If everything is down(including Tamis IP, 82.80.54.64), | ||
- | * If the IP address of Tami is reachable (Ping and Telnet), but the yunohost services are down, its likely just an infra issue | ||
- | * In case the stuff is still not functioning after trying all these steps, you should reach out to someone from the [[tamiwiki/ | ||
- | |||
- | ==== Network ==== | ||
- | Relevant Link: [[tamiwiki: | ||
- | |||
- | ==== Infra ==== | ||
- | Relevant Link: [[tamiwiki: | ||
- | === If there is an issue with a single service === | ||
- | * The first step is to see if you can log into [[https:// | ||
- | * Then check the service at [[https:// | ||
- | * Review the logs, restart the service if necessary and maybe share logs with yunopast into a relevant group in tamis communication channel | ||
- | === If there is an issue with a multiple services === | ||
- | * Attempt the steps above for each services but if its all services, it might be something related to yunohost or the device it is running on | ||
- | * Try to ssh into yunohost. The password is your yunohost SSO password | ||
- | * ssh < | ||
- | * Check out the output from the following services | ||
- | * sudo systemctl status nginx.service (for website issues) | ||
- | * sudo systemctl status mautrix_telegram.service (for telegram bridge issues) | ||
- | * For any errors or for any other reason, try restart the service if it is already broken) | ||
- | * sudo systemctl restart < | ||
- | * Failing this, try look for more logs. Look up any error messages and go down the rabbit holes | ||
- | * sudo journalctl -u < | ||
- | |||
- | |||
- | |||
- | |||
- | ---- | ||
- | QR Code for page, do not delete | ||
- | |||
- | {{: |