Will we fix our server problems by 7pm PST Thursday Oct 17th?
89
20kṀ110k
resolved Oct 18
Resolved
NO

11am Monday (oct 14th) state of affairs:

  • Things started to deteriorate the night of oct 9th (PST)

  • api latency slowly builds, leading to site outages every few hours

  • redeploying the api fixes things for a time

  • Local host works fine during the outage

  • simple queries start to take forever, filling up the connection pool

    So far we've :

  • Set up a read replica that the front end uses when querying directly from the db with the supabase js library

  • updated our backend pg promise version

  • checked open connections on api (1.3k), total ingress/egress bytes, total api query numbers, memory usage (10%), all are normal.

  • Reverted suspicious-looking commits over the past few days

  • Increased, then decreased pg pool size

  • Moved some requests from api to the supabase js client that talked directly to the db's load balancer

  • Discovered that our server's CPU usage increases until it hits 100% of the single core running node's capabilty, and this coincides with our server's latency spikes.

  • Opened a PR to integrate datadog into our server to get some visibility into what is causing the cpu spikes.


Typical API stats during an outage:

DB stats during an example outage:

I'm happy to provide more info, stats, etc.

repo: https://github.com/manifoldmarkets/manifold
previous market: https://manifold.markets/ian/will-we-fix-our-database-problems-b?play=true

This resolves as NO if we haven't fixed the underlying problem, i.e. even if we have a cron job restarting the server every hour

Get
Ṁ1,000
to start trading!

🏅 Top traders

#NameTotal profit
1Ṁ5,754
2Ṁ4,343
3Ṁ3,295
4Ṁ1,748
5Ṁ1,340
© Manifold Markets, Inc.TermsPrivacy