Looking at the queries being processed by the postgres database:
select * from pg_stat_activity where state not like '%idle%' and waiting=false order by query_start;Will basically show you all connections that are currently doing work (not idle) and queries that are not waiting to be completed. This will give you a good idea of what queries are taking up a lot of your compute/IO time.
You'll get the following columns to view:
datid datname pid usesysid usename application_name client_addr client_hostname client_port backend_start xact_start query_start state_change waiting state query
Killing rogue processes:
SELECT pg_cancel_backend(pid);Using the above command you can kill processes from the database perspective that you may find from pg_stat_activity
select * from locks where not granted;Will find all queries in the database where locks have not yet been granted. If this continuously increments you may have a locking query that is not releasing its lock causing a deadlock.
Showing queries causing the locks as well as the locks themselves
SELECT w.current_query AS waiting_query, w.procpid AS w_pid, w.usename AS w_user, w.query_start AS query_start, l.current_query AS locking_query, l.procpid AS l_pid, l.usename AS l_user, t.schemaname || '.' || t.relname AS tablename FROM pg_stat_activity w JOIN pg_locks l1 ON w.procpid = l1.pid AND NOT l1.granted JOIN pg_locks l2 ON l1.relation = l2.relation AND l2.granted JOIN pg_stat_activity l ON l2.pid = l.procpid JOIN pg_stat_user_tables t ON l1.relation = t.relid WHERE w.waiting ORDER BY w.query_start;
Finding Web Based reports causing high IO:
Every apache server has a server-status page that displays active processes as well as how long/how much CPU time these services are using. You can track down rogue queries from here.
Tracking down connections:
To track down connections its a good idea to know how to use netstat or sockstat. If you see a connection coming in from a certain port you can remote into that machine and run one of the following to trace down the PID using the socket
Linux: netstat -pan | grep (socket#) FreeBSD: socket | grep (socket#)From those commands you will eventually find what PID is tied to the connection and can use
psto find the full command and kill it
ps auxwww | grep (pid)
Easiest tools to figure out which processes and which drives are impacted by high disk io are
iotop- shows processes using the highest amount of disk IO
iostat- shows which partitions/disks are using the highest amount of io. Use the -x option for extended stats