Logging errors and getting them Subscribe Pub Share

While detailing the post on my setup for Rapid Play Framework Development I missed one of the scripts i use.

I do keep an eye on the websites, quite obviously, and I get notified of any error via an email. When such an email comes however, the first thing you want to do is take a look at the latest log file.

Enter the getlog script, which will find the last log file, zip it, copy it to my laptop and then view it with less:

ssh vm3.coolscala.com "cd ~/app/logs; gzip -c log.log >log.log.gz"
scp vm3.coolscala.com:app/logs/log.log.gz log.log.gz
ssh vm3.coolscala.com "cd ~/app/logs; rm log.log.gz"
gunzip -f log.log.gz
less log.log

The email is something you can enable in Global, like so:

/** customize some global handling errors */
object Global extends WithFilters(LoggingFilter) {

  override def onError(request: RequestHeader, ex: Throwable) = {
    val m = ("ERR_onError", "Current count: " + lastErrorCount + " Request:" + request.toString, "headers:" + request.headers, "ex:" + ex.toString).toString
        admin.SendEmail.withSession { implicit mailSession =>
          Emailer.tellRaz("ERR_onError",
            currentUser.map(_.userName).mkString, m)
       //...
      }
    }

Sometimes you will see a flurry of emails, say whlie reloading the database or due to some network connectivity (yes, looking at you, Digital O) so I wrote a quick backoff scheme to stop my inbox from being flooded:

  • it will not send more than one error in a 5 minutes interval
  • it will not send more than 5 emails in 6 hours
  • quite arbitrary numbers :)
/** customize some global handling errors */
object Global extends WithFilters(LoggingFilter) {
  // EMAIL BACKOFF stuff
  val ERR_DELTA1 = 5 * 60 * 1000 // 5 min
  val ERR_DELTA2 = 6 * 60 * 60 * 1000 // 6 hours
  val ERR_EMAILS = 5 // per DELTA2
  var errEmails = 0 // sent per DELTA2
  var lastErrorTime = System.currentTimeMillis - ERR_DELTA1 // time last error email went out - just one every 5 min, eh
  var firstErrorTime = System.currentTimeMillis - ERR_DELTA2 // time first error email went out - just one every 5 min, eh
  var lastErrorCount = 0 // time last error email went out - just one every 5 min, eh

  override def onError(request: RequestHeader, ex: Throwable) = {
    Audit.logdb("ERR_onError", "request:" + request.toString, "headers:" + request.headers, "ex:" + ex.toString)
    val m = ("ERR_onError", "Current count: " + lastErrorCount + " Request:" + request.toString, "headers:" + request.headers, "ex:" + ex.toString).toString
    if (System.currentTimeMillis - lastErrorTime >= ERR_DELTA1) {
      if (errEmails <= ERR_EMAILS || System.currentTimeMillis - firstErrorTime >= ERR_DELTA2) {
        admin.SendEmail.withSession { implicit mailSession =>
          Emailer.tellRaz("ERR_onError",
            currentUser.map(_.userName).mkString, m)

          synchronized {
            if (errEmails == ERR_EMAILS || System.currentTimeMillis - firstErrorTime >= ERR_DELTA2) {
              errEmails = 0
              firstErrorTime = lastErrorTime
            }
            errEmails = errEmails + 1
            lastErrorTime = System.currentTimeMillis()
            lastErrorCount = 0
          }
        }
      } else {
        lastErrorCount = 0
      }
    } else {
      lastErrorCount = 0
    }

Note that if you log a lot and don't roll over the logs often, these can get quite large...

See the entire Global here: play-simple-logging.scala.

Auditing

As you can see, important events are audited in the database - via Audit.logdb(...).

I use a Mongo DB and have a simple web page to review and clear these audited events every now and then. It is very good habit to get into - as it can alert you to uses of the website and hacking attempts and other data integrity issues that you won't otherwise know about.

Logging requests

As mentioned in my Playing with fire 2 - the instability comes into play, I log each request via the LF.START and LF.STOP messages.

These are good for debugging stuff whenever you have issues and you can also use as a simple performance monitor.

Better options ?

Alternatively, you could log that particular exception and the surrounding say 100 lines in a separate file and place a link to this file in the email - that way you can quickly inspect the log and details about the exception strait from the email.

I have used this technique in the past:

  • keep the last 1000 log lines in a buffer in-memory
  • have a web page to inspect these lines
  • on exception, clone this buffer and keep it for a while, with its own special link

That way, it would be available for a while for easy inspection.

This way you can also handle multiple servers - the getlog doesn't quite handle the case where you run a cluster, yet.

Of course, there are more complicated options that may offer more flexibility on bigger deployments, like Splunk etc - but for small adventures I prefer localized simple solutions.


Was this useful?    

By: Razie | 2014-09-13 .. 2016-05-16 | Tags: post , scala , play , programming


See more in: Cool Scala Subscribe

Viewed 1427 times ( | History | Print ) this page.

You need to log in to post a comment!