Wednesday, June 9, 2010

Microsoft Communications Server “14”: Monitoring and Reporting

 

So you may be asking – why is Kevin posting blogs for every session now?  Truth is I took notes in every session, but now that I am actually taking the notes in live writer it is a one button publish and formatting happens close to real-time (when speakers are bullshiting).  Very nice Microsoft.  Now, back to the show.

CS 14 Health Monitoring Goals (Jared Zhang): 

  • Accurate Alerts
    • Filter out transient conditions to reduce noise
    • Distinguish alerts based on the impact to the system
    • Track the current state of alerts (active or resolved)
  • Actionable alerts
    • Cause and recommended actions
    • Relevant information to identify and isolate problems
    • Guidance for troubleshooting

CS 14 Health Monitoring

  • Health monitoring for CS 14
    • Service Monitoring
      • End-to-end verification of availability of CS services
    • Component monitoring
      • Monitoring components running on individual CS servers
    • Voice Quality Monitoring
      • Monitoring end-user-call reliability and media quality experience
  • CS 14 MP for SCOM 20017 R2
    • Monitoring and alerting on services, components, and voice quality
    • Central discovery of monitored objects from CS 14 Central Management Store (CMS)

Service monitoring with Synthetic Transactions

  • Synthetic Transactions (ST’s)
    • End-to-end scenario view
    • Powershell cmdlets starting with the Test verb
      • Examples: 
        • Test-CsIM
        • Test-CsPresence
        • Test-CsPstnOutboundCall
    • Run with configured test accounts or real credentials
    • Provide a success/failure response
  • SCOM Alerting
    • Core set of ST’s are run periodically to verify service availability
    • ST failures result in high priority alerts
    • Alerts are auto-resolved if ST’s succeed in the next run

For example, making an outbound call through powershell

c:> Test-CsIm –TargetFqdn myocs.domain.com

Component Monitoring

  • Health modeling for CS14 components
    • Key health indicator (KHI) and non-KHI’s
      • Events and performance counters are categorized as service impacting aspects (KHI’s) and non-service impacting aspects (non-KHI’s)
      • KHI indicates a service impacting condition
  • SCOM Alerting
    • KHI’s result in medium priority alerts
    • KHI alerts are auto-resolved if the component returns to healthy
    • Non-KHI’s result in informational alerts that need manual resolution.

Call Reliability Monitoring

  • Call reliability data are stored as Call Detail Records (CDR) data
  • Failures are classified as Expected and Unexpected, based on the ms-diagnostic ID.
    • Example: 52031 indicates media connectivity failure
  • SCOM Alerting
    • Categories for call reliability alerting:
      • Peer-to-peer audio/video calls
      • Audio/video conference calls
    • Alerts are raised for higher then expected failure rates
    • Each alert contains a CDR report link for troubleshooting

Media Quality Monitoring

  • Media Quality data are stored as Quality of Experience (QoE) data
  • Calls are classified as good/poor quality alerting:
    • A/V Conferencing Servers, Mediation Servers, Gateways
    • Network locations (subnets, sites, regions)
  • Alerts are raised for higher then expected poor quality call rates
  • Each alert contains a QoE report link for troubleshooting

 

The bottom line for this section is that there are really thorough monitoring and ST command applets built into Powershell (Test-CS*), and you can tie these into SCOM. 

Health Monitoring for CS14 is a must for success – Antwan, build good health monitoring into our CS14 deployment from the ground up.

 

Reporting CS14 with the Monitoring Server Role - Arish Alreja

Improvements for CS14 Monitoring Server Role

  • Call Detail Record (CDR) data collection
    • Improved diagnostics information for all modalities in CS14
    • Registration diagnostics data
    • IP Phone Device data
  • Quality of Experience (QoE) data collection
    • Richer Endpoint Data (OS, Mac Address, CPU)
    • Richer Audio Metrics (User facing diagnostics, audio healer metrics)
    • Coverage on Media Bypass, Mediation Server – Multiple Gateways,
  • Reporting Improvements
    • For ROI Analysis and Asset Management
      • Usage reports for visibility into deployment activity
      • IP Phone HW and SW versions
    • For Operational monitoring and diagnostics
      • Dashboard delivers a view into any call reliability/media quality issues
      • Call Reliability reports for monitoring and troubleshooting
    • For Helpdesk admins helping end users
      • User Activity Report
  • Reports can be configured for periodic email delivery
  • Reports are accessible from the CS Control Panel (CSCP)

Arish then moved directly into a demonstration of the reporting server and the CS Control Panel.  It was very impressive – this picture does not do it justice:

                                                ocs

I look forward to seeing this in Beta back at Vanderbilt!

No comments:

Post a Comment