Skip to main content

Protocol Extension

Overview

The protocol extension feature provides R&D, operations, and product teams with network protocol monitoring capabilities beyond HTTP/HTTPS, supporting performance analysis of modern communication protocols such as WebSocket and gRPC, while providing flexible extension for custom protocol monitoring. It helps teams fully understand the network communication status of applications, quickly locate performance issues at the protocol level, and ensure the stability of real-time communication and microservices architecture.

Core Value

  • Full Protocol Coverage: Supports HTTP/HTTPS, WebSocket, gRPC, and other protocols for end-to-end monitoring
  • Real-Time Communication Monitoring: Focuses on performance and stability analysis of real-time protocols like WebSocket
  • Microservices Architecture Support: Deep monitoring of microservices communication protocols like gRPC
  • Flexible Extension Capability: Customize monitoring metrics and dimensions for any protocol through API

Technical Requirements

  • SDK Version: ≥ 2.16.0
  • Supported Platforms: Android, iOS, Harmony OS
  • Data Collection: Predefined protocols automatically collected, custom protocols require instrumentation

Use Cases

Use Case 1: Real-Time Communication Performance Optimization

WebSocket as the core communication protocol of the application requires monitoring of connection stability and message transmission performance.

Practice Case:

  • Social app uses WebSocket for instant messaging
  • Monitoring found message latency P95 reached 2s, failure rate 5%
  • Located connection issues in specific regions through protocol extension
  • After optimization, latency dropped to 500ms, failure rate dropped to 0.5%
  • User message delivery rate increased 95%

Use Case 2: Microservices Performance Monitoring

Under gRPC microservices architecture, need to monitor performance and error rates of inter-service calls.

Practice Case:

  • Application uses gRPC for mobile-to-microservices communication
  • Found a gRPC service call failure rate of 10%
  • Located timeout issues in specific methods through detail list
  • After backend team optimized the service, failure rate dropped to 1%
  • Overall application availability increased 3%

Use Case 3: Custom Protocol Monitoring

Using private protocols or special network protocols requires custom monitoring solutions.

Practice Case:

  • IoT application uses MQTT protocol for communication
  • Customized MQTT protocol monitoring through SDK API
  • Established metric system: connection count, message volume, transmission latency
  • Discovered frequent disconnection and reconnection issues on device side
  • After optimizing heartbeat mechanism, connection stability increased 80%

Supported Protocol Scenarios

Predefined Protocols

WebSocket

  • Application Scenario: Mobile
  • Monitoring Metrics: Connection latency, message latency, failure rate, TPM
  • Data Collection: SDK automatically collects, no additional configuration needed

gRPC

  • Application Scenario: Mobile
  • Monitoring Metrics: Call latency, failure rate, throughput, errors
  • Data Collection: SDK automatically collects, no additional configuration needed

Custom Protocols

Custom Capabilities

  • Custom protocol name and type
  • Custom monitoring metrics (metrics)
  • Custom analysis dimensions (tags)
  • Custom detail data (data)

Core Features

1. Data Dashboard

The protocol extension data dashboard provides a global view of protocol performance, supporting multi-protocol comparison analysis.

Global Filters

Supports multi-dimensional data filtering to accurately locate problem scope:

Filter Dimensions

  • Protocol Extension Name: Specific protocol instance (e.g., Chat WebSocket, Push WebSocket)
  • Protocol Extension Type: Protocol category (WebSocket, gRPC, Custom)

2. Data Cards

Data cards display core performance metrics of protocols, providing a quick overview.

Basic Metrics

MetricDescriptionCalculation Method
Access CountTotal number of protocol callsTotal calls within statistical time period
TPMAccesses per minuteAccess Count / Statistical Minutes
Average LatencyAverage response time of protocol callsDefault shows P75 percentile
Failure RateProportion of failed callsFailed Count / Total Access Count × 100%
Affected UsersUnique number of users using this protocolDeduplicated count

Failure Determination Rules

WebSocket

  • status = -1: Connection failure or abnormal disconnection
  • Can customize failure determination conditions through SDK API

gRPC

  • gRPC status code non-zero: Call failure
  • Timeout without response: Call failure

Custom Protocol

  • Set failure conditions through SDK API
  • Flexibly define failure standards that meet business requirements

3. Trend Analysis

Displays trends of protocol performance metrics over time, helping to discover performance fluctuations and anomalies.

Dimension Analysis

Data Type Distribution

  • Displays access volume distribution of different protocol types

Name Failure Rate Distribution

  • Displays failure rates of different instances

Failure Rate Trend

Displays protocol failure rate trends over time:

Analysis Points

  • Sudden Rise: May be service failure or network issues
  • Sustained High Level: Need deep investigation of protocol implementation
  • Periodic Fluctuation: May be related to business peaks or network environment
  • Gradual Rise: May be performance degradation or user volume increase

Alert Recommendations

  • Failure Rate > 5%: Needs attention
  • Failure Rate > 10%: Needs urgent handling
  • Failure Rate month-over-month rise > 50%: Trigger alert

Latency Trend

Displays protocol latency trends over time:

Percentile Analysis (can be customized in dashboard, default uses P75)

  • P50 (Median): Experience level of 50% of users
  • P75 (Default): Experience level of 75% of users
  • P95: Focus on long-tail user experience
  • P99: Extreme scenario performance

Optimization Directions

  • Sudden latency increase: Check network or server-side issues
  • Consistently high: Optimize protocol implementation or network strategy
  • Large fluctuations: Optimize connection reuse and heartbeat mechanism

4. Detail List

The detail list displays statistical data of all protocols, supporting deep analysis and problem localization.

Metric List

Default Metrics (SDK automatically collects)

  • Name, Data Type, Access Count, TPM, Latency P75, Availability
  • Active Device Count, Affected User Count

Custom Metrics (uploaded through metrics)

  • Business-related metrics (e.g., message size, reconnection count)
  • Performance-related metrics (e.g., serialization latency, compression rate)
  • Quality-related metrics (e.g., message loss rate, out-of-order rate)

Device Dimensions (can be customized in dashboard)

Default Dimensions

  • Device Model, Operating System, App Version
  • Country, Province/State, City, ISP
  • Network Type (WiFi, 4G, 5G)

Custom Dimensions (uploaded through tags)

  • Business dimensions (e.g., chat room ID, user level)
  • Technical dimensions (e.g., server region, protocol version)
  • Environment dimensions (e.g., foreground/background, weak network/normal network)

Custom Protocol Configuration

Configuration Process

1. SDK Instrumentation Define Data
↓ Set protocol name, metrics, dimensions

2. Metric System Extraction
↓ Configure custom metrics and dimensions

3. Dashboard Configuration
↓ Copy and customize dashboard

4. Data Display
↓ View protocol monitoring data

Step 1: SDK Instrumentation

Android SDK Example

// Start custom protocol monitoring
NBSAppAgent.startSpan(String name, String operation);

Parameter Description

  • name: Protocol name (e.g., MQTT Message, Custom RPC)
  • operation: Operation type (e.g., send, receive, connect)

iOS SDK Example

Objective-C

// Start custom protocol monitoring
+ (id<NBSSpan>)startSpanWithName:(NSString *)name
operation:(NSString *)operation;

Swift

// Start custom protocol monitoring
startSpan(withName: String!, operation: String!)

Parameter Description

  • name: Protocol name (e.g., MQTT Message, Custom RPC)
  • operation: Operation type (e.g., send, receive, connect)

Harmony OS SDK Example

import tingyun, { SpanStatuses } from '@tingyun/sdk-core'

// 1. Start protocol monitoring
const span = tingyun.startSpan('span_name', 'span_operation')

// 2. Set detail data
span.setData('dataKey', 'dataValue')

// 3. Set custom dimensions
span.setTag('tagKey', 'tagValue')

// 4. Set custom metrics
span.setMetric('time', 50, SpanMetricUnits.Millisecond)

// 5. Set duration
span.setDuration(100)

// 6. Set status
span.setStatus(SpanStatuses.OK)
span.setStatusCode('200')

// 7. Create child Span (optional)
const childSpan = span.startChild('child_span_name', 'child_span_operation')
childSpan.finish()

// 8. End monitoring
span.finish()

Parameter Description

  • span_name: Protocol name
  • span_operation: Operation type
  • setData(): Set detail data
  • setTag(): Set custom dimensions
  • setMetric(): Set custom metrics
  • setDuration(): Set duration (milliseconds)
  • setStatus(): Set status (OK/ERROR)
  • setStatusCode(): Set status code

Step 2: Metric System Configuration

Enter Metric System

  1. Navigate to: Platform Management → Metric System
  2. Select Data Source: Mobile Protocol Extension
  3. Click [New Metric]

Configure Custom Metrics

Configuration ItemDescriptionExample
Metric NameDisplay name of the metricMessage Size
Metric FieldKey of metric in SDKmessage_size
Aggregation Methodsum/avg/max/min/countavg
UnitUnit of the metricKB

Configure Custom Dimensions

Configuration ItemDescriptionExample
Dimension NameDisplay name of the dimensionMessage Topic
Dimension FieldKey of tag in SDKtopic
Dimension TypeString/NumericString

Step 3: Dashboard Configuration

Copy Default Dashboard

  1. Navigate to: Dashboard
  2. Find [Mobile Protocol Extension Dashboard]
  3. Click [Copy] button
  4. Rename to custom protocol dashboard (e.g., Socket Monitoring Dashboard)

Update Chart Configuration

  1. Enter edit mode
  2. Select the chart to modify
  3. Update chart configuration:
    • Metric: Select custom metric
    • Dimension: Select custom dimension
    • Filter: Add protocol type filter

Chart Configuration Example

Chart: Socket Message Size Trend
- Metric: Message Size (avg)
- Dimension: Time
- Filter: Protocol Type = Socket
- Chart Type: Line Chart

Chart: Message Topic Distribution
- Metric: Access Count
- Dimension: Message Topic
- Filter: Protocol Type = MQTT
- Chart Type: Pie Chart

Performance Optimization Guide

WebSocket Performance Optimization

Issue 1: Connection Latency Too Long

Symptom: WebSocket connection establishment latency > 2s

Common Causes

  • Slow DNS resolution
  • Slow TCP handshake
  • Slow SSL/TLS handshake
  • Slow server response

Optimization Recommendations

  1. DNS Optimization:

    • Use HTTPDNS
    • Cache DNS results locally
    • Pre-resolve domain names
  2. Connection Optimization:

    • Connection reuse and keep-alive
    • Reduce handshake times
    • Optimize heartbeat strategy
  3. Protocol Optimization:

    • Prefer WSS (encrypted)
    • Enable compression (permessage-deflate)

Issue 2: High Message Latency

Symptom: Message transmission latency P95 > 1s

Common Causes

  • Message body too large
  • Network congestion
  • Slow server processing
  • Serialization/deserialization latency

Optimization Recommendations

  1. Message Optimization:

    • Compress message content
    • Use binary protocols (e.g., Protocol Buffers)
    • Fragment large messages for transmission
  2. Network Optimization:

    • Select optimal server nodes
    • Degradation strategy in weak network
    • Message priority queue
  3. Code Optimization:

    • Asynchronous message processing
    • Optimize serialization performance
    • Batch send messages

Issue 3: Frequent Connection Disconnections

Symptom: WebSocket failure rate > 5%

Common Causes

  • Unstable network
  • Unreasonable heartbeat mechanism
  • Server actively disconnects
  • App enters background

Optimization Recommendations

  1. Heartbeat Mechanism:

    • Reasonably set heartbeat interval (recommend 30-60s)
    • Auto-reconnect after heartbeat timeout
    • Distinguish foreground/background heartbeat strategy
  2. Reconnection Strategy:

    • Exponential backoff reconnection
    • Limit reconnection attempts
    • Pause reconnection when network is disconnected
  3. Connection Keep-Alive:

    • Maintain connection when app goes to background
    • Monitor network changes and actively reconnect
    • Server-side optimize connection timeout

gRPC Performance Optimization

Issue 1: High Call Latency

Optimization Directions

  • Enable HTTP/2 multiplexing
  • Use connection pool
  • Enable gRPC compression
  • Optimize Protobuf serialization

Issue 2: High Call Failure Rate

Optimization Directions

  • Configure reasonable timeout
  • Implement retry mechanism
  • Circuit breaker and degradation strategy
  • Monitor server health status

Custom Protocol Optimization

General Optimization Principles

  1. Performance First:

    • Choose efficient serialization method
    • Reduce unnecessary data transmission
    • Optimize protocol handshake process
  2. Reliability First:

    • Implement message acknowledgment mechanism
    • Design retransmission strategy
    • Handle network exceptions
  3. Extensibility:

    • Design version compatibility mechanism
    • Reserve extension fields
    • Support protocol negotiation

Best Practices

1. Troubleshooting Process

1. Discover Issue
↓ Failure rate/latency abnormal

2. Confirm Scope
↓ Specific protocol/region/device

3. Analyze Trend
↓ When started/is it continuous

4. View Details
↓ Error information/call stack

5. Locate Root Cause
↓ Client/network/server

6. Formulate Solution
↓ Optimize/degrade/rollback

7. Verify Effectiveness
↓ Compare data before and after optimization

2. Data Analysis Recommendations

Multi-Dimensional Comparison

  • Performance comparison of different protocols
  • Performance comparison of different regions
  • Performance comparison of different network environments

Correlation Analysis

  • Correlation between protocol performance and business metrics
  • Correlation between protocol performance and device performance
  • Correlation between protocol performance and version

Trend Prediction

  • Predict future trends based on historical data
  • Early detection of performance degradation
  • Plan capacity and optimization

FAQ

Q1: How to design a metric system for custom protocols?

A: Follow these principles:

Core Metrics (Required)

  • Performance Metrics:

    • Call latency (P50/P75/P95/P99)
    • TPM (calls per minute)
    • Failure rate
  • Business Metrics:

    • Metrics directly related to business
    • E.g., message size, queue length, concurrency

Auxiliary Metrics (Optional)

  • Quality Metrics:

    • Retry count
    • Message loss rate
    • Out-of-order rate
  • Resource Metrics:

    • Traffic consumption
    • CPU usage
    • Memory usage

Metric Design Example

MQTT Protocol Metric System:

Core Metrics:
- Connection latency, message latency, failure rate, TPM

Business Metrics:
- Message size, topic count, subscription count

Quality Metrics:
- Reconnection count, message loss rate, QoS distribution

Dimensions:
- Topic, QoS level, message type