...
- Service Overview
- Contributing Applications, Daemons, and Windows Services
- Hours of Operation
- Execution Design
- Infrastructure and Network Design
- Resilience, Fault Tolerance and High-Availability
- Throttling and Partial Shutdown
- Required Resources
- Expected Traffic and Load
- Hot or Peak Periods
- Warm Periods
- Cool or Quiet Periods
- Environmental Differences
- Tools
...
- System Backup and Restore
- Backup Requirements
- Special Files
- Backup Procedures
- Restore Procedures
- Backup Requirements
...
- Error Messages
- Events
- Health Checks
- Other Messages
...
- Deployment
- Batch Processing
- Power Procedures
- Routine Checks
- System Rebuilds
- Troubleshooting
...
- Maintenance Procedures
- Patching
- Normal Cycle
- Zero-Day Vulnerabilities
- GMT/BST time changes
- Cleardown Activities
- Log Rotation
- Patching
- Testing
- Technical Testing
- Post-Deployment
...
- Failover
- Recovery
- Troubleshooting Failover and Recovery
...
Responding To Alerts P15 | 24x7x365 |
CloudNOC will respond to alerts within response time based on support agreement. | |
CloudNOC will restore impacted services. | |
CloudNOC will escalate issues to the customer if necessary. | |
CloudNOC will perform root cause analysis once issue is resolved. | |
CloudNOC will communicate resolution progress and update customer. | |
CloudNOC will maintain a record of all alerts for up to a year. | |
Monitoring | |
CloudNOC will ensure that all production systems are monitored. | |
CloudNOC will maintain appropriate alert levels and make adjustments as necessary. | |
CloudNOC will add and remove alerts upon request. | |
CloudNOC will keep the system free of false positive alerts. | |
CloudNOC will provide Advanced Kafka and Vault Monitoring. | |
| |
Database Backup Failure | |
Disk space in percentage from total | |
Filling Up Disks | |
High CPU | |
High Load Balancer Latency | |
Low Memory | |
Number of Processes | |
Percentage of inodes | |
Server Unavailable | |
System swap size | |
Unhealthy Hosts Under Load Balancer | |
VM Memory Size | |
Web Page Slow or Unresponsive | |
| |
Services monitoring | |
API Gateway | |
Autoscaling | |
AWS DynamoDB | |
AWS EC2 | |
AWS EFS | |
AWS ElastiCache | |
AWS ELB | |
AWS Lambda | |
AWS SES | |
AWS SQS | |
Backup | |
CloudNOC will ensure that all data is backed up according to the retention policy. | |
CloudNOC will ensure backup data is available in the event of a disaster. | |
CloudNOC will restore data from back up upon customer request. Restore takes approx. 2 hours per TB of data. | |
CloudNOC will delete the backup snapshots that are older than the required retention policy. | |
CloudNOC will monitor backup service and report any interruptions. | |
SOC | |
Scanning | |
CloudNOC will perform monthly scans of all customer infrastructure in AWS and provide customer with a report of identified vulnerabilities. | |
CloudNOC will maintain reports associated with each scan. | |
Patching | |
CloudNOC will continuously monitor the environment and CERT list of published vulnerabilities to establish if any software packages need to be patched. | |
CloudNOC will work with the customer to schedule and perform patch application. | |
Access Control | |
CloudNOC will manage access control to AWS, servers residing within AWS and other application systems managed by CloudNOC such as Intrusion Detection, Chef, Monitoring. | |
CloudNOC will generate access control entitlement report on a monthly basis. | |
Network Security | |
CloudNOC will manage firewall rules of all systems with AWS. | |
CloudNOC will generate network policy report every quarter. | |
CloudNOC will create a ticket if the request was not submitted via ticketing system. | |
Intrusion Detection | |
CloudNOC will maintain Intrusion Detection Server and ensure that IDS agents are deployed on all servers within AWS. | |
CloudNOC will ensure attack signatures are maintained up to date. | |
CloudNOC will investigate all IDS alerts within SLA based on their severity. | |
CloudNOC will provide quarterly reports summarizing IDS activities. | |
Incident Management by PagerDuty | |
CloudNOC will provide monthly reports from ticketing systems. | |
CloudNOC will ensure ticketing system is available and secure. | |
CloudNOC will retain ticketing data for up to 5 years. | |
Compliance Engine | Custom policy request |