Similar presentations:
Before Terabytes Fall Disk reliability in Windows Vista and beyond
1. Before Terabytes Fall Disk reliability in Windows Vista and beyond
Before Terabytes FallDisk reliability in Windows Vista
and beyond
Frank Shu
Program Manager
WDEG-Storage
Microsoft Corporation
Matthew Kerner
Program Manager
Windows Diagnosis
Microsoft Corporation
2. Windows Storage Devices Strategic pillars
Storage FabricsServer/Enterprise
Personal Storage
Client/Consumer
Optical Platform
Client/Consumer
Preferred
Storage Platform
Partner/Customer
Leading platform enabling
storage fabric adoption
Optimized platform features
enabling your Windows
experience, here and now
Timely, comprehensive, quality
platform support for optical devices
Preferred platform for developing,
deploying, and using
storage devices
3. Session Outline
Introduction (Frank Shu)Windows Vista Disk Diagnostics
(Matthew Kerner)
Future Technology (Frank Shu)
Demo (Microsoft and Samsung)
4. What Matters Most To Our Users?
What Matters MostTo Our Users?
A consumer bought a new computer and it
works great at work and at home. She
couldn’t do her everyday tasks without it.
What matters most to her?
a) CPU power
b) Network connection
c) Battery life
d) Something else…
5. The Answer Is…
The Data6. Protecting Data: Windows Vista disk diagnostics
Protecting Data:Windows Vista disk diagnostics
Matthew Kerner
7. Quantifying Disk Failures
Catastrophic disk failures~200 disks replaced per week at Microsoft
in 2003
Top driver of Microsoft support’s hardwarerelated support calls in both client and server
Based on Microsoft figures, disk failures cost
many millions of dollars per year in enterprises
Localized failures (bad blocks)
Kernel and user-mode crashes
1.7% of customer-report Microsoft Online Crash
Analysis crashes are due to disk errors
Application hangs during read recovery
8. Disk Failure Mitigations
PreventionHybrid hard disks (mobile systems)
RAID
Catastrophic failure recovery
Data backup
Disk replacement
Localized failure recovery
Repair from redundant copy
Restore from backup
9. Windows Vista Disk Diagnostics
Windows VistaDisk Diagnostics
Purpose: Save user data before
catastrophic disk failure
Client SKUs
Self Monitoring And Reporting Technology
(S.M.A.R.T.) polling triggers diagnostic
Uses S.M.A.R.T. trip status – no
threshold/attribute comparison
Warns user of impending failure and walks
them through backup and replacement
Windows Vista backup improvements
10. Disk Diagnostics Details
Disk class driver polls S.M.A.R.T. status hourlyas it has done since Windows 2000
Based on industry feedback, no use of Disk
Self-Test or attribute comparison
Failure triggers user-mode code
Filter out duplicate failures
Log SMART READ LOG details to OS event log
Device error count from summary error log sector
Life timestamp from most recent error log entry
Trigger user-context interactive resolution
Customizable by Group Policy
Print instructions, walk user through backup
11. Startup Repair/Windows Recovery Environment
Purpose: Recover from non-bootablestates, including those caused by
disk failures
Automatic failover on boot failure
to recovery partition
Optionally deployed by OEM
Available on installation media
Hands-free diagnosis and repair
of top non-boot issues
12. Corrupted File Recovery
Purpose: Turn repeat user-mode crashescaused by corrupted system binaries into
one-time crash with silent repair
from cache
Windows Error Reporting crash handler
triggers diagnostic on inpage error
crashes due to bad blocks
Diagnoses corrupted system files
Silent repair from System File Cache
13. Windows Vista Disk Diagnostics
Windows VistaDisk Diagnostics
Matthew Kerner
Program Manager
Windows Diagnosis
14. Opportunities For Future Technology
Opportunities ForFuture Technology
Proactive failure prevention
Reduce scenario pain by enabling
resolutions other than just data recovery
Requires finer-grained failure description
to help host choose the best resolution
Increase warning time before failures
to allow users to save data
15. Future Technology: Protecting User Data And Preventing Hard Drive Failure Proactively
Future Technology:Protecting User Data
And Preventing Hard
Drive Failure Proactively
Frank Shu
16. What Is PRCS?
Proactive Reporting and CorrectingSafeguard (PRCS) enables a device and
host to correct failure conditions proactively
Device can report hostile conditions before
damage or failure occurs
Host reacts to a device event in real time
based on policy and user preference
A proposal for the PRCS protocol has
been submitted to T13
17. Why Is PRCS Important?
User’s digital data is more valuable thanever before
Disk drive capacity continue to increase
Not every PC user can afford RAID
Deliver on opportunities for improvements
beyond S.M.A.R.T.
18. Goals Of PRCS
Proactively protect user dataImprove the user experience
when data is at risk
Reduce OEM’s customer support costs
Reduce warranty costs for disk
drive vendors
19. PRCS Features
Device monitors its own conditionsin real time
Reduce host monitoring performance impact
Device sends meaningful PRCS events to
the host for correction of hostile conditions
and data protection
No translations or guesses required
Host acts on device’s PRCS event
proactively according to policy and
user preference
20. PRCS Advantages
PRCS is proactiveTaking a corrective action before errors occur
Protecting data when it is at risk
PRCS is designed for end users, not just
computer experts
No need to understand a cryptic message to
benefit from PRCS. For example: “The previous
self-test completed having the electrical element
of the test failed”
PRCS enables transparent mitigation of a hostile
condition or a recovery process
Users do not need to configure a self-test mode or
reporting method
Users control policy as desired
21. Proactive Disk Diagnostics
ProactiveDisk Diagnostics
Debasis Baral
Vice President of Engineering
Samsung
22. HDD Reliability 101
HDD reliability and performanceis negatively impacted by extremes
in the following operating conditions
Temperature
Demo
Vibration
Demo
Shock
Demo
Duty cycle
Altitude
Humidity
A combination of the above conditions
A history of the above combinations
23. Reliability Versus Temperature
HDD life decreases with temperatureFailure rates increase exponentially with temperature
for all HDD suppliers
Environmental temperature increase from 25C to 100C
could translate into 10 – 50x shorter life
Ref.: Samsung reliability tests
Samsung HDD Lab Engineering Sample Data
24. Performance Versus Vibration
Performance Versus VibrationData throughput or drive performance can be
significantly affected in the presence of vibration
Effect of vibration is reversible
Cumulative effects of vibration on long term drive
reliability is a subject of ongoing research
Performance Loss With Vibration
100
120
80
10
60
40
20
Samsung HDD Lab Engineering Sample Data
1
0
0.05
0.10
0.20
0.50
0.75
Vibration level, Arb. Units
Thruput, MB/S
Off Track
1.00
1.30
Offtrack, % Track Ptich
Throughput in MB/s
100
25. Reliability Versus Shock
Shock ModelingOperating shock damage
Op. Shock
Scratches
Damage by corners, leading edge,
and side edges of the slider.
Courtesy: E. Jayson and Frank Talke, UC San Diego
Excessive shock is the major
cause of failure in both PC
and consumer electronics
environments
Non-operating shock damage
26. Reliability Design Guidelines
Failure modes and failure ratesof disk drives depend on their
operating environments
Temperature and Handling
(shock and vibration) are major factors
impacting HDD reliability
HDD reliability will be enhanced if OS
detects and manages reliability risks
and stress events intelligently (PRCS)
Users can improve HDD data reliability
by correctly responding to PRCS events
27. PRCS
Kai ChenMicrosoft Corporation
Debasis Baral
Samsung
28. Call To Action
Test your drives with Windows Vista DiskDiagnostics and send feedback
Ensure your drives comply with ATA-7
specs to surface device error count and
life timestamp
Engage with the Startup Repair team to
build a plan for Startup Repair in OEM
factory processes
Participate in T13 discussions on PRCS
Plan your device designs in line with
PRCS guidelines
29. Additional Resources
WhitepapersWindows Recovery Environment/Startup
Repair/Built-in Diagnostics:
http://www.microsoft.com/technet/windowsvista/evaluate/feat/relperf.mspx
Feedback/Questions
Windows Vista Disk Diagnosis:
Dfdfeed @ microsoft.com
Corrupt File Recovery: Dfdfeed @ microsoft.com
Windows Recovery Environment/Startup Repair:
Recovery @ microsoft.com
PRCS: Prcsdisc @ microsoft.com
30.
© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.