{{indexmenu_n>06}} ====== Troubleshooting and Problem Solving ====== ===== Overview ===== The LaCie 12big Rack Serial 2 includes an Enclosure Services Processor and associated monitoring and control logic to enable it to diagnose problems within the enclosure’s power, cooling and drive systems. The sensors for power and cooling conditions are housed within the PCMs. There is independent monitoring for each module. ==== Initial Start-up Problems ==== === Faulty Cords === First check that you have wired up the enclosure system correctly. Then, if: * cords are missing or damaged * plugs are incorrect * cords are too short Call your supplier for replacements. === Alarm Sounds On Power Up === Please refer to [[troubleshooting#troubleshooting|Troubleshooting]]. === Computer Doesn’t Recognize the LaCie 12big Rack Serial 2 === - Check that the interface cables from the LaCie 12big Rack Serial 2 to the host computer are fitted correctly. - Check that the LEDs on all installed drive carrier modules are illuminated (Amber). Note that the drive LEDs will not be lit during drive spinup. - Check that Drive Carrier Modules have been correctly installed. - Check any visible SAS indicators. - Check HBA BIOS for SAS/SATA target visibility. - Verify operating system driver installation. ~~pgbreak~~ ===== LEDs ===== Green LEDs are always used for good or positive indication, flashing Green/Amber if non-critical conditions exist. Red or Amber LEDs indicate there is a critical fault present within the module. ==== Power Cooling Module LEDs ==== Under Normal conditions the bi-color Power On LEDs (Figure 4–1) will be illuminated constant GREEN. When a fault occurs the Power On LEDs will be illuminated constant RED. **Table 5–1: PCM LED States ** ^ PCM OK (Green) ^ Fan Fail (Amber) ^ AC Fail (Amber) ^ DC Fail (Amber) ^ Status ^ | OFF | OFF | OFF | OFF | No AC (any PCM) | | OFF | OFF | ON | ON | No AC (this PCM only) | | ON | OFF | ON | ON | AC present PCM On OK | | OFF | ON | OFF | OFF | PCM Fan Fail | | OFF | ON | ON | ON | PCM fault (over amp, over voltage, over current) | | FLASHING | OFF | OFF | OFF | Standby mode | | OFF | FLASHING | FLASHING | FLASHING | PCM firmware download | ~~pgbreak~~ ==== Ops Panel LEDs ==== The Ops Panel (Figure 4–2) displays the aggregated status of all the modules. The Ops Panel LEDs are defined in Table 5–2. **Note:** The Ops Panel is supplied as an integral part of the enclosure core product and is not user replaceable. **Table 5–2: Ops Panel LED States** ^ System Power \\ (Green/Amber) ^ Module Fault \\ (Amber) ^ Logical Fault \\ (Amber) ^ LED Display ^ Associated LEDs/Alarms ^ Status ^ | ON | OFF | OFF | X | | Aux present, overall power failed or switched off | | ON | ON | X | X | Single beep, then double beep | Ops Panel Power On (5s), test state | | ON | OFF | OFF | X | | Power On, all functions good | | ON | ON | X | X | PCM Fault LEDs, Fan Fault LEDs | Any PCM fault, Fan fault, over or under temp. | | ON | ON | X | X | SBB Module LEDs | Any SBB Module fault | | ON | ON | X | X | | Enclosure logical fault | | ON | Flash | X | X | Module state LED on SBB module | Unknown (invalid or mixed) SBB module type installed, DC bus failure (inter SBB comms) LaCie 12big Rack Serial 2 VPD configuration error | | ON | Flash | X | X | PCM Fault LEDs, Fan Fault LEDs | Unknown (invalid or mixed) PCM type installed or DC bus failure (PCM comms) | | ON | X | ON | X | Array in failed or degraded state | drive failure has occurred causing loss of availability or redundancy | | ON | X | Flash | X | Array in impacted state | Arrays operating background function | | ON | Flash | Flash | X | SES state S! | Enclosure ID setting different from Start of Day | | X | X | X | Flash | | SES controlled Enclosure ID or Invalid ID selected | KEY: X = Disregard. ~~pgbreak~~ ==== Drive Carrier Module LEDs ==== Disk drive status is monitored by a GREEN LED and an AMBER LED mounted on the front of each Drive Carrier Module, shown in Figure 5–1 and Figure 5–2. The LED conditions are defined in Table 5–3. * In normal operation the Green LED will be ON and will flicker as the drive operates * In normal operation the Amber LED state will be: * OFF if there is no drive present, * OFF as the drive operates, and * ON if there is a drive fault present. [{{:products:12big-rack-serial-2:12bigserial2_herondrive.png?300|Figure 5–1 - Drive Carrier LEDs}}] ** Table 5–3: Drive LED States ** ^ Green Drive LED ^ Amber Drive LED ^ Associated Ops Panel LED ^ Status ^ | OFF | OFF | None | No drive installed | | On/Blink/Off with activity or start up | X | None | Drive installed and operational | | ON | Flash 1s On/1s Off | None | SES device identity set | | ON | ON | Logical Fault (Amber) | SES device fault bit set | | OFF | ON | Module Fault (Amber) | Power control circuit failure | | ON | Flash 3s On/1s Off | Logical Fault (Amber) | Failed disk array | KEY: X = Disregard. **Important info:** Dummy Drive Carrier Modules must be fitted to all unused drive bays to maintain a balanced air flow. ~~pgbreak~~ ==== I/O Module LEDs ==== I/O Module faults are indicated by the Amber Fault LED on the module faceplate. VPD errors are indicated by the Module OK LED flashing Green. Host port connections are monitored by Green Activity LEDs. LED states are shown in Table 5–4. **Table 5–4: I/O Module LED States** ^ I/O Module OK (Green) ^ I/O Module Fault (Amber) ^ Host Port Activity (Green) ^ Status ^ | ON | OFF | X | I/O Module OK | | OFF | ON | X | I/O Module Fault | | X | X | OFF | No Host Port Connection | | X | X | ON | Host Port Connection - No Activity | | X | X | Flashing | Host Port Connection - Activity | | Flashing | X | X | I/O Module VPD Error | KEY: X = Disregard. ===== Audible Alarm ===== An Audible Alarm is located on the Ops Panel and is activated by the LaCie 12big Rack Serial 2 firmware for a variety of situations. **Table 5–5: Alarm States** ^ Alarm State ^ Action ^ Action with Mute Button Pressed ^ | S0 | Normal Mode: Silent | Bleep twice | | S1 | Fault Mode: 1s on/1s off | Transition to S2 or S3, (see Notes) | | S2 | Remind Mode: Intermittent Bleep | None | | S3 | Muted Mode: Silent | None | | S4 | Critical Fault Mode: Continuous Alarm | None: Mute not active | **Notes:** \\ \\ 1 When in state S1, If Mute is not pressed after 2 minutes, automatically transition state S2 or S3, (VPD setup option). \\ 2 Alarm states S1 to S4 return to S0 upon cessation of fault. \\ 3 Critical Fault state S4 can be entered from any other state. The Audible Alarm can be muted by pressing the Mute button on the Ops Panel. Please refer to [[troubleshooting#troubleshooting|]] for more information. ~~pgbreak~~ ===== Troubleshooting ===== The following sections describe common problems, with possible solutions, which can occur with your LaCie 12big Rack Serial 2. For details on how to remove and replace a module see [[replace-module|]]. **Table 5–6: Alarm Conditions** ^ Status ^ Severity ^ Alarm ^ Ops Panel LED ^ | PCM Alert – Loss of DC power from a single PCM | Fault – No Loss of Redundancy | S1 | Module Fault | | PCM Alert – Loss of DC power from a single PCM | Fault – Loss of Redundancy | S1 | Module Fault | | PCM Fan Fail | Fault – Loss of Redundancy | S1 | Module Fault | | SBB Module detected PCM Fault | Fault | S1 | Module Fault | | PCM Removed | Configuration Error | None | Module Fault | | Enclosure Configuration Error (VPD) | Fault – Critical | S1 | Module Fault | | Low Warning Temperature Alert | Warning | S1 | Module Fault | | High Warning Temperature Alert | Warning | S1 | Module Fault | | Over Temperature Alarm | Fault – Critical | S4 | Module Fault | | I2C Bus failure | Fault – Loss of Redundancy | S1 | Module Fault | | Ops Panel Communication Error (I2C) | Critical Fault | S1 | Module Fault | | Raid Error | Fault – Critical | S1 | Module Fault | | SBB Interface Module Fault | Fault – Critical | S1 | Module Fault | | SBB Interface Module Fault – No functioning modules remaining | Fault – Critical | S4 | Module Fault | | SBB Interface Module Removed | Warning | None | Module Fault | | Drive Power Control Fault | Warning – No loss of drive power | S1 | Module Fault | | Drive Power Control Fault | Fault – Critical – loss of drive power | S1 | Module Fault | | Drive Removed | Warning | None | Insufficient Power Available | | Warning | None | Module Fault | | ~~pgbreak~~ ==== System Faults ==== ^ Symptom ^ Cause ^ Action ^ | Audible alarm sound | Internal fault detected (e.g. failure of an internal communications path) | Check for other AMBER LED indications on the PCMs. If there is a PCM error present there may be a communications problem with that PCM. Remove and then re-fit the PCM, if the problem persists then change the PCM. | ==== Power Cooling Module Faults ==== ^ Symptom ^ Cause ^ Action ^ | 1 Ops Panel Module Fault LED amber \\ 2 Audible alarm sounding. \\ 3 Fan Fail LED is illuminated on PCM | 1 Any power fault. \\ 2 A thermal condition which could cause PCM overheating. \\ 3 A fan failure. | 1 Check AC mains connections to PCM is live. \\ 2 Disconnect the PCM from mains power and remove the PCM. Re-install: if problem persists, replace PCM. \\ 3 Reduce the ambient temperature. | ==== Thermal Monitoring and Control ==== The LaCie 12big Rack Serial 2 uses extensive thermal monitoring and takes a number of actions to ensure component temperatures are kept low and also to minimize acoustic noise. Air flow is from front to rear of the enclosure. ^ Symptom ^ Cause ^ Action ^ | If the ambient air is below 25 °C and the fans are observed to increase in speed then some restriction on airflow may be causing additional internal temperature rise. \\ Note: This is not a fault condition. | The first stage in the thermal control process is for the fans to automatically increase in speed when a thermal threshold is reached. This may be caused by higher ambient temperatures in the local environment and may be perfectly normal. \\ Note: This threshold changes according to the number of drives and power supplies fitted. | 1 Check the installation for any airflow restrictions at either the front or rear of the enclosure. A minimum gap of 25mm at the front and 50mm at the rear is recommended. \\ 2 Check for restrictions due to dust build-up; clean as appropriate. \\ 3 Check for excessive re-circulation of heated air from rear to the front, use in a fully enclosed rack installation is not recommended. \\ 4 Check that all blank modules are in place. \\ 5 Reduce the ambient temperature. | ~~pgbreak~~ ==== Thermal Alarm ==== ^ Symptom ^ Cause ^ Action ^ | 1 Ops Panel Module Fault LED amber. \\ 2 An amber LED on one or more PCMs. | If the internal temperature measured in the airflow through the enclosure exceeds a pre-set threshold a thermal alarm will sound. | 1 Check local ambient environment temperature is below the upper 35°C specification. \\ 2 Check the installation for any airflow restrictions at either the front or rear of the enclosure. A minimum gap of 25mm at the front and 50mm at the rear is recommended. \\ 3 Check for restrictions due to dust build-up, clean as appropriate. \\ 4 Check for excessive re-circulation of heated air from rear to the front, use in a fully enclosed rack installation is not recommended. \\ 5 If possible shutdown the enclosure and investigate the problem before continuing. | ===== PCM Firmware Programming Failure ===== If a PCM is being firmware programmed and the download fails then the PCM fans will go to full speed and the PCM LEDs will flash as follows: * Power LED: Green * AC Fail LED: Amber * PCM Fail LED: Amber * AMD Fail LED: Amber **Important info:** In this situation (where PCM programming has failed) the PCM can be reprogrammed but it must not be moved between bays. If the PCM is moved it must be returned to the original bay before reprogramming can take place. ~~pgbreak~~ ===== Dealing with Hardware Faults ===== Ensure that you have obtained a replacement module of the same type before removing any faulty module. **Caution:** If the LaCie 12big Rack Serial 2 is powered up and you remove any module, replace it immediately. If the system is used with any modules missing for more than a few minutes, the enclosure can overheat, causing power failure and data loss. Such action will invalidate the warranty. * Replace a faulty drive with a drive of the same type and equivalent or greater capacity. * All drive bays must be fitted with a Drive Carrier Module or a Dummy Carrier Module in order to maintain a balanced air flow. * All the supplied plug-in power supply units, electronics modules and blank modules must be in place for the air to flow correctly around the cabinet. **Caution:** Observe all conventional ESD precautions when handling LaCie 12big Rack Serial 2 modules and components. Avoid contact with Midplane components and module connectors, etc. ===== Continuous Operation During Replacement ===== Your hardware or management software Enclosure Management application will determine the capability of replacing a failed disk without loss of access to any filles system on the enclosure. Enclosure access and use during this period is uninterrupted. If an enclosure contains two or more PCMs, they can maintain power to the system while a faulty PCM is replaced.