2/06/2011

Diary of a reverse-engineer and the SPARC T3-1

How does one get into the business of reverse engineering? For me it started at an early age, I loved to take things apart. Of course it's also desirable that, if whatever it is that's being taken apart was working, it should be again, once it's put back together. It took a little while before I was able to put them back together successfully. Sometimes the problem was that there would be extra parts left over during the re-assembly process. But the thing I always took the most pride in was being able to fix something that was broken.

In the late sixties it was old watches, transistor radios and telephones that were some of my first "projects". The old crank type phones were some of my favorites. I still have an old 5-bar magneto from one of those phones. As the years went by the technology changed, but my interest and curiosity remained, progressing into computers and programming.

So, this year I find myself doing some reverse engineering again, bringing the new Oracle/Sun SPARC T3-1 systems into an existing Solaris jumpstart environment. The first issue is that this environment is customized and the person who wrote the scripts is no longer around, thus that reverse engineering that I've been talking about.

First problem, the SPARC T3-1 wouldn't jumpstart in the existing environment because the custom scripts which try to specify the boot drives in one of two ways:

  1. try to look up the drives in a disk map file based on the machine type and a wild card for logical disk names.
  2. or use physical paths (assigned to a custom devalias) to look-up the logical path (/dev/dsk/c#t#d#) once the jumpstart is initiated.
In the product notes (821-2059-15) for the SPARC T3-1, it's stated that this system now uses a SAS 2.0 WWID (16 digit hexadecimal number) in place of the "t#". Of course this WWID is unique to each disk drive, so that rules out a wild card in the disk mapping file.

Option two was also a bust, because once the SPARC T3-1 boots, the physical path (whether using the format listing, or an "ls -l" command of the logical disks) doesn't provide the same physical path used at the eeprom boot level. This issue also comes into play during post jumpstart processing as well, when we try to write more devalias values for the root drive and its mirror. But I'm getting ahead of myself.

Example: ls -l and partial format listing (both show the scsi_vhci as the path)

/dev/dsk/c05000CCA00AD91644d0s2 -> ../../devices/scsi_vhci/disk@g5000cca00ad91644:c

AVAILABLE DISK SELECTIONS:

0. c1t5000CCA00AD91644d0
/scsi_vhci/disk
@g5000cca00ad91644

To solve the first problem, I did two things, I modified the script that looks up the boot drives to recognize the devalias when they were given as a logical device, vs physical. This allowed the jumpstart to proceed normally and only involved a few commands from the "ok" prompt to determine the WWID of the disks that I wanted to specify. In this case I wanted to use the first drive of each of the two internal controllers.

{0} ok probe-scsi-all

/pci@400/pci@2/pci@0/pci@4/scsi@0

#--------^^^^^ 2nd controller

FCode Version 1.00.54, MPT Version 2.00, Firmware Version 5.00.17.00

Target 9

Unit 0 Disk HITACHI H103030SCSUN300G A2A8 585937500 Blocks, 300 GB

SASDeviceName 5000cca0150611a0 SASAddress 5000cca0150611a1 PhyNum 0

#-------------^^^^^^^^^^^^^^^^ this will be the root mirror

/pci@400/pci@1/pci@0/pci@4/scsi@0

---------^^^^^ 1st controller

FCode Version 1.00.54, MPT Version 2.00, Firmware Version 5.00.17.00

Target 9

Unit 0 Disk HITACHI H103030SCSUN300G A2A8 585937500 Blocks, 300 GB

SASDeviceName 5000cca01502f6e4 SASAddress 5000cca01502f6e5 PhyNum 0

#-------------^^^^^^^^^^^^^^^^ this will be the primary root disk


These disks are in HDD slot 0 (for pci@1) and HDD slot 4 (for pci@2). Note, switch any alpha characters to uppercase, as that's how they are listed in the /dev/{r}dsk/ directory.

{0} ok nvalias boot1 /dev/dsk/c0t5000CCA01502F6E4d0

{0} ok nvalias boot2 /dev/dsk/c0t
5000CCA0150611A0d0
{0} ok nvstore

Now the jumpstart could proceed as normal, but as I mentioned above, the additional post processing (more of the custom stuff) tries to create devalias entries for the root disk and its mirror. The problem here was the same as mentioned earlier, i.e., the "ls -l" doesn't provide the info needed to re-construct a physical path that would be valid at the eeprom level.

One other things to note, during jumpstart the normal post processing does write the bootpath to the eeprom "boot-device" entry. However, it includes the unique WWID. This can be problematic if that disk should need to be replaced (i.e., how do we know the WWID of the new drive?). The solution I used in this case was to write the values for the root drive and its mirror using the default paths of the devalias listing for disks0 thru disk 7 (this system has an 8 disk backplane vs 16).

Example of the devalis. Note: The disk#’s match the HHD#’s

disk7 /pci@400/pci@2/pci@0/pci@4/scsi@0/disk@p3

disk6 /pci@400/pci@2/pci@0/pci@4/scsi@0/disk@p2

disk5 /pci@400/pci@2/pci@0/pci@4/scsi@0/disk@p1

disk4 /pci@400/pci@2/pci@0/pci@4/scsi@0/disk@p0

disk3 /pci@400/pci@1/pci@0/pci@4/scsi@0/disk@p3

disk2 /pci@400/pci@1/pci@0/pci@4/scsi@0/disk@p2

disk1 /pci@400/pci@1/pci@0/pci@4/scsi@0/disk@p1

disk0 /pci@400/pci@1/pci@0/pci@4/scsi@0/disk@p0


So, at this point I decided to scrap the old method and create the bootdisk and bootmirror devalias values based on these default disk devalias values, and they are based on the disk "port#"/HDD# slot position and not the WWID numbers. Note the addition of the ",0:a" to provide a slice/partition selection (needed to support a possible live upgrade to an alternate partition).

{0} ok nvalias bootdisk /pci@400/pci@1/pci@0/pci@4/scsi@0/disk@p0,0:a

{0} ok nvalias bootmirror /pci@400/pci@2/pci@0/pci@4/scsi@0/disk@p0,0:a

{0} ok nvstore

{0} ok setenv boot-device bootdisk bootmirror

{0} ok boot


This solution should be short lived as we are currently looking at implementing a different method of provisioning Solaris systems. If anyone has other suggestions for a solution, I welcome them.

2 comments:

Sean C. Payne said...

Could you post the modification you did to the script that looks up the boot drives? I'm assuming a change in "check". I'm trying to jumpstart some t3-1's right now and getting nowhere when with rules & check

Doug Curtis said...

There's no changes made to "check", but there is a begin script in the rules that sets up the profile by defining the filesys's. If you have access to the "Jumpstart Technology" book by Howard and Noordergraff it should have some examples. In my example the nvalias's boot1 and boot2 are "detected" by scripts that then parse the device name, set a disk name variable and feed/pass that into the begin script. I can't post the code since it's in a corporate/proprietary env.