JUNOS hard-disk recovery

Ok, this is not a full complete hard disk recovery method. I’m simply sharing how I managed to fix my M10 hard disk.

I was upgrading JUNOS form version 8 to 9 when I got a bunch of errors and suddenly the hard disk was no longer available. Unfortunately I did not grab a log when this happened.

When I rebooted the box I was seeing this error on startup:

mfs: /dev/ad1s1b: Device not configured
Can't open /dev/ad1s1f: Device not configured

Once the box finally started up via the compact-flash I attempted to run a diagnostics but could not:

root> request chassis routing-engine diagnostics hard-disk
-> Hard disk is absent from the boot list.
Hard disk may have had previous errors, skipping test.

So basically it looks like JUNOS has removed the hard drive from the boot list. How can we stick it back? First you need to log in via root so you get the % prompt. Do NOT go into cli mode yet.


sysctl -w machdep.bootdevs=pcmcia-flash,compact-flash,disk,lan 

This tells JUNOS to put the boot order back to the default, including the hard drive.

Not reboot the box and it should come back into the boot order. This is what I saw on my boot:

ad0: 1953MB  [3970/16/63] at ata0-master PIO4
ad1: DMA limited to UDMA33
ad1: 11513MB  [23392/16/63] at ata0-slave UDMA33
Mounting root from ufs:/dev/ad0s1a
if_pfe_open: listener socket opened, listening...
Mounted jbase package on /dev/vn0...

So far so good. Once the box is back up go back into root mode and type:

root@% smartd -oe /dev/ad1

Hopefully your drive should be back up. If not you can run some extended smart tests on it. If you’ve lost your partitions you can repartition the disk by first going into cli mode and then:

root> request system partition hard-disk

WARNING:   The hard disk is about to be partitioned.  The contents
WARNING:   of /altroot and /altconfig will be saved and restored.
WARNING:   All other data is at risk.  This is the setup stage, the
WARNING:   partition happens during the next reboot.

Setting up to partition the hard disk ...

WARNING:   'request system reboot' command when you are ready to proceed
WARNING:   with the partitioning.  To abort the partition of the hard disk
WARNING:   use the 'request system partition abort' command.

root> request system reboot
Reboot the system ? [yes,no] (no) yes

I had to repartition my disk and rebooted. All looks good:

root> show system storage
Filesystem              Size       Used      Avail  Capacity   Mounted on
/dev/ad0s1a             992M        51M       932M        5%  /
devfs                    16K        16K         0B      100%  /dev/
/dev/vn0                 14M        14M         0B      100%
/dev/vn1                 57M        57M         0B      100%
/dev/vn2                6.4M       6.4M         0B      100%
/dev/vn3                2.9M       2.9M         0B      100%
/dev/vn4                 20M        20M         0B      100%
/dev/vn5                8.3M       8.3M         0B      100%
/dev/vn6                9.6M       9.6M         0B      100%
mfs:260                 1.5G       8.0K       1.4G        0%  /tmp
mfs:267                 1.5G       248K       1.4G        0%  /mfs
/dev/ad0s1e             189M       5.0K       187M        0%  /config
procfs                  4.0K       4.0K         0B      100%  /proc
/dev/ad1s1f             7.7G       6.6M       7.1G        0%  /var

After this I went from 8.1 to 9.3 and then to 10.4 and everything looks good :)

SPAN, RSPAN, Layer 2 control packets and VLANS

I know the title is quite a mouthful, but I did want to cover all the above in this post. Daniel asked me to check a few things as I have ready access to real switches.

You learn in your studies that layer 2 control packets are ‘special’ – Special in the way that traffic going over the trunk between 2 switches does not follow the standard practice. Let’s use wireshark to see exactly what is going on in a bunch of scenarios. It’ll also give me the opportunity to do a bit of testing with SPAN and RSPAN.

Let’s use the basic topology:

Let’s first set up a span session on the 3750. I will monitor port gi1/0/9 in both directions and send that traffic to gi1/0/24 to be picked up by the laptop.

monitor session 1 source interface Gi1/0/9
monitor session 1 destination interface Gi1/0/24

The first thing I noticed when I plug in my laptop however is that Windows of course is very noisy. Already my capture is filling up with stuff that Windows is sending out. And so I’ve downloaded an NST ISO which I’ll listen with on the laptop.

So now that I’ve booted up into NST and got Wireshark running, I hardly see anything at all happening between the 2 switches. Where is all the layer 2 control traffic? Well the problem is that control traffic is not automatically replicated to a SPAN port. You need to enable encapsulation replication in order for it to work. Let’s do so:

C3750#conf t
C3750(config)#monitor session 1 destination interface gi1/0/24 encapsulation replicate

Let’s verify:

C3750#sh monitor session 1
Session 1
Type                   : Local Session
Source Ports           :
    Both               : Gi1/0/9
Destination Ports      : Gi1/0/24
    Encapsulation      : Replicate
          Ingress      : Disabled

I can now see lot’s of control traffic in my Wireshark capture.

Both switches are already connected to each other. By default they’ll create a trunk link and vlan 1 will be the native vlan. I’ll then configure the switches to tag the native vlan and see what happens.

Let’s ensure the native vlan is not currently tagged:

C3750#sh vlan dot1q tag native
dot1q native vlan tagging is disabled

So what does my CDP/STP/DTP control packets look like in Wireshark? Note that I’m running the default mode of STP on the switch for now
I do see something odd. I am seeing an STP packet that has a dot1q tag of 1, 10 and an untagged packet. 10 I can understand because I have created vlan 10 and it has a separate STP instance. But why would the main one be tagged with vlan 1 if vlan 1 is the native vlan?
dot1q vlan 1

dot1q vlan 10

no dot1q tag

What about DTP and CDP?


Both CDP and DTP are currently sent with no vlan tag at all. CDP does carry information about the native vlan in it’s packet which is why CDP does complain when these don’t match on either end. But the important thing is that both are untagged.

Let’s now tag the native vlan and see what happens.

C3750(config)#vlan dot1q tag native
C3750#sh vlan dot1q tag native
dot1q native vlan tagging is enabled

Let’s bring up the interfaces again and see what we see.


Interesting. DTP has not tag, but CDP is using a tag of 1.

What about STP?

I’m seeing the same as what I saw above. I see a tagged vlan 1 STP frame, a tagged vlan 10 STP frame, and finally an untagged STP frame

What happens if I change the native vlan to 10? Well no need to paste output because it’s exactly the previous example. i.e. vlan 10 is now the native vlan and tagged, but CDP is still using a tag value of 1. STP and DTP remain unchanged.

Now let’s try something else. Let’s keep vlan 10 as the tagged native, but let’s remove vlan 1 from the trunk:

interface GigabitEthernet1/0/9
 switchport trunk encapsulation dot1q
 switchport trunk native vlan 10
 switchport trunk allowed vlan 2-4094

DTP is unchanged. i.e. it’s still sending untagged traffic. CDP however is sending tagged traffic. In vlan 1!

As a quick test I created int vlan 1 on both switches in the same subnet and tried to ping accross. I could not. Therefore it looks like Cisco will use vlan 1 tagged to send certain control data, even if vlan 1 is pruned, but no user data will be allowed on that vlan.

For STP I still have the same 3 outputs. A tagged vlan 1 STP frame, A tagged vlan 10 STP frame, and an untagged STP frame.

One last thing I wanted to test now was RSPAN config. I’ve always been a little confused as to the correct config on a switch that is the RSPAN end-point, and is also sending traffic to be monitored. i.e. Let’s say that the 3550 above is monitoring traffic on vlan 2 with a destination of remote span vlan 500. The 3750 is the rspan endpoint who monitoring rspan vlan 500 and sends it out to a local port on the switch. What happens if the 3750 is also monitoring vlan 2 on it’s own ports and sending out. Do we configure the destination to vlan 500 or straight to a local port?

Let’s configure it like so:

C3750#sh monitor session all
Session 1
Type                   : Remote Source Session
Source Ports           :
    Both               : Gi1/0/9
Dest RSPAN VLAN        : 500

Session 2
Type                   : Remote Destination Session
Source RSPAN VLAN      : 500
Destination Ports      : Gi1/0/24
    Encapsulation      : Replicate
          Ingress      : Disabled

I’ve tested sending it to RSPAN vlan 500 and I don’t see any traffic at all. As soon as I change it to send traffic directly to the port it works.

EDIT (05/06/12) – I’ve uploaded my captures to Cloudshark so you can take them apart to do your own research
Untagged native vlan 1
Tagged native vlan 1
Tagged native vlan 10
Tagged native vlan 10 with vlan 1 removed from trunk