Wednesday, November 20, 2013

Finally solved my Wii-U eShop error issue!

We've been trying to purchase the Pikmin 3 DLC for the last week or so. Everytime we get through about half way through the eShop purchase, the eShop app gets an error code (useless!) and exits.  Sometimes we get further than other times, and once it fails, it seems to fail faster.

Using the Nintendo Wii-U eShop seems to be a common problem found on the Nintendo forums. Originally it was an issue with the Wifi reception. But there still seems to be a few cases here and there, and we use a wired connection. I was stumped. It didn't seem to matter what time of day either.

So today I was looking at my firewall logs and noticed:
[4582372.494624] iptables drop ratelimit: 
IN=eth0 OUT= MAC=... SRC= DST=... LEN=93 TOS=0x00 PREC=0x20 
TTL=59 ID=27348 DF PROTO=TCP SPT=443 DPT=4040 WINDOW=18666 RES=0x00 ACK PSH URGP=0 
A lot of those.  That's weird.  That IP is an akamai address.  And I seem to get a lot from them from akamai's port 443 to a random port on my system.

Akamai says:
Our firewall has detected that Akamai-controlled IP addresses are attempting to access our IP address via a number of different ports. This seems to be an attack. What is going on? 

The messages you see indicate that users behind your firewall are running the Akamai NetSession Interface. The Akamai NetSession Interface is a download manager client that is used on behalf of an Akamai customer to download software or other digital content. The Akamai NetSession Interface uses both TCP and UDP based protocols to download content and facilitate connectivity through network devices such as proxies, firewalls & NAT (network address translation) devices.
Huh! I wonder if Nintendo is using a similar download manager protocol in the eShop for the interface.  Or maybe I have downloads downloading in the background that is triggering a firewall rule.

Now about that rule -- the rule I have is that if you send me enough packets that I decide to drop, I blacklist the IP:
-A INPUT_FORWARD_FW -i eth0 -j DROP -m recent --set --name badguys     
 Combined with the following to just drop all of their packets:
-A INPUT_FORWARD_FW -i eth0 -m recent --rttl --name badguys --update --seconds 60 -j DROP        
Once I removed the first rule, I was immediately able to complete an eShop transaction.

So I wonder how many people out there who are having trouble with the eShop are having trouble from smart firewall routers (mine is Linux) that blacklist IP addresses if they try to contact blocked ports too often.

...And tomorrow we shall try out the new Pikmin levels!  :)

Saturday, October 5, 2013

Western Digital Green Drives and Linux - They're dying!

The last time I swapped the drives out in my home built NAS, I bought Western Digital Green drives.  They sounded great!  Low power, low heat, low noise...

Unfortunately, they appear to not have been built for this particular application.  They auto park and the timer is set too low for Linux filesystems.  (I believe the default may be 8 seconds!

My drives began to give me trouble (thankfully) even before the warranty ran out.  I found the load cycle count on my drive to be extremely high - 10's or 100's of thousands.  This is way beyond the expected count.

Here's how to query for it:

# smartctl -A /dev/sdX | grep ^193
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       XXXXXXX

Luckily there is a solution -- run this tool to either turn off the feature, or set the timeout much higher:

Make sure you follow the instructions closely -- you must power cycle the drives!

Now every time I boot, I run this script as part of the boot, just in case I forget if I put a new WD Green in.

for dev in /dev/sd[a-z]
   if ! hdparm -i $dev | grep Model=.*SSD
       if hdparm -i $dev | grep -q Model=WDC
           echo Disabling western digital 8sec park
           /opt/idle3-tools/sbin/idle3ctl -d $dev
       echo Setting timeout for non SSD drive
       hdparm -S 251 $dev

(WD has been mostly sending me back WD Red Drives, which is great as I believe they are marketed for NAS.)

SATA errors on Linux with Samsung SSD 840 Series with Asus M2N-E

This problem had been driving me crazy for weeks.  I run a home linux server (currently Fedora Core 18) as a NAS and various other services.   Ever since I upgraded the OS disks to SSD I began getting SATA errors in my kernel logs:

Jul 31 00:02:26 kernel: [20859.208310] ata3:EH in SWNCQ mode,QC:qc_active 0x7E sactive 0x7E
Jul 31 00:02:26 kernel: [20859.208371] ata3: SWNCQ:qc_active 0x3E defer_bits 0x40 last_issue_tag 0x5
Jul 31 00:02:26 kernel: [20859.208482] ata3: ATA_REG 0x41 ERR_REG 0x84
Jul 31 00:02:26 kernel: [20859.208533] ata3: tag : dhfis dmafis sdbfis sactive
Jul 31 00:02:26 kernel: [20859.208585] ata3: tag 0x1: 1 0 0 1
Jul 31 00:02:26 kernel: [20859.208636] ata3: tag 0x2: 1 0 0 1
Jul 31 00:02:26 kernel: [20859.208697] ata3: tag 0x3: 1 0 0 1
Jul 31 00:02:26 kernel: [20859.208748] ata3: tag 0x4: 1 0 0 1
Jul 31 00:02:26 kernel: [20859.208800] ata3: tag 0x5: 0 0 0 1
Jul 31 00:02:26 kernel: [20859.208860] ata3.00: exception Emask 0x1 SAct 0x7e SErr 0x0 action 0x6 frozen
Jul 31 00:02:26 kernel: [20859.208914] ata3.00: Ata error. fis:0x21
Jul 31 00:02:26 kernel: [20859.208967] ata3.00: failed command: READ FPDMA QUEUED
Jul 31 00:02:26 kernel: [20859.209049] ata3.00: cmd 60/08:08:e8:23:bb/00:00:04:00:00/40 tag 1 ncq 4096 in
Jul 31 00:02:26 kernel: [20859.209251] ata3.00: status: { DRDY ERR }
Jul 31 00:02:26 kernel: [20859.209303] ata3.00: error: { ICRC ABRT }
Jul 31 00:02:26 kernel: [20859.209354] ata3.00: failed command: READ FPDMA QUEUED
Jul 31 00:02:26 kernel: [20859.209410] ata3.00: cmd 60/08:10:d8:26:bb/00:00:04:00:00/40 tag 2 ncq 4096 in
Jul 31 00:02:26 kernel: [20859.209605] ata3.00: status: { DRDY ERR }
Jul 31 00:02:26 kernel: [20859.209675] ata3.00: error: { ICRC ABRT }
Jul 31 00:02:26 kernel: [20859.209734] ata3.00: failed command: READ FPDMA QUEUED

Since the spinning SATA disks I replaced also gave me errors (I assumed they were dying...), it was a bit of a mystery.  Is the SATA controller dying?  The errors did seem to correlate with greater disk utilization.

I swapped the cables.  I swapped the port.  For a day I convinced myself the problem was associated with the port.  I forced the kernel to throttle the SATA to 1.5Gb/s (libata.force=1.5G).  I tried libata.noacpi=1.

Finally I found the answer: libata.noncq

It's even a bit obvious now in the log messages: SWNCQ

Apparently newer drives let the kernel offload the write ordering to the drives.  After all, the drive knows its physical properties better than the kernel:
I did file a bug:

Now that I've added libata.nonncq to the kernel command line, my errors have gone away.  I'll try removing it someday when I upgrade the motherboard or OS.  I suspect it is the sata controller on the motherboard.

When to ask for help when stuck on a technical problem

I thought this blog post by +Matthew Ringel was a concise and useful summary of how (and when) to ask for help when stuck on a technical problem:
I often find coworkers skipping step #1: working at the problem a little longer and documenting/recording/reviewing what you've already tried.

When I do step #1 I often solve the problem myself.  I might be in the middle of an email explaining what the problem is.  I owe it to them to explain what I've tried, what results I got, etc.  Most of the time the solution then presents itself!  I then just delete the email draft and keep plugging away.

But then sometimes I wait too long for step #3 - going and asking for help.

And then sometimes, I just need a rubber duck:

Removing Embedded JPGs from Nikon NEF Files with Exiftool

I've been migrating away from Capture NX2 to Lightroom for editing my raw NEF's.  But I'm not quite ready to convert completely from NEF to DNG (Digital Negatives.)  I still might want to edit the photo in Capture - the control points are just too useful.  One tempting advantage of DNG is that they are reportedly a little smaller than NEF.

Why are the DNGs smaller?  I believe it is due to the NEF's embedded jpgs.  But Lightroom doesn't really need the jpeg rendering that is stored in the NEF and I could always recreate them later.  So how can I drop them?


+Jeffrey Friedl's blog post at put me on the right track, but I think his information is out of date now.  I found that my NEF had three jpgs:
  • JpgFromRaw (Full Size!)
  • OtherImage (Fairly large!)
  • PreviewImage (Thumbnail-ish)
Let's see how big the are in an NEF that is 44133771 bytes:

$ exiftool -list D8H_2754_20131001_183719.NEF  | egrep Binary\ data
Jpg From Raw                    : (Binary data 3492514 bytes, use -b option to extract)
Other Image                     : (Binary data 858341 bytes, use -b option to extract)
Preview Image                   : (Binary data 99052 bytes, use -b option to extract)&nbsp

I first attempted to just replace the JpgFromRaw with the PreviewImage.  That worked, but then I would be just duplicating the jpg -- but here is how you do it:

$ exiftool -v -JpgFromRaw\<PreviewImage 2754_20131001_183719.NEF
$ exiftool -v -OtherImage\<PreviewImage 2754_20131001_183719.NEF

So how do I just delete them?  Just delete the tags:

$ exiftool -JpgFromRaw= -OtherImage= -overwrite_original_in_place -P 2754_20131001_183719.NEF

(I'm leaving the smallest (~100KB) PreviewImage)

I ran this on a folder of about 11GB of NEFs and when done, the folder was 9.1GB.  That's 17% smaller!

I'm going to limit it to this folder for now, but will expand this to other parts of my archives as I gain more confidence that I really don't need the embedded jpgs.

Update:  This requires exiftool 9.03 -> Topic: "Otherimage" in NEFs of D800