A sense of A.I. in business: 2018

Solution to MobaXterm X11 proxy: Authorisation not recognised

While using remote SSH client MobaXTerm to open up X-11 forwarded GUI app with root privileges, an error message pops up:

MobaXterm X11 proxy: Authorisation not recognised

The GUI app actual crashed with this error and not showing up on client desktop. Here's the solution:

From your MobaXTerm SSH client console after login:

$
$ sudo xauth add $(xauth -f /home/[USER]/.Xauthority list|tail -1)
$

According to mobatek's blog,

We receive a lot of emails asking how to keep X11-forwarding working after changing user to root inside a SSH session in MobaXterm. This is by default not allowed on Unix/Linux systems, because the X11 display connection belongs to the user you used to log with when connecting to your remote SSH server. X11-forwarding mechanism does not allow anyone to use the open display.
However, in some cases you may need to start a graphical application like nedit or firefox in a sudo or su context. In order to achieve this, you could manually retrieve X credentials in the su/sudo context by looking up the “xauth list” for the original username and then adding them using “xauth add” to the current context.
You can also use a single (magic) command in order to achieve this!

The single line of command helps solving problems whenever you need to have a working X11 display through SSH after becoming root.

Update:

The aforementioned method would not survive another SSH login as MIT-MAGIC-COOKIE always changes during user login. The cookie previously copied to ROOT account will become obsolete.

To permanently resolve this, we need root privilege to ADD the following statement to the end of the file /etc/sudoers:

#
Defaults env_keep += "XAUTHORITY"
#

This is enough to allow sudo opening Xorg program. Also, env_keep actually points to environment variable "XAUTHORITY" which is the location of .Xauthority. So you may add the following EXPORT statement to ~/.bashrc or set it globally in the file /etc/environment (CentOS 7)

#
EXPORT XAUTHORITY=/home/[YOUR_USERNAME]/.Xauthority
#

This method ensure that remote sudo opening works even after next SSH login or SSH server reboot.

CentOS: Remote SSH libGL error: failed to load swrast driver

While I was trying to load up some GUI apps from remote SSH server with X11 forwarding and the error pops up:

libGL error: failed to load driver: swrast

To debug it, try the following command:

$ LIBGL_DEBUG=verbose glxinfo | grep renderer

Clearly, this is an openGL issue over SSH remote client. After searching many posts around on the forums, there is one promising solution found here.

Simply export an variable in SSH terminal solves the issue:

$ export LIBGL_ALWAYS_INDIRECT=1

Python: Custom global fonts for Matplotlib without installation

To use custom TTF font on our own, there are similar posts up on the forums. However, none of them works for my case except this one.

Here's the highlight:

Dynamically add custom font to current Python script outside of default system font folder, especially on Linux
No big changes to the rest of Python source about adding fontsproperties *kwargs to each text/label statement

Here's example:

        # Set global font style
        mpl = matplotlib
        fm = matplotlib.font_manager
        basedir = os.path.join(os.path.dirname(__file__), 'static', 'fonts')
        fpath = os.path.join(basedir,'times.ttf')
        prop = fm.FontProperties(fname=fpath)

        font_files = fm.findSystemFonts(fontpaths=basedir)
        font_list = fm.createFontList(font_files)
        fm.fontManager.ttflist.extend(font_list)

        logging.debug('Register font family: %s' % (prop.get_name()))
        mpl.rcParams['font.family'] = prop.get_name()
        mpl.rcParams.update({'font.size': '10'})
        
        plt.switch_backend('agg')

This code snippet assigns the basedir with current Python script's location as reference and sub-folder structure like:

[current_folder]/static/fonts/

The best of this is to create custom font list and extend it to current font manager's TTF List. It means it will apply the custom font to existing font family within current Python script as a whole. No more hassle to things to any other statements and it works like the default font.

This example uses Times New Roman TTF font. You can use whatever TTF font you like.

Compiling custom version of PHP on CentOS 7 with non-root privilegs

I was trying to test a PHP webapp up on commodity server. Some PHP extensions are not in place while Apache's config files are out of touch due to non-root user privilege.

After a couple of google searches, the closest thing would be how to custom compile PHP engine on web hosting server which fits the scenario of non-root user installation.

First, check out the git source from here:
https://github.com/php/php-src

So far I have been on PHP v7.2.11 and the outcome is satisfied after source compiling.

Second, choose and create a destination directory for new PHP implementation which can be something like:
/home/user/local

Third, steps to compile:

$
$ cd {PHP source directory}
$ make clean
$ ./configure \
  --prefix=/home/user/local \
  --enable-calendar \
  --enable-pcntl \
  --enable-shmop \
  --enable-sockets \
  --enable-mbstring \
  --enable-bcmath \
  --with-gd \
  --with-curl \
  --with-openssl \
  --with-xmlrpc \
  --enable-soap \
  --enable-zip \
  --enable-opcache \
  --with-gd \
  --with-jpeg-dir \
  --with-png-dir \
  --with-mysqli \
  --enable-pcntl \
  --with-pdo-mysql \
  --with-pdo-sqlite \
  --with-pgsql \
  --with-freetype-dir \
  --enable-intl \
  --with-xsl \
  --with-zlib \
  --enable-simplexml \
  --with-sqlite3 \
  --enable-xmlreader \
  --enable-xmlwriter \
  --with-gettext \
  --with-gdbm
$ make
$ make install

Once done, the files will be placed in the directory specified by the flag --prefix.

Please note that a specific PHP target directory is set according to destination directory, i.e., /home/user/local for installation which prevents overwriting default/previous PHP implementation even under non-root privileges.

For fcgid implementation, the following files should be created within the parent directory of targeted PHP app where index.php noramlly stays:

This .htaccess target the wiki webapp whereas first 3 lines may be omitted.

############################ File ".htaccess" (permission:755) ###########################################

RewriteEngine On
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule (api|load)\.php - [L]

Options +Indexes +FollowSymLinks +ExecCGI
AddHandler php-fastcgi72 .php
Action php-fastcgi72 {php_webapp_directory}/php7.fcgi

# DISABLE CACHING

 Header set Cache-Control "no-cache, no-store, must-revalidate"
 Header set Pragma "no-cache"
 Header set Expires 0

############################ End of File ".htaccess" ###########################################

File "php7.fcgi" must be assigned an execute permission to Apache user/group.
############################ File "php7.fcgi" (permission:775)###########################################

#!/bin/bash

export PHP_INI_SCAN_DIR="/home/user/local/lib/php.d"
export PHP_FCGI_CHILDREN=4
export PHP_FCGI_MAX_REQUESTS=10000
exec /home/user/local/bin/php-cgi -c /home/user/local/lib

############################ End of File "php7.fcgi" ###########################################

I was having trouble to enable opcache for PHP in FCGI mode. However, enabling opcache may not be a good idea for PHP script running in CGI mode (please read here: https://ma.ttias.be/how-to-clear-php-opcache/). Anyway, it's up to your own desire to try improving PHP's performance.

Integrating Dash and Flask with DispatcherMiddleware

The perfect situation is that the existing Flask app in Python is playing nicely while new project using Plotly Dash would like to join in. It's absolutely possible to make use of Dash's built-in Flask infrastructure to run its own server. Nonetheless, my situation is that an existing Flask app is here-to-stay and I don't want to give it a big change.

So the question is: what if we want to run both apps in parallel?

Package werkzeug.wsgi provides DispatcherMiddleware class to let multiple apps running concurrently under a parent app. All things work at Flask's middleware level and it basically shares the same port to the children apps.

Here's the code snippet:

# Assuming Flask object "app" is created for the original app
#

import dash
from werkzeug.wsgi import DispatcherMiddleware
from werkzeug.serving import run_simple
import dash_core_components as dcc
import dash_html_components as html
from dash.dependencies import Input, Output

app = Flask(__name__)
# Assuming source code for original Flask "app" is already in place
# ...

dash_app = dash.Dash(__name__)
dash_app.config.supress_callback_exceptions = True
dash_app.css.config.serve_locally = True
dash_app.scripts.config.serve_locally = True
dash_app.config.update({
    'routes_pathname_prefix':'/app2/',
    'requests_pathname_prefix': '/app2/'
})
dash_app.layout = html.Div([html.H1('Hello world')])

# Wrapping original "app" with DispatcherMiddleware
app.wsgi_app = DispatcherMiddleware(app.wsgi_app, {'/app2': dash_app.server})

# Entry point
if __name__ == '__main__':
    run_simple('0.0.0.0', 80, app, use_debugger=True, use_reloader=True, threaded=True)

To link up both apps, we need to make sure "app.wsgi_app" and "dash_app.server" are called correctly within DispatcherMiddleware.

Also, run_simple() method plays an important role here to ease the pain of complicated configurations and avoid unnecessary error messages. You might not want to use app.run() again after all.

The beauty is that the original Flask app remains the same while app2 or even app3 can load in for testing, of course, under the same hostname and port number.

Let's imagine the URLs can be accessed like these:

# Original Flask app
http://localhost

# app2 using Dash
http://localhost/app2/

# app3 and so on...
http://localhost/app3/

Custom build Tensorflow to support Intel CPU specific instruction set on CentOS 6

Custom build Tensorflow is getting popular as developers have started to push their hardware limit to extreme over the standard build which has no acceleration feature enabled on commodity machines (generally without a decent GPU with compute capability over 3.0).

Different machines have different hardware specifications but most likely the commodity servers have a high chance to equip with powerful multi-core Intel CPU which can go down to the pathway of compiled build against Intel MKL library, e.g. a boost to the deep learning performance via CPU.

CentOS 6.9 comes with older version of GCC compiler which is not helpful in compiling recent version of Tensorflow. So, first thing to do is installing newer version of GCC compiler:

$ yum install 
"http://ftp.scientificlinux.org/linux/scientific/6x/external_products/softwarecollections/yum-conf-softwarecollections-2.0-1.el6.noarch.rpm"
$ yum install devtoolset-6

Here's how Tensorflow CPU version is compiled on CentOS 6.9 platform:

Assuming we are working in CONDA environment:

$ conda activate $CONDA_ENVIRNMENT_NAME
$ scl enable devtoolset-6 bash
$ cd tensorflow
$ bazel build --linkopt=-lrt --config=mkl --copt="-DEIGEN_USE_VML" -c opt //tensorflow/tools/pip_package:build_pip_package
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package ../tensorflow_pkg
$ pip install --upgrade --user ../tensorflow_pkg/<wheel_name.whl>

Funny thing is that Intel has similar instruction which actually recommend installing Tensorflow via CONDA command:

$ conda install tensorflow-mkl

They claimed that warning message like:

Warning: “The TensorFlow library was not compiled to use **** instructions, but these are available on your machine and could speed up CPU computations.”

is actually not harmful at all. As we are installing things along with MKL-DNN library. This warning can be ignored since Intel MKL-DNN library with which the TensorFlow is compiled utilizes the latest Instruction sets available in your processor to perform compute-intensive operations.

It's up to you to trying either easy or hard way to set things up. The performance may not vary too much after all.

Deep Learning: It's about information bottleneck

A theory comes out to demystify Deep Learning. This seems to explain how Deep Learning works behind the scene. The graph shows the progress of how a deep neural network evolves during various stages.

Ref: https://www.quantamagazine.org/new-theory-cracks-open-the-black-box-of-deep-learning-20170921/

Photos app on MacOS Sierra causes quitting unexpectedly upon opening

Problem came one after another on my old MacBook which showed a sign of data corruption for Photos app. First thing first, it was still on macOS Sierra but Photos app might have gone through recent update.

A message popped up when opening Photos app and required to have a repair on Photos' library since a recovery from time-machine backup. I wouldn't mind if it could fix it so I tried. After the repair operation, Photos app could no longer be opened again. Along with the crash, an error message said 'Photos quits unexpectedly'.

After reading a post about similar thing, it seems to be an issue of unmatched dynamic library causing Photos' crash.

/Library/Caches/com.apple.xbs/Sources/PhotoApp_SubFrameworks/PhotoApp-3161.4.14 0/lib/photolibrary/PhotoLibraryPrivate/PhotoLibraryPrivate.m:23

One explanation:

The cached version of a dynamic library is newer than the installed version. Try first to start your Mac in Safe mode to clear any caches, try to run Photos once in Safe Mode,see: Use safe mode to isolate issues with your Mac - Apple Support

To tackle compatibility issue of Photos app, two options are available:

1. Clear library cache in safe mode in the hope that Photos app will run again for further backup. But it looks like a temporary solution. I got a feeling that I might need to upgrade to macOS High Sierra for a remedy if there is something related newer Photos app.

2. Reinstall MacOS on top of running system (assuming the same version) in order to reinstall Photos app (as one of the builtin apps). Make sure a full time-machine backup is stored somewhere for recovery.

For option 1, it needs to clarify what is safe mode:

Safe mode (sometimes called safe boot) is a way to start up your Mac so that it performs certain checks and prevents some software from automatically loading or opening. Starting your Mac in safe mode does the following:
Verifies your startup disk and attempts to repair directory issues, if needed
Loads only required kernel extensions
Prevents startup items and login items from opening automatically
Disables user-installed fonts
Deletes font caches, kernel cache, and other system cache files
If your Mac has an issue that goes away when you start up in safe mode, you might be able to isolate the cause.

So here's how to start up in safe mode:

Start or restart your Mac, then immediately press and hold the Shift key. The Apple logo appears on your display. If you don't see the Apple logo, learn what to do.
Release the Shift key when you see the login window. If your startup disk is encrypted with FileVault, you might be asked to log in twice: once to unlock the startup disk, and again to log in to the Finder.

To leave safe mode, restart your Mac without pressing any keys during startup.

For option 2, reinstalling an app which came with macOS by using reinstall macOS option in recovery mode does not erase user information. So, hopefully, those valuable personal information like photos and videos could be retained this way. Of course, it's absolutely important to have a full backup at first instance.

disk0s2: I/O error again - Seriously?

My MacBook white cannot boot up and shut down during boot up process. I decided to handle it in Single Mode using boot up keys Command + S.

After a couple rounds of fsck checkups, it still showed up with error like "disk0s2: I/O error".

It was hopeless that fsck didn't try the best to repair the problems. I felt terrified when the other users suggests things like Reinstall Mac OS or buying a big external drive to backup what I have on the hard drive which was suspected to be damaged.

One post with a success gave me hope while I bumped into a post which suggested that fsck can force to repair errors with special options.

Pressing Command + S during boot up to enter Single mode and try the following command:

$ /sbin/fsck_hfs -dryf /dev/disk0s2

The device name disk0s2 may be varying depending on your MacBook model and configurations. As discussed on the forum, this command may need to run multiple times to have a successful repair.

After a series of 'orphaned file hard link', 'Missing thread record', 'Invalid directory count' messages, messages like '*****The volume was modified ****' and 'repaired successfully' was finally shown up on the screen. It's now ready to reboot the machine to see if the system boots properly.

$ reboot now

Here's the reference of fsck_hfs command:

fsck_hfs
File System check for HFS and HPFS+ (high performance file systems
fsck_hfs -q [-df] special ...# check if clean unmount
fsck_hfs -p [-df] special ... # check for inconsistencies only

fsck_hfs [-n | -y | -r] [-dfgxlES] [-D flags] [-b size] [-B path] [-m mode] [-c size] [-R flags] /dev/disknsp …

# repair inconsistencies

Example:

> sudo /sbin/fsck_hfs -d -D0x33 /dev/disk0s10
journal_replay(/dev/disk0s10) returned 0
** /dev/rdisk0s10
Using cacheBlockSize=32K cacheTotalBlock=32768 cacheSize=1048576K.
Executing fsck_hfs (version hfs-305.10.1).
** Checking non-journaled HFS Plus Volume.
The volume name is untitled
** Checking extents overflow file.
** Checking catalog file.
** Checking multi-linked files.
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking volume bitmap.
** Checking volume information.
** The volume untitled appears to be OK.
CheckHFS returned 0, fsmodified = 0
Preens the specified file systems. Started by fsck(8) from /etc/rc.boot during boot
Fix common inconsistencies for file systems that were not unmounted cleanly. If more serious problems are found, fsck_hfs does not try to fix them, indicates that it was not successful, and exits.

With no options check and attempt to fix the specified file systems.

-d debugging information.
-D flags extra debugging information.
0x0001 Informational
0x0002 Error
0x0010 Extended attributes
0x0020 Overlapped extents
0x0033 include all
-b bytes size of the physical blocks used by -B
-B path Output the files containing the physical blocks listed in the file path.
The file contains decimal, octal (with leading 0) or hexadecimal (with leading 0x) physical block numbers, separated by white space, relative to the start of the partition, For block numbers relative to the start of the device, subtract the block number of the start of the partition.
The size of a physical block is given with the -b option; the default is 512 bytes per block.
-f with -p force check of `clean' file systems,
otherwise force check and repair journaled HFS+ file systems.
-g generate output strings in GUI format. This option is used when another application with a graphical user interface (like Mac OS X Disk Utility) is invoking the fsck_hfs tool.
-x generate output strings in XML (plist) format. implies -g
-l Lock down the file system (not limit parallel check as in other versions of fsck> and perform a test-only check. This makes it possible to check a file system that is currently mounted, although no repairs can be made.
-m rwxrwxrwx permissions for the lost+found directory if it is created (suggest 700 ed). orphaned files and directories are moved to the lost+found directory (located at the root of the volume). The default mode is 01777.(bad)!
-c size size of the cache used by fsck_hfs internally. Bigger size can result in better performance but can result in deadlock when used with -l. Decimal, octal, or hexadecimal number.
If the number ends with a k,m or g
-p Preen the specified file systems.
-q Causes fsck_hfs to quickly check whether the volume was unmounted cleanly. If the volume was unmounted cleanly, then the exit status is 0. If the volume was not unmounted cleanly, then the exit status will be non-zero. In either case, a message is printed to standard output describing whether the volume was clean or dirty.
-y Always attempt to repair any damage that is found.
-n Never
-E exit (with a value of 47) if it encounters any major errors. A ``major error'' is considered one which would impact using the volume in normal usage; an incon- sistency which would not impact such use is considered ``minor'' for this option. Only valid with the -n option.
-S scan the entire device looking for I/O errors. It will attempt to map the blocks with errors to names, similar to the -B option.
-R flags Rebuilds the requested btree. The following flags are supported: a Attribute btree
c Catalog btree
e Extents overflow btree
Requires free space on the file system for the new btree file, and if fsck_hfs is able to traverse each of the nodes in the requested btree successfully. Rebuilding btrees is not supported on HFS Standard volumes.
-r Rebuild the catalog btree. This is synonymous with -Rc. Because of inconsistencies between the block device and the buffer cache, the raw device should always be used.
Example:
> sudo fsck_hfs -l -d -D 0x0033 -B ~/.profile /dev/disk0s8
0 blocks to match:
** /dev/rdisk0s8 (NO WRITE)
Using cacheBlockSize=32K cacheTotalBlock=32768 cacheSize=1048576K.
Executing fsck_hfs (version hfs-305.10.1).
** Performing live verification.
** Checking Journaled HFS Plus volume.
The volume name is DATA
** Checking extents overflow file.
** Checking catalog file.
** Checking multi-linked files.
Orphaned open unlinked file temp7479645
** Checking catalog hierarchy.
** Checking extended attributes file.
** Checking volume bitmap.
** Checking volume information.
invalid VHB attributesFile.clumpSize
Volume header needs minor repair
(2, 0)
Verify Status: VIStat = 0x8000, ABTStat = 0x0000 EBTStat = 0x0000
CBTStat = 0x0000 CatStat = 0x00000000
Volume header needs minor repair
(2, 0)
Verify Status: VIStat = 0x8000, ABTStat = 0x0000 EBTStat = 0x0000
CBTStat = 0x0000 CatStat = 0x00000000
** The volume DATA was found corrupt and needs to be repaired.
volume type is pure HFS+
primary MDB is at block 0 0x00
alternate MDB is at block 0 0x00
primary VHB is at block 2 0x02
alternate VHB is at block 88769454 0x54a83ae
Volume header needs minor repair
(2, 0)
Verify Status: VIStat = 0x8000, ABTStat = 0x0000 EBTStat = 0x0000
CBTStat = 0x0000 CatStat = 0x00000000
** The volume DATA was found corrupt and needs to be repaired.
volume type is pure HFS+
primary MDB is at block 0 0x00
alternate MDB is at block 0 0x00
primary VHB is at block 2 0x02
alternate VHB is at block 88769454 0x54a83ae
sector size = 512 0x200
VolumeObject flags = 0x07
total sectors for volume = 88769456 0x54a83b0
total sectors for embedded volume = 0 0x00
CheckHFS returned 7, fsmodified = 0
> echo $?
8
EXIT VALUES
0 No errors found, or successfully repaired.
3 A quick-check (the -n option) found a dirty filesystem; no repairs were made.
4 During boot, the root filesystem was found to be dirty; repairs were made, and the filesystem was remounted. The system should be rebooted.
8 A corrupt filesystem was found during a check, or repairs did not succeed.
47 A major error was found with -E.

Here's the LDAP Servers and where to find them

It comes in handy to search for heaps of unknown LDAP servers in local domain via nslookup.

Here's the command to find them on Windows/Linux:

#
# Windows 
C:\> 
C:\> nslookup -type-srv _ldap._tcp.dc._msdcs.[DOMAIN_NAME]
#
# Linux
$
$ nslookup -type-srv _ldap._tcp.dc._msdcs.[DOMAIN_NAME]

whereas [DOMAIN_NAME] is the real name of the local domain

Leap year handling in Python

According to Wikipedia, A leap year (also known as an intercalary year or bissextile year) is a calendar year containing one additional day (or, in the case of lunisolar calendars, a month) added to keep the calendar year synchronized with the astronomical or seasonal year. Because seasons and astronomical events do not repeat in a whole number of days, calendars that have the same number of days in each year drift over time with respect to the event that the year is supposed to track. By inserting (also called intercalating) an additional day or month into the year, the drift can be corrected. A year that is not a leap year is called a common year.

In other words, a normal year has 365 days while a Leap Year has 366 days (the extra day is the 29th of February).

Having been discussing with a few scientists about datetime handling in programming language, we do have a question if the concept of leap year and second is really applied to common programming languages, especially Python. I have been satisfied with Python's strength and capability, just not so sure about how well it deals with the calendar issue like leap year and second. To illustrate whether Python can handle leap year correctly, here's the examples:

In [2]: import datetime

In [3]: datetime.datetime(2011, 2, 28) + datetime.timedelta(days=10)
Out[3]: datetime.datetime(2011, 3, 10, 0, 0)

In [4]: datetime.datetime(2011, 2, 28) + datetime.timedelta(days=1)
Out[4]: datetime.datetime(2011, 3, 1, 0, 0)

In [5]: datetime.datetime(2012, 2, 28) + datetime.timedelta(days=1)
Out[5]: datetime.datetime(2012, 2, 29, 0, 0)

As a highlight, there were only 28 days in February in 2011 while there were 29 days in February in 2012. Sounds good.

So how about datetime object itself?

According to Python library manual, a datetime object is a single object containing all the information from a date object and a time object. Like a date object, datetime assumes the current Gregorian calendar extended in both directions; like a time object, datetime assumes there are exactly 3600*24 seconds in every day.

Okay, so what about Leap Seconds?

According to a post in September 2016 on Stackoverflow, leap seconds are occasionally manually scheduled. Currently, computer clocks have no facility to honour leap seconds; there is no standard to tell them up-front to insert one. Instead, computer clocks periodically re-synch their time keeping via the NTP protocol and adjust automatically after the leap second has been inserted.
Next, computer clocks usually report the time as seconds since the epoch. It'd be up to the datetime module to adjust its accounting when converting that second count to include leap seconds. It doesn't do this at present. time.time() will just report a time count based on the seconds-since-the-epoch.
So, nothing different will happen when the leap second is officially in effect, other than that your computer clock will be 1 second of for a little while.
The issues with datetime only cover representing a leap second timestamp, which it can't. It won't be asked to do so anyway.

Rest assured that Python handled leap year well in the past and hopefully will do it good enough for the ongoing future.

In [6]:  datetime.datetime(2049, 2, 28) + datetime.timedelta(days=1)
Out[6]: datetime.datetime(2049, 3, 1, 0, 0)

In [7]:  datetime.datetime(2050, 2, 28) + datetime.timedelta(days=1)
Out[7]: datetime.datetime(2050, 3, 1, 0, 0)

In [8]:  datetime.datetime(2051, 2, 28) + datetime.timedelta(days=1)
Out[8]: datetime.datetime(2051, 3, 1, 0, 0)

In [9]:  datetime.datetime(2052, 2, 28) + datetime.timedelta(days=1)
Out[9]: datetime.datetime(2052, 2, 29, 0, 0)

Pip install via local proxy server behind secured firewall

Just found that PIP install works so well at home but not in office. One of the restriction is the firewall security to prevent non-http traffic passing through during package installation. Here's a trick:

Changing from this:

$
$ pip install --upgrade git+git://github.com/XXXXX/YYYYY.git

to this:

$ # x.y.z.s is IP address
$ # port is port number
$ pip --proxy=x.y.z.s:port install --upgrade git+https://github.com/XXXXX/YYYYY.git

Most firewall won't block http/https traffic which is supposed to be categorized as web traffic.

Get Plotly offline working in Jupyter Lab

You might encounter blank plot image if you install and use plotly module for Jupyter Lab in the first place.

When plotting data using library like Plotly, you will be asked to create user account and login in order to use the online APIs. However, Plotly does provide an offline version for use. It takes a couple of steps to resolve this manually.

This solutions work on Windows platform, but may also work on Linux/MacOS platform. Mileage varies.

To make sure plotly offline working on Jupyter Lab, please try the followings:

Try install plotly extension for Jupyter Lab:

> jupyter labextension install @jupyterlab/plotly-extension

For details, please visit https://github.com/jupyterlab/jupyter-renderers

You might also encounter ETIMEOUT error while install labextension if your computer is behind a known proxy server. Here's how to resolve:

>
> npm config set http-proxy <proxy address: port>
> npm config set https-proxy <proxy address: port>

For the issue of Plotly chart output from big dataset, the key is to increase maximum rate for output stream on Jupyter Lab server.
Edit the following entry in configuration file C:\Users\%USERNAME%\.jupyter\jupyter_notebook_config.py:
c.NotebookApp.iopub_data_rate_limit = 1.0e10

Here's an Plotly offline sample code block tested to be running on Jupyter Lab v0.31.12:

from plotly import __version__
import plotly
from plotly.offline import init_notebook_mode, plot
from plotly.graph_objs import Scatter

init_notebook_mode()

print("plotly version:", __version__)
plotly.offline.iplot([Scatter(x=[1, 2, 3], y=[3, 1, 6])])

Conversion from Magnetic North to True North

Here's the web site to acquire the True North calculation based on the coordinate and date:

http://www.ga.gov.au/oracle/geomag/agrfform.jsp

Quick recap of single line command to replace strings in heaps of files

I was looking at the fast way to replace strings in all related Python script files recursively at Terminal.

Here's the stand one:

$ find . -type f -name "*.py" -print | xargs sed -i 's/foo/bar/g'

xargs will combine the single line output of find and run commands with multiple
arguments, multiple times if necessary to avoid the max chars per line limit. In this case we combine xargs with sed.

Here's a variation:

$ find *.py -type f -exec sed -i "s/foo/bar/g" {} \;

This one is a bit different but may be easier to remember. It actually uses find command to output a list of files. With each one of the line, it then substitutes the filename with {} for the command line using sed for further processing which is replacing 'bar' with 'foo'. A character ';' is appended to the end of each line of command.

With this command, it would actually produce something like these:

$ sed -i "s/foo/bar/g" script1_found.py;

$ sed -i "s/foo/bar/g" script2_found.py;

$ sed -i "s/foo/bar/g" script3_found.py;

$ sed -i "s/foo/bar/g" script4_found.py;

$ sed -i "s/foo/bar/g" script5_found.py;

$ sed -i "s/foo/bar/g" script6_found.py;

...

WRF & ARW

What is WRF?

WRF is the short form of Weather Research and Forecasting Model, i.e., a numerical weather prediction system. WRF is a state-of-the-art atmospheric modeling system designed for both meteorological research and numerical weather prediction. It offers a host of options for atmospheric processes and can run on a variety of computing platforms.

Used for both research and operational forecasting
It is a supported "community model", i.e. a free and shared resource with distributed development and centralized support
Its development is led by NCAR, NOAA/ESRL and NOAA/NCEP/EMC with partnerships at AFWA, FAA, DOE/PNNL and collaborations with universities and other government agencies in the US and overseas

WRF Community Model

Version 1.0 WRF was released December 2000
Version 2.0: May 2004 (add nesting)
Version 3.0: April 2008 (add global ARW version)
... (major releases in April, minor releases in summer)
Version 3.8: April 2016
Version 3.8.1: August 2016
Version 3.9: April 2017
Version 3.9.1(.1) (August 2017)

What is ARW?

WRF has two dynamical cores: The Advanced Research WRF (ARW) and Non-hydrostatic Mesoscale Model (NMM)

Dynamical core includes mostly advection, pressure-gradients, Coriolis, buoyancy, filters, diffusion, and time-stepping

Both are Eulerian mass dynamical cores with terrain-following vertical coordinates

ARW support and development are centered at NCAR/MMM

NMM development is centered at NCEP/EMC and support is provided by NCAR/DTC (operationally now only used for HWRF)

Usage of WRF

ARW and NMM

Atmospheric physics/parameterization research
Case-study research
Real-time NWP and forecast system research
Data assimilation research
Teaching dynamics and NWP

ARW only

Regional climate and seasonal time-scale research
Coupled-chemistry applications
Global simulations
Idealized simulations at many scales (e.g. convection, baroclinic waves, large eddy simulations)

Examples of WRF Forecast

Hurricane Katrina (August, 2005): Moving 4 km nest in a 12 km outer domain
US Convective System (June, 2005): Single 4 km central US domain

Real-Data Applications

Numerical weather prediction
Meteorological case studies
Regional climate
Applications: air quality, wind energy, hydrology, etc.

Ref: https://www.climatescience.org.au/sites/default/files/WRF_Overview_Dudhia_3.9.pdf
Ref: http://www2.mmm.ucar.edu/wrf/users/

CALPUFF

CALPUFF is an advanced, integrated Lagrangian puff modeling system for the simulation of atmospheric pollution dispersion distributed by the Atmospheric Studies Group at TRC Solutions.

It is maintained by the model developers and distributed by TRC. The model has been adopted by the United States Environmental Protection Agency (EPA) in its Guideline on Air Quality Models as a preferred model for assessing long range transport of pollutants and their impacts on Federal Class I areas and on a case-by-case basis for certain near-field applications involving complex meteorological conditions.

The integrated modeling system consists of three main components and a set of preprocessing and postprocessing programs. The main components of the modeling system are CALMET (a diagnostic 3-dimensional meteorological model), CALPUFF (an air quality dispersion model), and CALPOST (a postprocessing package). Each of these programs has a graphical user interface (GUI). In addition to these components, there are numerous other processors that may be used to prepare geophysical (land use and terrain) data in many standard formats, meteorological data (surface, upper air, precipitation, and buoy data), and interfaces to other models such as the Penn State/NCAR Mesoscale Model (MM5), the National Centers for Environmental Prediction (NCEP) Eta model and the RAMS meteorological model.

The CALPUFF model is designed to simulate the dispersion of buoyant, puff or continuous point and area pollution sources as well as the dispersion of buoyant, continuous line sources. The model also includes algorithms for handling the effect of downwash by nearby buildings in the path of the pollution plumes.

NetCDF

NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The project homepage is hosted by the Unidata program at the University Corporation for Atmospheric Research (UCAR). They are also the chief source of netCDF software, standards development, updates, etc. The format is an open standard. NetCDF Classic and 64-bit Offset Format are an international standard of the Open Geospatial Consortium.
The project started in 1989 and is still actively supported by UCAR. Version 3.x (released in 1997) is still widely used across the world and maintained by UCAR (most recent update 2017). Version 4.0 (released in 2008) allows the use of the HDF5 data file format. Version 4.1 (2010) adds support for C and Fortran client access to specified subsets of remote data via OPeNDAP. Further releases have improved performance, added features, and fixed bugs.

The format was originally based on the conceptual model of the Common Data Format developed by NASA, but has since diverged and is not compatible with it.

Format

The netCDF libraries support multiple different binary formats for netCDF files:

The classic format was used in the first netCDF release, and is still the default format for file creation.
The 64-bit offset format was introduced in version 3.6.0, and it supports larger variable and file sizes.
The netCDF-4/HDF5 format was introduced in version 4.0; it is the HDF5 data format, with some restrictions.
The HDF4 SD format is supported for read-only access.
The CDF5 format is supported, in coordination with the parallel-netcdf project.

All formats are "self-describing". This means that there is a header which describes the layout of the rest of the file, in particular the data arrays, as well as arbitrary file metadata in the form of name/value attributes. The format is platform independent, with issues such as endianness being addressed in the software libraries. The data are stored in a fashion that allows efficient subsetting.
Starting with version 4.0, the netCDF API allows the use of the HDF5 data format. NetCDF users can create HDF5 files with benefits not available with the netCDF format, such as much larger files and multiple unlimited dimensions.

Full backward compatibility in accessing old netCDF files and using previous versions of the C and Fortran APIs is supported.

Access libraries

The software libraries supplied by UCAR provide read-write access to netCDF files, encoding and decoding the necessary arrays and metadata. The core library is written in C, and provides an API for C, C++ and two APIs for Fortran applications, one for Fortran 77, and one for Fortran 90. An independent implementation, also developed and maintained by Unidata, is written in 100% Java, which extends the core data model and adds additional functionality. Interfaces to netCDF based on the C library are also available in other languages including R (ncdf, ncvar and RNetCDF packages), Perl, Python, Ruby, Haskell, Mathematica, MATLAB, IDL, and Octave. The specification of the API calls is very similar across the different languages, apart from inevitable differences of syntax. The API calls for version 2 were rather different from those in version 3, but are also supported by versions 3 and 4 for backward compatibility. Application programmers using supported languages need not normally be concerned with the file structure itself, even though it is available as open formats.

Common uses

It is commonly used in climatology, meteorology and oceanography applications (e.g., weather forecasting, climate change) and GIS applications.
It is an input/output format for many GIS applications, and for general scientific data exchange.

Parallel-NetCDF

An extension of netCDF for parallel computing called Parallel-NetCDF (or PnetCDF) has been developed by Argonne National Laboratory and Northwestern University. This is built upon MPI-IO, the I/O extension to MPI communications. Using the high-level netCDF data structures, the Parallel-NetCDF libraries can make use of optimizations to efficiently distribute the file read and write applications between multiple processors. The Parallel-NetCDF package can read/write only classic and 64-bit offset formats. Parallel-NetCDF cannot read or write the HDF5-based format available with netCDF-4.0. The Parallel-NetCDF package uses different, but similar APIs in Fortran and C.
Parallel I/O in the Unidata netCDF library has been supported since release 4.0, for HDF5 data files. Since version 4.1.1 the Unidata NetCDF C library supports parallel I/O to classic and 64-bit offset files using the Parallel-NetCDF library, but with the NetCDF API.

GRIB format

GRIB (GRIdded Binary or General Regularly-distributed Information in Binary form) is a concise data format commonly used in meteorology to store historical and forecast weather data. It is standardized by the World Meteorological Organization's Commission for Basic Systems, known under number GRIB FM 92-IX, described in WMO Manual on Codes No.306.

Currently there are three versions of GRIB.

Version 0 was used to a limited extent by projects such as TOGA, and is no longer in operational use.
The first edition (current sub-version is 2) is used operationally worldwide by most meteorological centers, for Numerical Weather Prediction output (NWP).
A newer generation has been introduced, known as GRIB second edition, and data is slowly changing over to this format. Some of the second-generation GRIB are used for derived product distributed in Eumetcast of Meteosat Second Generation. Another example is the NAM (North American Mesoscale) model.

File Format

GRIB files are a collection of self-contained records of 2D data, and the individual records stand alone as meaningful data, with no references to other records or to an overall schema. So collections of GRIB records can be appended to each other or the records separated.

Each GRIB record has two components - the part that describes the record (the header), and the actual binary data itself. The data in GRIB-1 are typically converted to integers using scale and offset, and then bit-packed. GRIB-2 also has the possibility of compression.

GRIB superseded the Aeronautical Data Format (ADF).

The World Meteorological Organization (WMO) Commission for Basic Systems (CBS) met in 1985 to create the GRIB (GRIdded Binary) format. The WGDM in February 1994, after major changes, approved revision 1 of the GRIB format. GRIB Edition 2 format was approved in 2003 at Geneva.

Problems with GRIB

No way in GRIB to describe a collection of GRIB records

Each record is independent, with no way to reference the GRIB writer's intended schema
No foolproof way to combine records into the multidimensional arrays from which they were derived.

The use of external tables to describe the meaning of the data.

No authoritative place for centers to publish their local tables.
Inconsistent and incorrect methods of versioning local tables.
No machine-readable versions of the WMO tables (now available for GRIB-2, but not GRIB-1)

GRIB 1 Header

There are 2 parts of the GRIB 1 header - one mandatory (Product Definition Section - PDS) and one optional (Grid Description Section - GDS). The PDS describes who created the data (the research/operation center), the involved numerical model/process - can be NWP or GCM, the data that is actually stored (such as wind, temperature, ozone concentration etc.), units of the data (meters, pressure etc.), vertical system of the data (constant height, constant pressure, constant potential temperature), and the time stamp.
If a description of the spatial organization of the data is needed, the GDS must be included as well. This information includes spectral (harmonics of divergence and vorticity) vs gridded data (Gaussian, X-Y grid), horizontal resolution, and the location of the origin.

A sense of A.I. in business