.. title: PAR Class 10, Tues 2020-02-18
.. slug: class10
.. date: 2020-02-18
.. tags: class
.. category: 
.. link: 
.. description: 
.. type: text
.. has_math: true

.. raw:: html

   <style> .red {color:red} </style>
   <style> .blue {color:blue} </style>

.. role:: red
.. role:: blue

.. sectnum::
.. contents:: Table of contents
..

More on ssh
-----------

#. **ssh-agent**  on the source machine starts a daemon to manage your keys.  I start it in a login script.

#. **ssh-add** on the source machine creates a key pair in **~/.ssh** . Do it once.

#. **ssh-add -l**  lists the keys registered with ssh-agent, i.e., that are available to use with future ssh commands in this session.

#. **~/.ssh/authorized_keys** on the destination machine stores the public keys of source machines allowed to connect.  Copy your source machines' public keys into it.  Do that once.

#. If you use a different user name on the destination machine, instead of doing **user@destination** all the time, you can set defaults on the source machine, for the destination user names, in **~/.ssh/config**.

#. **ssh -v destination** shows the handshaking.

#. Now **ssh**, **scp**, Emacs tramp mode, mounting the remote filesystem, etc. work w/o typing passwords.

   


PGI compilers on parallel.ecse
------------------------------

I've installed pgc++ directly.  Run it thus::

  	/local/pgi/linux86-64/19.10/bin/pgc++ foo.cc -o foo

Here's a set of good switches::
  
  	/local/pgi/linux86-64/19.10/bin/pgc++ -fast -mp -Msafeptr -O3 -Minfo=all -Mconcur=allcores  -ta:tesla -acc foo.cc -o foo

In parallel.ecse:/parallel-class/matmul, I experiment with gcc and pgi with openmp running on the Xeon, and openacc running on the GPU, multiplying matrices stored as global data, on the local stack, as STL vectors of vectors, and as a heap array.  Some of my conclusions:

#. For sequential code, g++ is twice as fast as pgic++.
#. OpenMP works well.
#. OpenACC works only on pgc++.  It is very fast.

	

Nvidia GPU and accelated computing, ctd.
----------------------------------------

This material accompanies **Programming Massively Parallel Processors A Hands-on Approach, Third Edition, David B. Kirk Wen-mei W. Hwu**.  I recommend it.  (The slides etc are free but the book isn't.)

Continuing  /parallel-class/GPU-Teaching-Kit-Accelerated-Computing, Modules 13 to 17 slide 14.

Notes about parallel.ecse
-------------------------

Since it has 256GB of main memory, there's no paging, and pinning memory is not a problem.

Linux now has `Heterogeneous Memory Management (HMM) <https://www.kernel.org/doc/html/latest/vm/hmm.html>`_ .





   
   
   
