Difference between revisions of "FAQ"

From SHARCNETHelp
Jump to navigationJump to search
 
(18 intermediate revisions by the same user not shown)
Line 22: Line 22:


=== How do I contact SHARCNET? ===
=== How do I contact SHARCNET? ===
For technical inquiries, you may send E-mail to [mailto:support@computecanada.ca support@computecanada.ca] or [mailto:help@sharcnet.ca help@sharcnet.ca], or [https://www.sharcnet.ca/my/contact/directory#techteam contact your local system administrator or HPC specialist]. For general inquiries, you may contact the SHARCNET [http://www.sharcnet.ca/my/contact main office].
For technical inquiries, you may send E-mail to [mailto:support@tech.alliancecan.ca support@tech.alliancecan.ca] or [mailto:help@sharcnet.ca help@sharcnet.ca], or [https://www.sharcnet.ca/my/contact/directory#techteam contact your local system administrator or HPC specialist]. For general inquiries, you may contact the SHARCNET [http://www.sharcnet.ca/my/contact main office].


== Getting an Account with SHARCNET and Related Issues ==
== Getting an Account with SHARCNET and Related Issues ==
Line 31: Line 31:


=== I have encountered a problem while using an Alliance /SHARCNET system and need help, who should I talk to? ===
=== I have encountered a problem while using an Alliance /SHARCNET system and need help, who should I talk to? ===
We encourage you to use the problem ticketing system (described in detail [[FAQ: Getting Help#Alliance Problem Ticket System|below]]) .  This is the most efficient way of reporting a problem.
We encourage you to use the problem ticketing system (described in detail [[FAQ#Alliance_Problem_Ticket_System|below]]) .  This is the most efficient way of reporting a problem.


You are also welcome to contact system administrators and/or high performance technical computing consultants at any time. You may find their contact information on the [https://www.sharcnet.ca/my/contact/directory directory] page.
You are also welcome to contact system administrators and/or high performance technical computing consultants at any time. You may find their contact information on the [https://www.sharcnet.ca/my/contact/directory directory] page.
Line 59: Line 59:
==== How do I submit a ticket? ====
==== How do I submit a ticket? ====


In general, you can submit a new ticket by emailing  [mailto:support@computecanada.ca support@computecanada.ca]  with the email address associated with your Alliance account.  If you are using another email address, please provide your full name, your Alliance default username (if available) and your university or institution.
In general, you can submit a new ticket by emailing  [mailto:support@tech.alliancecan.ca support@tech.alliancecan.ca]  with the email address associated with your Alliance account.  If you are using another email address, please provide your full name, your Alliance default username (if available) and your university or institution.


If you like, you can also target your inquiry more specifically, by using the following addresses to submit your ticket:
If you like, you can also target your inquiry more specifically, by using the following addresses to submit your ticket:


* [mailto:help@sharcnet.ca help@sharcnet.ca] - questions specific to SHARCNET, for example relating to contributed SHARCNET systems
* [mailto:help@sharcnet.ca help@sharcnet.ca] - questions specific to SHARCNET, for example relating to contributed SHARCNET systems
* [mailto:globus@computecanada.ca globus@computecanada.ca] -- Questions about [[Globus]] file transfer services
* [mailto:globus@tech.alliancecan.ca globus@tech.alliancecan.ca] -- Questions about [[Globus]] file transfer services
* [mailto:cloud@computecanada.ca cloud@computecanada.ca] -- Questions about using Alliance cloud resources
* [mailto:cloud@tech.alliancecan.ca cloud@tech.alliancecan.ca] -- Questions about using Alliance cloud resources
* [mailto:accounts@computecanada.ca accounts@computecanada.ca] -- Questions about Alliance accounts
* [mailto:accounts@tech.alliancecan.ca accounts@tech.alliancecan.ca] -- Questions about Alliance accounts
* [mailto:renewals@computecanada.ca renewals@computecanada.ca] -- Questions about Alliance account renewals
* [mailto:renewals@tech.alliancecan.ca renewals@tech.alliancecan.ca] -- Questions about Alliance account renewals


=== I am new to parallel programming, can you help me get started with my project? ===
=== I am new to parallel programming, can you help me get started with my project? ===
Line 73: Line 73:


=== Can you install a package on a cluster for me? ===
=== Can you install a package on a cluster for me? ===
Certainly.  We suggest you make the request by sending e-mail to [mailto:support@computecanada.ca support@computecanada.ca] with the specific request.
Certainly.  We suggest you make the request by sending e-mail to [mailto:support@tech.alliancecan.ca support@tech.alliancecan.ca] with the specific request.


=== I am in a process of purchasing computer equipment for my research, would you be able to provide technical advice on that? ===
=== I am in a process of purchasing computer equipment for my research, would you be able to provide technical advice on that? ===
Line 80: Line 80:
=== Does SHARCNET provide any training on programming and using the systems? ===
=== Does SHARCNET provide any training on programming and using the systems? ===


Yes. SHARCNET provides workshops on specific topics from time to time and offers courses at some sites. Every summer (usually late May to early June), SHARCNET holds [https://www.sharcnet.ca/help/index.php/Summer_Schools an annual HPC Summer School] with a variety of in-depth, hands-on workshops.  
Yes. SHARCNET provides workshops on specific topics from time to time and offers courses at some sites. Every summer (usually late May to early June), SHARCNET holds [https://helpwiki.sharcnet.ca/wiki/Summer_Schools an annual HPC Summer School] with a variety of in-depth, hands-on workshops.  


SHARCNET also offers a series of online seminars (so-called "General interest webinars"), typically delivered every second Wednesday at lunch time.  These are announced via the SHARCNET events mailing list and one can see the schedule at the [https://www.sharcnet.ca/my/news/calendar SHARCNET event calendar].  Past seminars are recorded and posted on [http://youtube.sharcnet.ca our youtube channel]. A full listing of the past webinars is available on the [https://helpwiki.sharcnet.ca/wiki/Online_Seminars Online Seminars] page.
SHARCNET also offers a series of online seminars (so-called "General interest webinars"), typically delivered every second Wednesday at lunch time.  These are announced via the SHARCNET events mailing list and one can see the schedule at the [https://www.sharcnet.ca/my/news/calendar SHARCNET event calendar].  Past seminars are recorded and posted on [http://youtube.sharcnet.ca our youtube channel]. A full listing of the past webinars is available on the [https://helpwiki.sharcnet.ca/wiki/Online_Seminars Online Seminars] page.
Line 90: Line 90:


=== How do I login to SHARCNET? ===
=== How do I login to SHARCNET? ===
You access the SHARCNET clusters using ssh.  For Graham and other national systems Compute Canada credentials are required.  For the remaining ("Legacy") systems, you will require SHARCNET credentials (submit a ticket to help@sharcnet.ca if you have questions).
You access the SHARCNET clusters using ssh.  For Graham and other national systems Alliance credentials are required.  For the remaining ("Legacy") systems, you will require SHARCNET credentials (submit a ticket to help@sharcnet.ca if you have questions).


====Unix/Linux/OS X====
====Unix/Linux/OS X====
To login to a system, you need to use an [http://www.openssh.com/ Secure Shell (SSH)] connection. If you are logging in from a UNIX-based machine, make sure it has an SSH client (ssh) installed (this is almost always the case on UNIX/Linux/OS X). If you have the same login name on both your local system and SHARCNET, and you want to login to, say, <tt>graham</tt>, you may use the command:
To login to a system, you need to use an [http://www.openssh.com/ Secure Shell (SSH)] connection. If you are logging in from a UNIX-based machine, make sure it has an SSH client (ssh) installed (this is almost always the case on UNIX/Linux/OS X). If you have the same login name on both your local system and SHARCNET, and you want to login to, say, <tt>graham</tt>, you may use the command:


  ssh graham.computecanada.ca
  ssh graham.alliancecan.ca


If your Compute Canada username is different from the username on your local systems, then you may use either of the following forms:
If your Alliance username is different from the username on your local systems, then you may use either of the following forms:


  ssh graham.computecanada.ca -l username
  ssh graham.alliancecan.ca -l username
  ssh username@graham.computecanada.ca
  ssh username@graham.alliancecan.ca


If you want to establish an X window connection so that you can use graphics applications such as <tt>gvim</tt> and <tt>ddt</tt>, you can add a <tt>-Y</tt> to the command:
If you want to establish an X window connection so that you can use graphics applications such as <tt>gvim</tt> and <tt>ddt</tt>, you can add a <tt>-Y</tt> to the command:


  ssh -Y username@graham.computecanada.ca
  ssh -Y username@graham.alliancecan.ca


This will automatically set the X DISPLAY variable when you login.
This will automatically set the X DISPLAY variable when you login.
Line 155: Line 155:
To transfer files to and from a cluster on a UNIX machine, you may use <tt>scp</tt> or <tt>sftp</tt>. For example, if you want to upload file <tt>foo.f</tt> to cluster graham from your machine <tt>myhost</tt>, use the following command
To transfer files to and from a cluster on a UNIX machine, you may use <tt>scp</tt> or <tt>sftp</tt>. For example, if you want to upload file <tt>foo.f</tt> to cluster graham from your machine <tt>myhost</tt>, use the following command


  myhost$ scp foo.f graham.computecanada.ca:
  myhost$ scp foo.f graham.alliancecan.ca:


assuming that your machine has <tt>scp</tt> installed. If you want to transfer a file from Windows or Mac, you need have <tt>scp</tt> or <tt>sftp</tt> for Windows or Mac installed.
assuming that your machine has <tt>scp</tt> installed. If you want to transfer a file from Windows or Mac, you need have <tt>scp</tt> or <tt>sftp</tt> for Windows or Mac installed.


If you transfer file <tt>foo.f</tt> between Compute Canada clusters, say from your home directory on cedar to your home directory on graham, simply use the following command
If you transfer file <tt>foo.f</tt> between Alliance clusters, say from your home directory on cedar to your home directory on graham, simply use the following command


  [username@cdr-login2:~]$ scp foo.f graham:/home/username/
  [username@cdr-login2:~]$ scp foo.f graham:/home/username/
Line 165: Line 165:
If you are transferring files between a UNIX machine and a cluster, you may use <tt>scp</tt> command with <tt>-r</tt> option. For instance, if you want to download the subdirectory <tt>foo</tt> in the directory <tt>project</tt> in your home directory on graham to your local UNIX machine, on your local machine, use command
If you are transferring files between a UNIX machine and a cluster, you may use <tt>scp</tt> command with <tt>-r</tt> option. For instance, if you want to download the subdirectory <tt>foo</tt> in the directory <tt>project</tt> in your home directory on graham to your local UNIX machine, on your local machine, use command


  myhost$ scp -rp graham.sharcnet.ca:project/foo .
  myhost$ scp -rp graham.alliancecan.ca:project/foo .


Similarly, you can transfer the subdirectory between Compute Canada clusters. The following command
Similarly, you can transfer the subdirectory between Alliance clusters. The following command


  [username@cdr-login2:~]$ scp -r graham:/home/username/scratch/foo .
  [username@cdr-login2:~]$ scp -r graham.alliancecan.ca:/home/username/scratch/foo .


will download subdirectory <tt>foo</tt> from your scratch directory on graham to your home directory on cedar (note that the prompt indicates you are currently logged on to cedar).
will download subdirectory <tt>foo</tt> from your scratch directory on graham to your home directory on cedar (note that the prompt indicates you are currently logged on to cedar).
Line 183: Line 183:
Then on your local machine myhost, use <tt>scp</tt> to copy the tar file
Then on your local machine myhost, use <tt>scp</tt> to copy the tar file


  myhost$ scp graham.computecanada.ca:project/foo.tar.gz .
  myhost$ scp graham.alliancecan.ca:project/foo.tar.gz .


Note for most Linux distributions, <tt>tar</tt> has an option <tt>-z</tt> that will compress the <tt>.tar</tt> file using <tt>gzip</tt>.
Note for most Linux distributions, <tt>tar</tt> has an option <tt>-z</tt> that will compress the <tt>.tar</tt> file using <tt>gzip</tt>.
Line 192: Line 192:
====How can I best transfer large quantities of data to/from SHARCNET and what transfer rate should I expect?====
====How can I best transfer large quantities of data to/from SHARCNET and what transfer rate should I expect?====


In general, most users should be fine using ''scp'' or ''rsync'' to transfer data to and from SHARCNET systems.  If you need to transfer a lot of files ''rsync'' is recommended to ensure that you do not need to restart the transfer from scratch should there be a connection failure.  Although you can use ''scp'' and ''rsync'' to any cluster's login node(s), it is often best to use gra-dtn1.computecanada.ca - it is dedicated to data transfer.  
In general, most users should be fine using ''scp'' or ''rsync'' to transfer data to and from SHARCNET systems.  If you need to transfer a lot of files ''rsync'' is recommended to ensure that you do not need to restart the transfer from scratch should there be a connection failure.  Although you can use ''scp'' and ''rsync'' to any cluster's login node(s), it is often best to use gra-dtn1.alliancecan.ca - it is dedicated to data transfer.  


In general one should expect the following transfer rates with ''scp'':
In general one should expect the following transfer rates with ''scp'':
Line 257: Line 257:
Ordinarily, simple use of the atime property would be sufficient, as it is updated by the system in sync with the ctime. However, userspace programs are able to alter atime, potentially to times in the past, which could result in early expiration of a file. The use of ctime as a fallback guards against this undesirable behaviour.
Ordinarily, simple use of the atime property would be sufficient, as it is updated by the system in sync with the ctime. However, userspace programs are able to alter atime, potentially to times in the past, which could result in early expiration of a file. The use of ctime as a fallback guards against this undesirable behaviour.


It is also your responsibility to manage your stored data: most of the filesystems are not intended to provide an indefinite archiving service so when a given file or directory is no longer needed, you need to move it to a more appropriate filesystem which may well mean your personal workstation or some other storage system under your control. Moving significant amounts of data between your workstation and a Compute Canada system or between two Compute Canada systems should generally be done using Globus.
It is also your responsibility to manage your stored data: most of the filesystems are not intended to provide an indefinite archiving service so when a given file or directory is no longer needed, you need to move it to a more appropriate filesystem which may well mean your personal workstation or some other storage system under your control. Moving significant amounts of data between your workstation and an Alliance system or between two Alliance systems should generally be done using Globus.


==== How to archive my data? ====
==== How to archive my data? ====
Line 332: Line 332:
The most likely cause of this behaviour is repeated failed login attempts.  Part of our security policies involves blocking the IP address of machines that attempt multiple logins with incorrect passwords over a short period of time---many brute-force attacks on systems do exactly this: looking for poor passwords, badly configured accounts, etc.  Unfortunately, it isn't uncommon for a user to forget their password and make repeated login attempts with incorrect passwords and end up with that machine blacklisted and unable to connect at all.
The most likely cause of this behaviour is repeated failed login attempts.  Part of our security policies involves blocking the IP address of machines that attempt multiple logins with incorrect passwords over a short period of time---many brute-force attacks on systems do exactly this: looking for poor passwords, badly configured accounts, etc.  Unfortunately, it isn't uncommon for a user to forget their password and make repeated login attempts with incorrect passwords and end up with that machine blacklisted and unable to connect at all.


A temporary solution is simply to attempt to login from another machine.  If you have access to another machine at your site, you can shell to that machine first, and then shell to the SHARCNET system (as that machine's IP shouldn't be blacklisted).  In order to have your machine unblocked, you will have to email to [mailto:support@computecanada.ca support@computecanada.ca] as a system administrator must manually intervene in order to fix it.
A temporary solution is simply to attempt to login from another machine.  If you have access to another machine at your site, you can shell to that machine first, and then shell to the SHARCNET system (as that machine's IP shouldn't be blacklisted).  In order to have your machine unblocked, you will have to email to [mailto:support@tech.alliancecan.ca support@tech.alliancecan.ca] as a system administrator must manually intervene in order to fix it.


NOTE: there are other situations that can produce this message, however they are rarer and more transient.  If you are unable to log in from one machine, but can from another, it is most likely the IP blacklisting that is the problem and the above will provide a temporary work-around while your problem ticket is processed.
NOTE: there are other situations that can produce this message, however they are rarer and more transient.  If you are unable to log in from one machine, but can from another, it is most likely the IP blacklisting that is the problem and the above will provide a temporary work-around while your problem ticket is processed.
Line 344: Line 344:
'''Please first read the following:
'''Please first read the following:


https://docs.computecanada.ca/wiki/SSH_security_improvements'''
https://docs.alliancecan.ca/wiki/SSH_security_improvements'''


Suppose you attempt to login to SHARCNET, but instead get an alarming message like this:
Suppose you attempt to login to SHARCNET, but instead get an alarming message like this:
Line 400: Line 400:
If your desktop supports FUSE, it's very convenient to simply mount your home tree like this:
If your desktop supports FUSE, it's very convenient to simply mount your home tree like this:
  mkdir sharcnet
  mkdir sharcnet
  sshfs graham.computecanada.ca: sharcnet
  sshfs graham.alliancecan.ca: sharcnet
you can then use any local editor of your choice.
you can then use any local editor of your choice.


If you run emacs on your desktop, you can also edit a remote file from within your local emacs client using [http://www.gnu.org/software/tramp Tramp], opening and saving a file as /username@cluster.computecanada.ca:path/file.
If you run emacs on your desktop, you can also edit a remote file from within your local emacs client using [http://www.gnu.org/software/tramp Tramp], opening and saving a file as /username@cluster.alliancecan.ca:path/file.


== Research at SHARCNET ==
== Research at SHARCNET ==
Line 423: Line 423:
=== I need access to more CPU cores or storage than are available by default, what programs exist to support demanding computation? ===
=== I need access to more CPU cores or storage than are available by default, what programs exist to support demanding computation? ===


SHARCNET participates in the Compute Canada NRAC (National Resource Allocation Competition) and provides a continual competition for groups that require more than the default level of access to our resources.  Please see [[Dedicated Resources]] for further information.
SHARCNET participates in the Alliance national RAC (Resource Allocation Competition) and provides a continual competition for groups that require more than the default level of access to our resources.  Please see [[Dedicated Resources]] for further information.


=== I heard SHARCNET offers fellowships, where can I get more information? ===
=== I heard SHARCNET offers fellowships, where can I get more information? ===
Line 480: Line 480:
These include:
These include:
* Programming support of more modest duration (several days to one month engagement, usually part time)
* Programming support of more modest duration (several days to one month engagement, usually part time)
* Training on a variety of topics through [https://www.sharcnet.ca/help/index.php/Training workshops, seminars and online training materials]
* Training on a variety of topics through [https://helpwiki.sharcnet.ca/wiki/Training workshops, seminars and online training materials]
* Consultation. This may include user-initiated interactions on particular programs, algorithms, techniques, debugging, optimization etc., as well as unsolicited help to ensure effective use of SHARCNET systems
* Consultation. This may include user-initiated interactions on particular programs, algorithms, techniques, debugging, optimization etc., as well as unsolicited help to ensure effective use of SHARCNET systems
* Site Leaders play an important role in working with the community to help researchers connect with SHARCNET staff and to obtain appropriate help and support.
* Site Leaders play an important role in working with the community to help researchers connect with SHARCNET staff and to obtain appropriate help and support.

Latest revision as of 13:01, 17 September 2024

Sharcnet logo.jpg

About SHARCNET

What is SHARCNET?

SHARCNET stands for Shared Hierarchical Academic Research Computing Network. Established in 2000, SHARCNET is the largest high performance computing consortium in Canada, involving 19 universities and colleges across southern, central and northern Ontario.

SHARCNET is a member consortium in the Digital Research Alliance of Canada (Alliance) national HPC platform.

Where is SHARCNET?

The main office of SHARCNET is located in the Western Science Centre at The University of Western Ontario.

What does SHARCNET have?

The primary SHARCNET compute system is Graham heterogeneous cluster located at the University of Waterloo. It is named after Wes Graham, the first director of the Computing Centre at Waterloo. It consists of 41,548 cores and 520 GPU devices, spread across 1,185 nodes of different configurations.

What can I do with SHARCNET?

If you have a program that takes months to run on your PC, you could probably run it within a few hours using hundreds of processors on SHARCNET / Alliance, provided your program is inherently parallelisable. If you have hundreds or thousands of test cases to run through on your PC or computers in your lab, then with hundreds of processors running those cases independently will significantly reduce your test cycles.

Who is running SHARCNET?

The daily operation and development of SHARCNET computational facilities is managed by a group of highly qualified system administrators. In addition, we have a team of high performance technical computing consultants, who are responsible for technical support on libraries, programming and application analysis.

How do I contact SHARCNET?

For technical inquiries, you may send E-mail to support@tech.alliancecan.ca or help@sharcnet.ca, or contact your local system administrator or HPC specialist. For general inquiries, you may contact the SHARCNET main office.

Getting an Account with SHARCNET and Related Issues

To use SHARCNET (and any other national facilities in Alliance) one has to apply for an Alliance account.

Getting Help

I have encountered a problem while using an Alliance /SHARCNET system and need help, who should I talk to?

We encourage you to use the problem ticketing system (described in detail below) . This is the most efficient way of reporting a problem.

You are also welcome to contact system administrators and/or high performance technical computing consultants at any time. You may find their contact information on the directory page.

How long should I expect to wait for support?

Unfortunately Alliance/SHARCNET does not have adequate funding to provide support 24 hours a day, 7 days a week. User support and system monitoring is limited to regular business hours: there is no official support on weekends or holidays, or outside 9:00 - 17:00 EST .

Please note that this includes monitoring of our systems and operations, so typically when there are problems overnight or on weekends/holidays system notices will not be posted until the next business day.

Alliance Problem Ticket System

What is a "problem ticket system"?

This is a system that allows anyone with an Alliance account to start a persistent email thread that is referred to as a "problem ticket". When a user submits a new ticket it will be brought to the attention of an appropriate and available Alliance/SHARCNET staff member for resolution.

You interact with the ticket system entirely via email.

What do I need to specify in a ticket ?

To help us address your question faster, please try to do the following when submitting a ticket:

  1. specify which of our systems is involved
  2. if the problem pertains to a job, then report the jobid associated with the job; this is an integer that is returned by the scheduler when you submit the job
  3. report the exact commands necessary to duplicate the problem, as well as any error output that helps identify the problem; if relevant, this should include how the code is compiled, how the job is submitted, and/or anything else you are doing from the command line relating to the problem
  4. if you'd like for a particular staff member to be aware of the ticket, mention them

How do I submit a ticket?

In general, you can submit a new ticket by emailing support@tech.alliancecan.ca with the email address associated with your Alliance account. If you are using another email address, please provide your full name, your Alliance default username (if available) and your university or institution.

If you like, you can also target your inquiry more specifically, by using the following addresses to submit your ticket:

I am new to parallel programming, can you help me get started with my project?

Absolutely. We will be glad to help you from planning the project, architecting your application programs with appropriate algorithms and choosing efficient tools to solve associated numerical problems to debugging and analyzing your code. We will do our best to help you speed up research. If your programming project would involve a significant staff time, you should consider applying for Dedicated Programming support. (We run the competition annually; see https://www.sharcnet.ca/my/research/programming).

Can you install a package on a cluster for me?

Certainly. We suggest you make the request by sending e-mail to support@tech.alliancecan.ca with the specific request.

I am in a process of purchasing computer equipment for my research, would you be able to provide technical advice on that?

If you tell us what you want, we may be able to help you out.

Does SHARCNET provide any training on programming and using the systems?

Yes. SHARCNET provides workshops on specific topics from time to time and offers courses at some sites. Every summer (usually late May to early June), SHARCNET holds an annual HPC Summer School with a variety of in-depth, hands-on workshops.

SHARCNET also offers a series of online seminars (so-called "General interest webinars"), typically delivered every second Wednesday at lunch time. These are announced via the SHARCNET events mailing list and one can see the schedule at the SHARCNET event calendar. Past seminars are recorded and posted on our youtube channel. A full listing of the past webinars is available on the Online Seminars page.

Attending SHARCNET Webinars

SHARCNET makes a number of seminar events available online (New User Webinar, general interest talks, etc.) using Zoom. Zoom can be used by either installing an app (available for Windows, Mac, Linux etc), or running it in a browser (no installation required). A (free) Zoom account is required for attending our webinars. The Zoom registration link is provided on the event page in our Events calendar, https://www.sharcnet.ca/my/news/calendar .

VERY IMPORTANT: During registration, you have to provide the email address which is associated with your Zoom account, otherwise you won't be able to register and/or attend the webinar!

Please note that if your device has a microphone (highly recommended) and/or webcam, they will be used by Zoom to transmit your audio and video to all seminar participants. They are normally off by default, but you can enable them by clicking on a corresponding button at the bottom of your Zoom window - but only when allowed by the host. Normally we do not allow attendees to use their mikes during the webinar, but we have special time for questions and answers at the end of each webinar when the mikes can be enabled. Generally, please keep your mike muted (and webcam disabled) unless you want to ask a question.

We normally record our seminars, and make them available to all SHARCNET users. The recordings are posted on our youtube channel:

http://youtube.sharcnet.ca 

The links to the video recordings, slides and abstracts can be found on our online seminars page.

To subscribe to our events mailing list (which advertises upcoming webinars, summer schools and such), send an email to

events+subscribe@sharcnet.ca

To unsubscribe, send an email to

events+unsubscribe@sharcnet.ca

Logging in to Systems, Transferring and Editing Files

How do I login to SHARCNET?

You access the SHARCNET clusters using ssh. For Graham and other national systems Alliance credentials are required. For the remaining ("Legacy") systems, you will require SHARCNET credentials (submit a ticket to help@sharcnet.ca if you have questions).

Unix/Linux/OS X

To login to a system, you need to use an Secure Shell (SSH) connection. If you are logging in from a UNIX-based machine, make sure it has an SSH client (ssh) installed (this is almost always the case on UNIX/Linux/OS X). If you have the same login name on both your local system and SHARCNET, and you want to login to, say, graham, you may use the command:

ssh graham.alliancecan.ca

If your Alliance username is different from the username on your local systems, then you may use either of the following forms:

ssh graham.alliancecan.ca -l username
ssh username@graham.alliancecan.ca

If you want to establish an X window connection so that you can use graphics applications such as gvim and ddt, you can add a -Y to the command:

ssh -Y username@graham.alliancecan.ca

This will automatically set the X DISPLAY variable when you login.

Windows

If you are logging from a computer running Windows and need some pointers we recommend consulting our SSH tutorial.

What is the difference between Login Nodes and Compute Nodes?

Login Nodes

Most of our clusters have distinct login nodes associated with them that you are automatically redirected to when you login to the cluster (some systems are directly logged into, eg. SMPs and smaller specialty systems). You can use these to do most of your work preparing for jobs (compiling, editing configuration files) and other low-intensity tasks like moving and copying files.

You can also use them for other quick tasks, like simple post-processing, but any significant work should be submitted as a job to the compute nodes. On most login nodes, each process is limited to 1 cpu-hour; this will be noticable if you perform anything compute-intensive, and can affect IO-oriented activity as well (such as very large scp or rsync operations.)

How can I suspend and resume my session?

The program screen can start persistent terminals from which you can detach and reattach. The simplest use of screen is

screen -dR

which will either reattach you to any existing session or create a new one if one doesn't exist. To terminate the current screen session, type exit. To detach manually (you are automatically detached if the connection is lost) press ctrl+a followed by d, you can the resume later as above (ideal for running background jobs). Note that ctrl+a is screen's escape sequence, so you have to do ctrl+a followed by a to get the regular effect of pressing ctrl+a inside a screen session (e.g., moving the cursor to the start of the line in a shell).

For a list of other ctrl+a key sequences, press ctrl+a followed by ?. For further details and command line options, see the screen manual (or type man screen on any of the clusters).

Other notes:

  • If you want to create additional "text windows", use Ctrl-A Ctrl-C. Remember to type "exit" to close it.
  • To switch to a "text window" with a certain number, use Ctrl-A # (where # is 0 to 9).
  • To see a list of window numbers use Ctrl-A w
  • To be presented a list of windows and select one to use, use Ctrl-A " (This is handy if you've made too many windows.)
  • If the program running in a screen "text window" refuses to die (i.e., it needs to be killed) you can use Ctrl-A K
  • For brief help on keystrokes use Ctrl-A ?
  • For extensive help, run "man screen".

What operating systems are supported?

UNIX in general. Currently, Linux is the only operating system used within SHARCNET.

What makes a cluster different than my UNIX workstation?

If you are familiar with UNIX, then using a cluster is not much different from using a workstation. When you login to a cluster, you in fact only log in to one of the cluster nodes. In most cases, each cluster node is a physical machine, usually a server class machine, with one or several CPUs, that is more or less the same as a workstation you are familiar with. The difference is that these nodes are interconnected with special interconnect devices and the way you run your program is slightly different. Across SHARCNET clusters, you are not expected to run your program interactively. You will have to run your program through a queueing system. That also means where and when your program gets to run is not decided by you, but by the queueing system.

What programming languages are supported?

Those primary programming languages such as C, C++, Fortran, and Python are supported. Other languages, such as Java, Pascal and Ada, are also supported, but with limited technical support from us.

How do I organize my files?

Main file systems on our National systems.

How do I transfer files/directories to/from or between cluster?

Unix/Linux

To transfer files to and from a cluster on a UNIX machine, you may use scp or sftp. For example, if you want to upload file foo.f to cluster graham from your machine myhost, use the following command

myhost$ scp foo.f graham.alliancecan.ca:

assuming that your machine has scp installed. If you want to transfer a file from Windows or Mac, you need have scp or sftp for Windows or Mac installed.

If you transfer file foo.f between Alliance clusters, say from your home directory on cedar to your home directory on graham, simply use the following command

[username@cdr-login2:~]$ scp foo.f graham:/home/username/

If you are transferring files between a UNIX machine and a cluster, you may use scp command with -r option. For instance, if you want to download the subdirectory foo in the directory project in your home directory on graham to your local UNIX machine, on your local machine, use command

myhost$ scp -rp graham.alliancecan.ca:project/foo .

Similarly, you can transfer the subdirectory between Alliance clusters. The following command

[username@cdr-login2:~]$ scp -r graham.alliancecan.ca:/home/username/scratch/foo .

will download subdirectory foo from your scratch directory on graham to your home directory on cedar (note that the prompt indicates you are currently logged on to cedar).

The use of -p option above will preserve the time stamp of each file. For Windows and Mac, you need to check the documentation of scp for features. Do not use -p switch with scp or cp commands on clusters, when the destination is your project space, or you might run into the infamous Disk quota exceeded issue.

You may also tar and compress the entire directory and then use scp to save bandwidth. In the above example, first you login to graham, then do the following

[username@gra-login2:~]$ cd project
[username@gra-login2:~]$ tar -cvf foo.tar foo
[username@gra-login2:~]$ gzip foo.tar

Then on your local machine myhost, use scp to copy the tar file

myhost$ scp graham.alliancecan.ca:project/foo.tar.gz .

Note for most Linux distributions, tar has an option -z that will compress the .tar file using gzip.

Windows

You may read the instructions on using an ssh client. [[1]]

How can I best transfer large quantities of data to/from SHARCNET and what transfer rate should I expect?

In general, most users should be fine using scp or rsync to transfer data to and from SHARCNET systems. If you need to transfer a lot of files rsync is recommended to ensure that you do not need to restart the transfer from scratch should there be a connection failure. Although you can use scp and rsync to any cluster's login node(s), it is often best to use gra-dtn1.alliancecan.ca - it is dedicated to data transfer.

In general one should expect the following transfer rates with scp:

  • If you are connecting to SHARCNET through a Research/Education network site (ORION, CANARIE, Internet2) and are on a fast local network (this is the case for most users connecting from academic institutions) then you should be able to attain sustained transfer speeds in excess of 10MB/s. If your path is all gigabit or better, you should be able to reach rates above 50 MB/s.
  • If you are transferring data over the wider internet, you will not be able to attain these speeds, as all traffic that does not enter/exit SHARCNET via the R&E net is restricted to a limited-bandwidth commercial feed. In this case one will typically see rates on the order of 1MB/s or less.

Keep in mind that filesystems and networks are shared resources and suffer from contention; if they are busy the above rates may not be attainable

For transferring large amounts of data (many gigabytes) the best approach is to use the online tool Globus.

How do I access the same file from different subdirectories on the same cluster ?

You should not need copy large files on the same cluster (e.g. from one user to another or using the same file in different subdirectories). Instead of using scp you might consider issuing a "soft link" command. Assume that you need access to the file large_file1 in subdirectory /home/user1/subdir1 and you need it to be in your subdirectory /home/my_account/my_dir from where you will invoke it under the name my_large_file1. Then go to that directory and type:

ln -s /home/user1/subdir1/large_file1    my_large_file1

Another example, assume that in subdirectory /home/my_account/PROJ1 you have several subdirectories called CASE1, CASE2, ... In each subdirectory CASEn you have a slightly different code but all of them process the same data file called test_data. Rather than copying the test_data file into each CASEn subdirectory, place test_data above i.e. in /home/my_account/PROJ1 and then in each CASEn subdirectory issue following "soft link" command:

ln -s ../test_data  test_data

The "soft links" can be removed by using the rm command. For example, to remove the soft link from /home/my_account/PROJ1/CASE2 type following command from this subdirectory:

rm -rf test_data

Typing above command from subdirectory /home/my_account/PROJ1 would remove the actual file and then none of the CASEn subdirectories would have access to it.

How are files deleted from the /home/userid/scratch filesystems?

All files on /home/userid/scratch that are over 2 months old (not old in the common sense, please see below) are automatically deleted. Data needed for long term storage and reference should be kept in either /project, /nearline, or other archival storage areas. The scratch filesystem is checked at the end of the month for files which will be candidates for expiry on the 15th of the following month. On the first day of the month, a login message is posted and a notification e-mail is sent to all users who have at least one file which is a candidate for purging and containing the location of a file which lists all the candidates for purging. On the 12th of the month, a second scan is run to check for files that will still expire on the 15th, and again, users with candidates for expiry are sent a second warning email. Lastly, on the 15th, all files which are due for expiry are removed.

An unconventional aspect of this system is that it does not determine the age of a file based on the file's attributes, e.g., the standard dates reported by the stat, find, ls, etc. commands. (This is known as the "mtime" of the file, for "modification time", and is frequently inaccurate.) The age of a file is determined based on whether or not its data contents (i.e., the information stored in the file) have changed or been accessed, and this age is stored externally to the file. Only files where the contents have changed or accessed will have their age counter "reset".

A list of your candidate files for expiry can be found in /home/scratch_to_delete/$USER (where $USER is your userid).

How do I check the age of a file

We define a file's age as the most recent of:

*the access time (atime) and
*the change time (ctime)

You can find the ctime of a file using

[name@server ~]$ ls -lc <filename>

while the atime can be obtained with the command

[name@server ~]$ ls -lu <filename>

We do not use the modify time (mtime) of the file because it can be modified by the user or by other programs to display incorrect information.

Ordinarily, simple use of the atime property would be sufficient, as it is updated by the system in sync with the ctime. However, userspace programs are able to alter atime, potentially to times in the past, which could result in early expiration of a file. The use of ctime as a fallback guards against this undesirable behaviour.

It is also your responsibility to manage your stored data: most of the filesystems are not intended to provide an indefinite archiving service so when a given file or directory is no longer needed, you need to move it to a more appropriate filesystem which may well mean your personal workstation or some other storage system under your control. Moving significant amounts of data between your workstation and an Alliance system or between two Alliance systems should generally be done using Globus.

How to archive my data?

Use tar to archive files and directories

The primary archiving utility on all Linux and Unix-like systems is the tar command. It will bundle a bunch of files or directories together and generate a single file, called an archive file or tar-file. By convention an archive file has .tar as the file name extension. When you archive a directory with tar, it will, by default, include all the files and sub-directories contained within it, and sub-sub-directories contained in those, and so on. So the command tar --create --file project1.tar project1 will pack all the content of directory project1 into the file project1.tar. The original directory will remain unchanged, so this may double the amount of disk space occupied!

You can extract files from an archive using the same command with a different option:tar --extract --file project1.tar. If there is no directory with the original name, it will be created. If a directory of that name exists and contains files of the same names as in the archive file, they will be overwritten. Another option can be added to specify the destination directory where to extract the archive's content.

Compress and uncompress tar files

The tar archiving utility can compress an archive file at the same time it creates it. There are a number of compression methods to choose from. We recommend either xz or gzip, which can be used as follows:

[user_name@localhost]$ tar --create --xz --file project1.tar.xz project1
[user_name@localhost]$ tar --extract --xz --file project1.tar.xz
[user_name@localhost]$ tar --create --gzip --file project1.tar.gz project1
[user_name@localhost]$ tar --extract --gzip --file project1.tar.gz

Typically, --xz will produce a smaller compressed file (a "better compression ratio") but takes longer and uses more RAM while working. --gzip does not typically compress as small, but may be used if you encounter difficulties due to insufficient memory or excessive run time during tar --create. A third option, --bzip2, is also available, that typically does not compress as small as xz but takes longer than gzip.

You can also run tar --create first without compression and then use the commands xz or gzip in a separate step, although there is rarely a reason to do so. Similarly, you can run xz -d or gzip -d to decompress an archive file before running tar --extract, but again there is rarely a reason to do so.

The commands gzip or xz can be used to compress any file, not just archive files:

[user_name@localhost]$ gzip bigfile
[user_name@localhost]$ xz bigfile

These commands will produce the files bigfile.gz and bigfile.xz respectively.

Archival Storage

On Graham, files copied to ~/nearline will be subsequently moved to offline (tape-based) storage. See this link for more details.

How can I check the hidden files in directory?

The "." at the beginning of the name means that the file is "hidden". You have to use the -a option with ls to see it. I.e. ls -a .

If you want to display only the hidden files then type:

ls -d .*

Note: there is an alias which is loaded from /etc/bashrc (see your .bashrc file). The alias is defined by alias l.='ls -d .* --color=auto' and if you type:

l.

you will also display only the hidden files.

How can I count the number of files in a directory?

One can use the following command to count the number of files in a directory (in this example, your /work directory):

find /home/$USER -type f   | wc -l

It is always a good idea to archive and/or compress files that are no longer needed on the filesystem (see below). This helps minimize one's footprint on the filesystem and as such the impact they have on other users of the shared resource.

How to organize a large number of files?

With parallel cluster filesystems, you will get best I/O performance writing data to a small number of large files. Since all metadata operations on each of our parallel filesystems are handled by a single file server, depending on how many files are being accessed the server can become overwhelmed leading to poor overall I/O performance for all users. If your workflow involves storing data in a large number of files, it is best to pack these files into a small number of larger archives, e.g. using tar command

tar cvf archiveFile.tar directoryToArchive

For better performance with many files inside your archive, we recommend to use DAR (Disk ARchive utility), which is a disk analog of tar (Tape ARchive). Dar can extract files from anywhere in the archive much faster than tar. The dar command is available by default on sharcnet systems. It can be used to pack files into a dar archive by doing something like:

dar -s 1G -w -c archiveFile -g directoryToArchive

In this example we split the archive into 1GB chunks, and the archive files will be named archiveFile.1.dar, archiveFile.2.dar, and so on. To list the contents of the archive, you can type:

dar -l archiveFile

To temporarily extract files for post-processing into current directory, you would type:

dar -R . -O -x archiveFile -v -g pathToYourFile/fileToExtract

I am unable to connect to one of the clusters; when I try, I am told the connection was closed by the remote host

The most likely cause of this behaviour is repeated failed login attempts. Part of our security policies involves blocking the IP address of machines that attempt multiple logins with incorrect passwords over a short period of time---many brute-force attacks on systems do exactly this: looking for poor passwords, badly configured accounts, etc. Unfortunately, it isn't uncommon for a user to forget their password and make repeated login attempts with incorrect passwords and end up with that machine blacklisted and unable to connect at all.

A temporary solution is simply to attempt to login from another machine. If you have access to another machine at your site, you can shell to that machine first, and then shell to the SHARCNET system (as that machine's IP shouldn't be blacklisted). In order to have your machine unblocked, you will have to email to support@tech.alliancecan.ca as a system administrator must manually intervene in order to fix it.

NOTE: there are other situations that can produce this message, however they are rarer and more transient. If you are unable to log in from one machine, but can from another, it is most likely the IP blacklisting that is the problem and the above will provide a temporary work-around while your problem ticket is processed.

I am unable to ssh/scp from SHARCNET to my local computer

Most campus networks are behind some sort of firewall. If you can ssh out to SHARCNET, but cannot establish a connection in the other direction, then you are probably behind a firewall and should speak with your local system administrator or campus IT department to determine if there are any exceptions or workarounds in place.

SSH tells me SOMEONE IS DOING SOMETHING NASTY!?

Please first read the following:

https://docs.alliancecan.ca/wiki/SSH_security_improvements

Suppose you attempt to login to SHARCNET, but instead get an alarming message like this:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that the RSA host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
fe:65:ab:89:9a:23:34:5a:50:1e:05:d6:bf:ec:da:67.
Please contact your system administrator.
Add correct host key in /home/user/.ssh/known_hosts to get rid of this message.
Offending key in /home/user/.ssh/known_hosts:42
RSA host key for requin has changed and you have requested strict checking.
Host key verification failed. 

SSH begins a connection by verifying that the host you're connecting to is authentic. It does this by caching the hosts's "hostkey" in your ~/.ssh/known_hosts file. At times, a hostkey may be changed legitimately; when this happens, you may see such a message. It's a good idea to verify this with us, you may be able to check the fingerprint yourself by logging into another sharcnet system and running:

ssh-keygen -l -f /etc/ssh/ssh_host_rsa_key.pub 

If the fingerprint is OK, the normal way to fix the problem is to simply remove the old hostkey from your known_hosts file. On machines with commandline SSH, you can do this with:

ssh-keygen -R thehostname

You can use your choice of editor if you're comfortable doing so (it's a plain text file, but has long lines). On a unix-compatible machine, you can also use the following very small script (Substitute the line(s) printed in the warning message illustrated above for '42' here.):

perl -pi -e 'undef $_ if (++$line == 42)' ~/.ssh/known_hosts

Another solution is brute-force: remove the whole known_hosts file. This throws away any authentication checking, and afterwards, your first connection to any machine will prompt you to accept a newly discovered host key.

Ssh works, but scp doesn't!

If you can ssh to a cluster successfully, but cannot scp to to it, the problem is likely that your login scripts print unexpected messages which confuse scp. scp is based on the same ssh protocol, but assumes that the connection is "clean": that is, that it does not produce any un-asked-for content. If you have something like:

echo "Hello, Master; I await your command..."

scp will be confused by the salutation. To avoid this, simply ensure that the message is only printed on an interactive login:

if [ -t 0 ]; then
    echo "Hello, Master; I await your command..."
fi

or in csh/tcsh syntax:

if ( -t 0 ) then
    echo "Hello, Master; I await your command..."
endif

How do I edit my program on a cluster?

We provide a variety of editors, such as the traditional text-mode emacs and vi (vim), as well as a simpler one called nano. If you have X on your desktop (and tunneled through SSH), you can use the GUI versions (xemacs, gvim).

If your desktop supports FUSE, it's very convenient to simply mount your home tree like this:

mkdir sharcnet
sshfs graham.alliancecan.ca: sharcnet

you can then use any local editor of your choice.

If you run emacs on your desktop, you can also edit a remote file from within your local emacs client using Tramp, opening and saving a file as /username@cluster.alliancecan.ca:path/file.

Research at SHARCNET

I have a research project I would like to collaborate on with SHARCNET, who should I talk to?

You may contact SHARCNET head office or contact members of the SHARCNET technical staff.

How can I contribute compute resources to SHARCNET so that other researchers can share it?

Most people's research is "bursty" - there are usually sparse periods of time when some computation is urgently needed, and other periods when there is less demand. One problem with this is that if you purchase the equipment you need to meet your "burst" needs, it'll probably sit, underutilized, during other times.

An alternative is to donate control of this equipment to SHARCNET, and let us arrange for other users to use it when you are not. We prefer to be involved in the selection and configuration of such equipment. Our promise to contributors is that as much as possible, they should obtain as much benefit from the cluster as if it were not shared. Owners get preferential access. Naturally, owners are also able to burst to higher peak usage, since their equipment has been pooled with other contributions. (Technically, SHARCNET cannot itself own such equipment — it remains owned by the institution in question, and will be returned to the contributor upon request.) If you think this model will also work for you and you would like to contribute your computational resource to help the research community at SHARCNET, you can contact us for such arrangement.

I do not know much about computation, nor is it my research interest. But I am interested in getting my research done faster with the help of the high performance computing technology. In other words, I do not care about the process and mechanism, but only the final results. Can SHARCNET provide this type of help?

We will be happy to bring the technology of high performance computing to you to accelerate your research, if at all possible. If you would like to discuss your plan with us, please feel free to contact our high performance computing specialists. They will be happy to listen to your needs and are ready to provide appropriate suggestions and assistance.

I need access to more CPU cores or storage than are available by default, what programs exist to support demanding computation?

SHARCNET participates in the Alliance national RAC (Resource Allocation Competition) and provides a continual competition for groups that require more than the default level of access to our resources. Please see Dedicated Resources for further information.

I heard SHARCNET offers fellowships, where can I get more information?

SHARCNET no longer actively runs a fellowship program. You may find information regarding past fellowships and other dedicated resource opportunities on the Research Fellowships page of the web portal.

I would like to do some research at SHARCNET as a visiting scholar, how should I apply?

In general, you will need to find a hosting department or a person affiliated with one of the SHARCNET institutions. You may also contact us directly for more specific information.

I would like to send my students to SHARCNET to do some work for me. How should I proceed?

See above.

Contacting SHARCNET

How do I contact SHARCNET for research, academic exchanges, and technical issues?

Please contact SHARCNET head office.

How do I contact SHARCNET for business development, education and other issues?

Please contact SHARCNET head office.

How do I contact a specific staff member at SHARCNET?

See staff directory for contact information.


How to Acknowledge SHARCNET in Publications

How do I acknowledge SHARCNET in my publications?

We recommend one cite the following:

This work was made possible by the facilities of the Shared Hierarchical 
Academic Research Computing Network (SHARCNET:www.sharcnet.ca) and Digital Research Alliance of Canada (https://alliancecan.ca/en).

I've seen different spellings of the name, what is the standard spelling of SHARCNET?

We suggest the spelling SHARCNET, all in upper case.


What types of research programs / support are provided to the research community?

Our overall intent is to provide support that can both respond to the range of needs that the user community presents and help to increase the sophistication of the community and enable new and larger-in-scope applications making use of SHARCNET's HPC facilities. The range of support can perhaps best be understood in terms of a pyramid:

Level 1

At the apex of the pyramid, SHARCNET supports a small number of projects with dedicated programmer support. The intent is to enable projects that will have a lasting impact and may lead to a "step change" in the way research is done at SHARCNET. Inter-disciplinary and inter-institutional projects are particularly welcomed. For the latest information about the program, including application guidelines, please see the Programming Competition page in our web portal.

Level 2

The middle layers of support are provided through a number of initiatives.

These include:

  • Programming support of more modest duration (several days to one month engagement, usually part time)
  • Training on a variety of topics through workshops, seminars and online training materials
  • Consultation. This may include user-initiated interactions on particular programs, algorithms, techniques, debugging, optimization etc., as well as unsolicited help to ensure effective use of SHARCNET systems
  • Site Leaders play an important role in working with the community to help researchers connect with SHARCNET staff and to obtain appropriate help and support.

Level 3

The base level of the pyramid handles the very large number of small requests that are essential to keeping the user community working effectively with the infrastructure on a day-to-day basis. Several of these can be answered by this FAQ; many of the issues are presented through the ticketing system. The support is largely problem oriented with each problem being time limited.