Multi threading

From Ilianko

prace

super computer infrastructer...


desislava ivanova


Prace

  • дава достъп до ресурсите Prace
    • уравниение шрьодингер...
    • 1024*4 PowerPC процесори - българския компютър ( Blue Gene/P ...(в България), 27.85tflops

http://www.prace-ri.eu/ ,

Ресурси които имаме

  • Blue Gene ...(Blue Gene/P) в България
  • Jugene (Blue Gene/P)
  • Curie
  • Hermit
  • SuperMUC
  • CRAY XE6
  • Jurora
  • NIIFI - ClusterGrid
  • CINECA

Членове

  • Аустриа
  • ... mnogo evropejski dyrzhavi

Za da se uchastva trqbwa da dokazhesh, che imash zadacha , koqto izpolzwa 512 qdra, s obshto choveshko s znachenie...

Видове

Финна гранулизация - 1 whodno izhodna система - много ядра Финна гранулна паралезация

Message packing interface - gruba paalezaciq ...

Grid computing, Cloud Computing vs Parallel Computing....

isomorphna sistemna mrezha ... nachin na kluchvane na otdelnite vyzli Onet++

6Dimensional topologia

Klysteri

  • Homogenni
  • Nehomogenni

BlueVision

Bylgarska ideq -- analiz na satelitni snimki -- prognoza za osolenost na pochvi, predvizhdane na pozhari, zamyrseni vodi, navodneniq ... - na bazata na multi spektralen analiz


Диаграма на Гант

latenstnost propuskatelna sposobnost

pattern ...

fleet

YARC chip - http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=3&ved=0CDkQFjAC&url=http%3A%2F%2Fwww.ece.gatech.edu%2Fresearch%2Flabs%2Fesl%2Fclasses%2FECE6101%2FFor6101-1%2Fyarc.ppt&ei=8s1-T52YEorFtAat8fTPBA&usg=AFQjCNGWBp5mq_fLa4qThFgnbODlNd8Z7w


SAN - system area network hibriden dizajn - 3d torus+fat Tru

Principles and Practices of Interconnection Networks (The Morgan Kaufmann Series in Computer Architecture and Design) [Hardcover] William James Dally

routing algorithm <-> topology (razmer na paketa)

store forward .... wormhole .... gossip, reduced ... testwane traffic , collective network, 3D -dimensionall tools toroid, Low latency global Barier and Interupt

fat tree - benes network

Dekompoziciq

Randevu

komunikaciq point to point Kontragent <=> processor

Kolektiwna komunikaciq

  • Broadcast, ne personizirana komunikaciq (one to all)
  • All to All
  • Gossiping - personizirana globalna komunikaciq (naj tezhyk rezhim)
  • Redukciq - vsichki izprashtat kym edin (root stava goreshta tochka)

Mrezhi

toroid debelo dyrvo


Wsqka programa nalaga rezhim na komunikaciq. Мрежата си има максимална възмозхност. Предложен -“ Приет трафик (офферед delivered)

Devide and Conquer

KOmpoziciq и декомпозиция

  • Coarse granularity - груба гранулация - подход с message passing MPI
  • Обща памет => финна гранулациия OpenMPI
  • Хибридно -MPI OpenMpi

...

MPI

  • MPI_Init
  • MPI_Comm_Rank
  • MPI_Comm_Size
  • MPI_Finalize -
  • MPI_Barier - изчакване на процесите mpi_err = MPI_Barrier(MPI_COMM_WORLD);
  • MPI_Wtime - време на изпълнение на програмата
  • MPI_Wtick - ----------//-------------------
 
/* C Example */
#include <stdio.h>
#include <mpi.h>


int main (argc, argv)
     int argc;
     char *argv[];
{
  int rank, size;

  MPI_Init (&argc, &argv);	/* starts MPI */
  MPI_Comm_rank (MPI_COMM_WORLD, &rank);	/* get current process id */
  MPI_Comm_size (MPI_COMM_WORLD, &size);	/* get number of processes */
  printf( "Hello world from process %d of %d\n", rank, size );
  MPI_Finalize();
  return 0;
}

Мессаге пассинг модел

системна мрежа - full connectivity


Паралелната програма се представя само като граф не с блок схема


Система с рапределена памет

memory+CPU

Memory +cpu

...


Типове програми

  • статично паралелни - broq na procesite e postoqnen
  • динамично паралелни - примерно търсаките (предварително не се знае необходимия ресурс и трябва да се даде v run-time)


.

MPI_COMM_WORLD


Kompilirane

%mpicc -o myprog ....


Kolektiwna komunikaciiq

MPI_Reduce

MPI_REDUCE combines the elements provided in the input buffer of each process in the group, using the operation op, and returns the combined value in the output buffer of the process with rank root. The input buffer is defined by the arguments sendbuf, count and datatype; the output buffer is defined by the arguments recvbuf, count and datatype; both have the same number of elements, with the same type. The routine is called by all group members using the same arguments for count, datatype, op, root and comm. Thus, all processes provide input buffers and output buffers of the same length, with elements of the same type. Each process can provide one element, or a sequence of elements, in which case the combine operation is executed element-wise on each entry of the sequence. For example, if the operation is MPI_MAX and the send buffer contains two elements that are floating point numbers ( count = 2 and datatype = MPI_FLOAT), then and .


Ускорение

цели се паралелното програмиране

супер линейно ускорение

... ...

underload, overheading, super linier acceleration



eratosten

broene na prostite chisla

  1. markirame chetnite chisla
  2. markirame tezi koito se delqt na tri
  3. /5 se markirat

Block data decomposition

SPMD

funkcionalna dekompoziciq

Literatura

http://geco.mines.edu/workshop/class2/examples/mpi/index.html