Bilimsel hesaplamada paralelleştirme ile hızlanamayan ünlü problemler / algoritmalar var mı? CUDA ile ilgili kitap okurken çoğu şeyin olabileceği anlaşılıyor.
Bilimsel hesaplamada paralelleştirme ile hızlanamayan ünlü problemler / algoritmalar var mı? CUDA ile ilgili kitap okurken çoğu şeyin olabileceği anlaşılıyor.
Yanıtlar:
Temel sorun, kritik yol uzunluğu hesaplama toplam miktarına göre , T . Eğer C ile orantılıdır T iyi bir sabit hız-up, daha sonra paralellik teklifler. Eğer Cı daha asimptotik küçük olan T , sorun büyüklüğü arttıkça daha fazla paralellik için oda vardır. Ki burada algoritmalar için , T giriş büyüklüğü polinom , N , en iyi durumda Cı ~ günlük T çok az faydalı miktarlarda logaritmik süresinden daha kısa bir süre içinde hesaplanabilir, çünkü.
Kuzey Carolina karmaşıklığı sınıfı (polylogarithmic zamanda, örneğin,) paralel olarak etkin bir şekilde çözülebilir bu sorunları karakterize etmektedir. olup olmadığı bilinmemektedir , ancak yanlış olduğu yaygın olarak varsayılmaktadır. Eğer gerçekten durum buysa, P-complete , "içsel olarak sıralı" olan ve paralellikten dolayı önemli ölçüde hızlanamayan sorunları karakterize eder.
parallel processors. It is still unknown whether (although most people suspect it's not) where is the set of problems solvable in polynomial time. The "hardest" problems to parallelize are known as -complete problems in the sense that every problem in can be reduced to a -complete problem via reductions. If you show that a single -complete problem is in , you prove that (although that's probably false as mentioned above).
So any problem that is -complete would intuitively be hard to parallelize (although big speedups are still possible). A -complete problem for which we don't have even very good constant factor speedups is Linear Programming (see this comment on OR-exchange).
Start by grocking Amdahl's Law. Basically anything with a large number of serial steps will benefit insignificantly from parallelism. A few examples include parsing, regex, and most high-ratio compression.
Aside from that, the key issue is often a bottleneck in memory bandwidth. In particular with most GPU's your theoretical flops vastly outstrip the amount of floating point numbers you can get to your ALU's, as such algorithms with low arithmetic intensity (flops / cache-miss) will spend a vast majority of time waiting on RAM.
Lastly, any time that a piece of code requires branching, it is unlikely to get good performance, as ALU's typically outnumber logic.
In conclusion, a really simple example of something that would be hard to get a speed gain from a GPU is simply counting the number of zeros in a array of integers, as you may have to branch often, at most perform 1 operation (increment by one) in the case that you find a zero, and make at least one memory fetch per operation.
An example free of the branching problem is to compute a vector which is the cumulative sum of another vector. ( [1,2,1] -> [1,3,4] )
I don't know if these count as "famous" but there is certainly a large number of problems that parallel computing will not help you with.
The (famous) fast marching method for solving the Eikonal equation cannot be sped up by parallelization. There are other methods (for example fast sweeping methods) for solving the Eikonal equation that are more amenable to parallelization, but even here the potential for (parallel) speedup is limited.
The problem with the Eikonal equation is that the flow of information depends on the solution itself. Loosely speaking, the information flows along the characteristics (i.e. light rays in optics), but the characteristics depend on the solution itself. And the flow of information for the discretized Eikonal equation is even worse, requiring additional approximations (like implicitly present in fast sweeping methods) if any parallel speedup is desired.
To see the difficulties for parallelization, imagine a nice labyrinth like in some of the examples on Sethian's webpage. The number of cells on the shortest path through the labyrinth (probably) is a lower bound for the minimal number of steps/iterations of any (parallel) algorithm solving the corresponding problem.
(I write "(probably) is", because lower bounds are notoriously difficult to prove, and often require some reasonable assumptions on the operations used by an algorithm.)
Another class of problems that are hard to parallelize in practice are problems sensitive to rounding errors, where numerical stability is achieved by serialization.
Consider for example the Gram–Schmidt process and its serial modification. The algorithm works with vectors, so you might use parallel vector operations, but that does not scale well. If the number of vectors is large and the vector size is small, using parallel classical Gram–Schmidt and reorthogonalization might be stable and faster than single modified Gram–Schmidt, although it involves doing several times more work.