And what do we have here? It seems that Nvidia and AMD are already on top of the idea of offloading AI onto GPUs.
The target here seems to be path-finding algorithms, which is unsurprising. And, of course, this would be using the relevant GPGPU tech from each company: CUDA and ATI Stream. The article touches on the competing efforts and the need for standards in the space – which I have commented on previously.
Now, call me a pessimist but as much as I am looking forward to this – I’m not inclined to think that it will revolutionize gaming AI. Your average triple-A video game is going to squeeze everything it can out of your PC hardware. This is not a new thing. It isn’t unusual that the owner of the AI component in the game is told that they can only utilize a small percentage of the CPU at any time. I don’t think that moving that CPU load to the GPU changes the picture meaningfully – the AI component will still only be given a small percentage of the available PC horsepower to use. Only now they have to bargain with the graphics guys for their cut of the available machine cycles. And I’m not sure that many game studios would be willing to dial down the slick anti-aliased vertex-shaded blah-blah-buzzword graphics of their game simply to give more cycles to the AI. But I could be wrong.
However, this re-opens a recent favorite topic of mine – using video cards to throw large numbers of cores at parallel AI problems. There is certainly potential here, and maybe there will be a game that really utilizes it and surprises us all.
All that said, I’m still looking forward to seeing the technology and to see how all this plays out.
(Tip of the hat to Kotaku for the link.)
Greetings:
Using multi-cores (or multiple CPUs) was explored in the late 80’s by Drs. Gupta, Forgy and Newell. Dr. Anoop Gupta did this for his Ph.D. and Dr. Forgy was his mentor and Dr. Newell was on the board at that time. Most is documented in a program they called “Para-OPS” and was done in LISP.
Something old is now something new. :-)
SDG
jco
Hi James. Thanks for dropping in again.
This blog has a small number of readers, all of whom I may know personally. Therefore, I assumed that that most readers would be familiar with the prior art in the space. Dr. Gupta’s thesis was actually published as a paperback at one point – that’s the edition I have at my office.
Although, now that you mention it – a survey of the prior art might make a good posting, or a good ORF presentation. A lot of folks don’t know the history.
I’m tracking this space for a few reasons:
* Modern multi-cores are relatively more affordable compared to the specialized hardware required for a lot of the prior research in the space.
* As both Dr. Forgy and Gary Riley pointed out at ORF 2008, when it comes to rules – over time, the problems being tackled seem to scale to the available hardware.
I think the trend of multi-core on the desktop is going to open up some additional opportunities that we haven’t had before.
Has a lot of this been studied before? Absolutely.
Will a lot of people waste time covering ground that was already trailblazed by folks such as Newell, Forgy and Gupta? Probably.
(While I’m at it – do you know of the source for ParaOPS5 is online anywhere?)
I just came across your question about Parallel OPS5 sources. The C code can be found at http://cs.ucsb.edu/~acha/software.html
It is good to see interest in this area again.
Thanks for the link, Dr. Forgy! I hadn’t found it yet.
Actually, I’m slowly putting together a set of links to known available OPS5 sources and will post it here and keep it updated. So, I encourage other readers to send me source code links as well.
(Best wishes and hope to see you again at the conference this fall.)
The execution and memory model of GPUs are vastly different than standard CPU threads. Running truly general-purpose threaded code through GPUs will at best yield only minimal gains if any at all. GPUs are still based on streaming code passes over data grids, and a core-spanning SIMD model such that divergence in execution paths between threads causes complete serialization. Obviously, matrix-heavy HPC applications gobble this stuff up, but that’s very much a minority case.
In our initial testing, GPGPU might be more feasible in terms of parallel index searching rather than having an independent Rete node per core, if the compiler/architecture uses ARM-style execution flags instead of plain branching. Since our forward-chaining engine is built on top of a backward-chaining core and thus is a cached indexing system rather than a set of interconnected agents, this is definitely an area we’re looking to benefit from.
Thanks for dropping by, David. Obviously, there’s tons of research in the area going back over 20 years.
It’s also clear that when it comes to parallelism in general, it’s not one size fits all.
Still, the end of the CPU clock speed wars and the rise of multi-core makes for interesting times in a number of ways.