Showing posts with label resiliency. Show all posts
Showing posts with label resiliency. Show all posts

Friday, May 05, 2017

Wednesday, April 05, 2017

[Links of the Day] 05/04/2017 : Large network resilience, Distributed Systems, Machine Learning & Bayesian reasoning





Wednesday, November 23, 2016

[Links of the Day] 23/11/2016 : AMD Exascale vision, Hardware Resiliency myths and truths, MIT EmTech


  • Resiliency for Reliability– Myths and Truths : this slide deck provide an overview of the resiliency issue and how Intel tackle those for  hardware fault. From fans down to soft errors ( ex: neutron beam ... yes this can £%£ your system). The authors present the two type of approach , reactive and proactive handling of errors.
  • AMD's Exascale computing vision : Its all about 3d stacked chip with future interconnect. The interesting bit is the ROCM platform and the P2P multiGPU and P2P with RDMA. Slowly we are removing the need to have a full server to deploy GPU, one step closer to fully modular system with each resourced pooled and optimized in their own enclosure. Its a lot easier to design power supply, cooling system, etc.. When you do not have to deal with heterogeneous hardware with different power, and cooling profile ( cpu, memory , disk etc.. in the same enclosure).
  • MIT EmTech 16 : This year MIT EmTech is all about AI & machine learning ... reaching maximum hype in the domain


Friday, February 13, 2015

Links of the day 13 - 02 - 2015

Today's links 13/02/2015: Scalable and resilience web site, CAP theorem, automated log analyzer and Exascale HPC challenges
  • Scalable and Resilient website : lessons learned from all the biggest sites on the internet about how to build scalable and resilient architectures. 
  • Perspectives on the CAP theorem : paper summary that show the CAP theorem in the broader context of a family of results in distributed computing theory that shows impossibility of guaranteeing both safety and liveness in an unreliable distributed system.
  • Sequence: Automated Analyzer for Reducing 100,000's of Log Messages to 10's of Patterns
  • Algorithmic and Software Challenges atExtreme Scales : presentation by Jack Dongarra  , quite redundant as the same theme are pretty common to all HPC exascale challenge : matrix operations optimizations, power, resilience, scalability, etc..
IBM Photonic 3D chip

Wednesday, November 19, 2014

Links of the day 19 - 11 - 2014

Today's links 19/11/2014: #resiliency, #cloud , distributed system, #stream processing


Wednesday, October 15, 2014

Links of the day 15 - 10 - 2014

Today's links 15/10/2014: Resiliency , CEPH , storage, RDMA, NVM