- Krustlet : it seems that web assembly is getting more pervasive, we have kernel WASM, WASM for Deep learning, now Krustlet offer WASM for Kubernetes via Kubelet.
- Fast Anomaly Detection in Graphs : Really cool real-time anomaly detection on dynamic graphs. the authors claim to be 644 times faster than SOTA with 42-48% higher accuracy. What is even more attractive is the constant memory usage which is fantastic for production deployment. [github]
- io_uring : this will dominate the future of the Linux interface. It is currently eating up every single IO interface and probably won't stop just there.
A blog about life, Engineering, Business, Research, and everything else (especially everything else)
Showing posts with label linux. Show all posts
Showing posts with label linux. Show all posts
Tuesday, June 09, 2020
[Links of the Day] 09/06/2020 : #WASM on #K8S , Fast Anomaly Detection on Graphs, Linux one ring
Labels:
anomaly detection
,
graph
,
kernel
,
Kubernetes
,
links of the day
,
linux
,
wasm
Thursday, March 05, 2020
[Links of the Day] 05/03/2020 : Linux AI tuning, easy AutoML , lightweight container development environment
- OpenEuler : Huawei Linux distribution, interesting side project is A-tune which relies on AI for identifying the workload that runs on your the OS and tries to tune it to optimise its performance.
- AutoGluon : AutoGluon enables easy-to-use and easy-to-extend AutoML with a focus on deep learning and real-world applications spanning image, text, or tabular data. [github]
- k3c : kubernetes but lightweight and easy to use for container development
Labels:
Artificial intelligence
,
automl
,
containers
,
docker
,
huawei
,
Kubernetes
,
links of the day
,
linux
,
machine learning
Tuesday, March 03, 2020
[Links of the Day] 03/03/2020 : Embedded linux build toolchain, Cyber security body of knowledge, AWS API change tracker
- BuildRoot : a tool to generate embedded Linux systems through cross-compilation.
- Cybok v1.0 : aims to codify the foundational and generally recognised knowledge on cyber security. In the same fashion as SWEBOK, CyBOK is meant to be a guide to the body of knowledge; the knowledge that it codifies already exists in literature such as textbooks, academic research articles, technical reports, white papers, and standards. [website]
- AWS API Change : feel overwhelmed by the pass of change of the AWS API stack. Have no idea why your code doesn't work anymore? Want to use the latest, shiniest aws feature. You need this, this web page tracks all the API change in AWS stack.
Labels:
aws
,
cyber security
,
embedded
,
links of the day
,
linux
Thursday, April 18, 2019
[Links of the Day] 18/04/2019 : Decentralised networks made easy, Reload Linux system over ssh, Self Hosted push notification
- Octopus : Octopus allow you to streamline the creation of decentralized networks.
- reload.sh : Wipe, reinstall or restore your system from running GNU/Linux distribution. Via SSH, without rebooting.
- Gotify : A self-hosted push notification service. But beware as worth noting that this will use a noticeable amount of power more than Google cloud messaging. Google/Apple and others have special deals with carriers that might make such solution cost prohibitive.
Labels:
carrier
,
cellular
,
decentralized
,
links of the day
,
linux
,
network
,
push notification
,
ssh
,
telecom
Wednesday, May 09, 2018
A look at Google gVisor OCI runtime
Google released a new OCI container runtime: gVisor. This runtime aim at solving part of the security concerns associated with the current container technology stack. One of the big argument of the virtualisation crowd has always been that the lack of explicit partitioning and protection of resource can facilitate “leakage” from the containers to the host or other adjacent containers.
This stem from the historical evolution of containers in Linux. Linux has no such concept of native containers. Not like BSD with Jails or Solaris with Zones. Containers in Linux are the result of the gradual emergence of a stack of various security and isolation technology that was introduced in the Linux kernel. As a result, Linux ended up with a highly broad technology stack that can be separately turned on / off or tuned. However, there is no such thing as a pure sandbox solution. The classic jack of all trade curse, it’s a master of none solution.
The Docker runtime (Containerd) package all the Linux kernel isolation and security stack (namespaces, cgroup, capabilities, seccomp, apparmor, SELinux) into a neat solution that is easy to deploy and use.
It allows the user to restrict what the application can do such as which file it can access with which permission or limits resource consumption such as networks, disk I/O or CPU. It allows the applications to share resources without stepping on each other toes happily. Also, limits the risk any of their data being accessed while sitting on the same machine.
With a correct configuration ( the default one is quite reasonable ) will allow blocking anything that is not authorised and in principle protect from any leak from a malicious or badly coded piece of code running in the container.
However, you have to understand that Docker has some real limitation already. It has only limited support for user-namespace. The user-namespace allows applications to have UID 0 permission within the containers ( aka root ) while the containers and user running has a lower privilege level. As a result, each container would run under a different user ID without stepping on each other toes.
All of these features rely on the reliability and security (as in no bugs) of the Linux kernel. Most of Docker advanced feature relies on kernel features. And getting new features is a multi-year effort, it took a while for good resource isolation mechanism percolates from the first RFC to the stable branch by example. As a result, Docker and current container ecosystem are directly dependent on the Linux kernel update inertia as well as its code quality. While being excellent, no system is entirely free of bug, not to mention the eternal race for patching them when they are discovered and fixed.
Hence the idea is to, rather than having to share the kernel between all the containers, have one kernel per container. Explicitly limiting potential leakage, interference and reduce the attack surface. gVisor adopts this approach, which is not new as KataContainers already implemented something similar. Katacontainers is the result of the fusion of ClearContainer (intel) and runV (hyper). Katacontainers use KVM as a minimalistic kernel dedicated to the container runtime. But, you still need to manage the host machine to ensure fair resource sharing and their securitisation. This additional layer of indirection limits the attack surface as even if a kernel bug is discovered you will be challenged to exploit it to escape to another an adjacent container or underlying one as they are not shared.
gVisor can use a KVM as kernel; however, it was initially and is still primarily designed around ptrace. User Mode Linux already used the same technique, which is to start a process in userspace for the subsystem that will be running on top. Similarly to a hypervisor model used by virtual machines. All the system calls will be executed using the permission of the userspace process on behalf of the subsystem via an interception mechanism.

Now, how do you intercept these system calls which should be executed by the kernel? UML and gVisor divert ptrace primary goal ( which is debugging ) and inject a breakpoint in the executable code to intercept and stop for every system call execution. Once caught the new userspace kernel will execute the call on behalf the original process within userspace. It works well, but as you guessed, there is no free lunch. This method what heavily used by the first virtualisation solution. But rapidly, processor vendors realised that offering hardware-specific acceleration method would be highly beneficial ( and sell more at the same time).
KVM and other hypervisor leverage such accelerator. Now you even have AWS and Azure deploying completely dedicated coprocessor for handling virtualization related acceleration. Allowing VM to run almost that the same speed as a bare metal system.
And like Qemu leveraging KVM, gVisor also offer KVM as underlying runtime environment. However, there is significant work to be done to enable any container to run on to of it. While ptrace allow to directly leverage existing Linux stack, with KVM you need need to reimplement a good chunk of the system to make it work. Have a look at Qemu code to understand the complexity of the task. This is the reason behind the limited set of supported applications as not all syscalls are implemented yet.
As is, gVisor is probably not ready yet for production. However, this looks like a promising solution providing a middle ground between the Docker approach and the Virtualization one while taking some of the excellent ideas coming from the unikernel world. I hope that this technology gets picked up, and the KVM runtime becomes the default solution for gVisor. It will allow the system to benefit from a rock-solid hardware acceleration with all the paravirtualisation goodies such as virtio.
Labels:
container
,
docker
,
google
,
isolation
,
kernel
,
kvm
,
linux
,
performance
,
security
,
unikernel
,
virtualization
Thursday, January 18, 2018
[Links of the Day] - 18/01/2018 : Stellar Cryptocurrency Consensus protocol, Optimizing linux server for high throughput and low latency, performance impact of meltdown patch on HPC Filesystem
- Stellar Consensus Protocol : from ripple for to full-blown rewrite. Stellar looks like an impressive protocol addressing many of the shortcoming and risk of Ripple. Also, the authors seem to be smart enough to avoid jumping to fast onto the smart contract aspect as it is a really tough nut to crack. Maybe, with all the mayhem surrounding cryptocurrency, the stellar approach seems to be rather measured. Worth keeping an eye on.
- Optimizing web servers for high throughput and low latency : very good post on how to optimise your Linux system. A lot of it has already described many times, but it is never a bad thing to repeat them.
- The performance impact of Meltdown patches on HPC FS (Lustre) : no surprise here, IO intensive applications are the one the most heavily impacted. However, I wasn't expecting 40% performance penalty and up to 45% for large folders.

Labels:
bug
,
consensus
,
cryptocurrency
,
filesystem
,
HPC
,
links of the day
,
linux
,
meltdown
,
optimization
,
patch
,
performance
Friday, April 14, 2017
[Links of the Day] 14/04/2016 : OpenFabric Workshop , Docker's Containerd , Category Theory
- OpenFabrics Workshop 2017 : Some interesting talk this year at the open fabric conference:
- uRDMA : Userspace RDMA using DPDK. This opens up a certain amount of possibility, especially for object storage solution. [Video , Slides, github]
- Crail : Using urdma above to deliver accelerated storage solution for Apache big data projects [Slides, github]
- Remote Persistent Memory: I think this is the next killer app for RDMA. If Intel doesn't jump onto it and deliver a dpdk like solution. [Video, Slides]
- On Demand paging: slowly the tech is crawling its way up to upstream acceptance. While on-demand paging introduces a certain performance cost. It also allows a greater flexibility in consuming RDMA. One of the interesting aspects that nobody mentioned yet is how this feature could be used with persistent memory. I think that there is some good potential for p2p NVM storage solution.[Video, Slides]
- Containerd : Containerd move to github, the docker "industry standard" container runtime is also reaching its v.0.2.x release. [github]
- Category Theory : If you are into functional programming and Haskell. This is a must read book for you.
Labels:
category theory
,
container
,
docker
,
haskell
,
kernel
,
linux
,
nvm
,
nvme
,
openfabrick
,
rdma
,
user space
Wednesday, April 12, 2017
[Links of the Day] 12/04/2017: Linux Perf tools, libp2p and Contagion of Information in Social Media
- Perf Tools : miscellaneous collection of in-development and unsupported performance analysis tools for Linux ftrace and perf_events.
- Contagion of Information in Social Media : The authors look at how information spread on social media ( twitter ). The authors model contagion behaviour in the hope to create effective defences against "fake news" and other propaganda. However, to some extend the research can also be used to optimise the spread of such malicious information.
- libp2p : really cool network stack ( used by IPFS) that tackle a lot of the nitty gritty detail of p2p applications. It should allow devs to focus on the actual value of their p2p apps rather than the technical underlying problems of p2p itself. [github]
Labels:
links of the day
,
linux
,
p2p
,
performance
,
social media
,
twitter
Monday, July 11, 2016
[Links of the day] 11/07/2016: SSD failures, BCC , NUMA deep dives
- SSD Failures in Datacenters : Best student paper : SSD fail, What? When? And Why?
- BCC : I have been trying to trace some nasty RCU stall bug ( which turn out to be just the symptom of another problem) and BCC was really useful with this ordeal. It is quickly turning into the linux swiss army knife of debugging. BPF is an amazing piece of software.
- NUMA Deep Dive Series : Start of a series of posts looking into history and modern NUMA architecture.
Labels:
datacenter
,
kernel
,
links of the day
,
linux
,
numa
,
SSD
Thursday, June 16, 2016
[Links of the day] 16/06/2016 : Bayesan Betancourt's lecture, Physic breakthrough & Distributed systems, Linux Kernel Radix tree
- Betancourt Binge : Michael Betancourt’s video lectures in Tokyo, all about Bayesian model.
- Standing on Distributed Shoulders of Giants : as usual excellent ACM queue article drawing parallel between physics breakthrough and the world of distributed systems.
- Multi-order radix tree : The Linux kernel radix tree is a data-structure at the centerpiece of the memory management system. With the advent of new memory model ( persistent one). This data-structure needs to evolve. However, like with everything touching memory, it will take some time and many many re-submission to the mailing list. This article present a possible evolution path for this venerable data-structure.
Labels:
Distributed systems
,
kernel
,
links of the day
,
linux
,
machine learning
,
physics
,
radix tree
Tuesday, May 03, 2016
[Links of the day] 03/05/2016: Linux Storage, Filesystem, and Memory-Management Summit 2016
Linux Storage, Filesystem, and Memory-Management Summit : Loads of really good talk , here is a selection :
- VM as containers : Current effort focus on solving 2 main problems : 1. total VM memory consumption is superior to what application that runs in. 2. Storage access : a lot of the storage aspect focus on moving the storage stack back to the host ( providing DAX or Fuse). However all these aspects require carefull design in order to avoid compromising security and isolation features of virtual machines.
- Bulk memory-allocation APIs : What do we want ? we want loads of memory fast - when do we want it ? -N...O...W.. :) [slides]
- Persistent memory as remote storage : a look into leveraging RDMA for remote persistent storage access. A really good discussion around the possibility to move from PULL to PUSH mode for remote access . However this would require a lot of change and addition to work with the RDMA stack. Probably too much for it to be a viable option in the short term. Another aspect of the discussion was related to the durability guarantee of remote storage protocol. It is interesting to see that their is a consensus regarding the need for an API to hide the different durability behavior variation of the fabric / protocol / HW. This is sorely missing and why storage solution often trap you down a certain path and cannot evolve to adopt new tech, fabric, and hardware.
Labels:
API
,
containers
,
links of the day
,
linux
,
memory
,
nvm
,
storage
,
vm
Tuesday, April 19, 2016
[Links of the day] 19/04/2016: All CPU docs, Linux scheduling waste and probabilistic programming
- Decade of Wasted Cores : the forever war of linux scheduler optimisation main victimes, your cpu cores.. Well not really but as always the jack of trade default config is a master of none which implies that as soon as you have specific workload you need to spend the time to optimise it and sometime it doesn't exist.. This paper look into the impact of the linux scheduler policies and design.
- Pamela : Probabilistic Advanced Modeling and Execution Learning Architecture
- Awesome CPU : All CPU and MCU documentation in one place
Labels:
cpu
,
links of the day
,
linux
,
probabilistic
,
schedulers
Monday, February 01, 2016
[Links of the day] 01/02/2016: Linux internals, best paper and Spotify goes SDN
- Best Papers : Best Paper Awards in Computer Science (since 1996)
- Linux Internals : very good ebooks on the internal of linux kernel from boot to memory
- SDN Internet Router [part2] : a very impressive demonstration of the implication of the SDN technology for company. It allowed spotify to replace routers that would have cost 1/2 $M each with a couple of SDN switches.
Labels:
kernel
,
links of the day
,
linux
,
papers
,
sdn
Wednesday, December 02, 2015
Links of the day 02/12/2015: fast linux perf analysis, Path to AI, Datacenter transport
- Linux Performance Analysis in 60,000 Milliseconds : fast minimal performance analysis approach allowing to quickly narrow down where the issue might come from.
- Path to general AI : interesting essay on AI and why the current path for will not allow the emergence of a true intelligence as human understand it.
- pHost : Distributed Near-Optimal Datacenter Transport Over Commodity Network Fabric. This is an improvement over the previous Fastpass transport as it allows end-hosts to directly make scheduling decisions, thus avoiding the overheads of centralized scheduler architecture
Labels:
ai
,
analysis
,
links of the day
,
linux
,
network
,
performance
Wednesday, November 04, 2015
Links of the day 04/11/2015: Intel ISA-L , Linux Kernel userland page fault handling, Evolution of CI at stratoscale
- ISA-L : brief introduction to the Intel Intelligent Storage Acceleration Library (ISA-L). Some nice feature for erasure code in there [intel 01 website]
- Evolution of CI at Stratoscale : How the development team develops, tests, deploys and operates it. How do we get tens of developers to work productively at a high velocity while maintaining system cohesion and quality? How can we tame the inherent complexity of such a distributed system? How do we continuously integrate, test, deploy and operate it? How do we take devops to the limit? And just how tasty is our own dog food?
- Userfaultfd : Nice to see the code making it into the upstream release for user land page fault resolution.
Labels:
CI
,
intel
,
kernel
,
library
,
links of the day
,
linux
,
storage
,
stratoscale
Friday, August 28, 2015
Links of the day 28/08/2015 : libfabric, IO visor and demoscene
- IO visor : it seems that the effort from Plumgrid around eBPF are picking up speed and an official linux foundation project is been setup. However one must wonder how such solution will compete against the pure user space solution relying on DPDK and consort? You can find a more in depth slide deck of the concept [here].
- Libfabric : Intel is announcing in grand pomp the "Open source" library supporting its Omni path fabric. However it is not other that the fantastic Lib fabric. This library offer a set of next-generation, community-driven, ultra-low latency networking APIs. The APIs are not tied to any particular networking hardware model, it support Infiniband / Iwarp, usNic from Cisco, OPA from intel What is interesting is that it goes one step further than the RDMA library while maintaining a good balance between low level tuning and high level program-ability. While the learning curve might be a little bit more steep compared to Accelio from Mellanox it delivers ( I think ) greater advantage and flexibility.
- Winning 1kb intro : released at Assembly 2015, prepare to be amazed
Labels:
demoscene
,
kernel
,
links of the day
,
linux
,
network fabric
,
rdma
Wednesday, May 27, 2015
Links of the day 27 - 05 - 2015
Today's links 27/05/2015: L3 #datacenter #networking, #Linux futex #bug, Distributed system simultaneity problem
- Calico : pure L3 approach to data center networking. It uses BGP route management rather than an encapsulating overlay network, and thus avoids NAT and port proxies, doesn't require a fancy virtual switch setup, and supports IPv6. Looks like a viable probably more scalable alternative to virtual switch approach.
- Futex bug : The linux futex_wait call has been broken for about a year (in upstream since 3.14, around Jan 2014), and has just recently been fixed (in upstream 3.18, around October 2014). More importantly this breakage seems to have been back ported into major distros (e.g. into RHEL 6.6 and its cousins, released in October 2014), and the fix for it has only recently been back ported (e.g. RHEL 6.6.z and cousins have the fix).
- There is No Now : explore the problem with simultaneity in distributed systems
Labels:
bug
,
distributed system
,
links of the day
,
linux
,
networking
Thursday, March 19, 2015
Links of the day 19 - 03 - 2015
Today's links 19/03/2015:#python framework, #linux perf tools, #bitcoin API, Memory-centric distributed #storage
- SuPPort : in-development distillation of PayPal Python Infrastructure. SuPPort is an event-driven server framework built on top of several open-source technologies designed for building scalable and maintainable services and clients.
- Linux Performance Analysis: New Tools and Old Secrets : Brendan Gregg's excellent talk on new Linux performance tools: perf-tools collection. These use existing kernel frameworks, ftrace and perf_events, which are built in to most Linux kernel distributions by default, including the Linux cloud instances he analyze at Netflix.
- Coinkite : Bitcoin API provides simple and powerful REST integration for adding bitcoin functions into application.
- Tachyon: A memory-centric storage system enabling reliable file sharing at memory-speed across cluster frameworks, such as Spark and MapReduce. [SOCC 13] [Github]
Labels:
bitcoin
,
links of the day
,
linux
,
memory
,
performance
,
python
,
storage
Monday, February 16, 2015
Links of the day 16 - 02 - 2015
Today's links 16/02/2015: Memory Analysis, CPU instruction for NVM,SR-IOV and Linux kernel live patching
- ANATOMY: an analytic model of memory system performance able to summarize key workload characteristics, namely row buffer hit rate, bank-level parallelism, and request spread which are used as inputs to the queuing model to estimate memory performance. [slides]
- CLWB and PCOMMIT : a look at the new specific cpu instruction for NVM. The real benefit will start to appear when the dev willstart using them in application such as in-memory DB or persistence logging.
- SR-IOV : Single-root I/O virtualization (SR-IOV) standard allows an I/O device to be shared by multiple Virtual Machines (VMs), without losing runtime performance. series of videos covering topics for your virtualization environment such as VXLAN Tunnel End Point (VTEP), live VM migration, and HPC clustering.
- Live patching : kGraft and kpatch merged into a single patchset for kernel live patching ..
Labels:
cpu
,
kernel
,
links of the day
,
linux
,
live patching
,
memory
,
nvm
,
ram
,
sr-iov
Friday, October 31, 2014
Links of the day 31 - 10 - 2014
Today's links 31/10/2014: TCP, kernel, NVMe, network fabric
- lwIP on BareMetal : a lightweight TCP/IP stack running on an ultra-lightweight kernel ( coded in x86 assembly )
- NVMe over Fabrics : a look at the performance of NVMe over various fabric and what it implies for the future of storage.
Subscribe to:
Posts
(
Atom
)