Multi-Segment Distributed Storage for Kubernetes

Dmitry Yusupov

Published Mar 23, 2019

Working with a group of researchers involved into Calit2 project helped us nail an interesting EdgeFS use case of stretching within single Kubernetes cluster and across continents over fast throughput yet high latency networking backend.

The challenge of long-distance high throughput transfer of data is an old problem, and scientific communities, such as Cenic, have solved it. However, sharing data among researchers, even over dedicated DMZ, is a super complex task with data management being a real pain.

Latency across geographies is high, even with optical backbones and as such, stretching single storage namespace isn’t going to be efficient.

Datasets are distributed, and some can be very big in sizes. Copy of the data spreads it out, imposing security control challenges and content consistency uncertainties. Good news is that in the majority of cases not all datasets need to be accessed all at once.

So, we thought that EdgeFS can be of help here.

What is EdgeFS? This is a new storage provider addition to CNCF Rook project. While it is a scale-out storage cluster, it can still operate in a so-called “solo” mode, a single-node Docker container with an ability to scale out your deployment as it grows by simply connecting more nodes and/or geographically distributed cluster segments to it.

And the nice thing about Kubernetes is that by providing built-in namespace isolation, segmentation within same Kubernetes cluster can be easily achieved. Here is the picture of how EdgeFS segments can be geographically distributed, providing global storage namespaces in such cluster:

Each EdgeFS segment can run either within the same Kubernetes cluster as a dedicated namespace or across Kubernets clusters.

Inter-segment Gateway links (ISGW) can be used in a variety of use cases:

connecting two segments bi-directionally in master-master, master-secondary;
connecting segments in bi-directional star-like topology where the central segment would re-distribute subscribed datasets;
connecting segments in bi-directional circular-like topology where modification at any segment would spread out to neighbor segments and eventually to all chained segments;
enabling remote access to a fully replicated dataset;
enabling remote access to a metadata-only replicated dataset with data chunks fetch on-demand (data chunks cached and objects can be converted to be persistent);
enabling remote collecting segment where local data will be expunged after some short period of time while kept at a destination.

When configured to run within the same Kubernetes cluster, each EdgeFS segment will use its own Kubernetes namespace, and this is the use case this article is focusing on.

Let’s set it up!

With Rook EdgeFS operator we can configure 2+ segments within same Kubernetes cluster installation. This can be useful when Kubernetes nodes span out across multiple regions and cross-region latency can be high, or links can be temporarily offline.

EdgeFS ISGW Links can be set up to ensure consistent synchronization of all segments on a per-bucket basis.

Each Segment has to have its own region sub-namespace, local to region tenants and its users. For example, two syncing segments of the same global namespace can be seen in efscli as this:

# efscli cluster list
SanFrancisco
NewYork

# efscli tenant list SanFrancisco
Biology
MedicalSupply

# efscli tenant list NewYork
Marketing
Finance

Configuring Segments

Ensure that each Segment is configured on per its own Kubernetes namespace. For that, copy cluster.yaml CRD file and modify all occurrences of:

Namespace name
PodSecurityPolicy metadata name
ClusterRole metadata name
ClusterRoleBinding system-psp and cluster-psp metadata name and roleRef
namespace metadata

Create new cluster CRD and observe that it will be created in its own namespace. Pay attention to the filtering node and device selectors. Node’s devices cannot be shared between namespaces.

The end result may look like this:

# kubectl get svc -n rook-edgefs-sanfrancisco
NAME
rook-edgefs-mgr
rook-edgefs-restapi
rook-edgefs-target
rook-edgefs-ui
# kubectl get svc -n rook-edgefs-newyork
NAME
rook-edgefs-mgr
rook-edgefs-restapi
rook-edgefs-target
rook-edgefs-ui

Where each cluster segment has its own management endpoints, and yet controlled by the same Rook Operator instance.

Configuring Services

After all EdgeFS Targets of a new segment are up and running, verify that EdgeFS UI can be accessed and it can create Services CRDs. While creating services, it would automatically pick up Kubernetes operating namespace and provide it for a CRD’s metadata.

Similarly, if you prefer to operate cluster segments via CLI, you can use neadm management command: neadm service enable|disable NAME that will create/delete CRD similarly to GUI using same REST API calls.

Finally, the alternative way is to manually prepare CRD YAML file as per instructions on the Rook documentation website and specify target Kubernetes namespace.

Configuring CSI provisioner

At the moment of writing this article, CSI Topology Awareness is still in Beta status and as such EdgeFS CSI driver does not support it just yet: CSI Topology Awareness.

However, the work has been done to enable Multi-Segmented usage in the latest version of EdgeFS CSI provisioner and here is how.

Edit configurational secret file and add:

k8sEdgefsNamespaces: [“rook-edgefs-sanfrancisco”, “rook-edgefs-newyork”]

For dynamically provisioned volumes, in storage class YAML file, add segment’s name and segment’s EdgeFS service name:

...
parameters:
  segment: rook-edgefs-sanfrancisco
  service: nfs01
...

While scheduling new Pod, make sure that volume specifies the segment’s namespace and EdgeFS service name in it:

...
volumeHandle: sanfrancisco:nfs01@cluster/tenant/bucket
...

With the settings above, a single instance of EdgeFS CSI provisioner can handle topology-aware PV/PVC orchestration, thus request on a creation of NFS or iSCSI PVs can be re-directed to the specified region.

Summary

While my general recommendation would be to use federated Kubernetes clusters, I have to admit that single cluster flat networking has its advantages, like management simplification, lightweight namespace isolation and no need for extra federation replication complexities.

With EdgeFS in the mix, cross-region data access can be easily setup and tightly integrated with Kubernetes via Rook Operator and CSI Provisioner.

Once setup, Kubernetes PVs can be floating across data segments without the need for additional data management complexities. Just reschedule Pod in the different segment, repoint to the synchronized bucket and access data set immediately.

Even when data changes are not yet fully synced up, EdgeFS guarantees consistency of locally synced reads. By utilizing Metadata-Only syncing feature, consistent datasets can be distributed with extremely fast speeds and data chunks can be then fetched on demand.

Give it a try today!

Sherief S. 7y

This is super, thanks for sharing Dmitry!

To view or add a comment, sign in

Multi-Segment Distributed Storage for Kubernetes

Dmitry Yusupov

Let’s set it up!

Configuring Segments

Configuring Services

Configuring CSI provisioner

Summary

More articles by Dmitry Yusupov

Others also viewed

[Day 4/60] Designing Effective Data Ingestion Pipelines

Building a Fault-Tolerant Distributed Key-Value Store with Raft Consensus in Go. Part 3

MPP (Massive Parallel Processing)

The Retrieval Layer Is the Bottleneck Nobody Wants to Talk About

Gaia: Turning Immutable Backup Data into a Secure, Queryable AI Platform (Local ChatGPT for the Enterprise)

Unlocking the Power of Big Data Processing with Resilient Distributed Datasets

Day 25: Scaling MLOps with Distributed Systems

Backpressure in Distributed System Explained

Unlocking Log Performance: The Power of Indexing for Efficient Log Reads in Duva

Top Consistency Models in Distributed System Design: Strong, Weak, and Eventual

Explore content categories

Let’s set it up!

Configuring Segments

Configuring Services

Configuring CSI provisioner

Summary

More articles by Dmitry Yusupov

Consensus-less Edge/IoT Computing

Kubernetes Rook EdgeFS 1.1 Released

Securing and Deduplicating the Edge with EdgeFS

A Data Layer for Edge/IoT and Fog Computing

Data Geo-Transparency with EdgeFS on Mac for Developers

What's new in upcoming Kubernetes 1.14 and Rook EdgeFS 1.0?

Dockerized single-node EdgeFS with geo-transparent S3/NFS access

EdgeFS cluster with Rook in Google Cloud

Feature that made ZFS filesystem famous

Others also viewed

[Day 4/60] Designing Effective Data Ingestion Pipelines

Building a Fault-Tolerant Distributed Key-Value Store with Raft Consensus in Go. Part 3

MPP (Massive Parallel Processing)

The Retrieval Layer Is the Bottleneck Nobody Wants to Talk About

Gaia: Turning Immutable Backup Data into a Secure, Queryable AI Platform (Local ChatGPT for the Enterprise)

Unlocking the Power of Big Data Processing with Resilient Distributed Datasets

Day 25: Scaling MLOps with Distributed Systems

Backpressure in Distributed System Explained

Unlocking Log Performance: The Power of Indexing for Efficient Log Reads in Duva

Top Consistency Models in Distributed System Design: Strong, Weak, and Eventual

Similar topics

How to Streamline Kubernetes Cluster Setup

Reduce Kubernetes App Latency Without Scaling Nodes

Explore content categories