Containers Storage and recovery

Using S3 Object Storage in Kubernetes with CSI

13 min read / Articles

When your cluster runs out of disk space, object storage becomes the solution. S3 scales seamlessly, costs less than block storage, and works well for logs, archives, and build artifacts. However, you can’t mount S3 directly to a pod — Kubernetes expects volumes and doesn’t know how to work with S3’s HTTP API. That’s where CSI S3 comes in.

This guide explains how to use S3 object storage in Kubernetes with CSI. We’ll cover how the CSI S3 driver works, what a Kubernetes S3 Storage Class is, how to prepare your bucket and cluster, configure PVC, and then mount an S3 bucket as a volume to your application.

What is S3 and How to Use It in Kubernetes

S3 is object storage, not a traditional filesystem. Data lives in buckets as objects with keys and metadata. Access happens over HTTP, and operations are API requests. This isn’t POSIX, and it’s not a network drive.

Kubernetes can’t use object storage in this form. Pods work with filesystems mounted by kubelet. Kubernetes storage uses these resources to manage mounted filesystems:

PersistentVolume and PersistentVolumeClaim
StorageClass
VolumeAttachment
CSI Driver

To bridge these worlds, CSI Kubernetes comes into play. CSI (Container Storage Interface) defines a standard interface between Kubernetes and specific drivers. Third-party CSI drivers implement this interface and turn S3 into a volume that can be mounted in a pod.

When to Use S3 Storage in Kubernetes

S3 doesn’t replace node disks. But it excels in scenarios where capacity and reliability matter more than minimal latency.

Most common uses for S3 storage in Kubernetes:

Application and infrastructure logs. Pods write logs to a mounted directory, and through K8S CSI S3, they land in a bucket where they can stay for years.
Archives and backups. Database backups, dumps, and configuration archives are convenient to store in object storage, where you can set retention periods and storage classes.
Analytics and machine learning data. Datasets, calculation results, model artifacts. Jobs inside the cluster read and write objects directly, and S3 becomes a shared layer for different services.
Static content. Images, CSS and JS, user-uploaded files. A pod can serve them directly or use S3 as a source for further distribution.

For transactional workloads, relational databases, and scenarios with thousands of small writes per second, S3 isn’t suitable. Block disks and traditional Kubernetes storage are still needed there.

What is CSI and How the CSI S3 Driver Works

CSI Kubernetes is a specification that describes what operations a driver must support for Kubernetes to:

Create and delete volumes
Mount and unmount volumes on nodes
Query storage parameters

The drivers themselves are implemented by various vendors. For S3, popular options include csi-s3 (s3-csi) and their forks for specific clouds. In documentation, you’ll often see kubernetes s3 csi driver or simply CSI S3.

A typical k8s CSI S3 works like this:

CSI Node runs on each node. It handles actual volume mounting and works with FUSE or another filesystem layer on top of S3.
CSI Controller runs in the cluster. It creates and deletes volumes, processes storage allocation requests, and monitors volume lifecycle.
CSI Identity contains information about the CSI driver.
CSI Volume is the volume that can be mounted to pods.

Kubernetes communicates with the driver through the standard CSI Kubernetes API. For Kubernetes, it’s just another storage type.

Inside, the S3-compatible driver uses a FUSE mounter, such as GeeseFS. The container sees a regular directory, and all read and write operations turn into requests to object storage.

S3 Object Storage by Servercore

AWS S3 API compatible with triple data replication and 99.99% SLA for reliable storage of any data volumes

Learn More

Limitations and Considerations for S3 via CSI

This approach has important characteristics to remember before production deployment:

Not a full POSIX filesystem. Permission changes, ownership modifications, partial rewrites, hard links, and other complex file operations don’t always work correctly.
Latency depends on network and S3 service. Each file access is essentially an object request. Much depends on storage class. For logs and static content, cold storage is fine. For real-time chat or OLTP, not so much — though for some busier tasks, a combination of S3 region location and storage class might help.
Consistency is usually eventual. If one pod writes a file, another pod might see the old directory state with a slight delay.

That’s why Kubernetes S3 CSI is ideal for logs, archives, static content, and artifacts of any size, but not for databases and critical transactions.

Preparing Storage: Bucket, Keys, Permissions

To connect S3 to your cluster, first configure the storage itself:

Create a bucket in the desired region. Plan your key and prefix naming scheme, such as logs/, backup/, ml/.
Configure a user and access keys. Generate an access key and secret key that will later go into a Secret for CSI S3.
Restrict permissions. Use bucket policies to limit actions in this bucket, for example, so CSI S3 can’t accidentally delete someone else’s data.
Enable encryption and retention policy. Configure object encryption, lifecycle rules for hot and old data, and versioning if needed.

These steps are convenient to describe in Terraform. One module creates the bucket, another creates the user and key, a third handles access policies. Then Kubernetes object storage will follow the same IaC principles as the rest of your infrastructure.

Preparing Your Kubernetes Cluster

Next, prepare the cluster so Kubernetes S3 CSI works without surprises:

Verify CSI support. You need a Kubernetes version with CSI Kubernetes v1 API support, typically version 1.10 and above.
Configure node network to S3 service. Nodes must have network access to object storage via DNS and port 443, and security rules shouldn’t block traffic.
Prepare permissions in the cluster itself. CSI S3 will create and update PersistentVolume (PV), work with PersistentVolumeClaim (PVC) and StorageClass, so appropriate roles and bindings are needed.

In managed services, some settings are already there, but security policies may impose restrictions. If the driver requires privileged access or hostPath access, pay attention to these areas.

Managed Kubernetes by Servercore

Ready-to-use clusters with auto-updates, auto-scaling, and Control Plane availability SLA for rapid deployment

Learn More

Installing and Configuring CSI for S3

Choosing and Preparing the CSI S3 Driver

For S3, you typically use one of the csi-s3 forks or a ready-made module from your cloud provider. When choosing a Kubernetes S3 CSI driver, check several things:

Compatibility with your cluster version. An old driver may not support new Kubernetes versions.
Instructions for StorageClass and PVC. Good documentation saves hours of debugging.
Supported mounter. GeeseFS or another FUSE layer directly affects performance. For example, s3fs provides a large set of POSIX functions, but goofys is optimized for performance with some POSIX feature trade-offs.

Installation via Helm or Manifest

The easiest way is to install the module via Helm. It’s often called csi-s3 or s3-csi. An example installation might look like:

helm repo add s3-csi-driver https://example.com/helm/s3-csi
helm repo update

helm install csi-s3 s3-csi-driver/csi-s3 \
  --namespace kube-system \
  --create-namespace \
  --set secret.accessKey=<ACCESS_KEY> \
  --set secret.secretKey=<SECRET_KEY> \
  --set secret.region=<REGION> \
  --set secret.endpoint=https://s3.example.com

This way, Helm handles deploying all necessary components. If Helm isn’t an option, installation via regular Kubernetes manifest files with Deployment, DaemonSet, and CRDs will work.

Secrets and Credentials

Access to S3 on the cluster side is configured through Secret and ServiceAccount:

apiVersion: v1
kind: Secret
metadata:
  name: csi-s3-secret
  namespace: kube-system
type: Opaque
data:
  accessKey: <base64-access-key>
  secretKey: <base64-secret-key>
  endpoint: <base64-endpoint-url>
  region: <base64-region>

This Secret will be used by the CSI S3 driver. It’s important to restrict access to it and not store private keys in config maps.

Defining the S3 StorageClass

Now we need to describe what an S3 volume looks like from the cluster’s perspective. For this, we create an S3 StorageClass Kubernetes, sometimes called simply storage class s3.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: csi-s3
provisioner: csi.s3.example.com
parameters:
  bucket: my-app-bucket
  prefix: kubernetes/
  mounter: geesefs
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer

Here we’ve described a kubernetes s3 storage class. It knows which bucket to work with, what prefix to use for reading and writing objects, and how to behave with the volume after deleting the PVC.

Creating PVC and Verifying S3 Bucket Mounting

Next, create a PVC and verify that the bucket actually mounts as a volume:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: csi-s3-pvc
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: csi-s3
  resources:
    requests:
      storage: 50Gi

This PVC requests a volume in the csi-s3 class. For Kubernetes, this is a regular storage request. For the driver, it’s a command to prepare an area in S3.

To verify everything works, spin up a test pod and mount the PVC there:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: s3-app
spec:
  replicas: 1
  selector:
    matchLabels:
      app: s3-app
  template:
    metadata:
      labels:
        app: s3-app
    spec:
      containers:
        - name: s3-app-container
          image: nginx
          volumeMounts:
            - name: s3-storage
              mountPath: /usr/share/nginx/html
      volumes:
        - name: s3-storage
          persistentVolumeClaim:
            claimName: csi-s3-pvc

In this form, the s3 app sees the /usr/share/nginx/html directory as a filesystem. Actually, behind it is an S3 bucket connected through S3 CSI.

Using S3 in Your Kubernetes Application

From the developer’s perspective, everything looks simple. A volume appears in the Deployment manifest, and a directory appears in the container to work with.

Several things to keep in mind:

Access modes. Typically ReadWriteMany is used so multiple pods can read and write to the same volume.
Write patterns. Better to write data in large blocks rather than thousands of small files. For logs, append mode to a large file stored as an object is convenient, but be mindful that with large volumes, writes shouldn’t be too frequent.
Shared access considerations. With concurrent writes from multiple pods, race conditions are possible. If the application isn’t ready for this, consider how to serialize access or separate pods by prefixes.

With these considerations in mind, Kubernetes S3 CSI can be used without rewriting application code. It simply writes to the filesystem, and the CSI S3 driver handles the rest.

Automation and Operations

To prevent the solution from turning into manual assembly, everything should be described as code:

Manage storage and cluster through Terraform. Bucket, keys, policies, cluster, and Helm release with CSI S3 can all be described in one set of modules.
Monitor the driver and object storage. Watch CSI pod status, mounting errors, timeouts, and quota limits.
Plan scaling in advance. If load grows, it’s important to understand how quickly S3 and the FUSE layer will handle the additional request flow.
Configure admission webhooks for PV injection. To modify applied manifests on the fly, consider implementing Kubernetes admission webhooks.

With this approach, Kubernetes storage based on S3 becomes a managed part of infrastructure, not a set of manual configurations.

Working with Data: Reading, Writing, Compatibility, Latency

When reading through Kubernetes S3 CSI Driver, each operation in the mounted directory turns into a request to Object Storage. Sequential reading of large files and caching frequently requested objects give predictable response times, while random access to thousands of small files increases latency and network load.

For critical data reads, it’s better to keep a hot tier on block storage, and use Kubernetes Object Storage through CSI as a slower but more capacious level.

Writing also requires careful design. It’s better to focus on append-only models and batch writes rather than partial file rewrites: this makes it easier for the driver to map file operations to objects in the bucket. Also note that the mounter may send large files using multipart uploads, so write operations on large objects should be minimized.

POSIX compatibility is limited: locks, fsync, and complex shared access scenarios may not work as the application expects. This should be explicitly considered in Deployment and PVC descriptions where a volume from S3 StorageClass Kubernetes is used by multiple pods.

Data operation latency through CSI S3 depends on distance to the S3 region, cache settings, storage class, and quotas. For operational practice, it’s important to measure not just average read and write times, but p95–p99 latencies, retry counts, and errors.

If applications are sensitive to these parameters, they’re better separated into a dedicated PVC and StorageClass, and K8s CSI S3 primarily used for less latency-sensitive tasks.

Usage Scenarios: Logs, Archives, Machine Learning Data

For logs and technical records, S3 is best suited as a cheap and virtually unlimited storage layer. Applications sequentially append entries to files, and Kubernetes S3 CSI Driver transparently places them in a bucket. In S3 StorageClass Kubernetes, you can describe a separate class for logs with an aggressive lifecycle policy and separate prefixes for services, and bake uniform PVCs for logging into Helm and Deployment.

Archives and backups are the second natural scenario for Kubernetes Object Storage. This is where rare but valuable data goes: SQL dumps, configuration archives, export sets. For such PVCs, you can configure a cheaper storage class and longer object lifetimes, and run the archiving tasks themselves as Jobs that use the same K8s CSI S3 but write to their own bucket prefixes.

Machine learning data is convenient to keep in separate buckets and PVCs to separate training datasets, raw exports, and model artifacts. Through Terraform, you can describe a module that creates a bucket, access policy, and PVC for a specific ML project, and mount this volume in Helm Chart for Jobs and Deployments for training and inference. This way, Kubernetes S3 CSI Driver becomes the standard way to distribute datasets inside the cluster without hard binding to local disks.

Troubleshooting and Common Errors

Most failures when working with CSI S3 fall into several scenarios:

Invalid keys or access permissions. The driver can’t authenticate to S3, PVC gets stuck in Pending status, and logs show AccessDenied.
Unavailable endpoint. Connection errors and timeouts during mounting. Check DNS, routing rules, and service address.
RBAC errors. CSI driver can’t create PVs or read StorageClass, kubernetes s3 csi driver crashes with API access errors.
Version incompatibility. Old csi-s3 module doesn’t understand new Kubernetes versions and crashes at startup.
S3 access problems. The account being used might be read-only, or the bucket you’re writing to might have special rules for prefixes.

Diagnostics are standard. Look at kubectl get pods, kubectl describe, and kubectl logs for CSI pods and for applications using PVC. Often one look at the logs is enough to understand exactly what the problem is.

Cloud Servers by Servercore

Flexible resource scaling and pay-as-you-go pricing for hosting Kubernetes cluster nodes

Learn More

Conclusion

S3 in Kubernetes via CSI is a convenient way to connect object storage as a volume. Kubernetes storage continues working with volumes and PVCs, and the csi s3 driver handles S3 integration.

The proper setup looks like this: we configure the bucket and access permissions, prepare the cluster, install s3-csi or another module via Helm, create S3 StorageClass Kubernetes and PVC, then mount the volume in the application.

For developers, this looks like regular directory mounting, and for administrators, like another managed storage type — and the software doesn’t even realize it’s working with S3.

The key is to remember your workload profile and your S3 provider’s limitations, not try to move everything to it, and keep the configuration under control through Terraform and monitoring. Then Kubernetes S3 CSI will be not a source of failures, but a reliable and understandable tool.

Was this article useful for you?

Start using Servercore products now

Registration in the control panel will take a few minutes.

Already have an account? Sign in.