How to Streamline Your Data Archival Process using the Cloud

Data archiving is the process of moving data that is no longer essential to a separate data store for long-term retention. Archived data consists of older data that might serve some importance to the organization, possibly for future reference or regulatory and compliance work.

Why do organizations need the cloud for streamlined data archiving process? Storing archived data in the cloud was found to be cheaper as compared to doing so on-premises where maintenance was costly and troublesome. Cloud archival eliminated the need to buy and upgrade physical disks/tape hardware systems, and also the need to purchase installed software to manage and store non-primary data. Furthermore, it is also easy to create policies in the cloud to streamline data archival process.

All organizations have these three types of data: transactional (describes business operations), performance, and master data. With cloud-based archival, the time taken to search for important but old data was reduced. Search data could be saved easily for predicting which exact search result the user wanted. Also, with data archived properly, a case-by-case review could be done quickly for legal and compliance procedures.

With benefits like storing data for the long-term and applying analytics to them, a cloud archival solution is very helpful for organizational productivity. Some archival strategies will be discussed with regards to Amazon S3, focusing on lowering storage costs and streamlining the workflow, ie. how to retrieve data as quickly as possible without incurring gigantic bills. The exact steps on the graphical user interface of AWS will be shown afterwards.

There are three major costs associated with storing data in S3 (as of 2019, Amazon Web Services for Asia Pacific Singapore region): storage, API and data transfer. S3 Standard-Infrequent Access Storage ($0.02 per GB) is cheaper than S3 Standard Storage ($0.025 per GB). For the APIs operating on the files, 10,000 read requests cost around $0.005, while write requests are $0.05 for 10,000 requests. Transferring data from the Internet into S3 Buckets is free, whereas the other way round is subject to pricing if the data exceeds 1GB. Check out the Amazon S3 pricing page for more details.

Strategies for Amazon S3

Ensure EC2 and S3 are in the same AWS region. Data transfer is free between an EC2 instance and S3 bucket in the same region. Processing the data in the same region eliminates the S3 to EC2 inter-region data transfer cost. If an S3 bucket is in a different region from the EC2 instance, and assuming that each file in the S3 bucket is downloaded on an average of 3 times per month, (3 x $0.02 = $0.06 per GB), the S3 cost of inter-region data migration would be tripled.

Avoid starting with Amazon Glacier right away. Glacier is typically used by advanced software developers, who understand their application’s storage requirements well, and also the various requirement changes over the development cycle. For developers who plan not to access certain objects anymore, then they may begin with the Infrequent Access storage class which is more suitable for their needs.

Standard	Infrequent Access	Glacier
S3 Standard offers high durability, availability, and performance object storage for frequently accessed data. Because it delivers low latency and high throughput, S3 Standard is appropriate for a wide variety of use cases, including cloud applications, dynamic websites, content distribution, mobile and gaming applications, and big data analytics.	S3 Standard-IA is for data that is accessed less frequently, but requires rapid access when needed. S3 Standard-IA offers the high durability, high throughput, and low latency of S3 Standard, with a low per GB storage price and per GB retrieval fee. This combination of low cost and high performance make S3 Standard-IA ideal for long-term storage, backups, and as a data store for disaster recovery files.	S3 Glacier is a secure, durable, and low-cost storage class for data archiving. Glacier can reliably store any amount of data at costs that are competitive with or cheaper than on-premises solutions. To keep costs low yet suitable for varying needs, S3 Glacier provides three retrieval options that range from a few minutes to hours.

When using an S3 versioned bucket, the “lifecycle rules” feature lets us delete old versions no longer needed for use. By default in S3, all data is kept forever and incurs billing costs for as long as the data is kept in storage. In most cases, developers want to keep older version only for a certain time and setting up a lifecycle rule for that is very appropriate for this situation. When uploading a vast number of large objects onto S3, any interrupt to the uploading process might result in some parts of the objects not being visible to the user, yet the user still has to pay for it. After 7 days and the upload is still incomplete, one should either restart the upload process or cancel it completely.

The Infrequent Access (IA) storage class utilizes the same API and performs as well as the regular S3 storage. Infrequent Access (aka IA) is cheaper than the standard S3 storage ($0.007 per GB per month vs $0.03 per GB per month for S3), however, retrieval costs $0.01 per GB on IA whereas it is free on S3.

If developers have some S3 objects being downloaded on average 20% of the time in a month, it would make more sense to them to keep that object in the Infrequent Access class. It is recommended to access stored objects in S3 only when an EC2 instance goes out or when data migration is needed. The monthly cost saving for 1 GB of IA stored data is equal to S3 Standard Cost – IA Standard Cost – IA Access Costs. IA class requires a minimum data size of 128KB and a minimum of 30 days of storage. Migrating data to and from the S3 standard class uses one API call which will incur costs on the billing.

IA has several advantages over Glacier. It has a more user friendly interface. Furthermore, recovering data stored in Glacier vaults would take very long, possibly a month. Any increase in the speed of data transfer would be more expensive. Recovering 1TB in an hour will require the peak transfer rate of 998 GB per hour which costs $7186. Recovering that in 2 hours will cost $3592.

API calls have their costs charged per object regardless of the object’s size. Uploading 1 byte costs the same as uploading 1GB using API calls, hence it is recommended that developers do not upload a large object in numerous small parts. If 10GB is upload to S3 in a single file, the API cost is negligible. Compared to uploading the same 10GB in 5MB parts, the cost amounts to around $0.01, and if the 10GB is uploaded in 10KB parts, the cost exponentially rises to $5.00. Hence, the recommended approach is not to upload big sized objects in small segments.

Databases like DynamoDB or MySQL are more suitable if developers possess numerous tiny files. A database is designed for grouping small objects together before uploading to S3. S3 file names are not databases, hence it is recommended not to over-rely on S3 LIST calls. Designing, populating and then uploading a database to S3 would be the better way to use S3 for data archiving.

How to Streamline Your Data Archival Process using the Cloud

SUGGESTED ARTICLES

Case Study: How Netflix uses Cloud for Innovation, Agility and Scalability

Top 5 Predictions for the Cloud in 2019

How to Streamline Your Data Archival Process using the Cloud

Cloudsine showcases WebOrion to protect cloud-based websites and web applications at 10-11 April AWS Summit 2019

Cloudsine @ Div0 Startup Quarter – on 23 May 2019

Cloud Security Seminar – on 4 Sept 19

The Future of Digital Workforce with Intelligent Automation Seminar , 21 Nov 19

Case Study: Cloudsine Accelerates Centre for Evidence and Implementation (CEI)’s Cloud Adoption Journey

AWS CloudGoat and mitigation strategies: Part 1

AWS CloudGoat and mitigation strategies: Part 2

AWS CloudGoat and mitigation strategies: Part 3

AWS CloudGoat and mitigation strategies: Part 4

AWS CloudGoat and mitigation strategies: Part 5

WebOrion® launches Javascript Malware Detection Engine (JME)

Cloudsine | WebOrion® Supports the Launch of community-focused AI Security Quarter in Div0 on 31 Mar 2021

Cloudsine is excited to partner with SGInnovate New Frontier Event to build up Deep Tech Community

Product Announcement: Enhanced Email Alerts for WebOrion Defacement Monitor

Cloudsine and WebOrion signs Technology Alliance Partnership with New Net Technologies

SUTD X Cloudsine – Artificial Intelligence Award

SGInnovate’s PowerX Programme with Cloudsine

Statement on Apache Log4j2 Remote Code Execution (RCE) Vulnerability on WebOrion Products and Customers – CVE-2021-44228

The Cybersecurity Implications for Website Owners from the Russia-Ukraine Conflict

How Are Hacktivists Shaping the Cybersecurity Posture Of Nation-states in the Russia-Ukraine Conflict?

WebOrion® adds Smart Image Hash (SIH) Feature To Improve Monitoring of Compressed Images

WebOrion® Introduces AI NLP for Web Defacement Monitoring

Cloudsine | WebOrion® – Technology Alliance Partnership with Netrust Pte Ltd

The Serverless Model for the Uninitiated

Seamless Integration with the WebOrion® API

The WebOrion® Defacement Monitor Cloud SaaS is now available on AWS Marketplace

Cloudsine exhibited in Govware 2022, 18th to 20th October

DNS – A Brief Summary of an Easily Overlooked System

Enumerate, Secure and Detect changes in DNS records

Port Scanning – Exposing Your Network’s Points Of Entry

Anyone can enumerate your web server using port scanning tools

WebOrion® Anti-Defacement and Web Security Stack is now available on Indonesia’s LKPP E-Katalog

WebOrion® Anti-Defacement dan Web Security Stack kini sudah hadir di E-Katalog LKPP Indonesia

What’s New in PCI-DSS v4.0: Payment Page Javascript Monitoring

What’s New in PCI-DSS v4.0: HTTP Header Tamper Detection

Magecart and Card Skimming Detection

What’s New in PCI-DSS v4.0: Supply Chain Inventory of Software

What’s New in PCI-DSS v4.0: SSL Cert Monitoring

Preventing Web Defacement: A Technical Manager’s Guide to Securing Web Applications

Preventing Web Defacement: A Technical Manager’s Guide to Securing Web Applications

How to Streamline Your Data Archival Process using the Cloud

Strategies for Amazon S3