Can a AWS S3 used as a SQL/NOSQL Table ? Yes , by using TAGS
AWS S3 : short intro
AWS s3 is ‘simple storage service’ where the data as FILE or OBJECTS is stored. The familiar terms are 1) Buckets 2) Objects. Please refer the below links for additional infos a) https://medium.com/faun/what-is-amazon-s3-91b0480dedcc b) https://medium.com/@me.sanjeev3d/amazon-s3-4b2ae15f6c4d c) https://medium.com/@yjhyjhyjh0/aws-s3-overview-38bca96047b0
Data format of S3 objects
Usually s3 objects are json, jl, csv, text, zip, jpeg, png, xlsx etc. Please refer Data Format section in https://streamsets.com/documentation/datacollector/latest/help/datacollector/UserGuide/Destinations/AmazonS3.html for more info.
Concept of using S3 objects as table
Let it be any kind of file or object, the key idea is use the TAGS name of the S3 object as table columns and TAGS value as the column values.
Here, you can see, the key which can be used to refer as Table columns and the Value which act as the respective value. The keys can also be dynamically changed and values as well. While parsing/reading the S3 objects inside a Bucket, the program can just refer the META attributes like TAGS and need not open/download the file from the AWS Cloud.
Code to insert / add TAGS
Please refer the below code for add / insert tags and create object in S3
Known caveats
- Cannot have more than 10 tags per s3 object
- Cannot have more than 50 tags per S3 bucket
- Cannot create S3 bucket with underscores “_”