It is quite easy to develop new extensions to CloudFusion, as this has been one of the design goals from the start. CloudFusion provides a generic test framework to verify the implementations in a continuous testing cycle, which prevents untested or error prone code to be delivered to the end user. The framework can be used directly without the file system interface, though using it through the file system interface has the advantage that it can be used by arbitrary programming languages.
Here are some example scripts on how to use CloudFusion’s internal API, if you do not want to use it from the file system interface. The implementations are interchangeable. If you want to use Amazon S3 instead of Dropbox, you only need to change the client and everything else stays the same. The Amazon S3 example is a simple Hello World! example. The WebDAV example shows how to use the BulkGetMetadata interface to quickly get the metadata of all files within a single directory. Every example has been tested before it was written. If you find any bug, please report it as an issue at github: https://github.com/joe42/CloudFusion/issues.
Tutorial 1: Dropbox - Screencast1
Tutorial 2: Directories - Screencast2
This is an extensive example script on how to use CloudFusion’s internal API to access Dropbox. All methods except for get_config can also be used for the other Store implementations. The wrapper MetadataCachingStore allows to cache metadata information like that returned from directory listings. The wrapper MultiprocessingCachingStore caches downloaded files to allow quick access without downloading them again, and allows asynchronous file upload. It waits some time before uploading the file. The subclass TransparentMultiprocessingCachingStore has the same abilities, but moreover, it offers information about which files are not uploaded yet, how much space the cache takes, which errors have occured etc.:
from cloudfusion.store.dropbox.dropbox_store import DropboxStore
from cloudfusion.store.metadata_caching_store import MetadataCachingStore
from cloudfusion.store.transparent_caching_store import TransparentMultiprocessingCachingStore
from StringIO import StringIO
def main():
config = DropboxStore.get_config()
'''
{'consumer_key': 'eTBkeDNqM2cwc2lvZ2ow',
'consumer_secret': 'cWF4dHptbGswcWJmMGRs',
'password': '',
'root': 'dropbox',
'user': ''}
'''
#Add username/password
config['user'] = 'quirkquarks@web.de'
config['password'] = 'MySecret123$!'
#Create the actual client
client = DropboxStore(config)
#Get directory listing (may take a few seconds)
client.get_directory_listing("/")
'''
[u'/ordner', u'/neuer ordner', u'/test_4', u'/My DB']
'''
client.get_metadata('/test_4')
'''
{'bytes': 91750400,
'is_dir': False,
'modified': 1390147673,
'path': u'/test_4'}
'''
#Wrap with metadata caching store, to speed up consecutive calls to get_metadata, and get_directory_listing
client = MetadataCachingStore(client)
#Wrap with multiprocessing caching store, for asynchronous uploads, and to speed up consecutive calls to get_file
#When given the same session id even after computer crash or restart, it will remember cached files,
#including files that still need to be uploaded
client = TransparentMultiprocessingCachingStore(client, cache_id='session1')
file_content = client.get_file('/test_4') #takes long since test_4 is large (90 MB)
file_content = client.get_file('/test_4') #very fast, since the file is cached on the local hard disk
#Wrap file content into a file object
fileobj = StringIO()
fileobj.write(file_content)
fileobj.seek(0) # set to beginning of file for further read operations
client.store_fileobject(fileobj, '/test_5') # returns immediately, since uploads are now asynchronous
client.get_dirty_files() #list files that are not entirely uploaded
'''
['/test_5']
'''
client.get_dirty_files() #call again after a few minutes, when the file is uploaded
'''
[]
'''
client.get_cachesize() # get amount of cached data in MB
'''
91
'''
client.get_exception_stats() # get exceptions that occurred
'''
{}
'''
client.get_downloaded() # get amount of downloaded data in MB (1000*1000 bytes)
client.get_download_rate() # get amount of downloaded data im MBps (1000*1000 bytes per second)
'''
2.288234288664367
'''
client.get_upload_rate()
'''
0.22592696222436814
'''
#Even though it is not yet uploaded, you can access the newly created file:
client.get_directory_listing("/")
[u'/ordner',
u'/neuer ordner',
'/test_5',
u'/test_4',
u'/My DB']
if __name__ == '__main__':
main()
Webdav, Amazon, and Google Storage provide subclasses supporting the BulkGetMetadata interface with the method cloudfusion.store.bulk_get_metadata.BulkGetMetadata.get_bulk_metadata(), i.e. cloudfusion.store.webdav.bulk_get_metadata_webdav_store.BulkGetMetadataWebdavStore.get_bulk_metadata(). These subclasses can be used to quickly get metadata of all files withing a directory (see Webdav example).
Here is an example for WebDAV:
from cloudfusion.store.webdav.webdav_store import WebdavStore # instead of WebdavStore we use BulkGetMetadataWebdavStore
from cloudfusion.store.webdav.bulk_get_metadata_webdav_store import BulkGetMetadataWebdavStore
from cloudfusion.store.metadata_caching_store import MetadataCachingStore
from cloudfusion.store.transparent_caching_store import TransparentMultiprocessingCachingStore
from StringIO import StringIO
def main():
config = {}
#url can also contain an existing subfolder to access, i.e. https://webdav.mediencenter.t-online.de/myfolder
#url can also contain the port for the WebDAV server, i.e. https://webdav.mediencenter.t-online.de:443
config['url'] = 'https://webdav.mediencenter.t-online.de'
config['user'] = 'joe42' #your account username (this might be your e-mail address for other providers)
config['password'] = 'MySecret!23$' #your account password
#Create the actual client
client = BulkGetMetadataWebdavStore(config)
#Get directory listing (may take a few seconds)
client.get_bulk_metadata("/")
'''
u'/My Folder': {'bytes': 0,
'is_dir': True,
'modified': 1400352031,
'path': u'/My Folder'},
u'/_tower of god 2_000_1#0#95#1': {'bytes': 143360,
'is_dir': False,
'modified': 1396871330,
'path': u'/_tower of god 2_000_1#0#95#1'},
u'/_tower of god 2_000_10#0#104#1': {'bytes': 143360,
'is_dir': False,
'modified': 1396871390,
'path': u'/_tower of god 2_000_10#0#104#1'},
u'/_tower of god_2_014_9#0#439#1': {'bytes': 143360,
'is_dir': False,
'modified': 1396956942,
'path': u'/_tower of god_2_014_9#0#439#1'},
u'/dir3': {'bytes': 0,
'is_dir': True,
'modified': 1400503492,
'path': u'/dir3'},
u'/dirr': {'bytes': 0,
'is_dir': True,
'modified': 1400503492,
'path': u'/dirr'}}
'''
#...
if __name__ == '__main__':
main()
Here is an example for Amazon S3. The bucket will be created if it does not exist. A bucket is similar to a subfolder, to which access with CloudFusion is restricted. Key and Secret can be obtained from the console.aws.amazon.com/s3/home
And here is the code:
from cloudfusion.store.s3.amazon_store import AmazonStore
from cloudfusion.store.metadata_caching_store import MetadataCachingStore
from cloudfusion.store.transparent_caching_store import TransparentMultiprocessingCachingStore
from StringIO import StringIO
def main():
config = {}
config['consumer_key'] = 'FDS54548SDF8D2S311DF'
config['consumer_secret'] = 'D370JKD=564++873ZHFD9FDKDD'
config['bucket_name'] = 'cloudfusion'
#Create the actual client
client = AmazonStore(config)
fileobj = StringIO()
fileobj.write('Hello World!')
fileobj.seek(0)
client.store_fileobject(fileobj, '/hello world.txt') #stores the text to hello world.txt file and returns modified time stamp (which is not important here)
'''
1400093364
'''
client.get_file('/hello world.txt') # get the contents of the file:
'''
'Hello World!'
'''
#Get directory listing
client.get_directory_listing("/")
'''
[u'/chunk_IHvsqNQ=_-A==.tar',
u'/chunk_IHvsqNQ=_-Q==.tar',
u'/chunk_IHvsqNQ=_1A==.tar',
u'/chunk_IHvsqNQ=_1Q==.tar',
u'/chunk_IHvsqNQ=_1g==.tar',
u'/chunk_IHvsqNQ=_zw==.tar',
u'/hello world.txt',
u'/My Amazon',
u'/Untitled Folder',
u'/directory',
u'/logs']
'''
#...
if __name__ == '__main__':
main()
Here is an example for Google Storage. The bucket will be created if it does not exist. A bucket is similar to a subfolder, to which access with CloudFusion is restricted. Key and secret can be obtained from the developer’s console:
And here is the code:
from cloudfusion.store.gs.google_store import GoogleStore
from cloudfusion.store.metadata_caching_store import MetadataCachingStore
from cloudfusion.store.transparent_chunk_caching_store import TransparentChunkMultiprocessingCachingStore
from StringIO import StringIO
def main():
config = {}
config['client_id'] = 'FDS54548SDF8D2S311DF'
config['client_secret'] = 'D370JKD=564++873ZHFD9FDKDD'
config['bucket_name'] = 'cloudfusion'
#Create the actual client
client = GoogleStore(config)
#Get directory listing (may take a few seconds)
client.get_directory_listing("/")
'''
[u'/neuer ordner', u'/test', u'/My Google']
'''
#Wrap with metadata caching store, to speed up consecutive calls to get_metadata, and get_directory_listing
client = MetadataCachingStore(client)
#Wrap with archiving caching store, for putting small files in the same directory in one archive,
#for asynchronous uploads, and to speed up consecutive calls to get_file
#When given the same session id even after computer crash or restart, it will remember cached files,
#including files that still need to be uploaded
client = TransparentChunkMultiprocessingCachingStore(client, cache_id='session2')
#Wrap file content into a file object
fileobj = StringIO()
fileobj.write(file_content)
fileobj.seek(0) # set to beginning of file for further read operations
# uploading multiple small files in the same directory (in this case the directory is '/')
# upload is delayed for a few minutes, but the actual upload is very fast onece it starts
client.store_fileobject(fileobj, '/test_1') # returns immediately, since uploads are now asynchronous
fileobj.seek(0) #seek to beginning of file, so that it can be read again
client.store_fileobject(fileobj, '/test_2')
fileobj.seek(0)
client.store_fileobject(fileobj, '/test_3')
fileobj.seek(0)
client.store_fileobject(fileobj, '/test_4')
fileobj.seek(0)
client.store_fileobject(fileobj, '/test_5')
fileobj.seek(0)
client.store_fileobject(fileobj, '/test_6')
fileobj.seek(0)
client.store_fileobject(fileobj, '/test_7')
fileobj.seek(0)
client.store_fileobject(fileobj, '/test_8')
fileobj.seek(0)
#...
client.store_fileobject(fileobj, '/test_100')
client.get_dirty_files() #list files that are not entirely uploaded
'''
['/test_1',...]
'''
client.get_dirty_files() #call again after ~5 minutes, when the file is uploaded
'''
[]
'''
if __name__ == '__main__':
main()
Here is an example for accessing Sugarsync:
from cloudfusion.store.sugarsync.sugarsync_store import SugarsyncStore
from cloudfusion.store.metadata_caching_store import MetadataCachingStore
from cloudfusion.store.transparent_caching_store import TransparentMultiprocessingCachingStore
from StringIO import StringIO
def main():
config = SugarsyncStore.get_config()
config['user'] = 'me@emailserver.com' #your account username/e-mail address
config['password'] = 'MySecret!23$' #your account password
#Create the actual client
client = SugarsyncStore(config)
#Get directory listing (may take a few seconds)
client.get_directory_listing("/")
'''
[u'/ordner', u'/neuer ordner', u'/test_4', u'/My DB']
'''
#...
if __name__ == '__main__':
main()
Here is an example for accessing Google Drive:
from cloudfusion.store.gdrive.google_drive import GoogleDrive
from cloudfusion.store.metadata_caching_store import MetadataCachingStore
from cloudfusion.store.transparent_caching_store import TransparentMultiprocessingCachingStore
from StringIO import StringIO
def main():
config = {}
config['client_id'] = 'FDS54548SDF8D2S311DF'
config['client_secret'] = 'D370JKD=564++873ZHFD9FDKDD'
#Create the actual client
client = GoogleDrive(config)
#Get directory listing (may take a few seconds)
client.get_directory_listing("/")
'''
[u'/ordner', u'/neuer ordner', u'/test_4', u'/My DB']
'''
#...
if __name__ == '__main__':
main()
To support a new cloud storage or protocol, implement the central Store Interface. This simple interface frees developers from having to program caching mechanisms, multithreading, or a file system interface, and allows the implementation to be integrated into CloudFusion’s continuous testing cycle with automatically generated tests for quality assurance. Please contact me on Github if you want to develop a new extension for CloudFusion. I would be glad to help you.
To generate this documentation first call:
cloudfusion/doc/generate_modules.py -d cloudfusion/doc -f -m 5 cloudfusion main.py dropbox cloudfusion/fuse.py cloudfusion/conf cloudfusion/doc third_party
Then call:
make -f cloudfusion/doc/Makefile html
Also, to be able to use the generated stylesheets run:
mv cloudfusion/doc/_build/html/_static cloudfusion/doc/_build/html/static
First create configuration files for all services in “cloudfusion/config”. The following table shows the names of the configuration files that are required for each service.
Dropbox: | dropbox_testing.ini |
---|---|
SugarSync: | sugarsync_testing.ini |
Amazon S3: | AmazonS3_testing.ini |
Google Storage: | Google_testing.ini |
GMX: | Webdav_gmx_testing.ini |
T-Online: | Webdav_tonline_testing.ini |
Simple tests that need no setup can be run with:
nosetests -v -s -x -I db_logging_thread_test.py -I synchronize_proxy_test.py -I store_tests.py -I transparent_store_test_with_sync.py -I store_test_gdrive.py
See the store_tests.py_ module on how to run provider specific tests. For example, the module contains a test for the local harddrive store which can be executed with:
nosetests -v -s -x cloudfusion.tests.store_tests:test_local
-v and -s are optional flags for verbose output and output of anything printed to stdout.
To run tests automagically during development, as soon as you change something call:
nosy -c cloudfusion/config/nosy.cfg
The configuration file nosy.cfg needs to be adapted first.