API Reference
Submodules
Module contents
-
megfile.smart_access(path: Union[str, os.PathLike], mode: megfile.pathlike.Access) → bool[source] Test if path has access permission described by mode
- Parameters
path – Path to be tested
mode – Access mode(Access.READ, Access.WRITE, Access.BUCKETREAD, Access.BUCKETWRITE)
- Returns
bool, if the path has read/write access.
-
megfile.smart_cache(path, cacher=<class 'megfile.smart.SmartCacher'>, **options)[source] Return a path to Posixpath Interface
param path: Path to cache param s3_cacher: Cacher for s3 path param options: Optional arguments for s3_cacher
-
megfile.smart_combine_open(path_glob: str, mode: str = 'rb', open_func=<function smart_open>) → megfile.lib.combine_reader.CombineReader[source] Open a unified reader that supports multi file reading.
- Parameters
path_glob – A path may contain shell wildcard characters
mode – Mode to open file, supports ‘rb’
- Returns
A
`CombineReader`
-
megfile.smart_copy(src_path: Union[str, os.PathLike], dst_path: Union[str, os.PathLike], callback: Optional[Callable[int, None]] = None, followlinks: bool = False) → None[source] Copy file from source path to destination path
Here are a few examples:
>>> from tqdm import tqdm >>> from megfile import smart_copy, smart_stat >>> class Bar: ... def __init__(self, total=10): ... self._bar = tqdm(total=10) ... ... def __call__(self, bytes_num): ... self._bar.update(bytes_num) ... >>> src_path = 'test.png' >>> dst_path = 'test1.png' >>> smart_copy(src_path, dst_path, callback=Bar(total=smart_stat(src_path).size), followlinks=False) 856960it [00:00, 260592384.24it/s]
- Parameters
src_path – Given source path
dst_path – Given destination path
callback – Called periodically during copy, and the input parameter is the data size (in bytes) of copy since the last call
followlinks – False if regard symlink as file, else True
-
megfile.smart_exists(path: Union[str, os.PathLike], followlinks: bool = False) → bool[source] Test if path or s3_url exists
- Parameters
path – Path to be tested
- Returns
True if path eixsts, else False
-
megfile.smart_getmtime(path: Union[str, os.PathLike]) → float[source] Get last-modified time of the file on the given s3_url or file path (in Unix timestamp format). If the path is an existent directory, return the latest modified time of all file in it. The mtime of empty directory is 1970-01-01 00:00:00
- Parameters
path – Given path
- Returns
Last-modified time
- Raises
FileNotFoundError
-
megfile.smart_getsize(path: Union[str, os.PathLike]) → int[source] Get file size on the given s3_url or file path (in bytes). If the path in a directory, return the sum of all file size in it, including file in subdirectories (if exist). The result excludes the size of directory itself. In other words, return 0 Byte on an empty directory path.
- Parameters
path – Given path
- Returns
File size
- Raises
FileNotFoundError
-
megfile.smart_glob_stat(pathname: Union[str, os.PathLike], recursive: bool = True, missing_ok: bool = True) → Iterator[megfile.pathlike.FileEntry][source] Given pathname may contain shell wildcard characters, return a list contains tuples of path and file stat in ascending alphabetical order, in which path matches glob pattern
- Parameters
pathname – A path pattern may contain shell wildcard characters
recursive – If False, this function will not glob recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
-
megfile.smart_glob(pathname: Union[str, os.PathLike], recursive: bool = True, missing_ok: bool = True) → List[str][source] Given pathname may contain shell wildcard characters, return path list in ascending alphabetical order, in which path matches glob pattern
- Parameters
pathname – A path pattern may contain shell wildcard characters
recursive – If False, this function will not glob recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
-
megfile.smart_iglob(pathname: Union[str, os.PathLike], recursive: bool = True, missing_ok: bool = True) → Iterator[str][source] Given pathname may contain shell wildcard characters, return path iterator in ascending alphabetical order, in which path matches glob pattern
- Parameters
pathname – A path pattern may contain shell wildcard characters
recursive – If False, this function will not glob recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
-
megfile.smart_isdir(path: Union[str, os.PathLike], followlinks: bool = False) → bool[source] Test if a file path or an s3 url is directory
- Parameters
path – Path to be tested
- Returns
True if path is directory, else False
-
megfile.smart_isfile(path: Union[str, os.PathLike], followlinks: bool = False) → bool[source] Test if a file path or an s3 url is file
- Parameters
path – Path to be tested
- Returns
True if path is file, else False
-
megfile.smart_listdir(path: Union[str, os.PathLike, None] = None) → List[str][source] Get all contents of given s3_url or file path. The result is in acsending alphabetical order.
- Parameters
path – Given path
- Returns
All contents of given s3_url or file path in acsending alphabetical order.
- Raises
FileNotFoundError, NotADirectoryError
-
megfile.smart_load_content(path: Union[str, os.PathLike], start: Optional[int] = None, stop: Optional[int] = None) → bytes[source] Get specified file from [start, stop) in bytes
- Parameters
path – Specified path
start – start index
stop – stop index
- Returns
bytes content in range [start, stop)
-
megfile.smart_save_content(path: Union[str, os.PathLike], content: bytes) → None[source] Save bytes content to specified path
param path: Path to save content
-
megfile.smart_load_from(path: Union[str, os.PathLike]) → BinaryIO[source] Read all content in binary on specified path and write into memory
User should close the BinaryIO manually
- Parameters
path – Specified path
- Returns
BinaryIO
-
megfile.smart_load_text(path: Union[str, os.PathLike]) → str[source] Read content from path
param path: Path to be read
-
megfile.smart_save_text(path: Union[str, os.PathLike], text: str) → None[source] Save text to specified path
param path: Path to save text
-
megfile.smart_makedirs(path: Union[str, os.PathLike], exist_ok: bool = False) → None[source] Create a directory if is on fs. If on s3, it actually check if target exists, and check if bucket has WRITE access
- Parameters
path – Given path
missing_ok – if False and target directory not exists, raise FileNotFoundError
- Raises
PermissionError, FileExistsError
-
megfile.smart_open(path: Union[str, os.PathLike], mode: str = 'r', s3_open_func: Callable[[str, str], BinaryIO] = <function s3_buffered_open>, encoding: Optional[str] = None, errors: Optional[str] = None, **options) → IO[AnyStr][source] Open a file on the path
Note
On fs, the difference between this function and
io.openis that this function create directories automatically, instead of raising FileNotFoundErrorCurrently, supported protocols are:
s3: “s3://<bucket>/<key>”
http(s): http(s) url
stdio: “stdio://-”
FS file: Besides above mentioned protocols, other path are considered fs path
Here are a few examples:
>>> import cv2 >>> import numpy as np >>> raw = smart_open('https://ss2.bdstatic.com/70cFvnSh_Q1YnxGkpoWK1HF6hhy/it/u=2275743969,3715493841&fm=26&gp=0.jpg').read() >>> img = cv2.imdecode(np.frombuffer(raw, np.uint8), cv2.IMREAD_ANYDEPTH | cv2.IMREAD_COLOR)
- Parameters
path – Given path
mode – Mode to open file, supports r’[rwa][tb]?+?’
s3_open_func – Function used to open s3_url. Require the function includes 2 necessary parameters, file path and mode
encoding – encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode.
errors – errors is an optional string that specifies how encoding and decoding errors are to be handled—this cannot be used in binary mode.
- Returns
File-Like object
- Raises
FileNotFoundError, IsADirectoryError, ValueError
-
megfile.smart_path_join(path: Union[str, os.PathLike], *other_paths: Union[str, os.PathLike]) → str[source] Concat 2 or more path to a complete path
- Parameters
path – Given path
other_paths – Paths to be concatenated
- Returns
Concatenated complete path
Note
For URI, the difference between this function and
os.path.joinis that this function ignores left side slash (which indicates absolute path) inother_pathsand will directly concat. e.g. os.path.join(‘s3://path’, ‘to’, ‘/file’) => ‘/file’, and smart_path_join(‘s3://path’, ‘to’, ‘/file’) => ‘/path/to/file’ But for fs path, this function behaves exactly likeos.path.joine.g. smart_path_join(‘/path’, ‘to’, ‘/file’) => ‘/file’
-
megfile.smart_realpath(path: Union[str, os.PathLike])[source] Return the real path of given path
- Parameters
path – Given path
- Returns
Real path of given path
-
megfile.smart_remove(path: Union[str, os.PathLike], missing_ok: bool = False) → None[source] Remove the file or directory on s3 or fs, s3:// and s3://bucket are not permitted to remove
- Parameters
path – Given path
missing_ok – if False and target file/directory not exists, raise FileNotFoundError
- Raises
PermissionError, FileNotFoundError
-
megfile.smart_move(src_path: Union[str, os.PathLike], dst_path: Union[str, os.PathLike]) → None[source] Move file/directory on s3 or fs. s3:// or s3://bucket is not allowed to move
- Parameters
src_path – Given source path
dst_path – Given destination path
-
megfile.smart_rename(src_path: Union[str, os.PathLike], dst_path: Union[str, os.PathLike]) → None[source] Move file on s3 or fs. s3:// or s3://bucket is not allowed to move
- Parameters
src_path – Given source path
dst_path – Given destination path
-
megfile.smart_save_as(file_object: BinaryIO, path: Union[str, os.PathLike]) → None[source] Write the opened binary stream to specified path, but the stream won’t be closed
- Parameters
file_object – Stream to be read
path – Specified target path
-
megfile.smart_scan_stat(path: Union[str, os.PathLike], missing_ok: bool = True, followlinks: bool = False) → Iterator[megfile.pathlike.FileEntry][source] Iteratively traverse only files in given directory, in alphabetical order. Every iteration on generator yields a tuple of path string and file stat
- Parameters
path – Given path
missing_ok – If False and there’s no file in the directory, raise FileNotFoundError
- Raises
UnsupportedError
- Returns
A file path generator
-
megfile.smart_scan(path: Union[str, os.PathLike], missing_ok: bool = True, followlinks: bool = False) → Iterator[str][source] Iteratively traverse only files in given directory, in alphabetical order. Every iteration on generator yields a path string.
If path is a file path, yields the file only If path is a non-existent path, return an empty generator If path is a bucket path, return all file paths in the bucket
- Parameters
path – Given path
missing_ok – If False and there’s no file in the directory, raise FileNotFoundError
- Raises
UnsupportedError
- Returns
A file path generator
-
megfile.smart_scandir(path: Union[str, os.PathLike, None] = None) → Iterator[megfile.pathlike.FileEntry][source] Get all content of given s3_url or file path.
- Parameters
path – Given path
- Returns
An iterator contains all contents have prefix path
- Raises
FileNotFoundError, NotADirectoryError
-
megfile.smart_stat(path: Union[str, os.PathLike], follow_symlinks=True) → megfile.pathlike.StatResult[source] Get StatResult of s3_url or file path
- Parameters
path – Given path
- Returns
StatResult
- Raises
FileNotFoundError
-
megfile.smart_sync(src_path: Union[str, os.PathLike], dst_path: Union[str, os.PathLike], callback: Optional[Callable[[str, int], None]] = None, followlinks: bool = False, callback_after_copy_file: Optional[Callable[[str, str], None]] = None, src_file_stats: Optional[Iterable[megfile.pathlike.FileEntry]] = None, map_func: Callable[[Callable, Iterable], Any] = <class 'map'>, force: bool = False) → None[source] Sync file or directory on s3 and fs
Note
When the parameter is file, this function bahaves like
smart_copy.If file and directory of same name and same level, sync consider it’s file first.
Here are a few examples:
>>> from tqdm import tqdm >>> from threading import Lock >>> from megfile import smart_sync, smart_stat, smart_glob >>> class Bar: ... def __init__(self, total_file): ... self._total_file = total_file ... self._bar = None ... self._now = None ... self._file_index = 0 ... self._lock = Lock() ... def __call__(self, path, num_bytes): ... with self._lock: ... if path != self._now: ... self._file_index += 1 ... print("copy file {}/{}:".format(self._file_index, self._total_file)) ... if self._bar: ... self._bar.close() ... self._bar = tqdm(total=smart_stat(path).size) ... self._now = path ... self._bar.update(num_bytes) >>> total_file = len(list(smart_glob('src_path'))) >>> smart_sync('src_path', 'dst_path', callback=Bar(total_file=total_file))
- Parameters
src_path – Given source path
dst_path – Given destination path
callback – Called periodically during copy, and the input parameter is the data size (in bytes) of copy since the last call
followlinks – False if regard symlink as file, else True
callback_after_copy_file – Called after copy success, and the input parameter is src file path and dst file path
src_file_stats – If this parameter is not None, only this parameter’s files will be synced, and src_path is the root_path of these files used to calculate the path of the target file. This parameter is in order to reduce file traversal times.
map_func – A Callable func like map. You can use ThreadPoolExecutor.map, Pool.map and so on if you need concurrent capability. default is standard library map.
-
megfile.smart_touch(path: Union[str, os.PathLike])[source] Create a new file on path
param path: Path to create file
-
megfile.smart_unlink(path: Union[str, os.PathLike], missing_ok: bool = False) → None[source] Remove the file on s3 or fs
- Parameters
path – Given path
missing_ok – if False and target file not exists, raise FileNotFoundError
- Raises
PermissionError, FileNotFoundError, IsADirectoryError
-
megfile.smart_walk(path: Union[str, os.PathLike], followlinks: bool = False) → Iterator[Tuple[str, List[str], List[str]]][source] Generate the file names in a directory tree by walking the tree top-down. For each directory in the tree rooted at directory path (including path itself), it yields a 3-tuple (root, dirs, files).
root: a string of current path dirs: name list of subdirectories (excluding ‘.’ and ‘..’ if they exist) in ‘root’. The list is sorted by ascending alphabetical order files: name list of non-directory files (link is regarded as file) in ‘root’. The list is sorted by ascending alphabetical order
If path not exists, return an empty generator If path is a file, return an empty generator If try to apply walk() on unsupported path, raise UnsupportedError
- Parameters
path – Given path
- Raises
UnsupportedError
- Returns
A 3-tuple generator
-
megfile.smart_cache(path, cacher=<class 'megfile.smart.SmartCacher'>, **options)[source] Return a path to Posixpath Interface
param path: Path to cache param s3_cacher: Cacher for s3 path param options: Optional arguments for s3_cacher
-
megfile.smart_getmd5(path: Union[str, os.PathLike], recalculate: bool = False)[source] Get md5 value of file
param path: File path param recalculate: calculate md5 in real-time or not return s3 etag when path is s3
-
megfile.smart_symlink(src_path: Union[str, os.PathLike], dst_path: Union[str, os.PathLike]) → None[source] Create a symbolic link pointing to src_path named path.
- Parameters
src_path – Source path
dst_path – Desination path
-
megfile.smart_readlink(path: Union[str, os.PathLike]) → Union[str, os.PathLike][source] Return a string representing the path to which the symbolic link points. :param path: Path to be read :returns: Return a string representing the path to which the symbolic link points.
-
megfile.smart_lstat(path: Union[str, os.PathLike]) → megfile.pathlike.StatResult[source] Get StatResult of path but do not follow symbolic links
- Parameters
path – Given path
- Returns
StatResult
- Raises
FileNotFoundError
-
megfile.smart_concat(src_paths: List[Union[str, os.PathLike]], dst_path: Union[str, os.PathLike]) → None[source] Concatenate src_paths to dst_path
- Parameters
src_paths – List of source paths
dst_path – Destination path
-
megfile.is_s3(path: Union[str, os.PathLike]) → bool[source] According to aws-cli , test if a path is s3 path.
megfile also support the path like s3[+profile_name]://bucket/key
- Parameters
path – Path to be tested
- Returns
True if path is s3 path, else False
-
megfile.s3_access(path: Union[str, os.PathLike], mode: megfile.pathlike.Access = <Access.READ: 1>, followlinks: bool = False) → bool[source] Test if path has access permission described by mode
- Parameters
path – Given path
mode – access mode
- Returns
bool, if the bucket of s3_url has read/write access.
-
megfile.s3_buffered_open(s3_url: Union[str, os.PathLike], mode: str, followlinks: bool = False, *, max_concurrency: Optional[int] = None, max_buffer_size: int = 134217728, forward_ratio: Optional[float] = None, block_size: int = 8388608, limited_seekable: bool = False, buffered: bool = True, share_cache_key: Optional[str] = None, cache_path: Optional[str] = None) → Union[megfile.lib.s3_prefetch_reader.S3PrefetchReader, megfile.lib.s3_buffered_writer.S3BufferedWriter, _io.BufferedReader, _io.BufferedWriter, megfile.lib.s3_memory_handler.S3MemoryHandler][source] Open an asynchronous prefetch reader, to support fast sequential read
Note
User should make sure that reader / writer are closed correctly
Supports context manager
Some parameter setting may perform well: max_concurrency=10 or 20, max_block_size=8 or 16 MB, default value None means using global thread pool
- Parameters
max_concurrency – Max download thread number, None by default
max_buffer_size – Max cached buffer size in memory, 128MB by default
block_size – Size of single block, 8MB by default. Each block will be uploaded or downloaded by single thread.
limited_seekable – If write-handle supports limited seek (both file head part and tail part can seek block_size). Notes: This parameter are valid only for write-handle. Read-handle support arbitrary seek
- Returns
An opened S3PrefetchReader object
- Raises
S3FileNotFoundError
-
megfile.s3_cached_open(s3_url: Union[str, os.PathLike], mode: str, followlinks: bool = False, *, cache_path: Optional[str] = None) → megfile.lib.s3_cached_handler.S3CachedHandler[source] Open a local-cache file reader / writer, for frequent random read / write
Note
User should make sure that reader / writer are closed correctly
Supports context manager
cache_path can specify the path of cache file. Performance could be better if cache file path is on ssd or tmpfs
- Parameters
mode – Mode to open file, could be one of “rb”, “wb” or “ab”
cache_path – cache file path
- Returns
An opened BufferedReader / BufferedWriter object
-
megfile.s3_copy(src_url: Union[str, os.PathLike], dst_url: Union[str, os.PathLike], followlinks: bool = False, callback: Optional[Callable[int, None]] = None) → None[source] File copy on S3 Copy content of file on src_path to dst_path. It’s caller’s responsebility to ensure the s3_isfile(src_url) == True
- Parameters
src_url – Given path
dst_path – Target file path
callback – Called periodically during copy, and the input parameter is the data size (in bytes) of copy since the last call
-
megfile.s3_download(src_url: Union[str, os.PathLike], dst_url: Union[str, os.PathLike], followlinks: bool = False, callback: Optional[Callable[int, None]] = None) → None[source] Downloads a file from s3 to local filesystem. :param src_url: source s3 path :param dst_url: target fs path :param callback: Called periodically during copy, and the input parameter is the data size (in bytes) of copy since the last call
-
megfile.s3_exists(path: Union[str, os.PathLike], followlinks: bool = False) → bool[source] Test if s3_url exists
If the bucket of s3_url are not permitted to read, return False
- Parameters
path – Given path
- Returns
True if s3_url eixsts, else False
-
megfile.s3_getmd5(path: Union[str, os.PathLike], recalculate: bool = False, followlinks: bool = False) → str[source] Get md5 meta info in files that uploaded/copied via megfile
If meta info is lost or non-existent, return None
- Parameters
path – Given path
recalculate – calculate md5 in real-time or return s3 etag
followlinks – If is True, calculate md5 for real file
- Returns
md5 meta info
-
megfile.s3_getmtime(path: Union[str, os.PathLike], follow_symlinks: bool = False) → float[source] Get last-modified time of the file on the given s3_url path (in Unix timestamp format). If the path is an existent directory, return the latest modified time of all file in it. The mtime of empty directory is 1970-01-01 00:00:00
If s3_url is not an existent path, which means s3_exist(s3_url) returns False, then raise S3FileNotFoundError
- Parameters
path – Given path
- Returns
Last-modified time
- Raises
S3FileNotFoundError, UnsupportedError
-
megfile.s3_getsize(path: Union[str, os.PathLike], follow_symlinks: bool = False) → int[source] Get file size on the given s3_url path (in bytes). If the path in a directory, return the sum of all file size in it, including file in subdirectories (if exist). The result excludes the size of directory itself. In other words, return 0 Byte on an empty directory path.
If s3_url is not an existent path, which means s3_exist(s3_url) returns False, then raise S3FileNotFoundError
- Parameters
path – Given path
- Returns
File size
- Raises
S3FileNotFoundError, UnsupportedError
-
megfile.s3_glob_stat(path: Union[str, os.PathLike], recursive: bool = True, missing_ok: bool = True, followlinks: bool = False) → Iterator[megfile.pathlike.FileEntry][source] Return a generator contains tuples of path and file stat, in ascending alphabetical order, in which path matches glob pattern Notes: Only glob in bucket. If trying to match bucket with wildcard characters, raise UnsupportedError
- Parameters
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
- Raises
UnsupportedError, when bucket part contains wildcard characters
- Returns
A generator contains tuples of path and file stat, in which paths match s3_pathname
-
megfile.s3_glob(path: Union[str, os.PathLike], recursive: bool = True, missing_ok: bool = True, followlinks: bool = False) → List[str][source] Return s3 path list in ascending alphabetical order, in which path matches glob pattern Notes: Only glob in bucket. If trying to match bucket with wildcard characters, raise UnsupportedError
- Parameters
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
- Raises
UnsupportedError, when bucket part contains wildcard characters
- Returns
A list contains paths match s3_pathname
-
megfile.s3_hasbucket(path: Union[str, os.PathLike]) → bool[source] Test if the bucket of s3_url exists
- Parameters
path – Given path
- Returns
True if bucket of s3_url eixsts, else False
-
megfile.s3_iglob(path: Union[str, os.PathLike], recursive: bool = True, missing_ok: bool = True, followlinks: bool = False) → Iterator[str][source] Return s3 path iterator in ascending alphabetical order, in which path matches glob pattern Notes: Only glob in bucket. If trying to match bucket with wildcard characters, raise UnsupportedError
- Parameters
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
- Raises
UnsupportedError, when bucket part contains wildcard characters
- Returns
An iterator contains paths match s3_pathname
-
megfile.s3_isdir(path: Union[str, os.PathLike], followlinks: bool = False) → bool[source] Test if an s3 url is directory Specific procedures are as follows: If there exists a suffix, of which
os.path.join(s3_url, suffix)is a file If the url is empty bucket or s3://- Parameters
path – Given path
followlinks – whether followlinks is True or False, result is the same. Because s3 symlink not support dir.
- Returns
True if path is s3 directory, else False
-
megfile.s3_isfile(path: Union[str, os.PathLike], followlinks: bool = False) → bool[source] Test if an s3_url is file
- Parameters
path – Given path
- Returns
True if path is s3 file, else False
-
megfile.s3_listdir(path: Union[str, os.PathLike], followlinks: bool = False) → List[str][source] Get all contents of given s3_url. The result is in acsending alphabetical order.
- Parameters
path – Given path
- Returns
All contents have prefix of s3_url in acsending alphabetical order
- Raises
S3FileNotFoundError, S3NotADirectoryError
-
megfile.s3_load_content(s3_url, start: Optional[int] = None, stop: Optional[int] = None, followlinks: bool = False) → bytes[source] Get specified file from [start, stop) in bytes
- Parameters
s3_url – Specified path
start – start index
stop – stop index
- Returns
bytes content in range [start, stop)
-
megfile.s3_load_from(path: Union[str, os.PathLike], followlinks: bool = False) → BinaryIO[source] Read all content in binary on specified path and write into memory
User should close the BinaryIO manually
- Parameters
path – Given path
- Returns
BinaryIO
-
megfile.s3_makedirs(path: Union[str, os.PathLike], exist_ok: bool = False)[source] Create an s3 directory. Purely creating directory is invalid because it’s unavailable on OSS. This function is to test the target bucket have WRITE access.
- Parameters
path – Given path
exist_ok – If False and target directory exists, raise S3FileExistsError
- Raises
S3BucketNotFoundError, S3FileExistsError
-
megfile.s3_memory_open(s3_url: Union[str, os.PathLike], mode: str, followlinks: bool = False) → megfile.lib.s3_memory_handler.S3MemoryHandler[source] Open a memory-cache file reader / writer, for frequent random read / write
Note
User should make sure that reader / writer are closed correctly
Supports context manager
- Parameters
mode – Mode to open file, could be one of “rb”, “wb”, “ab”, “rb+”, “wb+” or “ab+”
- Returns
An opened BufferedReader / BufferedWriter object
-
megfile.s3_open(s3_url: Union[str, os.PathLike], mode: str, followlinks: bool = False, *, max_concurrency: Optional[int] = None, max_buffer_size: int = 134217728, forward_ratio: Optional[float] = None, block_size: int = 8388608, limited_seekable: bool = False, buffered: bool = True, share_cache_key: Optional[str] = None, cache_path: Optional[str] = None) → Union[megfile.lib.s3_prefetch_reader.S3PrefetchReader, megfile.lib.s3_buffered_writer.S3BufferedWriter, _io.BufferedReader, _io.BufferedWriter, megfile.lib.s3_memory_handler.S3MemoryHandler] Open an asynchronous prefetch reader, to support fast sequential read
Note
User should make sure that reader / writer are closed correctly
Supports context manager
Some parameter setting may perform well: max_concurrency=10 or 20, max_block_size=8 or 16 MB, default value None means using global thread pool
- Parameters
max_concurrency – Max download thread number, None by default
max_buffer_size – Max cached buffer size in memory, 128MB by default
block_size – Size of single block, 8MB by default. Each block will be uploaded or downloaded by single thread.
limited_seekable – If write-handle supports limited seek (both file head part and tail part can seek block_size). Notes: This parameter are valid only for write-handle. Read-handle support arbitrary seek
- Returns
An opened S3PrefetchReader object
- Raises
S3FileNotFoundError
-
megfile.s3_path_join(path: Union[str, os.PathLike], *other_paths: Union[str, os.PathLike]) → str[source] Concat 2 or more path to a complete path
- Parameters
path – Given path
other_paths – Paths to be concatenated
- Returns
Concatenated complete path
Note
The difference between this function and
os.path.joinis that this function ignores left side slash (which indicates absolute path) inother_pathsand will directly concat. e.g. os.path.join(‘/path’, ‘to’, ‘/file’) => ‘/file’, but s3_path_join(‘/path’, ‘to’, ‘/file’) => ‘/path/to/file’
-
megfile.s3_pipe_open(s3_url: Union[str, os.PathLike], mode: str, followlinks: bool = False, *, join_thread: bool = True) → megfile.lib.s3_pipe_handler.S3PipeHandler[source] Open a asynchronous read-write reader / writer, to support fast sequential read / write
Note
User should make sure that reader / writer are closed correctly
Supports context manager
When join_thread is False, while the file handle are closing, this function will not wait until the asynchronous writing finishes; False doesn’t affect read-handle, but this can speed up write-handle because file will be written asynchronously. But asynchronous behaviour can guarantee the file are successfully written, and frequent execution may cause thread and file handle exhaustion
- Parameters
mode – Mode to open file, either “rb” or “wb”
join_thread – If wait after function execution until s3 finishes writing
- Returns
An opened BufferedReader / BufferedWriter object
-
megfile.s3_prefetch_open(s3_url: Union[str, os.PathLike], mode: str = 'rb', followlinks: bool = False, *, max_concurrency: Optional[int] = None, max_block_size: int = 8388608) → megfile.lib.s3_prefetch_reader.S3PrefetchReader[source] Open a asynchronous prefetch reader, to support fast sequential read and random read
Note
User should make sure that reader / writer are closed correctly
Supports context manager
Some parameter setting may perform well: max_concurrency=10 or 20, max_block_size=8 or 16 MB, default value None means using global thread pool
- Parameters
max_concurrency – Max download thread number, None by default
max_block_size – Max data size downloaded by each thread, in bytes, 8MB by default
- Returns
An opened S3PrefetchReader object
- Raises
S3FileNotFoundError
-
megfile.s3_remove(path: Union[str, os.PathLike], missing_ok: bool = False) → None[source] Remove the file or directory on s3, s3:// and s3://bucket are not permitted to remove
- Parameters
path – Given path
missing_ok – if False and target file/directory not exists, raise S3FileNotFoundError
- Raises
S3PermissionError, S3FileNotFoundError, UnsupportedError
-
megfile.s3_rename(src_url: Union[str, os.PathLike], dst_url: Union[str, os.PathLike])[source] Move s3 file path from src_url to dst_url
- Parameters
dst_url – Given destination path
-
megfile.s3_move(src_url: Union[str, os.PathLike], dst_url: Union[str, os.PathLike]) → None[source] Move file/directory path from src_url to dst_url
- Parameters
src_url – Given path
dst_url – Given destination path
-
megfile.s3_sync(src_url: Union[str, os.PathLike], dst_url: Union[str, os.PathLike], followlinks: bool = False, force: bool = False) → None[source] Copy file/directory on src_url to dst_url
- Parameters
src_url – Given path
dst_url – Given destination path
followlinks – False if regard symlink as file, else True
force – Sync file forcely, do not ignore same files
-
megfile.s3_save_as(file_object: BinaryIO, path: Union[str, os.PathLike])[source] Write the opened binary stream to specified path, but the stream won’t be closed
- Parameters
path – Given path
file_object – Stream to be read
-
megfile.s3_scan_stat(path: Union[str, os.PathLike], missing_ok: bool = True, followlinks: bool = False) → Iterator[megfile.pathlike.FileEntry][source] Iteratively traverse only files in given directory, in alphabetical order. Every iteration on generator yields a tuple of path string and file stat
- Parameters
path – Given path
missing_ok – If False and there’s no file in the directory, raise FileNotFoundError
- Raises
UnsupportedError
- Returns
A file path generator
-
megfile.s3_scan(path: Union[str, os.PathLike], missing_ok: bool = True, followlinks: bool = False) → Iterator[str][source] Iteratively traverse only files in given s3 directory, in alphabetical order. Every iteration on generator yields a path string.
If s3_url is a file path, yields the file only If s3_url is a non-existent path, return an empty generator If s3_url is a bucket path, return all file paths in the bucket If s3_url is an empty bucket, return an empty generator If s3_url doesn’t contain any bucket, which is s3_url == ‘s3://’, raise UnsupportedError. walk() on complete s3 is not supported in megfile
- Parameters
path – Given path
missing_ok – If False and there’s no file in the directory, raise FileNotFoundError
- Raises
UnsupportedError
- Returns
A file path generator
-
megfile.s3_scandir(path: Union[str, os.PathLike], followlinks: bool = False) → Iterator[megfile.pathlike.FileEntry][source] Get all contents of given s3_url, the order of result is not guaranteed.
- Parameters
path – Given path
- Returns
All contents have prefix of s3_url
- Raises
S3FileNotFoundError, S3NotADirectoryError
-
megfile.s3_stat(path: Union[str, os.PathLike], follow_symlinks=True) → megfile.pathlike.StatResult[source] Get StatResult of s3_url file, including file size and mtime, referring to s3_getsize and s3_getmtime
If s3_url is not an existent path, which means s3_exist(s3_url) returns False, then raise S3FileNotFoundError If attempt to get StatResult of complete s3, such as s3_dir_url == ‘s3://’, raise S3BucketNotFoundError
- Parameters
path – Given path
- Returns
StatResult
- Raises
S3FileNotFoundError, S3BucketNotFoundError
-
megfile.s3_lstat(path: Union[str, os.PathLike]) → megfile.pathlike.StatResult[source] Like Path.stat() but, if the path points to a symbolic link, return the symbolic link’s information rather than its target’s.
-
megfile.s3_unlink(path: Union[str, os.PathLike], missing_ok: bool = False) → None[source] Remove the file on s3
- Parameters
path – Given path
missing_ok – if False and target file not exists, raise S3FileNotFoundError
- Raises
S3PermissionError, S3FileNotFoundError, S3IsADirectoryError
-
megfile.s3_upload(src_url: Union[str, os.PathLike], dst_url: Union[str, os.PathLike], callback: Optional[Callable[int, None]] = None, **kwargs) → None[source] Uploads a file from local filesystem to s3. :param src_url: source fs path :param dst_url: target s3 path :param callback: Called periodically during copy, and the input parameter is the data size (in bytes) of copy since the last call
-
megfile.s3_walk(path: Union[str, os.PathLike], followlinks: bool = False) → Iterator[Tuple[str, List[str], List[str]]][source] Iteratively traverse the given s3 directory, in top-bottom order. In other words, firstly traverse parent directory, if subdirectories exist, traverse the subdirectories in alphabetical order. Every iteration on generator yields a 3-tuple: (root, dirs, files)
root: Current s3 path;
dirs: Name list of subdirectories in current directory. The list is sorted by name in ascending alphabetical order;
files: Name list of files in current directory. The list is sorted by name in ascending alphabetical order;
If s3_url is a file path, return an empty generator If s3_url is a non-existent path, return an empty generator If s3_url is a bucket path, bucket will be the top directory, and will be returned at first iteration of generator If s3_url is an empty bucket, only yield one 3-tuple (notes: s3 doesn’t have empty directory) If s3_url doesn’t contain any bucket, which is s3_url == ‘s3://’, raise UnsupportedError. walk() on complete s3 is not supported in megfile
- Parameters
path – Given path
followlinks – whether followlinks is True or False, result is the same. Because s3 symlink not support dir.
- Raises
UnsupportedError
- Returns
A 3-tuple generator
-
megfile.s3_symlink(src_path: Union[str, os.PathLike], dst_path: Union[str, os.PathLike]) → None[source] Create a symbolic link pointing to src_path named dst_path.
- Parameters
src_path – Given path
dst_path – Desination path
- Raises
S3NameTooLongError, S3BucketNotFoundError, S3IsADirectoryError
-
megfile.s3_readlink(path) → str[source] Return a string representing the path to which the symbolic link points.
- Returns
Return a string representing the path to which the symbolic link points.
- Raises
S3NameTooLongError, S3BucketNotFoundError, S3IsADirectoryError, S3NotALinkError
-
megfile.s3_concat(src_paths: List[Union[str, os.PathLike]], dst_path: Union[str, os.PathLike], block_size: int = 8388608, max_workers: int = 128) → None[source] Concatenate s3 files to one file.
- Parameters
src_paths – Given source paths
dst_path – Given destination path
-
megfile.is_fs(path: Union[PathLike, int]) → bool[source] Test if a path is fs path
- Parameters
path – Path to be tested
- Returns
True of a path is fs path, else False
-
megfile.fs_abspath(path: Union[str, os.PathLike]) → str[source] Return the absolute path of given path
- Parameters
path – Given path
- Returns
Absolute path of given path
-
megfile.fs_access(path: Union[str, os.PathLike], mode: megfile.pathlike.Access = <Access.READ: 1>) → bool[source] Test if path has access permission described by mode Using
os.access- Parameters
path – Given path
mode – access mode
- Returns
Access: Enum, the read/write access that path has.
-
megfile.fs_exists(path: Union[str, os.PathLike], followlinks: bool = False) → bool[source] Test if the path exists
Note
The difference between this function and
os.path.existsis that this function regard symlink as file. In other words, this function is equal toos.path.lexists- Parameters
path – Given path
followlinks – False if regard symlink as file, else True
- Returns
True if the path exists, else False
-
megfile.fs_getmtime(path: Union[str, os.PathLike], follow_symlinks: bool = False) → float[source] Get last-modified time of the file on the given path (in Unix timestamp format). If the path is an existent directory, return the latest modified time of all file in it.
- Parameters
path – Given path
- Returns
last-modified time
-
megfile.fs_getsize(path: Union[str, os.PathLike], follow_symlinks: bool = False) → int[source] Get file size on the given file path (in bytes). If the path in a directory, return the sum of all file size in it, including file in subdirectories (if exist). The result excludes the size of directory itself. In other words, return 0 Byte on an empty directory path.
- Parameters
path – Given path
- Returns
File size
-
megfile.fs_glob_stat(path: Union[str, os.PathLike], recursive: bool = True, missing_ok: bool = True) → Iterator[megfile.pathlike.FileEntry][source] Return a list contains tuples of path and file stat, in ascending alphabetical order, in which path matches glob pattern
- If doesn’t match any path, return empty list
Notice:
glob.globin standard library returns [‘a/’] instead of empty list when pathname is like a/**, recursive is True and directory ‘a’ doesn’t exist. fs_glob behaves likeglob.globin standard library under such circumstance.
- No guarantee that each path in result is different, which means:
Assume there exists a path /a/b/c/b/d.txt use path pattern like /**/b/**/*.txt to glob, the path above will be returned twice
** will match any matched file, directory, symlink and ‘’ by default, when recursive is True
fs_glob returns same as glob.glob(pathname, recursive=True) in acsending alphabetical order.
Hidden files (filename stars with ‘.’) will not be found in the result
- Parameters
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
- Returns
A list contains tuples of path and file stat, in which paths match pathname
-
megfile.fs_glob(path: Union[str, os.PathLike], recursive: bool = True, missing_ok: bool = True) → List[str][source] Return path list in ascending alphabetical order, in which path matches glob pattern
- If doesn’t match any path, return empty list
Notice:
glob.globin standard library returns [‘a/’] instead of empty list when pathname is like a/**, recursive is True and directory ‘a’ doesn’t exist. fs_glob behaves likeglob.globin standard library under such circumstance.
- No guarantee that each path in result is different, which means:
Assume there exists a path /a/b/c/b/d.txt use path pattern like /**/b/**/*.txt to glob, the path above will be returned twice
** will match any matched file, directory, symlink and ‘’ by default, when recursive is True
fs_glob returns same as glob.glob(pathname, recursive=True) in acsending alphabetical order.
Hidden files (filename stars with ‘.’) will not be found in the result
- Parameters
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
- Returns
A list contains paths match pathname
-
megfile.fs_iglob(path: Union[str, os.PathLike], recursive: bool = True, missing_ok: bool = True) → Iterator[str][source] Return path iterator in ascending alphabetical order, in which path matches glob pattern
- If doesn’t match any path, return empty list
Notice:
glob.globin standard library returns [‘a/’] instead of empty list when pathname is like a/**, recursive is True and directory ‘a’ doesn’t exist. fs_glob behaves likeglob.globin standard library under such circumstance.
- No guarantee that each path in result is different, which means:
Assume there exists a path /a/b/c/b/d.txt use path pattern like /**/b/**/*.txt to glob, the path above will be returned twice
** will match any matched file, directory, symlink and ‘’ by default, when recursive is True
fs_glob returns same as glob.glob(pathname, recursive=True) in acsending alphabetical order.
Hidden files (filename stars with ‘.’) will not be found in the result
- Parameters
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
- Returns
An iterator contains paths match pathname
-
megfile.fs_isabs(path: Union[str, os.PathLike]) → bool[source] Test whether a path is absolute
- Parameters
path – Given path
- Returns
True if a path is absolute, else False
-
megfile.fs_isdir(path: Union[str, os.PathLike], followlinks: bool = False) → bool[source] Test if a path is directory
Note
The difference between this function and
os.path.isdiris that this function regard symlink as file- Parameters
path – Given path
followlinks – False if regard symlink as file, else True
- Returns
True if the path is a directory, else False
-
megfile.fs_isfile(path: Union[str, os.PathLike], followlinks: bool = False) → bool[source] Test if a path is file
Note
The difference between this function and
os.path.isfileis that this function regard symlink as file- Parameters
path – Given path
followlinks – False if regard symlink as file, else True
- Returns
True if the path is a file, else False
-
megfile.fs_islink(path: Union[str, os.PathLike]) → bool[source] Test whether a path is a symbolic link
- Parameters
path – Given path
- Returns
If path is a symbolic link return True, else False
- Return type
bool
-
megfile.fs_ismount(path: Union[str, os.PathLike]) → bool[source] Test whether a path is a mount point
- Parameters
path – Given path
- Returns
True if a path is a mount point, else False
-
megfile.fs_listdir(path: Union[str, os.PathLike]) → List[str][source] Get all contents of given fs path. The result is in acsending alphabetical order.
- Parameters
path – Given path
- Returns
All contents have in the path in acsending alphabetical order
-
megfile.fs_load_from(path: Union[str, os.PathLike]) → BinaryIO[source] Read all content on specified path and write into memory
User should close the BinaryIO manually
- Parameters
path – Given path
- Returns
Binary stream
-
megfile.fs_makedirs(path: Union[str, os.PathLike], exist_ok: bool = False)[source] make a directory on fs, including parent directory
If there exists a file on the path, raise FileExistsError
- Parameters
path – Given path
exist_ok – If False and target directory exists, raise FileExistsError
- Raises
FileExistsError
-
megfile.fs_realpath(path: Union[str, os.PathLike]) → str[source] Return the real path of given path
- Parameters
path – Given path
- Returns
Real path of given path
-
megfile.fs_relpath(path: Union[str, os.PathLike], start: Optional[str] = None) → str[source] Return the relative path of given path
- Parameters
path – Given path
start – Given start directory
- Returns
Relative path from start
-
megfile.fs_remove(path: Union[str, os.PathLike], missing_ok: bool = False) → None[source] Remove the file or directory on fs
- Parameters
path – Given path
missing_ok – if False and target file/directory not exists, raise FileNotFoundError
-
megfile.fs_rename(src_path: Union[str, os.PathLike], dst_path: Union[str, os.PathLike]) → None[source] rename file on fs
- Parameters
src_path – Given path
dst_path – Given destination path
-
megfile.fs_move(src_path: Union[str, os.PathLike], dst_path: Union[str, os.PathLike]) → None[source] rename file on fs
- Parameters
src_path – Given path
dst_path – Given destination path
-
megfile.fs_sync(src_path: Union[str, os.PathLike], dst_path: Union[str, os.PathLike], followlinks: bool = False, force: bool = False) → None[source] Force write of everything to disk.
- Parameters
src_path – Given path
dst_path – Target file path
followlinks – False if regard symlink as file, else True
force – Sync file forcely, do not ignore same files
-
megfile.fs_save_as(file_object: BinaryIO, path: Union[str, os.PathLike])[source] Write the opened binary stream to path If parent directory of path doesn’t exist, it will be created.
- Parameters
path – Given path
file_object – stream to be read
-
megfile.fs_scan_stat(path: Union[str, os.PathLike], missing_ok: bool = True, followlinks: bool = False) → Iterator[megfile.pathlike.FileEntry][source] Iteratively traverse only files in given directory, in alphabetical order. Every iteration on generator yields a tuple of path string and file stat
- Parameters
path – Given path
missing_ok – If False and there’s no file in the directory, raise FileNotFoundError
- Returns
A file path generator
-
megfile.fs_scan(path: Union[str, os.PathLike], missing_ok: bool = True, followlinks: bool = False) → Iterator[str][source] Iteratively traverse only files in given directory, in alphabetical order. Every iteration on generator yields a path string.
If path is a file path, yields the file only If path is a non-existent path, return an empty generator If path is a bucket path, return all file paths in the bucket
- Parameters
path – Given path
missing_ok – If False and there’s no file in the directory, raise FileNotFoundError
- Returns
A file path generator
-
megfile.fs_scandir(path: Union[str, os.PathLike]) → Iterator[megfile.pathlike.FileEntry][source] Get all content of given file path.
- Parameters
path – Given path
- Returns
An iterator contains all contents have prefix path
-
megfile.fs_stat(path: Union[str, os.PathLike], follow_symlinks=True) → megfile.pathlike.StatResult[source] Get StatResult of file on fs, including file size and mtime, referring to fs_getsize and fs_getmtime
- Parameters
path – Given path
- Returns
StatResult
-
megfile.fs_lstat(path: Union[str, os.PathLike]) → megfile.pathlike.StatResult[source] Like Path.stat() but, if the path points to a symbolic link, return the symbolic link’s information rather than its target’s.
- Parameters
path – Given path
- Returns
StatResult
-
megfile.fs_unlink(path: Union[str, os.PathLike], missing_ok: bool = False) → None[source] Remove the file on fs
- Parameters
path – Given path
missing_ok – if False and target file not exists, raise FileNotFoundError
-
megfile.fs_walk(path: Union[str, os.PathLike], followlinks: bool = False) → Iterator[Tuple[str, List[str], List[str]]][source] Generate the file names in a directory tree by walking the tree top-down. For each directory in the tree rooted at directory path (including path itself), it yields a 3-tuple (root, dirs, files).
root: a string of current path dirs: name list of subdirectories (excluding ‘.’ and ‘..’ if they exist) in ‘root’. The list is sorted by ascending alphabetical order files: name list of non-directory files (link is regarded as file) in ‘root’. The list is sorted by ascending alphabetical order
If path not exists, or path is a file (link is regarded as file), return an empty generator
Note
Be aware that setting
followlinksto True can lead to infinite recursion if a link points to a parent directory of itself. fs_walk() does not keep track of the directories it visited already.- Parameters
path – Given path
followlinks – False if regard symlink as file, else True
- Returns
A 3-tuple generator
-
megfile.fs_expanduser(path: Union[str, os.PathLike])[source] Expand ~ and ~user constructions. If user or $HOME is unknown, do nothing.
-
megfile.fs_resolve(path: Union[str, os.PathLike]) → str[source] Equal to fs_realpath, return the real path of given path
- Parameters
path – Given path
- Returns
Real path of given path
-
megfile.fs_getmd5(path: Union[str, os.PathLike], recalculate: bool = False, followlinks: bool = True)[source] Calculate the md5 value of the file
- Parameters
path – Given path
recalculate – Ignore this parameter, just for compatibility
followlinks – Ignore this parameter, just for compatibility
returns: md5 of file
-
megfile.fs_symlink(src_path: Union[str, os.PathLike], dst_path: Union[str, os.PathLike]) → None[source] Create a symbolic link pointing to src_path named dst_path.
- Parameters
src_path – Given path
dst_path – Desination path
-
megfile.fs_readlink(path) → str[source] Return a string representing the path to which the symbolic link points. :returns: Return a string representing the path to which the symbolic link points.
-
megfile.is_http(path: Union[str, os.PathLike]) → bool[source] http scheme definition: http(s)://domain/path
- Parameters
path – Path to be tested
- Returns
True if path is http url, else False
-
megfile.http_open(path: Union[str, os.PathLike], mode: str = 'rb', *, encoding: Optional[str] = None, errors: Optional[str] = None, max_concurrency: Optional[int] = None, max_buffer_size: int = 134217728, forward_ratio: Optional[float] = None, block_size: int = 8388608, **kwargs) → Union[_io.BufferedReader, megfile.lib.http_prefetch_reader.HttpPrefetchReader][source] Open a BytesIO to read binary data of given http(s) url
Note
Essentially, it reads data of http(s) url to memory by requests, and then return BytesIO to user.
- Parameters
path – Given path
mode – Only supports ‘rb’ mode now
encoding – encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode.
errors – errors is an optional string that specifies how encoding and decoding errors are to be handled—this cannot be used in binary mode.
max_concurrency – Max download thread number, None by default
max_buffer_size – Max cached buffer size in memory, 128MB by default
block_size – Size of single block, 8MB by default. Each block will be uploaded or downloaded by single thread.
- Returns
BytesIO initialized with http(s) data
-
megfile.http_stat(path: Union[str, os.PathLike], follow_symlinks=True) → megfile.pathlike.StatResult[source] Get StatResult of http_url response, including size and mtime, referring to http_getsize and http_getmtime
- Parameters
path – Given path
follow_symlinks – Ignore this parameter, just for compatibility
- Returns
StatResult
- Raises
HttpPermissionError, HttpFileNotFoundError
-
megfile.http_getsize(path: Union[str, os.PathLike], follow_symlinks: bool = False) → int[source] Get file size on the given http_url path.
If http response header don’t support Content-Length, will return None
- Parameters
path – Given path
follow_symlinks – Ignore this parameter, just for compatibility
- Returns
File size (in bytes)
- Raises
HttpPermissionError, HttpFileNotFoundError
-
megfile.http_getmtime(path: Union[str, os.PathLike], follow_symlinks: bool = False) → float[source] Get Last-Modified time of the http request on the given http_url path.
If http response header don’t support Last-Modified, will return None
- Parameters
path – Given path
follow_symlinks – Ignore this parameter, just for compatibility
- Returns
Last-Modified time (in Unix timestamp format)
- Raises
HttpPermissionError, HttpFileNotFoundError
-
megfile.http_exists(path: Union[str, os.PathLike], followlinks: bool = False) → bool[source] Test if http path exists
- Parameters
path – Given path
followlinks (bool, optional) – ignore this parameter, just for compatibility
- Returns
return True if exists
- Return type
bool
-
megfile.is_stdio(path: Union[str, os.PathLike]) → bool[source] stdio scheme definition: stdio://-
Note
Only tests protocol
- Parameters
path – Path to be tested
- Returns
True of a path is stdio url, else False
-
megfile.stdio_open(path: Union[str, os.PathLike], mode: str = 'rb', encoding: Optional[str] = None, errors: Optional[str] = None, **kwargs) → IO[AnyStr][source] Used to read or write stdio
Note
Essentially invoke sys.stdin.buffer | sys.stdout.buffer to read or write
- Parameters
path – Given path
mode – Only supports ‘rb’ and ‘wb’ now
- Returns
STDReader, STDWriter
-
megfile.is_sftp(path: Union[str, os.PathLike]) → bool[source] Test if a path is sftp path
- Parameters
path – Path to be tested
- Returns
True of a path is sftp path, else False
-
megfile.sftp_readlink(path: Union[str, os.PathLike]) → str[source] Return a SftpPath instance representing the path to which the symbolic link points. :param path: Given path :returns: Return a SftpPath instance representing the path to which the symbolic link points.
-
megfile.sftp_absolute(path: Union[str, os.PathLike]) → megfile.sftp_path.SftpPath[source] Make the path absolute, without normalization or resolving symlinks. Returns a new path object
-
megfile.sftp_glob(path: Union[str, os.PathLike], recursive: bool = True, missing_ok: bool = True) → List[str][source] Return path list in ascending alphabetical order, in which path matches glob pattern
- If doesn’t match any path, return empty list
Notice:
glob.globin standard library returns [‘a/’] instead of empty list when pathname is like a/**, recursive is True and directory ‘a’ doesn’t exist. fs_glob behaves likeglob.globin standard library under such circumstance.
- No guarantee that each path in result is different, which means:
Assume there exists a path /a/b/c/b/d.txt use path pattern like /**/b/**/*.txt to glob, the path above will be returned twice
** will match any matched file, directory, symlink and ‘’ by default, when recursive is True
fs_glob returns same as glob.glob(pathname, recursive=True) in acsending alphabetical order.
Hidden files (filename stars with ‘.’) will not be found in the result
- Parameters
path – Given path
pattern – Glob the given relative pattern in the directory represented by this path
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
- Returns
A list contains paths match pathname
-
megfile.sftp_iglob(path: Union[str, os.PathLike], recursive: bool = True, missing_ok: bool = True) → Iterator[str][source] Return path iterator in ascending alphabetical order, in which path matches glob pattern
- If doesn’t match any path, return empty list
Notice:
glob.globin standard library returns [‘a/’] instead of empty list when pathname is like a/**, recursive is True and directory ‘a’ doesn’t exist. fs_glob behaves likeglob.globin standard library under such circumstance.
- No guarantee that each path in result is different, which means:
Assume there exists a path /a/b/c/b/d.txt use path pattern like /**/b/**/*.txt to glob, the path above will be returned twice
** will match any matched file, directory, symlink and ‘’ by default, when recursive is True
fs_glob returns same as glob.glob(pathname, recursive=True) in acsending alphabetical order.
Hidden files (filename stars with ‘.’) will not be found in the result
- Parameters
path – Given path
pattern – Glob the given relative pattern in the directory represented by this path
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
- Returns
An iterator contains paths match pathname
-
megfile.sftp_glob_stat(path: Union[str, os.PathLike], recursive: bool = True, missing_ok: bool = True) → Iterator[megfile.pathlike.FileEntry][source] Return a list contains tuples of path and file stat, in ascending alphabetical order, in which path matches glob pattern
- If doesn’t match any path, return empty list
Notice:
glob.globin standard library returns [‘a/’] instead of empty list when pathname is like a/**, recursive is True and directory ‘a’ doesn’t exist. sftp_glob behaves likeglob.globin standard library under such circumstance.
- No guarantee that each path in result is different, which means:
Assume there exists a path /a/b/c/b/d.txt use path pattern like /**/b/**/*.txt to glob, the path above will be returned twice
** will match any matched file, directory, symlink and ‘’ by default, when recursive is True
fs_glob returns same as glob.glob(pathname, recursive=True) in acsending alphabetical order.
Hidden files (filename stars with ‘.’) will not be found in the result
- Parameters
path – Given path
pattern – Glob the given relative pattern in the directory represented by this path
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
- Returns
A list contains tuples of path and file stat, in which paths match pathname
-
megfile.sftp_resolve(path: Union[str, os.PathLike], strict=False) → str[source] Equal to fs_realpath
- Parameters
path – Given path
strict – Ignore this parameter, just for compatibility
- Returns
Return the canonical path of the specified filename, eliminating any symbolic links encountered in the path.
- Return type
-
megfile.sftp_isdir(path: Union[str, os.PathLike], followlinks: bool = False) → bool[source] Test if a path is directory
Note
The difference between this function and
os.path.isdiris that this function regard symlink as file- Parameters
path – Given path
followlinks – False if regard symlink as file, else True
- Returns
True if the path is a directory, else False
-
megfile.sftp_exists(path: Union[str, os.PathLike], followlinks: bool = False) → bool[source] Test if the path exists
- Parameters
path – Given path
followlinks – False if regard symlink as file, else True
- Returns
True if the path exists, else False
-
megfile.sftp_scandir(path: Union[str, os.PathLike]) → Iterator[megfile.pathlike.FileEntry][source] Get all content of given file path.
- Parameters
path – Given path
- Returns
An iterator contains all contents have prefix path
-
megfile.sftp_getmtime(path: Union[str, os.PathLike], follow_symlinks: bool = False) → float[source] Get last-modified time of the file on the given path (in Unix timestamp format). If the path is an existent directory, return the latest modified time of all file in it.
- Parameters
path – Given path
- Returns
last-modified time
-
megfile.sftp_getsize(path: Union[str, os.PathLike], follow_symlinks: bool = False) → int[source] Get file size on the given file path (in bytes). If the path in a directory, return the sum of all file size in it, including file in subdirectories (if exist). The result excludes the size of directory itself. In other words, return 0 Byte on an empty directory path.
- Parameters
path – Given path
- Returns
File size
-
megfile.sftp_isfile(path: Union[str, os.PathLike], followlinks: bool = False) → bool[source] Test if a path is file
Note
The difference between this function and
os.path.isfileis that this function regard symlink as file- Parameters
path – Given path
followlinks – False if regard symlink as file, else True
- Returns
True if the path is a file, else False
-
megfile.sftp_listdir(path: Union[str, os.PathLike]) → List[str][source] Get all contents of given sftp path. The result is in acsending alphabetical order.
- Parameters
path – Given path
- Returns
All contents have in the path in acsending alphabetical order
-
megfile.sftp_load_from(path: Union[str, os.PathLike]) → BinaryIO[source] Read all content on specified path and write into memory
User should close the BinaryIO manually
- Parameters
path – Given path
- Returns
Binary stream
-
megfile.sftp_makedirs(path: Union[str, os.PathLike], mode=511, parents: bool = False, exist_ok: bool = False)[source] make a directory on sftp, including parent directory
If there exists a file on the path, raise FileExistsError
- Parameters
path – Given path
mode – If mode is given, it is combined with the process’ umask value to determine the file mode and access flags.
parents – If parents is true, any missing parents of this path are created as needed;
If parents is false (the default), a missing parent raises FileNotFoundError. :param exist_ok: If False and target directory exists, raise FileExistsError :raises: FileExistsError
-
megfile.sftp_realpath(path: Union[str, os.PathLike]) → str[source] Return the real path of given path
- Parameters
path – Given path
- Returns
Real path of given path
-
megfile.sftp_rename(src_path: Union[str, os.PathLike], dst_path: Union[str, os.PathLike]) → megfile.sftp_path.SftpPath[source] rename file on sftp
- Parameters
src_path – Given path
dst_path – Given destination path
-
megfile.sftp_move(src_path: Union[str, os.PathLike], dst_path: Union[str, os.PathLike]) → megfile.sftp_path.SftpPath[source] move file on sftp
- Parameters
src_path – Given path
dst_path – Given destination path
-
megfile.sftp_remove(path: Union[str, os.PathLike], missing_ok: bool = False) → None[source] Remove the file or directory on sftp
- Parameters
path – Given path
missing_ok – if False and target file/directory not exists, raise FileNotFoundError
-
megfile.sftp_scan(path: Union[str, os.PathLike], missing_ok: bool = True, followlinks: bool = False) → Iterator[str][source] Iteratively traverse only files in given directory, in alphabetical order. Every iteration on generator yields a path string.
If path is a file path, yields the file only If path is a non-existent path, return an empty generator If path is a bucket path, return all file paths in the bucket
- Parameters
path – Given path
missing_ok – If False and there’s no file in the directory, raise FileNotFoundError
- Returns
A file path generator
-
megfile.sftp_scan_stat(path: Union[str, os.PathLike], missing_ok: bool = True, followlinks: bool = False) → Iterator[megfile.pathlike.FileEntry][source] Iteratively traverse only files in given directory, in alphabetical order. Every iteration on generator yields a tuple of path string and file stat
- Parameters
path – Given path
missing_ok – If False and there’s no file in the directory, raise FileNotFoundError
- Returns
A file path generator
-
megfile.sftp_stat(path: Union[str, os.PathLike], follow_symlinks=True) → megfile.pathlike.StatResult[source] Get StatResult of file on sftp, including file size and mtime, referring to fs_getsize and fs_getmtime
- Parameters
path – Given path
- Returns
StatResult
-
megfile.sftp_lstat(path: Union[str, os.PathLike]) → megfile.pathlike.StatResult[source] Get StatResult of file on sftp, including file size and mtime, referring to fs_getsize and fs_getmtime
- Parameters
path – Given path
- Returns
StatResult
-
megfile.sftp_unlink(path: Union[str, os.PathLike], missing_ok: bool = False) → None[source] Remove the file on sftp
- Parameters
path – Given path
missing_ok – if False and target file not exists, raise FileNotFoundError
-
megfile.sftp_walk(path: Union[str, os.PathLike], followlinks: bool = False) → Iterator[Tuple[str, List[str], List[str]]][source] Generate the file names in a directory tree by walking the tree top-down. For each directory in the tree rooted at directory path (including path itself), it yields a 3-tuple (root, dirs, files).
root: a string of current path dirs: name list of subdirectories (excluding ‘.’ and ‘..’ if they exist) in ‘root’. The list is sorted by ascending alphabetical order files: name list of non-directory files (link is regarded as file) in ‘root’. The list is sorted by ascending alphabetical order
If path not exists, or path is a file (link is regarded as file), return an empty generator
Note
Be aware that setting
followlinksto True can lead to infinite recursion if a link points to a parent directory of itself. fs_walk() does not keep track of the directories it visited already.- Parameters
path – Given path
followlinks – False if regard symlink as file, else True
- Returns
A 3-tuple generator
-
megfile.sftp_path_join(path: Union[str, os.PathLike], *other_paths: Union[str, os.PathLike]) → str[source] Concat 2 or more path to a complete path
- Parameters
path – Given path
other_paths – Paths to be concatenated
- Returns
Concatenated complete path
Note
The difference between this function and
os.path.joinis that this function ignores left side slash (which indicates absolute path) inother_pathsand will directly concat. e.g. os.path.join(‘/path’, ‘to’, ‘/file’) => ‘/file’, but sftp_path_join(‘/path’, ‘to’, ‘/file’) => ‘/path/to/file’
-
megfile.sftp_getmd5(path: Union[str, os.PathLike], recalculate: bool = False, followlinks: bool = True)[source] Calculate the md5 value of the file
- Parameters
path – Given path
recalculate – Ignore this parameter, just for compatibility
followlinks – Ignore this parameter, just for compatibility
returns: md5 of file
-
megfile.sftp_symlink(src_path: Union[str, os.PathLike], dst_path: Union[str, os.PathLike]) → None[source] Create a symbolic link pointing to src_path named dst_path.
- Parameters
src_path – Given path
dst_path – Desination path
-
megfile.sftp_islink(path: Union[str, os.PathLike]) → bool[source] Test whether a path is a symbolic link
- Parameters
path – Given path
- Returns
If path is a symbolic link return True, else False
- Return type
bool
-
megfile.sftp_save_as(file_object: BinaryIO, path: Union[str, os.PathLike])[source] Write the opened binary stream to path If parent directory of path doesn’t exist, it will be created.
- Parameters
path – Given path
file_object – stream to be read
-
megfile.sftp_open(path: Union[str, os.PathLike], mode: str = 'r', buffering=-1, encoding: Optional[str] = None, errors: Optional[str] = None, **kwargs) → IO[AnyStr][source] Open a file on the path.
- Parameters
path – Given path
mode – Mode to open file
buffering – buffering is an optional integer used to set the buffering policy.
encoding – encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode.
errors – errors is an optional string that specifies how encoding and decoding errors are to be handled—this cannot be used in binary mode.
- Returns
File-Like object
-
megfile.sftp_chmod(path: Union[str, os.PathLike], mode: int, follow_symlinks: bool = True)[source] Change the file mode and permissions, like os.chmod().
- Parameters
path – Given path
mode – the file mode you want to change
followlinks – Ignore this parameter, just for compatibility
-
megfile.sftp_rmdir(path: Union[str, os.PathLike])[source] Remove this directory. The directory must be empty.
-
megfile.sftp_copy(src_path: Union[str, os.PathLike], dst_path: Union[str, os.PathLike], callback: Optional[Callable[int, None]] = None, followlinks: bool = False)[source] Copy the file to the given destination path.
- Parameters
src_path – Given path
dst_path – The destination path to copy the file to.
callback – An optional callback function that takes an integer parameter and is called periodically during the copy operation to report the number of bytes copied.
followlinks – Whether to follow symbolic links when copying directories.
- Raises
IsADirectoryError – If the source is a directory.
OSError – If there is an error copying the file.
-
megfile.sftp_sync(src_path: Union[str, os.PathLike], dst_path: Union[str, os.PathLike], followlinks: bool = False, force: bool = False)[source] Copy file/directory on src_url to dst_url
- Parameters
src_path – Given path
dst_url – Given destination path
followlinks – False if regard symlink as file, else True
force – Sync file forcely, do not ignore same files
-
megfile.sftp_concat(src_paths: List[Union[str, os.PathLike]], dst_path: Union[str, os.PathLike]) → None[source] Concatenate sftp files to one file.
- Parameters
src_paths – Given source paths
dst_path – Given destination path
-
class
megfile.S3Path(path: Union[str, os.PathLike], *other_paths: Union[str, os.PathLike])[source] Bases:
megfile.pathlike.URIPath-
absolute() → megfile.s3_path.S3Path[source] Make the path absolute, without normalization or resolving symlinks. Returns a new path object
-
access(mode: megfile.pathlike.Access = <Access.READ: 1>, followlinks: bool = False) → bool[source] Test if path has access permission described by mode
- Parameters
mode – access mode
- Returns
bool, if the bucket of s3_url has read/write access.
-
copy(dst_url: Union[str, os.PathLike], followlinks: bool = False, callback: Optional[Callable[int, None]] = None) → None[source] File copy on S3 Copy content of file on src_path to dst_path. It’s caller’s responsebility to ensure the s3_isfile(src_url) == True
- Parameters
dst_path – Target file path
callback – Called periodically during copy, and the input parameter is the data size (in bytes) of copy since the last call
-
cwd() → megfile.s3_path.S3Path[source] Return current working directory
returns: Current working directory
-
exists(followlinks: bool = False) → bool[source] Test if s3_url exists
If the bucket of s3_url are not permitted to read, return False
- Returns
True if s3_url eixsts, else False
-
getmtime(follow_symlinks: bool = False) → float[source] Get last-modified time of the file on the given s3_url path (in Unix timestamp format). If the path is an existent directory, return the latest modified time of all file in it. The mtime of empty directory is 1970-01-01 00:00:00
If s3_url is not an existent path, which means s3_exist(s3_url) returns False, then raise S3FileNotFoundError
- Returns
Last-modified time
- Raises
S3FileNotFoundError, UnsupportedError
-
getsize(follow_symlinks: bool = False) → int[source] Get file size on the given s3_url path (in bytes). If the path in a directory, return the sum of all file size in it, including file in subdirectories (if exist). The result excludes the size of directory itself. In other words, return 0 Byte on an empty directory path.
If s3_url is not an existent path, which means s3_exist(s3_url) returns False, then raise S3FileNotFoundError
- Returns
File size
- Raises
S3FileNotFoundError, UnsupportedError
-
glob(pattern, recursive: bool = True, missing_ok: bool = True, followlinks: bool = False) → List[megfile.s3_path.S3Path][source] Return s3 path list in ascending alphabetical order, in which path matches glob pattern Notes: Only glob in bucket. If trying to match bucket with wildcard characters, raise UnsupportedError
- Parameters
pattern – Glob the given relative pattern in the directory represented by this path
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
- Raises
UnsupportedError, when bucket part contains wildcard characters
- Returns
A list contains paths match s3_pathname
-
glob_stat(pattern, recursive: bool = True, missing_ok: bool = True, followlinks: bool = False) → Iterator[megfile.pathlike.FileEntry][source] Return a generator contains tuples of path and file stat, in ascending alphabetical order, in which path matches glob pattern Notes: Only glob in bucket. If trying to match bucket with wildcard characters, raise UnsupportedError
- Parameters
pattern – Glob the given relative pattern in the directory represented by this path
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
- Raises
UnsupportedError, when bucket part contains wildcard characters
- Returns
A generator contains tuples of path and file stat, in which paths match s3_pathname
-
hasbucket() → bool[source] Test if the bucket of s3_url exists
- Returns
True if bucket of s3_url eixsts, else False
-
iglob(pattern, recursive: bool = True, missing_ok: bool = True, followlinks: bool = False) → Iterator[megfile.s3_path.S3Path][source] Return s3 path iterator in ascending alphabetical order, in which path matches glob pattern Notes: Only glob in bucket. If trying to match bucket with wildcard characters, raise UnsupportedError
- Parameters
pattern – Glob the given relative pattern in the directory represented by this path
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
- Raises
UnsupportedError, when bucket part contains wildcard characters
- Returns
An iterator contains paths match s3_pathname
-
is_dir(followlinks: bool = False) → bool[source] Test if an s3 url is directory Specific procedures are as follows: If there exists a suffix, of which
os.path.join(s3_url, suffix)is a file If the url is empty bucket or s3://- Parameters
followlinks – whether followlinks is True or False, result is the same. Because s3 symlink not support dir.
- Returns
True if path is s3 directory, else False
-
is_file(followlinks: bool = False) → bool[source] Test if an s3_url is file
- Returns
True if path is s3 file, else False
-
is_symlink() → bool[source] Test whether a path is link
- Returns
True if a path is link, else False
- Raises
S3NotALinkError
-
iterdir(followlinks: bool = False) → Iterator[megfile.s3_path.S3Path][source] Get all contents of given s3_url. The result is in acsending alphabetical order.
- Returns
All contents have prefix of s3_url in acsending alphabetical order
- Raises
S3FileNotFoundError, S3NotADirectoryError
-
listdir(followlinks: bool = False) → List[str][source] Get all contents of given s3_url. The result is in acsending alphabetical order.
- Returns
All contents have prefix of s3_url in acsending alphabetical order
- Raises
S3FileNotFoundError, S3NotADirectoryError
-
load(followlinks: bool = False) → BinaryIO[source] Read all content in binary on specified path and write into memory
User should close the BinaryIO manually
- Returns
BinaryIO
-
lstat() → megfile.pathlike.StatResult[source] Like Path.stat() but, if the path points to a symbolic link, return the symbolic link’s information rather than its target’s.
-
md5(recalculate: bool = False, followlinks: bool = False) → str[source] Get md5 meta info in files that uploaded/copied via megfile
If meta info is lost or non-existent, return None
- Parameters
recalculate – calculate md5 in real-time or return s3 etag
followlinks – If is True, calculate md5 for real file
- Returns
md5 meta info
-
mkdir(mode=511, parents: bool = False, exist_ok: bool = False)[source] Create an s3 directory. Purely creating directory is invalid because it’s unavailable on OSS. This function is to test the target bucket have WRITE access.
- Parameters
mode – mode is ignored, only be compatible with pathlib.Path
parents – parents is ignored, only be compatible with pathlib.Path
exist_ok – If False and target directory exists, raise S3FileExistsError
- Raises
S3BucketNotFoundError, S3FileExistsError
-
move(dst_url: Union[str, os.PathLike]) → None[source] Move file/directory path from src_url to dst_url
- Parameters
dst_url – Given destination path
-
open(mode: str = 'r', *, encoding: Optional[str] = None, errors: Optional[str] = None, s3_open_func: Callable[[str, str], BinaryIO] = <function s3_buffered_open>, **kwargs) → IO[AnyStr][source] Open the file with mode.
-
path_with_protocol[source] Return path with protocol, like file:///root, s3://bucket/key
-
path_without_protocol[source] Return path without protocol, example: if path is s3://bucket/key, return bucket/key
-
protocol= 's3'
-
readlink() → megfile.s3_path.S3Path[source] Return a S3Path instance representing the path to which the symbolic link points.
- Returns
Return a S3Path instance representing the path to which the symbolic link points.
- Raises
S3NameTooLongError, S3BucketNotFoundError, S3IsADirectoryError, S3NotALinkError
-
remove(missing_ok: bool = False) → None[source] Remove the file or directory on s3, s3:// and s3://bucket are not permitted to remove
- Parameters
missing_ok – if False and target file/directory not exists, raise S3FileNotFoundError
- Raises
S3PermissionError, S3FileNotFoundError, UnsupportedError
-
rename(dst_path: Union[str, os.PathLike]) → megfile.s3_path.S3Path[source] Move s3 file path from src_url to dst_url
- Parameters
dst_path – Given destination path
-
save(file_object: BinaryIO)[source] Write the opened binary stream to specified path, but the stream won’t be closed
- Parameters
file_object – Stream to be read
-
scan(missing_ok: bool = True, followlinks: bool = False) → Iterator[str][source] Iteratively traverse only files in given s3 directory, in alphabetical order. Every iteration on generator yields a path string.
If s3_url is a file path, yields the file only If s3_url is a non-existent path, return an empty generator If s3_url is a bucket path, return all file paths in the bucket If s3_url is an empty bucket, return an empty generator If s3_url doesn’t contain any bucket, which is s3_url == ‘s3://’, raise UnsupportedError. walk() on complete s3 is not supported in megfile
- Parameters
missing_ok – If False and there’s no file in the directory, raise FileNotFoundError
- Raises
UnsupportedError
- Returns
A file path generator
-
scan_stat(missing_ok: bool = True, followlinks: bool = False) → Iterator[megfile.pathlike.FileEntry][source] Iteratively traverse only files in given directory, in alphabetical order. Every iteration on generator yields a tuple of path string and file stat
- Parameters
missing_ok – If False and there’s no file in the directory, raise FileNotFoundError
- Raises
UnsupportedError
- Returns
A file path generator
-
scandir(followlinks: bool = False) → Iterator[megfile.pathlike.FileEntry][source] Get all contents of given s3_url, the order of result is not guaranteed.
- Returns
All contents have prefix of s3_url
- Raises
S3FileNotFoundError, S3NotADirectoryError
-
stat(follow_symlinks=True) → megfile.pathlike.StatResult[source] Get StatResult of s3_url file, including file size and mtime, referring to s3_getsize and s3_getmtime
If s3_url is not an existent path, which means s3_exist(s3_url) returns False, then raise S3FileNotFoundError If attempt to get StatResult of complete s3, such as s3_dir_url == ‘s3://’, raise S3BucketNotFoundError
- Returns
StatResult
- Raises
S3FileNotFoundError, S3BucketNotFoundError
-
symlink(dst_path: Union[str, os.PathLike]) → None[source] Create a symbolic link pointing to src_path named dst_path.
- Parameters
dst_path – Desination path
- Raises
S3NameTooLongError, S3BucketNotFoundError, S3IsADirectoryError
-
sync(dst_url: Union[str, os.PathLike], followlinks: bool = False, force: bool = False) → None[source] Copy file/directory on src_url to dst_url
- Parameters
dst_url – Given destination path
followlinks – False if regard symlink as file, else True
force – Sync file forcely, do not ignore same files
-
unlink(missing_ok: bool = False) → None[source] Remove the file on s3
- Parameters
missing_ok – if False and target file not exists, raise S3FileNotFoundError
- Raises
S3PermissionError, S3FileNotFoundError, S3IsADirectoryError
-
walk(followlinks: bool = False) → Iterator[Tuple[str, List[str], List[str]]][source] Iteratively traverse the given s3 directory, in top-bottom order. In other words, firstly traverse parent directory, if subdirectories exist, traverse the subdirectories in alphabetical order. Every iteration on generator yields a 3-tuple: (root, dirs, files)
root: Current s3 path;
dirs: Name list of subdirectories in current directory. The list is sorted by name in ascending alphabetical order;
files: Name list of files in current directory. The list is sorted by name in ascending alphabetical order;
If s3_url is a file path, return an empty generator If s3_url is a non-existent path, return an empty generator If s3_url is a bucket path, bucket will be the top directory, and will be returned at first iteration of generator If s3_url is an empty bucket, only yield one 3-tuple (notes: s3 doesn’t have empty directory) If s3_url doesn’t contain any bucket, which is s3_url == ‘s3://’, raise UnsupportedError. walk() on complete s3 is not supported in megfile
- Parameters
followlinks – whether followlinks is True or False, result is the same. Because s3 symlink not support dir.
- Raises
UnsupportedError
- Returns
A 3-tuple generator
-
-
class
megfile.FSPath(path: Union[PathLike, int], *other_paths: Union[str, os.PathLike])[source] Bases:
megfile.pathlike.URIPathfile protocol e.g. file:///data/test/ or /data/test
-
absolute() → megfile.fs_path.FSPath[source] Make the path absolute, without normalization or resolving symlinks. Returns a new path object
-
access(mode: megfile.pathlike.Access = <Access.READ: 1>) → bool[source] Test if path has access permission described by mode Using
os.access- Parameters
mode – access mode
- Returns
Access: Enum, the read/write access that path has.
-
chmod(mode: int, *, follow_symlinks: bool = True)[source] Change the file mode and permissions, like os.chmod().
This method normally follows symlinks. Some Unix flavours support changing permissions on the symlink itself; on these platforms you may add the argument follow_symlinks=False, or use lchmod().
-
copy(dst_path: Union[str, os.PathLike], callback: Optional[Callable[int, None]] = None, followlinks: bool = False)[source] File copy on file system Copy content (excluding meta date) of file on src_path to dst_path. dst_path must be a complete file name
Note
The differences between this function and shutil.copyfile are:
If parent directory of dst_path doesn’t exist, create it
Allow callback function, None by default. callback: Optional[Callable[[int], None]],
the int data is means the size (in bytes) of the written data that is passed periodically
This function is thread-unsafe
- Parameters
dst_path – Target file path
callback – Called periodically during copy, and the input parameter is the data size (in bytes) of copy since the last call
followlinks – False if regard symlink as file, else True
-
cwd() → megfile.fs_path.FSPath[source] Return current working directory
returns: Current working directory
-
exists(followlinks: bool = False) → bool[source] Test if the path exists
Note
The difference between this function and
os.path.existsis that this function regard symlink as file. In other words, this function is equal toos.path.lexists- Parameters
followlinks – False if regard symlink as file, else True
- Returns
True if the path exists, else False
-
getmtime(follow_symlinks: bool = False) → float[source] Get last-modified time of the file on the given path (in Unix timestamp format). If the path is an existent directory, return the latest modified time of all file in it.
- Returns
last-modified time
-
getsize(follow_symlinks: bool = False) → int[source] Get file size on the given file path (in bytes). If the path in a directory, return the sum of all file size in it, including file in subdirectories (if exist). The result excludes the size of directory itself. In other words, return 0 Byte on an empty directory path.
- Returns
File size
-
glob(pattern, recursive: bool = True, missing_ok: bool = True) → List[megfile.fs_path.FSPath][source] Return path list in ascending alphabetical order, in which path matches glob pattern
- If doesn’t match any path, return empty list
Notice:
glob.globin standard library returns [‘a/’] instead of empty list when pathname is like a/**, recursive is True and directory ‘a’ doesn’t exist. fs_glob behaves likeglob.globin standard library under such circumstance.
- No guarantee that each path in result is different, which means:
Assume there exists a path /a/b/c/b/d.txt use path pattern like /**/b/**/*.txt to glob, the path above will be returned twice
** will match any matched file, directory, symlink and ‘’ by default, when recursive is True
fs_glob returns same as glob.glob(pathname, recursive=True) in acsending alphabetical order.
Hidden files (filename stars with ‘.’) will not be found in the result
- Parameters
pattern – Glob the given relative pattern in the directory represented by this path
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
- Returns
A list contains paths match pathname
-
glob_stat(pattern, recursive: bool = True, missing_ok: bool = True) → Iterator[megfile.pathlike.FileEntry][source] Return a list contains tuples of path and file stat, in ascending alphabetical order, in which path matches glob pattern
- If doesn’t match any path, return empty list
Notice:
glob.globin standard library returns [‘a/’] instead of empty list when pathname is like a/**, recursive is True and directory ‘a’ doesn’t exist. fs_glob behaves likeglob.globin standard library under such circumstance.
- No guarantee that each path in result is different, which means:
Assume there exists a path /a/b/c/b/d.txt use path pattern like /**/b/**/*.txt to glob, the path above will be returned twice
** will match any matched file, directory, symlink and ‘’ by default, when recursive is True
fs_glob returns same as glob.glob(pathname, recursive=True) in acsending alphabetical order.
Hidden files (filename stars with ‘.’) will not be found in the result
- Parameters
pattern – Glob the given relative pattern in the directory represented by this path
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
- Returns
A list contains tuples of path and file stat, in which paths match pathname
-
group() → str[source] Return the name of the group owning the file. KeyError is raised if the file’s gid isn’t found in the system database.
-
iglob(pattern, recursive: bool = True, missing_ok: bool = True) → Iterator[megfile.fs_path.FSPath][source] Return path iterator in ascending alphabetical order, in which path matches glob pattern
- If doesn’t match any path, return empty list
Notice:
glob.globin standard library returns [‘a/’] instead of empty list when pathname is like a/**, recursive is True and directory ‘a’ doesn’t exist. fs_glob behaves likeglob.globin standard library under such circumstance.
- No guarantee that each path in result is different, which means:
Assume there exists a path /a/b/c/b/d.txt use path pattern like /**/b/**/*.txt to glob, the path above will be returned twice
** will match any matched file, directory, symlink and ‘’ by default, when recursive is True
fs_glob returns same as glob.glob(pathname, recursive=True) in acsending alphabetical order.
Hidden files (filename stars with ‘.’) will not be found in the result
- Parameters
pattern – Glob the given relative pattern in the directory represented by this path
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
- Returns
An iterator contains paths match pathname
-
is_absolute() → bool[source] Test whether a path is absolute
- Returns
True if a path is absolute, else False
-
is_block_device() → bool[source] Return True if the path points to a block device (or a symbolic link pointing to a block device), False if it points to another kind of file.
False is also returned if the path doesn’t exist or is a broken symlink; other errors (such as permission errors) are propagated.
-
is_char_device() → bool[source] Return True if the path points to a character device (or a symbolic link pointing to a character device), False if it points to another kind of file.
False is also returned if the path doesn’t exist or is a broken symlink; other errors (such as permission errors) are propagated.
-
is_dir(followlinks: bool = False) → bool[source] Test if a path is directory
Note
The difference between this function and
os.path.isdiris that this function regard symlink as file- Parameters
followlinks – False if regard symlink as file, else True
- Returns
True if the path is a directory, else False
-
is_fifo() → bool[source] Return True if the path points to a FIFO (or a symbolic link pointing to a FIFO), False if it points to another kind of file.
False is also returned if the path doesn’t exist or is a broken symlink; other errors (such as permission errors) are propagated.
-
is_file(followlinks: bool = False) → bool[source] Test if a path is file
Note
The difference between this function and
os.path.isfileis that this function regard symlink as file- Parameters
followlinks – False if regard symlink as file, else True
- Returns
True if the path is a file, else False
-
is_mount() → bool[source] Test whether a path is a mount point
- Returns
True if a path is a mount point, else False
-
is_socket() → bool[source] Return True if the path points to a Unix socket (or a symbolic link pointing to a Unix socket), False if it points to another kind of file.
False is also returned if the path doesn’t exist or is a broken symlink; other errors (such as permission errors) are propagated.
-
is_symlink() → bool[source] Test whether a path is a symbolic link
- Returns
If path is a symbolic link return True, else False
- Return type
bool
-
iterdir() → Iterator[megfile.fs_path.FSPath][source] Get all contents of given fs path. The result is in acsending alphabetical order.
- Returns
All contents have in the path in acsending alphabetical order
-
joinpath(*other_paths: Union[str, os.PathLike]) → megfile.fs_path.FSPath[source] Calling this method is equivalent to combining the path with each of the other arguments in turn
-
listdir() → List[str][source] Get all contents of given fs path. The result is in acsending alphabetical order.
- Returns
All contents have in the path in acsending alphabetical order
-
load() → BinaryIO[source] Read all content on specified path and write into memory
User should close the BinaryIO manually
- Returns
Binary stream
-
lstat() → megfile.pathlike.StatResult[source] Like Path.stat() but, if the path points to a symbolic link, return the symbolic link’s information rather than its target’s.
- Returns
StatResult
-
md5(recalculate: bool = False, followlinks: bool = True)[source] Calculate the md5 value of the file
- Parameters
recalculate – Ignore this parameter, just for compatibility
followlinks – Ignore this parameter, just for compatibility
returns: md5 of file
-
mkdir(mode=511, parents: bool = False, exist_ok: bool = False)[source] make a directory on fs, including parent directory
If there exists a file on the path, raise FileExistsError
- Parameters
mode – If mode is given, it is combined with the process’ umask value to determine the file mode and access flags.
parents – If parents is true, any missing parents of this path are created as needed;
If parents is false (the default), a missing parent raises FileNotFoundError. :param exist_ok: If False and target directory exists, raise FileExistsError :raises: FileExistsError
-
open(mode: str = 'r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, **kwargs) → IO[AnyStr][source] Open the file with mode.
-
owner() → str[source] Return the name of the user owning the file. KeyError is raised if the file’s uid isn’t found in the system database.
-
property
path_with_protocol Return path with protocol, like file:///root, s3://bucket/key
-
protocol= 'file'
-
readlink() → megfile.fs_path.FSPath[source] Return a FSPath instance representing the path to which the symbolic link points. :returns: Return a FSPath instance representing the path to which the symbolic link points.
-
relpath(start: Optional[str] = None) → str[source] Return the relative path of given path
- Parameters
start – Given start directory
- Returns
Relative path from start
-
remove(missing_ok: bool = False) → None[source] Remove the file or directory on fs
- Parameters
missing_ok – if False and target file/directory not exists, raise FileNotFoundError
-
rename(dst_path: Union[str, os.PathLike]) → megfile.fs_path.FSPath[source] rename file on fs
- Parameters
dst_path – Given destination path
-
replace(dst_path: Union[str, os.PathLike]) → megfile.fs_path.FSPath[source] move file on fs
- Parameters
dst_path – Given destination path
-
resolve(strict=False) → megfile.fs_path.FSPath[source] Equal to fs_realpath
- Returns
Return the canonical path of the specified filename, eliminating any symbolic links encountered in the path.
- Return type
-
save(file_object: BinaryIO)[source] Write the opened binary stream to path If parent directory of path doesn’t exist, it will be created.
- Parameters
file_object – stream to be read
-
scan(missing_ok: bool = True, followlinks: bool = False) → Iterator[str][source] Iteratively traverse only files in given directory, in alphabetical order. Every iteration on generator yields a path string.
If path is a file path, yields the file only If path is a non-existent path, return an empty generator If path is a bucket path, return all file paths in the bucket
- Parameters
missing_ok – If False and there’s no file in the directory, raise FileNotFoundError
- Returns
A file path generator
-
scan_stat(missing_ok: bool = True, followlinks: bool = False) → Iterator[megfile.pathlike.FileEntry][source] Iteratively traverse only files in given directory, in alphabetical order. Every iteration on generator yields a tuple of path string and file stat
- Parameters
missing_ok – If False and there’s no file in the directory, raise FileNotFoundError
- Returns
A file path generator
-
scandir() → Iterator[megfile.pathlike.FileEntry][source] Get all content of given file path.
- Returns
An iterator contains all contents have prefix path
-
stat(follow_symlinks=True) → megfile.pathlike.StatResult[source] Get StatResult of file on fs, including file size and mtime, referring to fs_getsize and fs_getmtime
- Returns
StatResult
-
symlink(dst_path: Union[str, os.PathLike]) → None[source] Create a symbolic link pointing to src_path named dst_path.
- Parameters
dst_path – Desination path
-
sync(dst_path: Union[str, os.PathLike], followlinks: bool = False, force: bool = False) → None[source] Force write of everything to disk.
- Parameters
dst_path – Target file path
followlinks – False if regard symlink as file, else True
force – Sync file forcely, do not ignore same files
-
unlink(missing_ok: bool = False) → None[source] Remove the file on fs
- Parameters
missing_ok – if False and target file not exists, raise FileNotFoundError
-
utime(atime: Union[float, int], mtime: Union[float, int])[source] Set the access and modified times of the file specified by path.
- Parameters
atime – a float or int representing the access time to be set. If it is set to None, the access time is set to the current time.
mtime – a float or int representing the modified time to be set. If it is set to None, the modified time is set to the current time.
- Returns
None
-
walk(followlinks: bool = False) → Iterator[Tuple[str, List[str], List[str]]][source] Generate the file names in a directory tree by walking the tree top-down. For each directory in the tree rooted at directory path (including path itself), it yields a 3-tuple (root, dirs, files).
root: a string of current path dirs: name list of subdirectories (excluding ‘.’ and ‘..’ if they exist) in ‘root’. The list is sorted by ascending alphabetical order files: name list of non-directory files (link is regarded as file) in ‘root’. The list is sorted by ascending alphabetical order
If path not exists, or path is a file (link is regarded as file), return an empty generator
Note
Be aware that setting
followlinksto True can lead to infinite recursion if a link points to a parent directory of itself. fs_walk() does not keep track of the directories it visited already.- Parameters
followlinks – False if regard symlink as file, else True
- Returns
A 3-tuple generator
-
-
class
megfile.HttpPath(path: Union[str, os.PathLike], *other_paths: Union[str, os.PathLike])[source] Bases:
megfile.pathlike.URIPath-
exists(followlinks: bool = False) → bool[source] Test if http path exists
- Parameters
followlinks (bool, optional) – ignore this parameter, just for compatibility
- Returns
return True if exists
- Return type
bool
-
getmtime(follow_symlinks: bool = False) → float[source] Get Last-Modified time of the http request on the given http_url path.
If http response header don’t support Last-Modified, will return None
- Parameters
follow_symlinks – Ignore this parameter, just for compatibility
- Returns
Last-Modified time (in Unix timestamp format)
- Raises
HttpPermissionError, HttpFileNotFoundError
-
getsize(follow_symlinks: bool = False) → int[source] Get file size on the given http_url path.
If http response header don’t support Content-Length, will return None
- Parameters
follow_symlinks – Ignore this parameter, just for compatibility
- Returns
File size (in bytes)
- Raises
HttpPermissionError, HttpFileNotFoundError
-
open(mode: str = 'rb', *, max_concurrency: Optional[int] = None, max_buffer_size: int = 134217728, forward_ratio: Optional[float] = None, block_size: int = 8388608, **kwargs) → Union[_io.BufferedReader, megfile.lib.http_prefetch_reader.HttpPrefetchReader][source] Open a BytesIO to read binary data of given http(s) url
Note
Essentially, it reads data of http(s) url to memory by requests, and then return BytesIO to user.
- Parameters
mode – Only supports ‘rb’ mode now
encoding – encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode.
errors – errors is an optional string that specifies how encoding and decoding errors are to be handled—this cannot be used in binary mode.
max_concurrency – Max download thread number, None by default
max_buffer_size – Max cached buffer size in memory, 128MB by default
block_size – Size of single block, 8MB by default. Each block will be uploaded or downloaded by single thread.
- Returns
BytesIO initialized with http(s) data
-
protocol= 'http'
-
stat(follow_symlinks=True) → megfile.pathlike.StatResult[source] Get StatResult of http_url response, including size and mtime, referring to http_getsize and http_getmtime
- Parameters
follow_symlinks – Ignore this parameter, just for compatibility
- Returns
StatResult
- Raises
HttpPermissionError, HttpFileNotFoundError
-
-
class
megfile.HttpsPath(path: Union[str, os.PathLike], *other_paths: Union[str, os.PathLike])[source] Bases:
megfile.http_path.HttpPath-
protocol= 'https'
-
-
class
megfile.StdioPath(path: Union[str, os.PathLike])[source] Bases:
megfile.pathlike.BaseURIPath-
open(mode: str = 'rb', encoding: Optional[str] = None, errors: Optional[str] = None, **kwargs) → IO[AnyStr][source] Used to read or write stdio
Note
Essentially invoke sys.stdin.buffer | sys.stdout.buffer to read or write
- Parameters
mode – Only supports ‘rb’ and ‘wb’ now
- Returns
STDReader, STDWriter
-
protocol= 'stdio'
-
-
class
megfile.SmartPath(path: Union[str, os.PathLike, int], *other_paths: Union[str, os.PathLike])[source] Bases:
megfile.pathlike.BasePath-
absolute(*args, **kwargs)
-
abspath(*args, **kwargs)
-
access(*args, **kwargs)
-
property
anchor
-
as_posix(*args, **kwargs)
-
as_uri(*args, **kwargs)
-
chmod(*args, **kwargs)
-
cwd(*args, **kwargs)
-
property
drive
-
exists(*args, **kwargs)
-
expanduser(*args, **kwargs)
-
getmtime(*args, **kwargs)
-
getsize(*args, **kwargs)
-
glob(*args, **kwargs)
-
glob_stat(*args, **kwargs)
-
group(*args, **kwargs)
-
hardlink_to(*args, **kwargs)
-
home(*args, **kwargs)
-
iglob(*args, **kwargs)
-
is_absolute(*args, **kwargs)
-
is_block_device(*args, **kwargs)
-
is_char_device(*args, **kwargs)
-
is_dir(*args, **kwargs)
-
is_fifo(*args, **kwargs)
-
is_file(*args, **kwargs)
-
is_mount(*args, **kwargs)
-
is_relative_to(*args, **kwargs)
-
is_reserved(*args, **kwargs)
-
is_socket(*args, **kwargs)
-
is_symlink(*args, **kwargs)
-
iterdir(*args, **kwargs)
-
joinpath(*args, **kwargs)
-
lchmod(*args, **kwargs)
-
listdir(*args, **kwargs)
-
load(*args, **kwargs)
-
lstat(*args, **kwargs)
-
match(*args, **kwargs)
-
md5(*args, **kwargs)
-
mkdir(*args, **kwargs)
-
property
name
-
open(*args, **kwargs)
-
owner(*args, **kwargs)
-
property
parent
-
property
parents
-
property
parts
-
property
protocol
-
read_bytes(*args, **kwargs)
-
read_text(*args, **kwargs)
-
readlink(*args, **kwargs)
-
realpath(*args, **kwargs)
-
relative_to(*args, **kwargs)
-
relpath(*args, **kwargs)
-
remove(*args, **kwargs)
-
rename(*args, **kwargs)
-
replace(*args, **kwargs)
-
resolve(*args, **kwargs)
-
rglob(*args, **kwargs)
-
rmdir(*args, **kwargs)
-
property
root
-
samefile(*args, **kwargs)
-
save(*args, **kwargs)
-
scan(*args, **kwargs)
-
scan_stat(*args, **kwargs)
-
scandir(*args, **kwargs)
-
stat(*args, **kwargs)
-
property
stem
-
property
suffix
-
property
suffixes
-
symlink(*args, **kwargs)
-
symlink_to(*args, **kwargs)
-
touch(*args, **kwargs)
-
unlink(*args, **kwargs)
-
utime(*args, **kwargs)
-
walk(*args, **kwargs)
-
with_name(*args, **kwargs)
-
with_stem(*args, **kwargs)
-
with_suffix(*args, **kwargs)
-
write_bytes(*args, **kwargs)
-
write_text(*args, **kwargs)
-
-
class
megfile.SftpPath(path: Union[str, os.PathLike], *other_paths: Union[str, os.PathLike])[source] Bases:
megfile.pathlike.URIPathsftp protocol
uri format: - absolute path
sftp://[username[:password]@]hostname[:port]//file_path
- relative path
sftp://[username[:password]@]hostname[:port]/file_path
-
absolute() → megfile.sftp_path.SftpPath[source] Make the path absolute, without normalization or resolving symlinks. Returns a new path object
-
chmod(mode: int, follow_symlinks: bool = True)[source] Change the file mode and permissions, like os.chmod().
- Parameters
mode – the file mode you want to change
followlinks – Ignore this parameter, just for compatibility
-
copy(dst_path: Union[str, os.PathLike], callback: Optional[Callable[int, None]] = None, followlinks: bool = False)[source] Copy the file to the given destination path.
- Parameters
dst_path – The destination path to copy the file to.
callback – An optional callback function that takes an integer parameter and is called periodically during the copy operation to report the number of bytes copied.
followlinks – Whether to follow symbolic links when copying directories.
- Raises
IsADirectoryError – If the source is a directory.
OSError – If there is an error copying the file.
-
cwd() → megfile.sftp_path.SftpPath[source] Return current working directory
returns: Current working directory
-
exists(followlinks: bool = False) → bool[source] Test if the path exists
- Parameters
followlinks – False if regard symlink as file, else True
- Returns
True if the path exists, else False
-
getmtime(follow_symlinks: bool = False) → float[source] Get last-modified time of the file on the given path (in Unix timestamp format). If the path is an existent directory, return the latest modified time of all file in it.
- Returns
last-modified time
-
getsize(follow_symlinks: bool = False) → int[source] Get file size on the given file path (in bytes). If the path in a directory, return the sum of all file size in it, including file in subdirectories (if exist). The result excludes the size of directory itself. In other words, return 0 Byte on an empty directory path.
- Returns
File size
-
glob(pattern, recursive: bool = True, missing_ok: bool = True) → List[megfile.sftp_path.SftpPath][source] Return path list in ascending alphabetical order, in which path matches glob pattern
- If doesn’t match any path, return empty list
Notice:
glob.globin standard library returns [‘a/’] instead of empty list when pathname is like a/**, recursive is True and directory ‘a’ doesn’t exist. fs_glob behaves likeglob.globin standard library under such circumstance.
- No guarantee that each path in result is different, which means:
Assume there exists a path /a/b/c/b/d.txt use path pattern like /**/b/**/*.txt to glob, the path above will be returned twice
** will match any matched file, directory, symlink and ‘’ by default, when recursive is True
fs_glob returns same as glob.glob(pathname, recursive=True) in acsending alphabetical order.
Hidden files (filename stars with ‘.’) will not be found in the result
- Parameters
pattern – Glob the given relative pattern in the directory represented by this path
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
- Returns
A list contains paths match pathname
-
glob_stat(pattern, recursive: bool = True, missing_ok: bool = True) → Iterator[megfile.pathlike.FileEntry][source] Return a list contains tuples of path and file stat, in ascending alphabetical order, in which path matches glob pattern
- If doesn’t match any path, return empty list
Notice:
glob.globin standard library returns [‘a/’] instead of empty list when pathname is like a/**, recursive is True and directory ‘a’ doesn’t exist. sftp_glob behaves likeglob.globin standard library under such circumstance.
- No guarantee that each path in result is different, which means:
Assume there exists a path /a/b/c/b/d.txt use path pattern like /**/b/**/*.txt to glob, the path above will be returned twice
** will match any matched file, directory, symlink and ‘’ by default, when recursive is True
fs_glob returns same as glob.glob(pathname, recursive=True) in acsending alphabetical order.
Hidden files (filename stars with ‘.’) will not be found in the result
- Parameters
pattern – Glob the given relative pattern in the directory represented by this path
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
- Returns
A list contains tuples of path and file stat, in which paths match pathname
-
iglob(pattern, recursive: bool = True, missing_ok: bool = True) → Iterator[megfile.sftp_path.SftpPath][source] Return path iterator in ascending alphabetical order, in which path matches glob pattern
- If doesn’t match any path, return empty list
Notice:
glob.globin standard library returns [‘a/’] instead of empty list when pathname is like a/**, recursive is True and directory ‘a’ doesn’t exist. fs_glob behaves likeglob.globin standard library under such circumstance.
- No guarantee that each path in result is different, which means:
Assume there exists a path /a/b/c/b/d.txt use path pattern like /**/b/**/*.txt to glob, the path above will be returned twice
** will match any matched file, directory, symlink and ‘’ by default, when recursive is True
fs_glob returns same as glob.glob(pathname, recursive=True) in acsending alphabetical order.
Hidden files (filename stars with ‘.’) will not be found in the result
- Parameters
pattern – Glob the given relative pattern in the directory represented by this path
recursive – If False, ** will not search directory recursively
missing_ok – If False and target path doesn’t match any file, raise FileNotFoundError
- Returns
An iterator contains paths match pathname
-
is_dir(followlinks: bool = False) → bool[source] Test if a path is directory
Note
The difference between this function and
os.path.isdiris that this function regard symlink as file- Parameters
followlinks – False if regard symlink as file, else True
- Returns
True if the path is a directory, else False
-
is_file(followlinks: bool = False) → bool[source] Test if a path is file
Note
The difference between this function and
os.path.isfileis that this function regard symlink as file- Parameters
followlinks – False if regard symlink as file, else True
- Returns
True if the path is a file, else False
-
is_symlink() → bool[source] Test whether a path is a symbolic link
- Returns
If path is a symbolic link return True, else False
- Return type
bool
-
iterdir() → Iterator[megfile.sftp_path.SftpPath][source] Get all contents of given sftp path. The result is in acsending alphabetical order.
- Returns
All contents have in the path in acsending alphabetical order
-
listdir() → List[str][source] Get all contents of given sftp path. The result is in acsending alphabetical order.
- Returns
All contents have in the path in acsending alphabetical order
-
load() → BinaryIO[source] Read all content on specified path and write into memory
User should close the BinaryIO manually
- Returns
Binary stream
-
lstat() → megfile.pathlike.StatResult[source] Get StatResult of file on sftp, including file size and mtime, referring to fs_getsize and fs_getmtime
- Returns
StatResult
-
md5(recalculate: bool = False, followlinks: bool = True)[source] Calculate the md5 value of the file
- Parameters
recalculate – Ignore this parameter, just for compatibility
followlinks – Ignore this parameter, just for compatibility
returns: md5 of file
-
mkdir(mode=511, parents: bool = False, exist_ok: bool = False)[source] make a directory on sftp, including parent directory
If there exists a file on the path, raise FileExistsError
- Parameters
mode – If mode is given, it is combined with the process’ umask value to determine the file mode and access flags.
parents – If parents is true, any missing parents of this path are created as needed;
If parents is false (the default), a missing parent raises FileNotFoundError. :param exist_ok: If False and target directory exists, raise FileExistsError :raises: FileExistsError
-
open(mode: str = 'r', buffering=-1, encoding: Optional[str] = None, errors: Optional[str] = None, **kwargs) → IO[AnyStr][source] Open a file on the path.
- Parameters
mode – Mode to open file
buffering – buffering is an optional integer used to set the buffering policy.
encoding – encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode.
errors – errors is an optional string that specifies how encoding and decoding errors are to be handled—this cannot be used in binary mode.
- Returns
File-Like object
-
protocol= 'sftp'
-
readlink() → megfile.sftp_path.SftpPath[source] Return a SftpPath instance representing the path to which the symbolic link points. :returns: Return a SftpPath instance representing the path to which the symbolic link points.
-
remove(missing_ok: bool = False) → None[source] Remove the file or directory on sftp
- Parameters
missing_ok – if False and target file/directory not exists, raise FileNotFoundError
-
rename(dst_path: Union[str, os.PathLike]) → megfile.sftp_path.SftpPath[source] rename file on sftp
- Parameters
dst_path – Given destination path
-
replace(dst_path: Union[str, os.PathLike]) → megfile.sftp_path.SftpPath[source] move file on sftp
- Parameters
dst_path – Given destination path
-
resolve(strict=False) → megfile.sftp_path.SftpPath[source] Equal to sftp_realpath
- Parameters
strict – Ignore this parameter, just for compatibility
- Returns
Return the canonical path of the specified filename, eliminating any symbolic links encountered in the path.
- Return type
-
save(file_object: BinaryIO)[source] Write the opened binary stream to path If parent directory of path doesn’t exist, it will be created.
- Parameters
file_object – stream to be read
-
scan(missing_ok: bool = True, followlinks: bool = False) → Iterator[str][source] Iteratively traverse only files in given directory, in alphabetical order. Every iteration on generator yields a path string.
If path is a file path, yields the file only If path is a non-existent path, return an empty generator If path is a bucket path, return all file paths in the bucket
- Parameters
missing_ok – If False and there’s no file in the directory, raise FileNotFoundError
- Returns
A file path generator
-
scan_stat(missing_ok: bool = True, followlinks: bool = False) → Iterator[megfile.pathlike.FileEntry][source] Iteratively traverse only files in given directory, in alphabetical order. Every iteration on generator yields a tuple of path string and file stat
- Parameters
missing_ok – If False and there’s no file in the directory, raise FileNotFoundError
- Returns
A file path generator
-
scandir() → Iterator[megfile.pathlike.FileEntry][source] Get all content of given file path.
- Returns
An iterator contains all contents have prefix path
-
stat(follow_symlinks=True) → megfile.pathlike.StatResult[source] Get StatResult of file on sftp, including file size and mtime, referring to fs_getsize and fs_getmtime
- Returns
StatResult
-
symlink(dst_path: Union[str, os.PathLike]) → None[source] Create a symbolic link pointing to src_path named dst_path.
- Parameters
dst_path – Desination path
-
sync(dst_path: Union[str, os.PathLike], followlinks: bool = False, force: bool = False)[source] Copy file/directory on src_url to dst_url
- Parameters
dst_url – Given destination path
followlinks – False if regard symlink as file, else True
force – Sync file forcely, do not ignore same files
-
unlink(missing_ok: bool = False) → None[source] Remove the file on sftp
- Parameters
missing_ok – if False and target file not exists, raise FileNotFoundError
-
utime(atime: Union[float, int], mtime: Union[float, int]) → None[source] Set the access and modified times of the file specified by path.
- Parameters
atime (Union[float, int]) – The access time to be set.
mtime (Union[float, int]) – The modification time to be set.
- Returns
None
-
walk(followlinks: bool = False) → Iterator[Tuple[str, List[str], List[str]]][source] Generate the file names in a directory tree by walking the tree top-down. For each directory in the tree rooted at directory path (including path itself), it yields a 3-tuple (root, dirs, files).
root: a string of current path dirs: name list of subdirectories (excluding ‘.’ and ‘..’ if they exist) in ‘root’. The list is sorted by ascending alphabetical order files: name list of non-directory files (link is regarded as file) in ‘root’. The list is sorted by ascending alphabetical order
If path not exists, or path is a file (link is regarded as file), return an empty generator
Note
Be aware that setting
followlinksto True can lead to infinite recursion if a link points to a parent directory of itself. fs_walk() does not keep track of the directories it visited already.- Parameters
followlinks – False if regard symlink as file, else True
- Returns
A 3-tuple generator