Softwares Archives

Softwares Archives

Softwares Archives

Softwares Archives

Tools & software

Web archiving is the process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. Web archivists typically employ Web crawlers for automated capture due to the massive scale of the Web. Ever-evolving Web standards require continuous evolution of archiving tools to keep up with the changes in Web technologies to ensure reliable and meaningful capture and replay of archived web pages.

Contents

Training/Documentation

Resources for Web Publishers

These resources can help when working with individuals or organisations who publish on the web, and who want to make sure their site can be archived.

Tools & Software

This list of tools and software is intended to briefly describe some of the most important and widely-used tools related to web archiving. For more details, we recommend you refer to (and contribute to!) these excellent resources from other groups: * Comparison of web archiving software * Awesome Website Change Monitoring * Web Crawl @ COPTR

Acquisition

  • ArchiveBox - A tool which maintains an additive archive from RSS feeds, bookmarks, and links using wget, chrome headless, and other methods (formerly ). (In Development)
  • archivenow - A Python library to push web resources into on-demand web archives. (Stable)
  • Brozzler - A distributed web crawler (爬虫) that uses a real browser (chrome or chromium) to fetch pages and embedded urls and to extract links. (Stable)
  • Chronicler - Web browser with record and replay functionality. (In Development)
  • Crawl - A simple web crawler in Golang. (Stable)
  • crocoite - Crawl websites using headless Google Chrome/Chromium and save resources, static DOM snapshot and page screenshots to WARC files. (In Development)
  • F(b)arc - A commandline tool and Python library for archiving data from Facebook using the Graph API. (Stable)
  • freeze-dry - JavaScript library to turn page into static, self-contained HTML document; useful for browser extensions. (In Development)
  • grab-site - The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns. (Stable)
  • Heritrix - An open source, extensible, web-scale, archival quality web crawler. (Stable)
  • html2warc - A simple script to convert offline data into a single WARC file. (Stable)
  • HTTrack - An open source website copying utility. (Stable)
  • monolith - CLI tool to save a web page as a single HTML file. (Stable)
  • SingleFile - Browser extension for Firefox/Chrome and CLI tool to save a faithful copy of a complete page as a single HTML file. (Stable)
  • SiteStory - A transactional archive that selectively captures and stores transactions that take place between a web client (browser) and a web server. (Stable)
  • Social Feed Manager - Open source software that enables users to create social media collections from Twitter, Tumblr, Flickr, and Sina Weibo public APIs. (Stable)
  • Squidwarc - An open source, high-fidelity, page interacting archival crawler that uses Chrome or Chrome Headless directly. (In Development)
  • StormCrawler - A collection of resources for building low-latency, scalable web crawlers on Apache Storm. (Stable)
  • twarc - A command line tool and Python library for archiving Twitter JSON data. (Stable)
  • WARCreate - A Google Chrome extension for archiving an individual webpage or website to a WARC file. (Stable)
  • Warcworker - An open source, dockerized, queued, high fidelity web archiver based on Squidwarc with a simple web GUI. (Stable)
  • WAIL - A graphical user interface (GUI) atop multiple web archiving tools intended to be used as an easy way for anyone to preserve and replay web pages; Python, Electron. (Stable)
  • Web2Warc - An easy-to-use and highly customizable crawler that enables anyone to create their own little Web archives (WARC/CDX). (Stable)
  • WebMemex - Browser extension for Firefox and Chrome which lets you archive web pages you visit. (In Development)
  • Webrecorder - Create high-fidelity, interactive recordings of any web site you browse. (Stable)
  • Wget - An open source file retrieval utility that of version 1.14 supports writing warcs. (Stable)
  • Wget-lua - Wget with Lua extension. (Stable)
  • Wpull - A Wget-compatible (or remake/clone/replacement/alternative) web downloader and crawler. (Stable)

Replay

  • ReplayWeb.Page - A browser-based, fully client-side replay engine for both local and remote WARC files.
  • PyWb - A Python (2 and 3) implementation of web archival replay tools, sometimes also known as 'Wayback Machine'. (Stable)
  • OpenWayback - The open source project aimed to develop Wayback Machine, the key software used by web archives worldwide to play back archived websites in the user's browser. (Stable)
  • InterPlanetary Wayback (ipwb) - Web Archive (WARC) indexing and replay using IPFS.
  • Reconstructive - Reconstructive is a ServiceWorker module for client-side reconstruction of composite mementos by rerouting resource requests to corresponding archived copies (JavaScript).

Search & Discovery

  • Mink - A Google Chrome extension for querying Memento aggregators while browsing and integrating live-archived web navigation. (Stable)
  • SecurityTrails - Web based archive for WHOIS and DNS records. REST API available free of charge.
  • Tempas v1 - Temporal web archive search based on Delicious tags. (Stable)
  • Tempas v2 - Temporal web archive search based on links and anchor texts extracted from the German web from 1996 to 2013 (results are not limited to German pages, e.g., Obama@2005-2009 in Tempas). (Stable)
  • webarchive-discovery - WARC and ARC full-text indexing and discovery tools, with a number of associated tools capable of using the index shown below. (Stable)
    • Shine - A prototype web archives exploration UI, developed with researchers as part of the Big UK Domain Data for the Arts and Humanities project. (Stable)
    • SolrWayback - A prototype web archives exploration UI with integrated playback functionality for WARCs. (In Development)
    • Warclight - A Project Blacklight based Rails engine that supports the discovery of web archives held in the WARC and ARC formats. (In Development)
    • Wasp - A fully functional prototype of a personal web archive and search system. (In Development)
    • Other possible options for builting a front-end are listed on in the wiki, here.

Utilities

  • ArchiveTools - Collection of tools to extract and interact with WARC files (Python).
  • har2warc - Convert HTTP Archive (HAR) -> Web Archive (WARC) format (Python).
  • httpreserve.info - Service to return the status of a web page or save it to the Internet Archive. Returns JSON via browser or command line via CURL using GET (Golang Package). (Stable)
  • HTTPreserve Workbench - Tool and API to describe the status of a web page encoded in a simple JSON output describing current status, and earliest and latest links on wayback.org. Save a web page to the Internet Archive. Audit lists of URIs and output a CSV with the data described above (Golang). (In Development)
  • MementoMap - A Tool to Summarize Web Archive Holdings (Python). (In Development)
  • MemGator - A Memento Aggregator CLI and Server (Golang). (Stable)
  • node-cdxj - CDXJ file parser (Node.js). (Stable)
  • OutbackCDX - RocksDB-based capture index (CDX) server supporting incremental updates and compression. Can be used as backend for OpenWayback, PyWb and Heritrix. (Stable)
  • py-wasapi-client - Command line application to download crawls from WASAPI (Python). (Stable)
  • The Archive Browser - The Archive Browser is a program that lets you browse the contents of archives, as well as extract them. It will let you open files from inside archives, and lets you preview them using Quick Look. WARC is supported (macOS only, Proprietary app).
  • The Unarchiver - Program to extract the contents of many archive formats, inclusive of WARC, to a file system. Free variant of The Archive Browser (macOS only, Proprietary app).
  • tikalinkextract - Extract hyperlinks as a seed for web archiving from folders of document types that can be parsed by Apache Tika (Golang, Apache Tika Server). (In Development)
  • wasapi-downloader - Java command line application to download crawls from WASAPI. (Stable)
  • WarcPartitioner - Partition (W)ARC Files by MIME Type and Year. (Stable)
  • webarchive-indexing - Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.
  • wikiteam - Tools for downloading and preserving wikis. (Stable)

WARC I/O Libraries

  • HadoopConcatGz - A Splitable Hadoop InputFormat for Concatenated GZIP Files (and ). (Stable)
  • jwarc - Reading and write WARC files with a typesafe API (Java).
  • Jwat - Libraries and tools for reading/writing/validating WARC/ARC/GZIP files (Java). (Stable)
  • node-warc - Parse WARC files or create WARC files using either Electron or chrome-remote-interface (Node.js). (Stable)
  • Warcat - Tool and library for handling Web ARChive (WARC) files (Python). (Stable)
  • warcio - Streaming WARC/ARC library for fast web archive IO (Python).
  • warctools - Library to work with ARC and WARC files (Python).
  • webarchive - Golang readers for ARC and WARC webarchive formats (Golang).

Analysis

  • ArchiveSpark - An Apache Spark framework (not only) for Web Archives that enables easy data processing, extraction as well as derivation. (Stable)
  • Archives Unleashed Cloud - Archives Unleashed Cloud (AUK) is an web interface for analysing web archives. Currently, it can sync with Archive-It collections and extract hyperlink networks, full text, and other information from your collections. (Stable)
  • Archives Unleashed Notebooks - Notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit. (Stable)
  • Archives Unleashed Toolkit - Archives Unleashed Toolkit (AUT) is an open-source platform for analyzing web archives with Apache Spark. (Stable)
  • Tweet Archvies Unleashed Toolkit - An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark. (In Development)

Quality Assurance

Community Resources

Other Awesome Lists

Blogs and Scholarship

Mailing Lists

Slack

Twitter

Источник: [https://torrent-igruha.org/3551-portal.html]
, Softwares Archives

Archive file

"File archive" redirects here. For other uses, see file archiver.

An archive file is a file that is composed of one or more computer files along with metadata. Archive files are used to collect multiple data files together into a single file for easier portability and storage, or simply to compress files to use less storage space. Archive files often store directory structures, error detection and correction information, arbitrary comments, and sometimes use built-in encryption.

Applications[edit]

Archive files are particularly useful in that they store file system data and metadata within the contents of a particular file, and thus can be stored on systems or sent over channels that do not support the file system in question, only file contents – examples include sending a directory structure over email.

Beyond archival purposes, archive files are frequently used for packaging software for distribution, as software contents are often naturally spread across several files; the archive is then known as a package. While the archival file format is the same, there are additional conventions about contents, such as requiring a manifest file, and the resulting format is known as a package format. Examples include deb for Debian, JAR for Java, and APK for Android.

Features[edit]

Features supported by various kinds of archives include:

Some archive programs have self-extraction, self-installation, source volume and medium information, and package notes/description.

The file extension or file header of the archive file are indicators of the file format used. Computer archive files are created by file archiver software, optical disc authoring software, and disk image software.

Archive formats[edit]

An archive format is the file format of an archive file. Some formats are well-defined by their authors and have become conventions supported by multiple vendors and communities.

Types[edit]

  • Archiving only formats store metadata and concatenate files.
  • Compression only formats only compress files.
  • Multi-function formats can store metadata, concatenate, compress, encrypt, create error detection and recovery information, and package the archive into self-extracting and self-expanding files.
  • Software packaging formats are used to create software packages that may be self-installing files.
  • Disk image formats are used to create disk images of mass storage volumes.

Examples[edit]

Filename extensions used to distinguish different types of archives include zip, rar, 7z, and tar.

Java also introduced a whole family of archive extensions such as jar and war (j is for Java and w is for web). They are used to exchange entire byte-code deployment. Sometimes they are also used to exchange source code and other text, HTML and XML files. By default they are all compressed.

Error detection and recovery[edit]

Archive files often include parity checks and other checksums for error detection, for instance zip files use a cyclic redundancy check (CRC). RAR archives may include redundant error correction data (called recovery records).

Archive files are sometimes accompanied by separate parity archive (PAR) files that allow for additional error detection and recovery, particularly in recovery of missing files in a multi-file archive.

See also[edit]

References[edit]

External links[edit]

Источник: [https://torrent-igruha.org/3551-portal.html]
Softwares Archives

Kodi (formerly XBMC) is a free and open-source media player software application developed by the XBMC Foundation, a non-profit technology consortium. Kodi is available for multiple operating systems and hardware platforms, with a software 10-foot user interface for use with televisions and remote controls. It allows users to play and view most streaming media, such as videos, music, podcasts, and videos from the Internet, as well as all common digital media files from local and network storage...

146.2M146M

Dec 30, 201512/15

A collection of APK (Android Package) Software Programs uploaded by various users.

The Vintage Software collection gathers various efforts by groups to classify, preserve, and provide historical software. These older programs, many of them running on defunct and rare hardware, are provided for purposes of study, education, and historical reference. 

115.1M115M

The Internet Arcade is a web-based library of arcade (coin-operated) video games from the 1970s through to the 1990s, emulated in JSMAME, part of the JSMESS software package. Containing hundreds of games ranging through many different genres and styles, the Arcade provides research, comparison, and entertainment in the realm of the Video Game Arcade.   The game collection ranges from early "bronze-age" videogames, with black and white screens and simple sounds, through to large-scale...

98.4M98M

May 9, 200605/06

byInternet Archive

The Open Source Software Collection includes computer programs and/or data which are licensed under an Open Source Initiative or Free Software license, or is public domain . In general, items in this collection should be software for which the source code is freely available and able to be used and distributed without undue restrictions, and/or computer data which conforms to an openly published format.
Topics: software, public domain, open source, opensource, oss, free software, gpl, gnu, public domain...

75.6M76M

The Internet Archive Software Library is the ultimate software crate-digger's dream: Tens of thousands of playable software titles from multiple computer platforms, allowing instant access to decades of computer history in your browser through the JSMESS emulator. The intention is to ultimately have most major computer platforms available; currently, the collection includes the Apple II , Atari 800 , and ZX Spectrum computers. In each case, sub-collections contain vast sets of disk and...
Topics: software, floppies, images, disks, emulation, Apple II, Atari 800, Atari 8-Bit, ZX Spectrum

MS-DOS (/ˌɛmɛsˈdɒs/ em-es-doss; short for Microsoft Disk Operating System) is an operating system for x86-based personal computers mostly developed by Microsoft. It was the most commonly used member of the DOS family of operating systems, and was the main operating system for IBM PC compatible personal computers during the 1980s to the mid-1990s. IF YOU ARE EXPERIENCING ANY ISSUES WITH RUNNING THESE PROGRAMS, PLEASE READ THE FAQ. Thanks to eXo for contributions and assistance with this...

48.5M48M

The Internet Archive Console Living Room harkens back to the revolution of the change in the hearth of the home, when the fireplace and later television were transformed by gaming consoles into a center of videogame entertainment. Connected via strange adapters and relying on the television's speaker to put out beeps and boops, these games were resplendent with simple graphics and simpler rules. The home console market is credited with slowly shifting attention from the arcade craze of the...

Источник: [https://torrent-igruha.org/3551-portal.html]
.

What’s New in the Softwares Archives?

Screen Shot

System Requirements for Softwares Archives

Add a Comment

Your email address will not be published. Required fields are marked *