Apache Parquet

Apache Parquet

Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem inspired by Google Dremel interactive ad-hoc query system for analysis of read-only nested data. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most of the data processing frameworks around Hadoop. It provides data compression and encoding schemes with enhanced performance to handle complex data in bulk. == History == The open-source project to build Apache Parquet began as a joint effort between Twitter and Cloudera using the record shredding and assembly algorithm as described in Google's Dremel. Parquet was designed as an improvement on the Trevni columnar storage format created by Doug Cutting, the creator of Hadoop. The name 'parquet' (lit. 'small compartment') refers to a style of decorative flooring and was chosen to "evoke the bottom layer of a database with an interesting layout". The first version, Apache Parquet 1.0, was released in July 2013. Since April 27, 2015, Apache Parquet has been a top-level Apache Software Foundation (ASF)-sponsored project. == Features == Apache Parquet is implemented using the record-shredding and assembly algorithm, which accommodates the complex data structures that can be used to store data. The values in each column are stored in contiguous memory locations, providing the following benefits: Column-wise compression is efficient in storage space Encoding and compression techniques specific to the type of data in each column can be used Queries that fetch specific column values need not read the entire row, thus improving performance Apache Parquet is implemented using the Apache Thrift framework, which increases its flexibility; it can work with a number of programming languages like C++, Java, Python, PHP, etc. As of August 2015, Parquet supports the big-data-processing frameworks including Apache Hive, Apache Drill, Apache Impala, Apache Crunch, Apache Pig, Cascading, Presto and Apache Spark. It is one of the external data formats used by the pandas Python data manipulation and analysis library. == Compression and encoding == In Parquet, compression is performed column by column, which enables different encoding schemes to be used for text and integer data. This strategy also keeps the door open for newer and better encoding schemes to be implemented as they are invented. Parquet supports various compression formats: snappy, gzip, LZO, brotli, zstd, and LZ4. === Dictionary encoding === Parquet has an automatic dictionary encoding enabled dynamically for data with a small number of unique values (i.e. below 105) that enables significant compression and boosts processing speed. === Bit packing === Storage of integers is usually done with dedicated 32 or 64 bits per integer. For small integers, packing multiple integers into the same space makes storage more efficient. === Run-length encoding (RLE) === To optimize storage of multiple occurrences of the same value, run-length encoding is used, which is where a single value is stored once along with the number of occurrences. Parquet implements a hybrid of bit packing and RLE, in which the encoding switches based on which produces the best compression results. This strategy works well for certain types of integer data and combines well with dictionary encoding. == Cloud Storage and Data Lakes == Parquet is widely used as the underlying file format in modern cloud-based data lake architectures. Cloud storage systems such as Amazon S3, Azure Data Lake Storage, and Google Cloud Storage commonly store data in Parquet format due to its efficient columnar representation and retrieval capabilities. Data lakehouse frameworks—including Apache Iceberg, Delta Lake, and Apache Hudi —build an additional metadata layer on top of Parquet files to support features such as schema evolution, time-travel queries, and ACID-compliant transactions. In these architectures, Parquet files serve as the immutable storage layer while the table formats manage data versioning and transactional integrity. == Comparison == Apache Parquet is comparable to RCFile and Optimized Row Columnar (ORC) file formats — all three fall under the category of columnar data storage within the Hadoop ecosystem. They all have better compression and encoding with improved read performance at the cost of slower writes. In addition to these features, Apache Parquet supports limited schema evolution, i.e., the schema can be modified according to the changes in the data. It also provides the ability to add new columns and merge schemas that do not conflict. Apache Arrow is designed as an in-memory complement to on-disk columnar formats like Parquet and ORC. The Arrow and Parquet projects include libraries that allow for reading and writing between the two formats. == Implementations == Known implementations of Parquet include:

Document-oriented database

A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving, and managing document-oriented information, also known as semi-structured data. Document-oriented databases are one of the main categories of NoSQL databases, and the popularity of the term "document-oriented database" has grown alongside the adoption of NoSQL itself. XML databases are a subclass of document-oriented databases optimized for XML documents. Graph databases are similar, but add another layer, the relationship, which allows them to link documents for rapid traversal. Document-oriented databases are conceptually an extension of the key–value store, another type of NoSQL database. In key-value stores, data is treated as opaque by the database, whereas document-oriented systems exploit the internal structure of documents to extract metadata and optimize storage and queries. Although in practice the distinction can be minimal due to modern tooling, document stores are designed to provide a richer programming experience with modern programming techniques. Document databases differ significantly from traditional relational databases (RDBs). Relational databases store data in predefined tables, often requiring an object to be split across multiple tables. In contrast, document databases store all information for a given object in a single document, with each document potentially having a unique structure. This design eliminates the need for object-relational mapping when loading data into the database. == Documents == The central concept of a document-oriented database is the notion of a document. Although implementations vary in their specific definitions, document-oriented databases generally treat documents as self-contained units that encapsulate and encode data in a standardized format. Common encoding formats include XML, YAML, JSON, as well as binary representations such as BSON. Documents in a document store are equivalent to the programming concept of an object. They are not required to adhere to a fixed schema, and documents within the same collection may contain different fields or structures. Fields may be optional, and documents of the same logical type may differ in composition. For example, the following illustrates a document encoded in JSON: A second document might be encoded in XML as: The two example documents share some structural elements but also contain unique fields. The structure, text, and other data within each document are collectively referred to as the document's content and can be accessed or modified using retrieval or editing operations. Unlike relational databases, in which each record contains the same fields and unused fields are left empty, document-oriented databases do not require uniform fields across documents. This design allows new information to be added to some documents without affecting the structure of others. Document databases often support the storage of additional metadata alongside the document content. Such metadata may relate to organizational features, security, indexing, or other implementation-specific features. === CRUD operations === The core operations supported by a document-oriented database for manipulating documents are similar to those in other databases. Although terminology is not perfectly standardized, these operations are generally recognized as Create, Read, Update, and Delete (CRUD). Creation (C): Adds a new document to the database. Retrieval (R): Retrieves documents or fields based on queries. Update (U): Modifies the contents of existing documents. Deletion (D): Removes documents from the database. === Keys === Documents in a document-oriented database are addressed via a unique identifier. This identifier, often a string, URI, or path, can be used to retrieve the document from the database. Most document stores maintain an index on the key to optimize retrieval, and in some implementations the key is required when creating or inserting a new document. === Retrieval === In addition to key-based access, document-oriented databases typically provide an API or query language that enables retrieval based on document content or associated metadata. For example, a query may return all documents with a specific field matching a given value. The available query features, indexing options, and performance characteristics vary across implementations. Document stores differ from key-value stores in that they exploit the internal structure and metadata of stored documents. In many key-value stores, values are treated as opaque or "black-box" data, meaning the database system does not interpret their internal structure. By contrast, document-oriented databases can classify and interpret document content. This enables queries that distinguish between types of data––for example, retrieving all phone numbers containing "555" without also matching a postal code such as "55555." === Editing === Document databases typically provide mechanisms for updating or editing the content or metadata of a document. Updates may involve replacing the entire document or modifying individual elements or fields within the document. === Organization === Document database implementations support a variety of methods for organizing documents, including: Collections: Groups of documents. Depending on the implementation, a document may be required to belong to a single collection or may be allowed in multiple collections. Tags and non-visible metadata: Additional data stored outside the main document content. Directory hierarchies: Documents organized in a tree-like structure, often based on path or URI. These organizational structures may differ between logical and physical representations (e.g. on disk or in memory). == Relationship to other databases == === Relationship to key-value stores === A document-oriented database can be viewed as a specialized form of key-value store, which is itself a category of NoSQL database. In a basic key-value store, the stored value is typically treated as opaque by the database system. By contrast, a document-oriented database provides APIs or a query and update language that allows queries and modifications based on the internal structure of the document. For users who do not require advanced query, retrieval, or update capabilities, the distinction between document-oriented databases and key-value stores may be minimal. === Relationship to search engines === Some search engine and information retrieval systems, such as Apache Solr and Elasticsearch, provide document storage and support core document operations. As a result, they may meet certain functional definitions of a document-oriented database, although their primary design goals differ. === Relationship to relational databases === In a relational database, data is organized into predefined types represented as tables. Each table contains rows (records) with a fixed set of columns (fields), so all records in a table share the same structure. Administrators typically define indexes on selected fields to improve query performance. A central principle of relational database design is database normalization, in which data that might otherwise be repeated is stored in separate tables and linked using keys. When records in different tables are related, a foreign key is used to associate them. For example, an address book application may store a contact's name, image, phone numbers, mailing addresses, and email addresses. In a normalized relational design, separate tables might be created for contacts, phone numbers, and email addresses. The phone number table would include a foreign key referencing the associated contact. To reconstruct a complete contact record, the database retrieves related information from each table using the foreign keys and combines it into a single record. In contrast, a document-oriented database stores all data related to an object within a single document, and stored in the database as a single entry. In the address book example,the contact's name, image, and contact information may be stored together in one document. The document is retrieved using a unique key, and all related information is returned together, without needing to look up multiple tables. A key difference between the document-oriented and relational models is that the data formats are not predefined in the document case. In most cases, any sort of document can be stored in a database, and documents can change in type and form over time. For example, a new field such as COUNTRY_FLAG can be added to new documents as they are inserted without affecting existing documents. To aid retrieval, document-oriented systems generally allow the administrator to provide hints to the database for locating certain types of information. These hints work in a similar fashion to indexes in relational databases. Many systems also allow additional metadata outside the content of the document itself

Artificial intelligence in architecture

Artificial intelligence in architecture is the use of artificial intelligence in automation, design, and planning in the architectural process or in assisting human skills in the field of architecture. AI has been used by some architects for design, and has been proposed as a way to automate planning and routine tasks in the field. == Implications == === Benefits === Artificial intelligence, according to ArchDaily, is said to potentially significantly augment the architectural profession through its ability to improve the design and planning process as well as increasing productivity. Through its ability to handle a large amount of data, AI is said to potentially allow architects a range of design choices with criteria considerations such as budget, requirements adjusted to space, and sustainability goals calculated as part of the design process. ArchDaily said this may allow the design of optimized alternatives that can then undergo human review. AI tools are also said to potentially allow architects to assimilate urban and environmental data to inform their designs, streamlining initial stages of project planning and increasing efficiency and productivity. The advances in generative design through the input of specific prompts allow architects to produce visual designs, including photorealistic images, and thus render and explore various material choices and spatial configurations. ArchDaily noted this could speed the creative process as well as allow for experimentation and sophistication in the design. Additionally, AI's capacity for pattern recognition and coding could aid architects in organizing design resources and developing custom applications, thus enhancing the efficiency and collaboration between both architects and AI. AI is thought to also be able to contribute to the sustainability of buildings by analyzing various factors and following recommended energy-efficient modifications, thus pushing the industry towards greener practices. The use of AI in building maintenance, project management, and the creation of immersive virtual reality experiences are also thought of as potentially augmenting the architectural design process and workflow. Examples include the use of text-to-image systems such as Midjourney to create detailed architectural images, and the use of AI optimization systems from companies such as Finch3D and Autodesk to automatically generate floor plans from simple programmatic inputs. In contrast to digital-only creative practices, the high materiality of architectural outputs requires transitions from ephemeral digital files to permanent physical structures that are subject to strict safety regulations, material constraints, sensory intuition, and site-specific cultural contexts, making full automation difficult. Early adopters such as architect Stephen Coorlas have actively challenged the boundaries of architectural practice through AI. His early experimental initiative, Speculations on AI and Architecture, confronts the discipline's traditional workflows by training text-to-image AI tools such as Midjourney, Luma AI, and PromeAI to generate more nuanced architectural illustrations including construction documents, architectural details, and assembly sequences for various structures. Coorlas inputs precise terminology and architectural language to provoke the AI into producing axonometric drawings that resemble conventional documentation, then experiments with animating the outputs using AI generated depth maps and other AI image-to-3D wireframe tools. Stephen's inventive process invites architects and designers to reconsider authorship, automation, and the future of visual communication in the built environment. Rather than treating AI as a peripheral tool, Stephen has advocated for AI to be a speculative collaborator capable of engaging with discipline-specific challenges. His work contributes to the growing discourse on generative design, parametric optimization, and the philosophical implications of machine-assisted creativity raising urgent questions about how such technologies will reshape architectural agency, precision, and pedagogy. Another prominent advocate is Architect Andrew Kudless, who in an interview to Dezeen recounted that he uses AI to innovate in architectural design by incorporating materials and scenes not usually present in initial plans, which he believes can significantly alter client presentations. He told Dezeen he believes one should show clients renderings from the onset, with AI assisting in this work, arguing that changes in design should be a positive aspect of the client-designer relationship by actively involving clients in the process. Additionally, Kudless highlighted the AI's potential to facilitate labor in architectural firms, particularly in automating rendering tasks, thus reducing the workload on junior staff while maintaining control over the creative output. === Emergent aesthetics === In an interview for the AItopia series to Dezeen, designer Tim Fu discussed the transformative potential of AI in architecture, and proposed a future where AI could herald a "neoclassical futurist" style, blending the grandeur of classical aesthetics with futuristic design. Through his collaborative project, The AI Stone Carver, Fu showcased how AI can innovate traditional practices by generating design concepts that are then realized through human craftsmanship, such as stone carving by mason Till Apfel. This approach, he believed, celebrated the fusion of diverse architectural styles and also emphasized the unique capabilities of AI in enhancing creative design processes. Fu told Dezeen he envisions the integration of AI in design as a means to revive the ornamentation and detailed aesthetics characteristic of classical architecture, moving away from minimalism, which he said dominates contemporary architecture. He argued that AI's involvement in the ideation phase of design allows for a reversal in the roles of machine and human, enabling architects and designers to focus on creating more intricate and ornamental structures. Fu's optimistic outlook extended to the broader impact of AI on the architectural field, seeing it as an indispensable tool that will shift rather than replace human roles, enriching the field with innovative designs that pay homage to the beauty and qualities of classical architecture not present in contemporary architecture while embracing new technologies. This perspective resonates with designers like Manas Bhatia, whose explorations similarly embrace generative AI as a co-creator and a medium to express ideas, blend architectural traditions, and speculate spatial futures. === Concerns === As AI continues to expand its presence across various industries, its impact on the architectural profession has become a topic of growing discussion. These discussions focus on how AI processes may influence traditional architectural practices, potentially altering job roles, and shaping the nature of creativity. While AI-driven processes may increase efficiency in some aspects of the profession, they also raise questions about the potential loss of unique design perspectives. These thoughts have been countered by many prominent creative figures in the realm of AI architecture, such as Stephen Coorlas, Tim Fu, Hassan Ragab, and Manas Bhatia who have showcased the amplification of creativity in design and potential benefits in terms of restoring creative power to the designer. A key concern is that AI-powered tools could diminish the need for human involvement in specific tasks traditionally performed by architects. This has led to speculation that the profession may increasingly shift toward roles focused on oversight, coordination, and strategic decision-making rather than hands-on design work. In some design scenarios, algorithmically generated solutions can be adjusted to prioritize efficiency and cost-effectiveness, which some argue may overshadow the creative and contextual nuances that define individual architectural styles. As with any discipline though, it has been determined that AI can be configured to provide beneficial results based on inputs and end goals the architect or designer assigns it. There are also concerns about the potential for AI to exacerbate inequalities within the architectural profession. For instance, larger firms with greater resources to invest in advanced AI technologies may gain a competitive edge over smaller firms and independent architects. This dynamic could contribute to industry consolidation, potentially limiting the diversity of architectural practice and stifling innovation. Ethical considerations in regard to cultural sensitivity have also been raised due to the datasets used to train AI. Without proper vetting of data or implementing failsafe overrides, AI generated outcomes can trend toward overly documented and prioritized content.

Magic Quadrant

Magic Quadrant (MQ) is a series of market research reports published by research and advisory firm Gartner that rely on proprietary qualitative data analysis methods to demonstrate market trends, such as direction, maturity, and participants. Their analyses are conducted for several specific technology industries and are updated every 1–2 years: once an updated report has been published, its predecessor is "retired". == Rating == Gartner rates vendors upon two criteria: completeness of vision and ability to execute. Completeness of vision – Reflects the vendor's innovation, and whether the vendor drives or follows the market. Ability to execute – Summarizes factors such as the vendor's financial viability, market responsiveness, product development, sales channels and customer base. The two component scores lead to a vendor position in one of four quadrants: === Leaders === Vendors in the "Leaders" quadrant have the highest composite scores for their completeness of vision and ability to execute. A vendor in the Leaders quadrant has the market share, credibility, and marketing & sales capabilities needed to drive the acceptance of new technologies. These vendors demonstrate a clear understanding of market needs, they are innovators and thought leaders, and they have well-articulated plans that customers and prospects can use when designing their infrastructures and strategies. In addition, they have a presence in the five major geographical regions, consistent financial performance, and broad platform support. === Challengers === Vendors in the "Challengers" quadrant have high scores mainly for their ability to execute. They both participate in the market and execute well enough to be a serious threat to vendors in the "Leaders" quadrant. They have strong products, as well as sufficiently credible market position and resources to sustain continued growth. Financial viability is not an issue for vendors in the "Challengers" quadrant, but they lack the size and influence of vendors in the "Leaders" quadrant due to their relative lack of vision. === Visionaries === Vendors in the "Visionaries" quadrant have high scores mainly for their completeness of vision. They deliver innovative products that address operationally or financially important end-user problems at a broad scale, but have not yet demonstrated the ability to capture market share or maintain sustainable levels of profitability. Visionary vendors are frequently privately held companies and acquisition targets for larger, established companies. The likelihood of acquisition often reduces the risks associated with installing their systems. === Niche Players === Vendors in the "Niche Players" quadrant have relatively low scores for both their ability to execute and their completeness of vision. They are often narrowly focused on specific market or vertical segments. This quadrant often also includes vendors that are adapting their existing products to enter the market under consideration, or larger vendors having difficulty developing and executing on their vision. == Gartner Critical Capabilities == Gartner Critical Capabilities complement Magic Quadrant analysis to offer deeper insight into the products and services offered by multiple vendors by a comparative analysis that scores competing products or services against a set of critical differentiators identified by Gartner. Gartner has periodically ended Magic Quadrant listings for IT Service Management, Web Content Management, and other industries as those markets have fully matured or other factors rendered the analytic framework inapplicable. == Criticism == The Magic Quadrant, and analysts in general, skew the market: according to research, by applying their methodologies to describe a market, they change that marketplace to fit their tools. Another criticism is that open source vendors are not considered sufficiently by analysts like Gartner, as has been published in an online discussion between a VP from Talend and a German Research VP from Gartner. On May 29, 2009 (2009-05-29), software vendor ZL Technologies filed a federal lawsuit against Gartner that challenged the "legitimacy" of Gartner's Magic Quadrant rating system. Gartner filed a motion to dismiss by claiming First Amendment protection since it contends that its MQ reports contain "pure opinion", which legally means opinions that are not based on fact. The court threw out the ZL case because it lacked a specific complaint. The decision was upheld on appeal.

Jump-and-Walk algorithm

Jump-and-Walk is an algorithm for point location in triangulations (though most of the theoretical analysis were performed in 2D and 3D random Delaunay triangulations). Surprisingly, the algorithm does not need any preprocessing or complex data structures except some simple representation of the triangulation itself. The predecessor of Jump-and-Walk was due to Lawson (1977) and Green and Sibson (1978), which picks a random starting point S and then walks from S toward the query point Q one triangle at a time. But no theoretical analysis was known for these predecessors until after mid-1990s. Jump-and-Walk picks a small group of sample points and starts the walk from the sample point which is the closest to Q until the simplex containing Q is found. The algorithm was a folklore in practice for some time, and the formal presentation of the algorithm and the analysis of its performance on 2D random Delaunay triangulation was done by Devroye, Mucke and Zhu in mid-1990s (the paper appeared in Algorithmica, 1998). The analysis on 3D random Delaunay triangulation was done by Mucke, Saias and Zhu (ACM Symposium of Computational Geometry, 1996). In both cases, a boundary condition was assumed, namely, Q must be slightly away from the boundary of the convex domain where the vertices of the random Delaunay triangulation are drawn. In 2004, Devroye, Lemaire and Moreau showed that in 2D the boundary condition can be withdrawn (the paper appeared in Computational Geometry: Theory and Applications, 2004). Jump-and-Walk has been used in many famous software packages, e.g., QHULL, Triangle and CGAL.

Aarogya Setu

Aarogya Setu (lit. 'The bridge to health') is an Indian COVID-19 "contact tracing, syndromic mapping and self-assessment" digital service, primarily a mobile app, developed by the National Informatics Centre under the Ministry of Electronics and Information Technology (MeitY). The app reached more than 100 million installs in 40 days. On 26 May, amid growing privacy and security concerns, the source code of the app was made public. == Full view == The stated purpose of this app is to spread awareness of COVID-19 and to connect essential COVID-19-related health services to the people of India. This app augments the initiatives of the Department of Health to contain COVID-19 and shares best practices and advisories. It is a tracking app which uses the smartphone's GPS and Bluetooth features to track COVID-19 cases. The app is available for Android and iOS mobile operating systems. With Bluetooth, it tries to determine the risk if one has been near (within six feet of) a COVID-19-infected person, by scanning through a database of known cases across India. Using location information, it determines whether the location one is in belongs to one of the infected areas based on the data available. This app is an updated version of an earlier app called Corona Kavach (now discontinued) which was released earlier by the Government of India. == Features and tools == Aarogya Setu has four sections: User Status (tells the risk of getting COVID-19 for the user) Self Assess (helps the users identify COVID-19 symptoms and their risk profile) COVID-19 Updates (gives updates on local and national COVID-19 cases) E-pass integration (if applied for E-pass, it will be available) See Recent Contacts option (allows the users to assess the risk level of their Bluetooth contacts) It tells how many COVID-19 positive cases are likely in a radius of 500 m, 1 km, 2 km, 5 km and 10 km from the user. The app is built on a platform that can provide an application programming interface (API) so that other computer programs, mobile applications, and web services can make use of the features and data available in Aarogya Setu. == Response == Aarogya Setu crossed five million downloads within three days of its launch, making it one of the most popular government apps in India. It became the world's fastest-growing mobile app, beating Pokémon Go, with more than 50 million installs 13 days after launching in India on 2 April 2020. It reached 100 million installs by 13 May 2020, that is in 40 days since its launch. In an order on 29 April 2020 the central government made it mandatory for all employees to download the app and use it – "Before starting for office, they must review their status on Aarogya Setu and commute only when the app shows safe or low risk". The Union Home Ministry also said that the application is mandatory for all living in the COVID-19 containment zone. The government gave the announcement along with the nationwide lockdown extension by two weeks from the 4 May with certain relaxations. On 21 May 2020, the Airport Authority of India issued a Standard Operating Procedure (SOP) stating that all departing passengers must compulsorily be registered with the Aarogya Setu app. It added that the app would not be mandatory for children below 14 years. However, the next day, Civil Aviation Minister Hardeep Singh Puri clarified that the app would not be mandatory for any passengers. On 26 May 2020, the Aarogya Setu app code was made open to developers across the globe to help other countries manage contact tracing in their fight against COVID-19 pandemic. In March 2021, Co-WIN portal was integrated with the app. This allowed users to schedule an appointment through the app for COVID-19 vaccine by registering their phone number and providing relevant documents. == Effectiveness == NITI Aayog CEO revealed that "the app has been able to identify more than 3,000 hotspots in 3–17 days ahead of time." However, users and experts in India and around the world say the app raises huge data security concerns. The app collects name, number, gender, travel history, and uses a phone's Bluetooth and location data to let users know if they have been near a person with COVID-19 by scanning a database of known cases of infection, and also share it with the government simultaneously. This is the major area of concern as the app's constant access to a phone's Bluetooth imposes a form of security threat. But it stood to clarify itself that the informations received are not going to be made public. Amidst all these, the app hits a record of about one-hundred million downloads. == Reception == Rahul Gandhi, leader of the Congress party, termed the Aarogya Setu application a "sophisticated surveillance system" after the government announced that downloading the app would be mandatory for both government and private employees. Following this, others raised the same concerns about the Aarogya Setu app. The Ministry of Electronics and Information Technology (MeitY) responded to these concerns by asserting that Gandhi's claims were false, and that the app was being appreciated internationally. On 5 May, French ethical hacker Robert Baptiste, who goes by the name Elliot Alderson on Twitter, claimed that there were security issues with the app. The Indian government, as well as the app developers, responded to this claim by thanking the hacker for his attention, but dismissed his concerns. The developers of the app stated that the fetching of location data is a documented feature of the app, rather than a flaw, since the app is designed to track the distribution of the virus-infected population. They also asserted that no personal information of any user has been proven to be at risk. On 6 May, Robert Baptiste tweeted that security vulnerabilities in Aarogya Setu allowed hackers to "know who is infected, unwell, [or] made a self assessment in the area of his choice". He also gave details of how many people were unwell and infected at the Prime Minister's Office, the Indian Parliament and the Home Office. The Economic Times pointed out that a clause in the app's Terms and Conditions stated that the user "agrees and acknowledges that the Government of India will not be liable for ... any unauthorised access to your information or modification thereof". In response, several software developers called for the source code to be made public. On 12 May, former Supreme Court Judge Justice B.N. Srikrishna termed the government's push mandating the use of Aarogya Setu app "utterly illegal". He said so far it is not backed by any law and questioned "under what law, government is mandating it on anyone". MIT Technology Review gave 2 out of 5 stars to Aarogya Setu app after analyzing the COVID contact tracing apps launched in 25 countries. The app got stars only for the policy which suggests that data collected is deleted after a period of time and that the data collection, as far as user inputs go, is minimal. It also highlighted that India is the only democracy making its app mandatory for millions of people. The rating was further downgraded from 2 to 1 for collecting more information than the app needs to function. Following this, the MeitY made the source code of the Android app public on GitHub on 26 May, which will be followed by iOS and API documentation. Further, the Government has also launched a "bug bounty program". This was done to "promote transparency and ensure security and integrity of the app". However, experts stated that the server-side code had not yet been publicly released, which meant that public opinion on security and privacy was yet to be completely assuaged. Following this, ZDNet noted that the source code seemed to confirm the government's claim that user location data, if collected, would be anonymised and would be deleted after 45 days, or 60 days for high-risk individuals.

Harold Borko

Harold Borko (1922-2012) was an American psychologist and researcher working primarily in the field of information science. == Biography == Borko was born in 1922 in New York City, New York. After serving in the US Army from 1942 to 1946 he obtained a BA in Psychology from the University of California, Los Angeles in 1948 and both his MA and PhD from the University of Southern California in Psychology in 1952. He returned to the army as a psychologist until 1956 after which he began a career working in and teaching information science. He died in California in 2012. == Information Science Career == After leaving the military Borko began working at the RAND Corporation as a Systems Training Specialist in 1956 and moved to the Systems Development Corporation a year later working in the Language Processing and Retrieval department. Alongside this work he taught Psychology at USC from 1957-65 and then moved into teaching Library Science at UCLA from 1965. In 1967 Borko left his role at the Systems Development Corporation and continued as a full-time professor at UCLA until his retirement in 1993.. From 1961 to 1995 Borko authored and co-authored over 100 articles on new developments in the field as well as the historiography of information science. He served as an editor of the Journal of Educational Data Processing from 1963-1975 and as President of the American Society for Information Science in 1966 == Partial list of works == Borko, H. (1962, May). The construction of an empirically based mathematically derived classification system. In Proceedings of the May 1-3, 1962, spring joint computer conference (pp. 279-289). Borko, H., & Bernick, M. (1963). Automatic document classification. Journal of the ACM (JACM), 10(2), 151-162. Borko, H. (1964). The Storage and Retrieval of Educational Information. Journal of Teacher Education, 15(4), 449-452. Borko, H. (1964). Measuring the reliability of subject classification by men and machines. American Documentation, 15(4), 268-273. Borko, H. (1965). The conceptual foundations of information systems. Borko, H. (1968), Information science: What is it?†. Amer. Doc., 19: 3-5. https://doi.org/10.1002/asi.5090190103 Borko, H. (1970). Experiments in book indexing by computer. Information storage and retrieval, 6(1), 5-16. Borko, H. (1985). An introduction to computer-based library systems (Lucy A. Tedd). Education for Information, 3(1), 61.