BigData | Digital Data News and Information

Keeping You Up On The Lastest

Posts tagged ‘BigData’

Category:

Tagged with:

Bitcoins

Bitcoin collecting has caused a wave of supercomputers techies hoping to make them a small fortune – even if it costs load of cash to power.

Specially developed Bitcoin ‘mining’ computers are either homemade or can be purchased from one of the growing number of online stores dedicated to cashing in on the supply side of the cult currency trend.

The 21 million Bitcoins hidden across an internet-based network, are expected to all be found by 2040. To unearth them computers have to solve the complex processor-intensive equations which hold them.

What is a Bitcoin?

It’s a piece of data locked in an internet-based network by a complex equation computers can break.

Once released it can be traded and used like money online and can be purchased with real cash.

Many websites are now taking Bitcoins as a form of currency. As well as digital currency, Bitcoin miners enjoy the competitive nature of unlocking the coins.

Famous fans include the Winklevoss twins who own around 1 percent of Bitcoins – currently worth around $11million

It has been dismissed by some as a Ponzi Scheme and touted by others as the future of money. It is not centrally controlled and it’s unique and complex set up means the market cannot be altered or hacked, according to the developers

There are 21 million coins predicted to last until 2140 and their finite nature means they perform more like a commodity, such as gold. The coins first emerged in 2008 and launched as a network in 2009.

They were introduced by an obscure hacker whose identity is a mystery but is known as Satoshi Nakamoto, which is thought to be a pseudonym. Users choose a virtual wallet from one of the various providers which enables them to receive, give and trade coins from other users. Bitcoins can be bought from specialist currency exchanges and online marketplaces such as eBay. Bitcoins mining machines can unlock many of the potentially lucrative coins – essentially computer codes – which are then used as currency online making their collectors potentially very rich.

The Bitcoin network, set up by a mysterious programmer under the pseudonym Satoshi Nakamoto in 2009, is also designed to get more complex the more miners there are looking for Bitcoins – opening the potential for ever-developing mining machines.

More here

Category:

computers, technology, Uncategorized

Tagged with:

Walmart Takes On Big Data

Much of the big data tools have been developed at the Walmart Labs, which was created after Walmart took over Kosmix in 2011. The products that were developed at Walmart Labs are ‘Social Genome’, ‘ShoppyCat’ and Get on the Shelf.

The Social Genome product allows Walmart to reach customers, or friends of customers, who have mentioned something online to inform them about that exact product and include a discount.
Public data is combined from the web along with social data and proprietary data such as customer purchasing data and contact information. The result is , constantly changing, up-to-date knowledge base with hundreds of millions of entities and relationships. this provides Walmart with a better understanding of the what their customers are saying online. An example mentioned by Walmart Labs shows a woman tweeting regularly about movies. When she tweets “I love Salt”, Walmart is able to understand that she is talking about the movie Salt and not the condiment.

The Shoppycat product developed by Walmart is able to recommend suitable products to Facebook users based on the hobbies and interests of their friends.

Get on the Shelf a crowd-sourcing solution that gave anyone the chance to promote his or her product in front of a large online audience. The best products would be sold at Walmart with the potential to suddenly reach millions of customers.

Category:

Big Data, Data, Data Scientist, Mobile Technology, Research, technology, Uncategorized

Tagged with:

Big Data Changing The Way You Think

Category:

Big Data, computers, Data, Data and Visualization, Data Scientist, Digital Humanities, Digital Libraries, Higher Education, Metadata, Research, technology

Tagged with:

Digital Humanities Awards 2012

Best DH tool or suite of tools
Winner: Omeka http://omeka.org/
1st Runner Up: Paper Machines
https://github.com/chrisjr/papermachines
2nd Runner Up: Isidore http://www.rechercheisidore.fr/
Total votes in category: 877

Best DH blog, article, or short publication
Winner: Digital Humanities Now http://digitalhumanitiesnow.org/
1st Runner Up:
http://leonardoflores.net/
2nd Runner Up:
http://www.thespace.lrb.co.uk/
Total votes in category: 1494

Best DH visualization or infographic
Winner: A Thousand Words: Advanced Visualization for the
Humanities http://www.tacc.utexas.edu/tacc-projects/a-thousand-words
1st Runner Up: e-Diasporas Atlas http://maps.e-diasporas.fr/
2nd Runner Up: ORBIS: The Stanford Geospatial Network Model of
the Roman World http://orbis.stanford.edu/
Total votes in category: 1099

Best professional resources for learning about or doing DH work
Winner: Digital Humanities Tool Box
http://www.scoop.it/t/digital-humanities-tool-box
1st Runner Up: Livingstone’s 1871 Field Diary: A Multispectral
Critical Edition: Project History, pages starting from
http://livingstone.library.ucla.edu/1871diary/initial_history.htm
2nd Runner Up: Bamboo DiRT http://dirt.projectbamboo.org/
Total votes in category: 1048

Best DH project for public audiences
Winner: CEISMIC: Canterbury Earthquake Digital Archive
http://www.ceismic.org.nz/
1st Runner Up: La Biblioteca Virtual de la Biblioteca Luis Ã ngel
Arango http://www.banrepcultural.org/blaavirtual/indice
2nd Runner Up: Dickens Journals Online http://www.djo.org.uk/
Total votes in category: 3161

Best use of DH for fun
Winner: The Future of the Past http://newspapers.wraggelabs.com/fotp/
1st Runner Up: DigitalNZ magic squares
http://wraggelabs.com/shed/magicsquares/
2nd Runner Up: 10 PRINT ebooks https://twitter.com/10print_ebooks
Total votes in category: 911

Category:

Big Data, Data and Visualization, Data Scientist, Digital Humanities, Digital Libraries, Higher Education, Libraries, Metadata, Mobile Technology, Research, technology

Tagged with:

BIG DATA AND THE OSCARS

READ MORE

Category:

Big Data, Research, Uncategorized

Tagged with:

Image

Written

on February 10, 2013

DH Awards Voting 2012

View More

Category:

Data, Data Scientist, Digital Humanities, Digital Libraries, Higher Education, Research, technology

Tagged with:

Although Java has applied a patch to fix Java 7 from a massive security vulnerability, the U.S. Department of Homeland Security has reiterated its warning that Java still poses risks.

Category:

computers, Data Scientist, Metadata, technology

Tagged with:

Image

The Cloud

Cloud Vmware

Written

on January 9, 2013

Human Face Of Big Data

Category:

Big Data, computers, Data, Data and Visualization, Data Scientist, Health care, Higher Education, Hospitals, Metadata, technology, Uncategorized

Tagged with:

A Data Visualization: The Internet Map

View Here

Category:

Big Data, computers, Data, Data and Visualization, Data Scientist, Digital Humanities, Digital Libraries, technology

Tagged with:

Data Visualization Tools

Sunburst, which displays data as a series of rings to demonstrate hierarchies.
Zoom, a tool for analyzing different ranges across data sets.
Trellis, an aggregation of scatter plots designed to help visualize patterns within Big Data.
Treemap, which is similar to Sunburst but displays hierarchies as rectangles instead of rings.

Category:

Data, Data and Visualization, Digital Humanities, Higher Education, Metadata, technology

Tagged with:

Can Big Data Survive Without Data Scientist?

A 2011 McKinsey & Co. survey pointed out that many organizations don’t have the skilled personnel needed to mine big data for insights and the structures and incentives required to use big data to make informed decisions and act on them.

Big data is a mixture of distributed data architectures and tools like Hadoop, NoSQL, Hive and R. Data scientists serve as the gatekeepers and mediators between these systems and the people who run the business – the domain experts.

Three main roles served by the data scientist: data architecture, machine learning, and analytics. While these roles are important, but not every company actually needs a highly specialized data team of the sort you’d find at Google or Facebook.

Most of the standard challenges that require big data, like recommendation engines and personalization systems, can be abstracted out. On a per domain basis, however, feature creation could be templatized. What if domain experts could directly encode their ideas and representations of their domains into the system, bypassing the data scientists as middleman and translator?

interactive report

Category:

Big Data, computers, Data Scientist, technology

Tagged with:

National Strategy for Information Sharing and Safeguarding

The White House has released a new strategy for federal information sharing lists among its top priorities adoption of metadata standards and the further extension of the Federal Identity Credential and Access Management framework. The new document does define policy objectives, which includes metadata.

Read More

Category:

Big Data, computers, Data, Digital Curation, Digital Libraries, Higher Education, Libraries, Metadata, technology

Tagged with:

Data Becoming Bigger and Better 2013

Mortar

Infochimps

Microsoft Windows Azure HDInsight

There are companies trying to make Hadoop more useful by turning it into a platform for something other than running MapReduce jobs. The companies – Continuuity, Platfora, Drawn to Scale

Category:

Big Data, computers, Data, Data and Visualization, technology

Tagged with:

Federal mHealth Projects

Read More Here

Category:

Big Data, computers, Higher Education, mHealth, Mobile Technology, technology

Tagged with:

U.S. Refuse To Sign ITU Treaty

Part of the U.S. delegation, which said today it could not sign the proposed treaty because of its harmful effects on the Internet, to the Dubai summit.

An Alliance of Western democracies including the United States, the United Kingdom, and Canada has rejected a proposed treaty over concerns it hands repressive governments too much authority over the Internet.

Ambassador Terry Kramer, head of the U.S. delegation to the Dubai summit “This conference was never meant to focus on Internet issues and the Internet has given the world unimaginable economic and social benefit during these past 24 years — all without U.N. regulation.”

Delegates from the Netherlands, New Zealand, Denmark, Sweden, the Philippines, Poland, and the Czech Republic also said they could not sign the proposed International Telecommunication Union treaty, which is scheduled to be finished by today. Kenya’s delegate appeared to take the same position, saying “we reserve our rights” to “go back home and do more consultations” before signing, and India has signaled it agrees with the U.S. position. Japan’s delegation said needed to consult with Tokyo before proceeding.

Deep divisions became apparent over the mere mention of “human rights obligations” in the treaty — a proposal that China and Iran opposed — and whether the U.N. was the proper organization to oversee key decisions about how the Internet should be managed. Currently groups including the Internet Engineering Task Force and the Internet Corporation for Assigned Names and Numbers, or ICANN, fulfill that role.

Canada said it was forced to reject the proposed treaty because of its commitment to an Internet “in which people are free to participate, communicate, organize and exchange information.

At least a dozen nations, especially the United States, has likely doomed the entire summit, which was convened to draft a new treaty, unless a competing alliance including China and Algeria are willing to offer a dramatic last-minute compromise. ITU secretary general Hamadoun Touré said in September that “no proposal is going to be passed if it does not have very wide support from all involved.”

Internet Society

Webcast

Category:

Big Data, computers, Data, Data and Visualization, Data Scientist, Digital Humanities, Digital Libraries, Education, Higher Education, Librarian, Metadata, technology, Uncategorized

Tagged with:

PHD Comics

Category:

Big Data, Books, computers, Data and Visualization, Data Scientist, Digital Curation, Digital Humanities, Digital Libraries, Higher Education, Librarian, Metadata, technology, Uncategorized

Tagged with:

Big Data

After having been accustomed to terms like MegaByte, GigaByte, and TerraByte, we must now prepare ourselves for a whole new vocabulary, such as PetaByte, ExaByte, and ZettaByte which will be as common as the aforementioned.

Dr Riza Berkan CEO and Board Member of Hakia provides a list of Mechanisms generating Big Data

Data from scientific measurements and experiments (astronomy, physics, genetics, etc.)
Peer to peer communication (text messaging, chat lines, digital phone calls)
Broadcasting (News, blogs)
Social Networking (Facebook, Twitter)
Authorship (digital books, magazines, Web pages, images, videos)
Administrative (enterprise or government documents, legal and financial records)
Business (e-commerce, stock markets, business intelligence, marketing, advertising)
Other

Dr Riza Berkan says Big Data can be a blessing and a curse.

He says that although there should be clear boundaries between data segments that belong to specific objectives, this very concept is misleading and can undermine potential opportunities. For example, scientists working on human genome data may improve their analysis if they could take the entire content (publications) on Medline (or Pubmed) and analyze it in conjunction with the human genome data. However, this requires natural language processing (semantic) technology combined with bioinformatics algorithms, which is an unusual coupling at best. Two different data segments in different formats, when combined, actually define a new “big data”. Now, add to that a 3^rd data segment, such as the FBI’s DNA bank, or geneology.com and you’ll see the complications/opportunities can go on and on. This is where the mystery and the excitement resides with the concept of big data.

Super Big Data Software

Dr Riza Berkan asks are we prepared for generating data at colossal volumes? and we should look at this question in two stages: (1) Platform and (2) Analytics “super” Software

Apache Hadoop’s open source software enables the distributed processing of large data sets across clusters of commodity servers, aka cloud computing. IBM’s Platform Symphony is another example of grid management suitable for a variety of distributed computing and big data analytics applications. Oracle, HP, SAP, and Software AG are very much in the game for this $10 billion industry. While these giants are offering variety of solutions for distributed computing platforms, there is still a huge void at the level of Analytics Super Software . Super Software’s main function would be to discover new knowledge which would otherwise be impossible to acquire via manual means says Dr Berkan.

Discovery requires the following functions:

Finding associations across information in any format
Visualization of associations
Search
Categorization, compacting, summarization
Characterization of new data (where it fits)
Alerting
Cleaning (deleting unnecessary clogging information

Moreover, Dr Berkan says that” Super Software would be able to identify genetic patterns of a disease from human genome data, supported by clinical results reported in Medline, and further analyzed to unveil mutation possibilities using FBI’s DNA bank of millions of DNA information. One can extend the scope and meaning of top level objectives which is only limited by our imagination.”

Then too, Dr Berkan says big data can also be a curse if the cleaning (deleting) technologies are not considered as part of the Super Software operation. In his previous post, “information pollution”, he emphasized the danger of uncontrollable growth of information which is the invisible devil in information age.

credits: Search Engine Journal/SEG

Category:

Big Data, computers, Data, Data and Visualization, Data Scientist, Education, technology

Tagged with:

Singapore’s Ten Year Road Map and Big Data

June 2006 Singapore launched its 10-year roadmap, dubbed Intelligent Nation 2015. The objective was to ensure the city-state would achieve economic and social benefits through the innovative use of infocomm technologies.

Video

Read More

Category:

Bigdata, computers, Data, Data and Visualization, Data Scientist, Education, technology

Tagged with:

Big Data On The Campus

Mining Data To Help Students

View More

Category:

Bigdata, computers, Data and Visualization, Digital Curation, Education, technology

Tagged with:

Big Data and the Legal Profession

Read More

Category:

computers, Data, Data Scientist, Education, Law, technology

Tagged with:

IBM’s Understanding Big Data e-book

PDF

Category:

computers, Data, Data Scientist, Education, technology

Tagged with:

Data Citation Index

October 2012, Thomson Reuters to release the Data Citation Index on the Web of Knowledge platform to provide a single point of access to quality research data from repositories across disciplines and around the world.

Read More

Category:

Data, Data Scientist, Digital Humanities, Digital Libraries, Metadata, technology

Tagged with:

BYOD

Gartner says, Bring Your Own Device is an alternative strategy that allows employees, business partners and other users to use a personally selected and purchased client device to execute enterprise applications and access data. For most organizations, the program is limited to smartphones and tablets, but the strategy may also be used for PCs. It may or may not include subsidies for equipment or service fees.

Read More

Category:

computers, Data, Data Scientist, technology

Tagged with:

Big Data and Other Technologies

Currently Big Data is synonymous with technologies like Hadoop, and the “NoSQL” class of databases like Mongo (document stores) and Cassandra (key-values). Today it’s possible to stream real-time analytics with ease. Spinning clusters up and down is a (relative) cinch, accomplished in 20 minutes or less.

Now there are new untapped open source technologies out there.

STORM AND KAFKA

Storm and Kafka are used at a number of high-profile companies including Groupon, Alibaba, and The Weather Channel.

Storm and Kafka is said to handle data velocities of tens of thousands of messages every second.

Drill and Dremel said to put power in the hands of business analysts, and not just data engineers.

R

R is an open source statistical programming language. It is incredibly powerful. Over two million (and counting) analysts use R. R works very well with Hadoop

GREMLIN AND GIRAPH

Gremlin and Giraph help empower graph analysis, and are often used coupled with graph databases like Neo4j or InfiniteGraph, or in the case of Giraph, working with Hadoop.

SAP HANA

SAP Hana is an in-memory analytics platform that includes an in-memory database and a suite of tools and software for creating analytical processes and moving data in and out, in the right formats.

Category:

Data, Data Scientist, technology

Tagged with:

Big Data and High Performance Comouting

Read More

Category:

computers, Data, Data Scientist, technology

Tagged with:

Big Data

Big data is measured in terabytes, petabytes, or more. Data becomes “big data” when it outgrows your current ability to process it, store it, and cope with it efficiently. Storage has become very cheap in the past ten years, allowing loads of data to be collected. However, our ability to actually process the loads of data quickly has not scaled as fast. Traditional tools to analyze and store data — SQL databases, spreadsheets, the Chinese abacus — were not designed to deal with vast data problems. The amount of information in the world is now measured in zettabytes. A zettabyte, which is 10²¹ bytes (that is 1 followed by twenty-one zeroes), is a big number. Imagine writing three paragraphs describing your favorite movie – that’s about 1 kilobyte. Next, imagine writing three paragraphs for every grain of sand on the earth — that amount of information is in the zettabyte range.

The best tool available today for processing and storing herculean amounts of big data is Hadoop. Hundreds or thousands of computers are thrown at the big data problem, rather than using single computer.

Hadoop makes data mining, analytics, and processing of big data cheap and fast. Hadoop can take most of your big data problems and unlock the answers, because you can keep all your data, including all of your historical data, and get an answer before your children graduate college.

Apache Hadoop is an open-source project inspired by research of Google. Hadoop is named after the stuffed toy elephant of the lead programmer’s son. In Hadoop parlance, the group of coordinated computers is called a cluster, and the individual computers in the cluster are called nodes.

Category:

Uncategorized

Tagged with:

New York City Show Promising Sign to Becoming The Next Silicon Valley

Tech Giants Google and Facebook have shown their presences in New York in recent years. Some big-name newcomers are headquartered here. Plans for an elite technology graduate school, attracted with city money, are getting enough attention that a federal patent officer is being stationed on campus in a first-of-its-kind arrangement.

Entrepreneurs say New York also faces particular challenges, including problematic broadband access in a few areas and a limited tech talent base, though the city is trying to address the concerns. New York solid ground so to speak in financial technology and online publishing, but the growth of social media and digital marketing opens new prospects for a city known for communications, design and advertising. Some prominent start ups include Foursquare, Tumblr, Kickstarter and Gilt Groupe. They were established in New York in the past five years.

The city’s biggest move was : offering 12 acres of land and up to $100 million in improvements for a tech-focused graduate school. Cornell University and Technion-Israel Institute of Technology won a competition to run the school, set to start with a handful of students in January. It will be the first institution in the country to boast about an on-campus patent officer, acting U.S. Commerce Secretary Rebecca Blank announced this month. Columbia University and New York University were also offered $15 million apiece in incentives to create new technology programs.

Category:

Data, Data Scientist, Education, technology

Tagged with:

Interactive Data Visualizations

View

Category:

computers, Data, Data and Visualization, Data Scientist, technology

Tagged with:

New York City Data Week 2012

New York City Data Week At Silicon Alley

View Calendar

Category:

Data, Data and Visualization, Data Scientist, Education, Metadata, technology

Tagged with:

Business Intelligence Applications Making Transitions

Business intelligence applications, have begun to transition from an OLAP to a new type of service that connects different data sources from social networks, third-party apps and other sources. NoSQL has begun to appear as a popular option for its scaling capability across cheap, commodity-based nodes. It’s much cheaper than scaling with vertically integrated systems that require attaching expensive storage arrays.

A new generation of big data applications are turning up. Which in turn has put pressure on enterprise vendors to modify existing software suites.Venture capitalists will continue to invest in data infrastructure and big data apps that represent the manifest disruption in IT.

Category:

computers, Data, technology

Tagged with:

Keeping You Up On The Lastest

Posts tagged ‘BigData’

STORM AND KAFKA

R

GREMLIN AND GIRAPH

SAP HANA

Archives