473,398 Members | 2,368 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,398 software developers and data experts.

Pulling a subset of data from a larger data set

2
Hi,

I've got a file that's ~8,500 lines long and I want to pull out a subset of ~1,500 of these lines. Each line is a specific sample and I have a list of samples that I want to pull from this main data file. Just wondering which command would be best to implement this.

Regards,
Blair
Feb 19 '14 #1
3 2052
Banfa
9,065 Expert Mod 8TB
So assuming that you have the unique ids of the 1500 samples you want in a file (ids) and the data is in a file (samples) and that final order of retrieved samples is not important you could do something like this
Expand|Select|Wrap|Line Numbers
  1. cat ids | xargs -I '{}' grep '{}' samples
Feb 19 '14 #2
blairb
2
Many thanks for the response, it's much appreciated.

My coding is far from advanced, so just to clarify - do I need to put any parameters in the '{}' field?
i.e. I've got a file with the samples I want (ids.csv) and the data file (samples.xls) so my code would simply be

Expand|Select|Wrap|Line Numbers
  1. cat ids.csv | xargs -I '{}' grep '{}' samples.xls > catdata.xls
Is this correct?
Mar 11 '14 #3
Banfa
9,065 Expert Mod 8TB
The command line looks about right. Nothing needs placing in the '{}' field, read the man pages for xargs (man xargs) the -I switch specifies a string that should be replaced with the names read in from standard in (that is the output of the cat command in this case).

However I am concerned that you appear to be trying to work on xls (Excel) files as though they text files which they are not (unless they happen to contain csv data in which case why not use the csv extension). You may be able to achieve what you want by exporting the xls to a csv first though.
Mar 11 '14 #4

Sign in to post your reply or Sign up for a free account.

Similar topics

7
by: Will | last post by:
On the subject of Data Warehouses, Data Cubes & OLAP…. I would like to speak frankly about Data Warehouses, Data Cubes and OLAP (on-line analytical processing). Has it dawned on anyone else...
0
by: Julie | last post by:
I am studying about designing patterns that could help integrating data mining techniques implemented as web services into a data warehouse in the DB layer. Is anyone have idea what visual...
7
by: Will | last post by:
On the subject of Data Warehouses, Data Cubes & OLAP…. I would like to speak frankly about Data Warehouses, Data Cubes and OLAP (on-line analytical processing). Has it dawned on anyone else...
0
by: elcc1958 | last post by:
I need to support a VB6 application that will be receiving disconnected ADODB.Recordset from out DotNet solution. Our dotnet solution deals with System.Data.DataTable. I need to populate a...
1
by: mhnazly | last post by:
i'm trying to read data from SQL Server database using data reader and assigned it to a label in my asp.net web application. but when the button is clicked, nothing appears. please help, thanks. ...
1
by: krian | last post by:
Hi, I need a help from anybody. My problem is here I wrote two WebApplication in ASP.net Using C#. The name of the applications are (Journal.aspx and Payment.aspx). These Two pages have DataGrids...
1
by: Usarian Skiff | last post by:
I'm pulling a list of data from an excel file. When I open Excel files directly, if someone else has it open, I can select 'READ-ONLY' from a pop-up. When accessing the file from within my...
3
by: nittin14 | last post by:
helo every1, pls solve my problem i m showing data in data list using asp.net 2003 My problem is FirstName // column name in datalist RAVI ANIL AMIT
0
by: dataentryoffshore | last post by:
Get a Discount up to 60% on data entry, data capture, dataentry services, large volume data processing and data conversion services through offshore facilities in India. Offshore data entry also...
0
by: sunilkds | last post by:
I am working at one project on VB6.0, iam using ADO to retrieve data. I want add DATAGRID CONTROL.In that how to access data through data grid. Plz write the code , Is there any properties to...
0
by: emmanuelkatto | last post by:
Hi All, I am Emmanuel katto from Uganda. I want to ask what challenges you've faced while migrating a website to cloud. Please let me know. Thanks! Emmanuel
0
BarryA
by: BarryA | last post by:
What are the essential steps and strategies outlined in the Data Structures and Algorithms (DSA) roadmap for aspiring data scientists? How can individuals effectively utilize this roadmap to progress...
0
marktang
by: marktang | last post by:
ONU (Optical Network Unit) is one of the key components for providing high-speed Internet services. Its primary function is to act as an endpoint device located at the user's premises. However,...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
by: Hystou | last post by:
Overview: Windows 11 and 10 have less user interface control over operating system update behaviour than previous versions of Windows. In Windows 11 and 10, there is no way to turn off the Windows...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.